5,945 Matching Annotations
  1. Oct 2025
    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      This publication applies 3D super-resolution STORM imaging to understanding the role of developmental neural activity in the clustering of retinal inputs to the mouse dorsal lateral geniculate nucleus (dLGN). The authors argue that retinal ganglion cell (RGC) synaptic boutons start forming clusters early in postnatal development (P2). They then argue that these clusters contribute to eye-specific segregation of retinal inputs by activity-dependent stabilization of nearby boutons from the same eye. The data provided is N=3 animals for each condition of P2, P4, and P8 animals in wild-type mice and in mice where early patterns of structured retinal activity are blocked.

      Strengths:

      The 3D storm imaging of pre and postsynaptic elements provides convincing high-resolution localization of synapses.

      The experimental design of comparing ipsilateral and contralateral RGC axon boutons in a region of the dLGN that is known to become contralateral is elegant. The design makes it possible to relate fixed time point structural data to a known outcome of activity-dependent remodeling.

      Weaknesses:

      Based on previous literature, it is known that synapse density, synapse clustering, and synaptic specificity increase during postnatal development. Previous work has also shown that both the changes in synaptic clustering and synaptic specificity are affected by retinal activity. The data and analysis provided by the authors add little unambiguous evidence that advances this understanding.

      We agree with the reviewer that previous literature shows that synapse density, synapse clustering, and synaptic specificity increase during postnatal development and that these processes are affected by retinal activity. The majority of studies on synaptic refinement have been performed after eye-opening, when eye-specific segregation is already complete. In contrast, most studies of eye-specific segregation focus on axonal refinement phenotypes. To our knowledge, only a small number of experiments have examined retinogeniculate synaptic properties at the nanoscale during eye-specific segregation (1-4). Our broad goal is to understand the mechanisms of synaptogenesis and competition at the earliest stages of eye-specific refinement, when spontaneous retinal activity is a major driver of activity-dependent remodeling. We hope that readers will appreciate that there is still much to discover in this fascinating model system of synaptic competition.

      General problem 1: Most of the statistical analysis is limited to ANOVA comparison of axons from the contralateral and ipsilateral retina in the contralateral dLGN. The hypothesis that ipsilateral and contralateral axons would be statistically identical in the contralateral dLGN is not a plausible hypothesis so rejecting the hypothesis with P < X does not advance the authors' arguments beyond what was already known.

      General problem 2: Most of the interpretation of data is qualitative. While error bars are provided, these error bars are not used to draw conclusions. Given the small sample size (N=3), there is a large degree of uncertainty regarding the magnitude of changes (synapse size, number, specificity). The authors base their conclusions on the averages of these values when the likely degree of uncertainty could allow for the opposite interpretation.

      We appreciate the reviewer’s concerns regarding the use of ANOVA for statistical testing in the original submission. We have generated new figures that show confidence intervals for each analysis in the manuscript and these are included in the response to reviewers document below. To address the underlying concern that our N=3 sample size limits the interpretation of our results, we have revised the manuscript to be cautious in our interpretations and to discuss additional possibilities that are consistent with the anatomical data.

      General problem 3: Two of the four results sections depend on using the frequency of single active zone vGlut2 clusters near multiple active zone vGlut2 as a proxy for synaptic stabilization of the single active zone vGlut2 clusters by the multiple active zone vGlut2 clusters. The authors argue that the increased frequency of same-eye single active zone clusters relative to opposite-eye single active zone clusters means that multiple active zone vGlut2 clusters are selectively stabilizing single active zone clusters. There are other plausible explanations for this observation that are not eliminated. An increased frequency of nearby single active zone clusters would also occur if RGC axons form more than one synapse in the dLGN. Eye-specific segregation is, by definition, a relative increase in the frequency of nearby boutons from the same eye. The authors were, therefore, guaranteed to observe a non-random relationship between boutons from the same eye. The authors do compare their measures to a random model, but I could not find a description of the model. I would expect that the model would need to account for RGC arbor size, arbor structure, bouton number, and segregation independent of multi-active-zone vGlut2 clusters. The most common randomization for the type of analysis described here, a shift in the positions of single-active zone boutons, would not be adequate.<br /> In discussing the claimed cluster-induced stabilization of nearby boutons, the authors state that the specificity increases with age due to activity-dependent refinement. Their quantification does not support an increase in specificity with age. In fact, the high degree of clustering "specificity" they observe at P2 argues for the trivial same axon explanation.

      We agree with the reviewer that individual RGC axons form multiple synapses and that, over time, eye-specific segregation must increase the frequency of like-eye synapses relative to opposite-eye synapses. Indeed, our previous study of eye-specific refinement showed that at P8, the density of eye-specific inputs had increased for the dominant-eye and decreased for the non-dominant-eye (1). However, at postnatal day 4, contralateral and ipsilateral input densities were the same in the future contralateral-eye territory. One of our goals in this study was to determine if the process of synaptic clustering begins at these earliest stages of synaptic competition and, if so, whether it is influenced by retinal wave activity. It is plausible that the RGC axons from the same eye could initially form synapses randomly and, at some later stage, synapses may be selectively added to produce mature glomeruli. Consistent with this possibility, previous analysis of JAM-B RGC axon refinement showed the progressive clustering of axonal boutons at later stages of development after eye-specific segregation (5).

      Regarding the randomization that we employed, we performed a repositioning of synapse centroids within the volume of the neuropil after accounting for neuronal soma volumes and edge effects. We agree that this type of randomization cannot account for the fine scale structure of axons and dendrites, which we did not have access to in this four-color volumetric super-resolution data set. To address this, we have performed additional clustering analyses surrounding both single-active zone and multi-active zone synapses. This new analysis showed that there is a modest clustering effect around single-active zone synapses compared to complete randomization described above. We now present this information using a normalized clustering index for direct comparison of clustering between multi-active zone and single-active zone synapses. We have measured effect sizes and confidence intervals, which we present in point-by-point responses below. We have restructured the manuscript figures and discussion to provide a balanced interpretation of our results and the limitations of our study.

      Analysis of specific claims:

      Result Section 1

      Most of the figures show mean, error bars, and asterisks, but not the three data points from which these statistics are derived. Large changes in variance from condition to condition suggest that displaying the data points would provide more useful information.

      We thank the reviewer for their suggestion. We have updated all figures to display the means of all biological replicates as individual data points.

      Claim 1: Contralateral density increases more than ipsilateral in the contralateral region over the course of development. This claim is supported by the qualitative comparison of means and error bars in Figure 2D. The argument could be made quantitative by providing a confidence interval for synapse density increase for dominant and non-dominant synapse density. A confidence interval could then be generated for the difference in this change between the two groups. Currently, the most striking effect is a big difference in variance between P4 and P8 for dominant eye complex synapses. Given that N=3, I assume there is one extreme outlier here.

      We appreciate the comment and believe the reviewer was referring to the data presented in the original Figure 1D, rather than Figure 2D.

      We agree with the reviewer that our comment on the change in synapse density across ages was not quantitatively supported by the figure as we did not perform a proper age-wise statistical comparison. We have removed this claim in the revised manuscript.

      We also appreciate the suggestions to clarify the presentation of our statistical analyses and to utilize confidence interval measurements wherever possible. We present Author response image 1 below, showing the density of multi-AZ synapses in the contralateral-eye territory over time (P2-P8), for both CTB(+) contralateral (black) and CTB(-) ipsilateral inputs (red) featuring 5/95% confidence intervals:

      Author response image 1.

      More broadly, the reviewer has raised the concern that the low number of biological replicates (N=3) presents challenges in the use of ANOVA for statistical testing. We agree with the concern and have revised the manuscript to be cautious in our statistical tests and resulting claims. We have chosen to use paired T-tests to compare measurements of eye-specific synapse properties because these measurements were always made within each individual biological replicate (paired measurements). Below, we discuss our logic for this change and the effects on the results we present in the revised manuscript.

      Considering the above image:

      (1) ANOVA: In our initial submission, we used an ANOVA test which showed P<0.05 for the CTB(+) P4 vs. P8 comparison above, leading to our statement about an age-dependent increase in multi-AZ density. However, the figure above shows that P8 data has higher variance. Thus, the homogeneity of variance assumption of ANOVA may lead to false positives in this comparison.

      (2) Confidence interval for N=3: We calculated confidence intervals for P4 and P8 data (5/95% CI shown above). Overlap between the two groups indicates the true mean values of the two groups could be identical. However, the P8 confidence intervals (as well as other confidence intervals across other comparisons in the manuscript) also include the value of 0. This indicates there actually might be no multi-active zone synapses in the mouse dLGN. The failure arises because the low number of biological replicates (N=3 data points) precludes a reliable confidence interval measurement. CI measurements require sufficient sample sizes to determine the true population variance.

      (3) Difficulty in achieving sufficient sample sizes for CI analysis in ultrastructural studies of the brain: volumetric STORM experiments are technically complex and make use of sample preparation and analysis methods that are similar to volumetric electron microscopy (physical ultrathin sectioning and computational 3D stack alignment). For these technical reasons, it is difficult to collect imaging data from >10 mice for each group of data (e.g. age and tissue location) in one single project. Because of the technical challenges, most ultrastructural studies published to date present results from single biological replicates. In our STORM dataset, we collected imaging data of N=3 biological replicates for each age and genotype. We agree that in the future the collection of additional replicates will be important for improving the reliability of statistical comparisons in super-resolution and electron-microscopy studies. Continued advances in the throughput of imaging/analysis should help to make this easier over time. 

      (4) The use of paired T-tests: In this study, we have eye-specific CTB(+) and CTB(-) synapse imaging data from the same STORM fields within single biological replicates. When there is only one measurement from each replicate (e.g. synapse density, ratio of total synapses), using paired tests to compare these groups increases statistical power and does not assume similar variance. However, this limits our analysis to comparisons within each age, and not between ages. Accordingly, we have revised our discussion of the results and interpretations throughout the manuscript. When there are thousands of measurements of synapses from each replicate (e.g. Figure 2A-B on synapse volumes), we use a mixed linear model to analyze the variance. In the revised figures we present the results using standard error of the mean and link measurements from within the same individual replicates to show the paired data structure. In cases where specific comparisons are made across ages, we present 5/95% confidence interval measurements.

      Claim 2: The fraction of multiple-active zone vGlut2 clusters increases with age. This claim is weakly supported by a qualitative reading of panel 1E. The error bars overlap so it is difficult to know what the range of possible increases could be. In the text, the authors report mean differences without confidence intervals (or any other statistics). The reported results should, therefore, be interpreted as a description of their three mice and not as evidence about mice in general.

      We appreciate the reviewer’s concern that statistical accuracy of our synapse density comparisons over age is limited by the small sample size as discussed above. We have removed all strong claims about age-dependent changes in the density of multi-active zone and single-active zone synapses. Instead, we focus our analyses on comparisons between CTB(+) and CTB(-) synapse measurements, which are paired within each biological replicate. To specifically address the reviewer’s concern about figure panel 1E, we present Author response image 2 with confidence intervals below.

      Author response image 2.

      Figure S1. Panel A makes the point that the study could not be done without STORM by comparing the STORM images to "Conventional" images. The images are over-saturated low-resolution images. A reasonable comparison would be to a high-quality quality confocal image acquired with a high NA objective (~1.4) and low laser power (PSF ~ 0.2 x 0.2 x 0.6 um) that was acquired over the same amount of time it takes to acquire a STORM volume.

      We agree with the reviewer that the presentation of low-resolution conventional images is not necessary. We have deleted the panel and modified the text accordingly.

      Result section 2.

      Claim 1: The ipsi/contra (in contra LGN) difference in VGluT2 cluster volume increases with development. While there are many p-values listed, the main point is not directly quantified. A reasonable way to quantify the relative increase in volume could be in the form: the non-dominant volumes were 75%-95%(?) of the dominant volume at P2 and 60%-80% (?) at P8. The difference in change was -5 to 15%(?).

      We thank the reviewer for their helpful suggestion to improve the clarity of the results presented in this analysis of eye-specific synapse volumes. In our original report, we found differences in eye-specific VGluT2 volume at each time point (P2/P4/P8) in control mice (1). The original measurements used the entire synapse population. Here, we aimed to determine whether eye-specific differences in VGluT2 volumes were present for both multi-AZ synapses and single-AZ synapses, and whether one population may have a greater contribution to the previous population measurement that we reported. We found that at P4 (a time when the overall eye-specific synapse density is equivalent for both eyes in the dLGN), WT multi-AZ synapses showed a greater difference (372%) in eye-specific VGluT2 volume compared with single-AZ synapses (135%). In β2KO mice multi-AZ synapses showed a greater difference (110%) in eye-specific VGluT2 volume compared with single-AZ synapses (41%). In our initial manuscript submission, we included statistical comparisons of eye-specific volume differences across ages, but we did not highlight these differences in our discussion of the results. For clarity, we have removed all statistical comparisons across ages in the revised manuscript. We have modified the text to focus on eye-specific VGluT2 volume differences at P4 described above. To specifically address the reviewer’s question, we provide the percentage differences between multi- and single-AZ eye-specific synapses for each age/genotype below:

      Author response table 1.

      Claim 2: Complex synapses (vGlut2 clusters with multiple active zones) represent clusters of simple synapses and not single large boutons with multiple active zones. The authors argue that because vGlut2 cluster volume scales roughly linearly with active zone number, the vGlut2 clusters are composed of multiple boutons each containing a single active zone. Their analysis does not rule out the (known to be true) possibility that RGC bouton sizes are much larger in boutons with multiple active zones. The correlation of volume and active zone number, by itself, does not resolve the issue. A good argument for multiple boutons might be that the variance is smallest in clusters with 4 active zones (looks like it in the plot) since they would be the average of four active zones to vesicle pool ratios. It is very likely that the multi-active zone vGlut2 clusters represent some clustering and some multi-synaptic boutons. The reference cited by the authors as evidence for the presence of single active zone boutons in young tissue does not rule out the existence of multiple active zone boutons.

      We agree with the reviewer’s comments on the challenges of classifying multi-active zone synapses in STORM images as single terminals versus aggregates of terminals. To help address this, we have performed electron microscopy imaging of genetically labeled RGC axons and identified the existence of single retinogeniculate terminals with multiple active zones. Our EM imaging was limited to 2D sections and does not rule out the clustering of small, single- active zone synapses within 3D volumes. Future volumetric EM reconstructions will be informative for this question. We have significantly updated the figures and text to discuss the new results and provide a careful interpretation of the nature of multi-AZ synapses in STORM imaging data. 

      Several arguments are made that depend on the interpretation of "not statistically significant" (n.s.) meaning that "two groups are the same" instead of "we don't know if they are different". This interpretation is incorrect and materially impacts the conclusions.

      Several arguments are made that interpret statistical significance for one group and a lack of statistical significance for another group meaning that the effect was bigger in the first group. This interpretation is incorrect and materially impacts the conclusions.

      We thank the reviewer for raising these concerns. We have extensively revised the manuscript text to report the data in a more precise way without overinterpreting the results. All references to “N.S.” and associated conclusions have been either removed or substantiated with 5/95% confidence interval testing.

      Result Section 3.

      Claim 1: Complex synapses stabilize simple synapses. There are alternative explanations (mentioned above) for the observed clustering that negate the conclusions. 1) Boutons from the same axon tend to be found near one another. 2) Any form of eye-specific segregation would produce non-random associations in the analysis as performed. The authors compare each observation to a random model, but I cannot determine from the text if the model adequately accounts for alternative explanations.

      We thank the reviewer for their suggestion to consider alternative explanations for our results. We agree that our study does not provide direct molecular mechanistic data demonstrating synaptic stabilization effects. We have significantly revised the manuscript to be more cautious in our interpretations and specifically address alternative biological mechanisms that are consistent with the non-random arrangement of retinogeniculate synapses in our data.

      We agree with the reviewer that individual RGC axons form multiple synapses, however, nascent synapses might not always form close together. If synapses are initially added randomly within RGC axons, eye-specific segregation may conclude with a still-random pattern of dominant-eye inputs. At some later stage, synapses may be selectively refined to produce mature glomeruli. Consistent with this, individual RGCs undergo progressive clustering of axonal boutons at later stages of development after eye-specific segregation (5). One of our goals in this work was to determine if the process of synaptic clustering begins at the earliest stages of synapse formation and, if so, whether it is influenced by retinal wave activity.

      To measure synaptic clustering in our STORM data, we used a randomization of single-AZ synapse centroids within the volume of the neuropil after accounting for neuronal soma volumes and edge effects. Multi-AZ centroid positions were held fixed. Comparing the randomized result to the original distribution, we found a higher fraction of single-AZ synapse associated with multi-AZ synapses, arguing for a non-random clustering effect. However, we agree with the reviewer’s concern that this type of randomization cannot account for the fine scale structure of axons, which we did not have access to in this four-color volumetric super-resolution data set. Thus, there could still be errors in a purely volumetric randomization (e.g. the assignment of synapses to regions in the volume that would not be synaptic locations in the original neuropil), which would effectively decrease the measured degree of clustering after the randomization. To address this, we have revised our analysis to measure the degree of synapse clustering nearby both multi-AZ and single-AZ synapses after an equivalent randomization of single-AZ synapse positions in the volume. 

      We now present the revised results as a “clustering index” for both multi-AZ and single-AZ synapses. This measurement was performed in several steps: 1) randomization of single-AZ position with the imaging volume while holding multi-AZ centroid positions fixed, 2) independent measurements of the fraction of single-AZ synapses within the local shell (1.5 μm search radius) around multi-AZ and single-AZ synapses within the random distribution, 3) comparison of the result from (2) with the actual fractional measurements in the raw STORM data to compute a “clustering index” value. 4) Because the randomization is equivalent for both multi-AZ and single-AZ synapse measurements, any measured differences in the degree of clustering reflect the synapse type.

      We have updated Figure 3 in the revised manuscript to present the relative clustering index described above. We have updated the results, discussion, and methods sections accordingly.

      The authors claim that specificity increases over time. Figure 3b (middle) shows that the number of synapses near complex synapses might increase with time (needs confidence interval for effect size), but does not show that specificity (original relative to randomized) increases with time. The fact that nearby simple synapse density is always (P2) very different from random suggests a primarily non-activity-dependent explanation. The simplest explanation is that same-side boutons could be from the same axon whereas different-side axons could not be.

      We have significantly revised the analysis and presentation of results in Figure 3 to include a comparative measurement of synaptic clustering between multi-AZ and single-AZ synapses (discussed above). The data presented in the original Figure 3B have been moved to Supplemental Figure 4. Statistical comparisons in Figure S4 between the original and randomized synapse distributions are limited to within-age measurements. Cross-age comparisons were not performed or presented. To address the reviewer’s question concerning CI analysis in the original Figure 3B, we provide Author response image 3 below showing 5/95% confidence intervals for WT mice:

      Author response image 3.

      Claim 2: vGlut2 clusters more than 1.5 um away from multi-active zone vGlut2 clusters are not statistically significantly different in size than vGlut2 clusters within 1.5 um of multi-active zone vGlut2 clusters. Therefore "activity-dependent synapse stabilization mechanisms do not impact simple synapse vesicle pool size". The specific measure of 1.5 um from multi-active zone vGlut2 clusters does not represent all possible synapse stabilization mechanisms.

      We agree with the reviewer that this specific measure does not capture all possible synapse stabilization mechanisms. We have modified the text in the revised manuscript throughout to be more cautious in our data interpretation and have included additional discussion of alternative mechanisms consistent with our results.

      Result Section 4.

      Claim: The proximity of complex synapses with nearby simple synapses to other complex synapses with nearby simple synapses from the same eye is used to argue that activity is responsible for all this clustering.

      It is difficult to derive anything from the quantification besides 'not-random'. That is a problem because we already know that axons from the left and right eye segregate during the period being studied. All the measures in Section 4 are influenced by eye-specific segregation. Given this known bias, demonstrating a non-random relationship (P<X) doesn't mean anything. The test will reveal any non-random spatial relationship between same-eye and opposite-eye synapses.

      The results can be stated as: If you are a contralateral complex synapse, contralateral complex synapses that are also close to contralateral simple synapses will, on average, be slightly closer to you than contralateral complex synapses that are not close to contralateral ipsilateral synapses. That would be true if there is any eye-specific segregation (which there is).

      We appreciate the reviewer’s comments that our anatomical data are consistent with several possible mechanisms, suggesting the need for alternative interpretations of the results. In the original writing, we interpreted our results in the context of activity-dependent mechanisms of like-eye stabilization and opposite-eye competition. However, our results are also consistent with other mechanisms, including non-random molecular specification of eye-specific inputs onto subregions of postsynaptic target cells (e.g. distinct relay neuron dendrites). We have rewritten the manuscript to be more cautious in our interpretations and to provide a balanced discussion of alternative possibilities.

      Regarding the concern that the data in section four are influenced by eye-specific segregation, we previously found synapse density from both eyes is equivalent in the contralateral region at the P4 time point presented (1), which is consistent with binocular axonal overlap at this age. Within our imaging volumes, ipsilateral and contralateral inputs were broadly intermingled throughout the volume, and we did not find evidence for regional segregation with the imaging fields. By these metrics, retraction of ipsilateral inputs from the contralateral territory has not yet occurred.

      It is an overinterpretation of the data to claim that the lack of a clear correlation between vGlut2 cluster volume and distance to vGlut2 clusters with multiple active zones provides support for the claim that "presynaptic protein organization is not influenced by mechanisms governing synaptic clustering".

      We agree with the reviewer that our original language was imprecise in referring to presynaptic protein organization broadly. We have revised this text to present a more accurate description of the results.

      Reviewer #2 (Public Review):

      In this manuscript, Zhang and Speer examine changes in the spatial organization of synaptic proteins during eye-specific segregation, a developmental period when axons from the two eyes initially mingle and gradually segregate into eye-specific regions of the dorsal lateral geniculate. The authors use STORM microscopy and immunostain presynaptic (VGluT2, Bassoon) and postsynaptic (Homer) proteins to identify synaptic release sites. Activity-dependent changes in this spatial organization are identified by comparing the β2KO mice to WT mice. They describe two types of presynaptic organization based on Bassoon clustering, the complex and the simple synapse. By analyzing the relative densities and distances between these proteins over age, the authors conclude that the complex synapses promote the clustering of simple synapses nearby to form the future mature glomerular synaptic structure.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because of the CTB label allows identification of the eye from which the presynaptic terminal arises. Using this approach, the authors find that simple synapses cluster close to complex synapses over age, that complex synapse density increases with age.

      Weaknesses:

      From these data, the authors conclude that the complex synapse serves to "promote clustering of like-eye synapses and prohibit synapse clustering from the opposite eye". However, the authors show no causal data to support these ideas. There are a number of issues that the authors should consider:

      (1) Clustering of retinal synapses is in part due to the fact that retinal inputs synapse on the proximal dendrites. With increased synaptogenesis, there will be increased density of retinal terminals that are closely localized. And with development, perhaps simple synapses mature into complex synapses. Simple synapses may also represent ones that are in the process of being eliminated as previously described by Campbell and Shatz, JNeurosci 1992 (consider citing). Can the authors distinguish these scenarios from the ones that they conclude?

      We thank the reviewer for their thoughtful commentary and suggestions to improve our manuscript. We agree with the reviewer that our original interpretation of synaptic clustering by activity-dependent stabilization and punishment mechanisms is not directly supported by causal data. We have extensively revised the manuscript to take a more cautious view of the results and to discuss alternative mechanisms that are consistent with our data.

      During eye-specific circuit development, there is indeed increased synaptogenesis and, ultimately, RGC terminals are closely clustered within synaptic glomeruli. This process involves the selective addition and elimination of synapses. Bouton clustering has been shown to occur within individual RGC axons after eye-opening in the mouse (5). The convergence of other RGC types into clustered boutons has been shown at eye-opening by light and electron microscopy (3). There is also qualitative evidence that synaptic clusters may form earlier during eye-specific segregation in the cat (4). Our data provide additional evidence that synaptic clustering begins prior to eye-opening in the mouse (P2-P8). Although synapse numbers also increase during this period, the distribution of synapse addition is non-random. 

      Single-active zone synapses (we previously called these “simple”) may indeed mature into multi-active zone synapses (we previously called these “complex”). At the same time, single-active zone synapses may be eliminated. We believe that each of these events occurs as part of the synaptic refinement process. Our STORM images are static snapshots of eye-specific refinement, and we cannot infer the dynamic developmental trajectory of an individual synapse in our data. Future live imaging experiments in vivo/in situ will be needed to track the maturation and pruning of individual connections. We have expanded our discussion of these limitations and future directions in the manuscript.

      (2) The argument that "complex" synapses are the aggregate of "simple" synapses (Fig 2, S2) is not convincing.

      We agree with the reviewer’s concern about the ambiguous identity of complex synapses. To clarify the nature of multi-active zone synapses, we have performed RGC-specific dAPEX2 labeling to visualize retinogeniculate terminals by electron microscopy (EM). These experiments revealed the presence of synaptic terminals with multiple active zones. We have added images and text to the results section describing these findings. Our 2D EM images do not rule out the possibility that some multi-active zone synapses observed in STORM images are in fact clusters of individual RGC terminals. We have revised the text to provide a more accurate discussion of the nature of multi-active zone synapses.  

      (3) The authors use of the β2KO mice to assess changes in the organization of synaptic proteins in retinal terminals that have disrupted retinal waves. However, β2-nAChRs are also expressed in the dLGN and other areas of the brain and glutamatergic synapse development has been reported in the CNS independent of the disruption in retinal waves. This issue should be considered when interpreting the total reduced retinal synapse density in the dLGN of the mutant.

      We thank the reviewer for their suggestion to consider non-retinal effects of the germline deletion of the beta 2 subunit of the nicotinic acetylcholine receptor. Previously, Xu and colleagues reported the development of a conditional transgenic mouse model lacking β2-nAChR expression specifically in the retina (6). These retina-specific β2-nAChR mutant mice (Rx-β2cKO) have disrupted retinal wave properties and defects in eye-specific axonal segregation in binocular anterograde tracing experiments. This work suggests that the defects seen in germline β2-nAChR KO mice arise from defects in retinal wave activity rather than the loss of nicotinic receptors elsewhere in the brain. Additionally, the development of brainstem cholinergic inputs to the dLGN is delayed until the closure of the eye-specific segregation period (7), further suggesting a limited role for cholinergic transmission in the retinogeniculate refinement process.

      (4) Outside of a total synapse density difference between WT and β2KO mice, the changes in the spatial organization of synaptic proteins over development do not seem that different. In fact % simple synapses near complex synapses from the non-dominant eye in the mutant is not that different from WT at P8 (Fig 3C), an age when eye-specific segregation is very different between the genotypes. Can the authors explain this discrepancy?

      We thank the reviewer for their question concerning differences between synapse organization in WT versus β2KO mice. In the original presentation of Figure 3C at P4, the percentage of non-dominant eye single-AZ synapses near multi-AZ synapses increased at P4 in WT mice, but this did not occur in β2KO mice. This is consistent with our previous results showing that there is an increase in non-dominant eye synaptic density at this age, which does not occur in β2KO mice (1). At P8, this clustering effect is lost in WT as eye-specific segregation has taken place and non-dominant eye inputs have been eliminated. However, in β2KO mice, the overall synapse density is still low at this age. We interpret this result as a failure of synaptogenesis in the β2KO line, which leads to increased growth of individual RGC axons (8) and eye-specific overlap at P8 (9, 10). Evidence in support of this interpretation comes from live dynamic imaging studies of RGC axon branching in Xenopus and Zebrafish, showing that synapse formation stabilizes local axon branching and that disruptions of synapse formation or neurotransmission lead to enlarged axons (11-13).

      Our anatomical results do not provide a specific biological mechanism for the remaining clustering observed in the β2KO mice. We have revised our discussion of the fact that individual RGC axons may form multiple synaptic connections leading to clustering, which may be independent of changes in retinal wave properties in the β2KO mouse. We have also extensively revised the analysis and presentation of results in Figure 3 to directly compare synaptic clustering around both multi-AZ synapses and single-AZ synapses within the same imaging volumes.

      (5) The authors use nomenclature that has been previously used and associated with other aspects of retinogeniculate properties. For example, the phrases "simple" and "complex" synapses have been used to describe single boutons or aggregates of boutons from numerous retinal axons, whereas in this manuscript the phrases are used to describe vesicle clusters/release sites with no knowledge of whether they are from single or multiple boutons. Likewise, the use of the word "glomerulus" has been used in the context of the retinogeniculate synapse to refer to a specific pattern of bouton aggregates that involves inhibitory and neuromodulatory inputs. It is not clear how the release sites described by the authors fit in this picture. Finally the use of the word "punishment" is associated with a body of literature regarding the immune system and retinogeniculate refinement-which is not addressed in this study. This double use of the phrases can lead to confusion in the field and should be clarified by clear definitions of how they are used in the current study.

      We appreciate the reviewer’s concern that the terminology we used in the initial submission may cause confusion. We have revised the text throughout for clarity. “Simple” synapses are now referred to as “single-active zone synapses”. “Complex” synapses are now referred to as “multi-active zone synapses”. We have removed all text that previously referred to synaptic clusters in STORM images as glomeruli. We agree that we have not provided causal evidence for synaptic stabilization and punishment mechanisms, which would require additional molecular genetic studies. We have restructured the manuscript to remove these references and discuss our anatomical results impartially.  

      Reviewer #3 (Public Review):

      This manuscript is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label a number of active zones, and anti-Homer to identify postsynaptic densities. In their previous study, they compared the detailed synaptic structure across the development of synapses made with contra-projecting vs ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new analysis on the same data set in which they classify synapses into "complex" vs. "simple" and assess the number and spacing of these synapses. From these measurements, they make conclusions regarding the processes that lead to synapse competition/stabilization.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate eye of origin is also a plus.

      Weaknesses:

      The lack of details provided for the classification scheme as well as the interpretation of small effect sizes limit the interpretations that can be made based on these findings.

      We thank the reviewer for their reading of the manuscript and helpful comments to improve the work. We provide details on how single-active zone and multi-active zone synapses are classified in the methods section. We agree with the suggestion to be more careful in interpreting the results. We have extensively revised the manuscript to 1) include additional electron microscopy data demonstrating the presence of multi-active zone retinogeniculate synapses, 2) extend the synaptic clustering analysis to both single-active zone and multi-active zone synapses for comparison, and 3) improve the clarity and accuracy of the discussion throughout the manuscript.

      (1) The criteria to classify synapses as simple vs. complex is critical for all of the analysis in this study. Therefore this criteria for classification should be much more explicit and tested for robustness. As stated in the methods, it is based on the number of active zones which are designated by the number of Bassoon clusters associated with a Vglut2 cluster (line 697). A second part of the criteria is the size of the presynaptic terminal as assayed by "greater Vglut2 signal" (line 116). So how are these thresholds determined? For Bassoon clusters, is one voxel sufficient? Two? If it's one, how often do they see a Bassoon positive voxel with no Vglut2 cluster and therefore may represent "noise"? There is no distribution of Bassoon volumes that is provided that might be the basis for selecting this number of sites. Unfortunately, the images are not helpful. For example, does P8 WT in Figure 1B have 7 or 2? According to Figure 2C, it appears the numbers are closer to 2-4.

      The Vglut volume measurements also do not seem to provide a clear criterion. Figure 2 shows that the distributions of Vglut2 cluster volumes for complex and for simple synapses are significantly overlapping.

      The authors need to clarify the quantitative approach used for this classification strategy and test how sensitive the results of the study are to how robust this strategy is

      We thank the reviewer for their question concerning the STORM data analysis. Here we provide a brief overview of the complete analysis details, which are provided in the methods section.

      Our raw STORM data sets consisted of spectrally separate volumetric imaging channels of VGluT2, Bassoon, and Homer1 signals. For each of these channels, raw STORM data were processed by 1) application of the corresponding low-resolution conventional image of each physical section to the STORM data to filter artifacts in the STORM image which do not appear in the conventional image, 2) STORM images are then thresholded using a 2-factor Otsu threshold that removes low-intensity background noise while preserving all single-molecule localizations that correspond to genuine antibody labeling as well as non-specific antibody labeling in the tissue, 3) application of the MATLAB function “conncomp” to identify connected component voxel in 3D across the image stack. Clusters are only kept for further analysis steps if they are connected across at least 2 continuous physical sections (140 nm Z depth). 4) for every connected component (clusters corresponding to genuine antibody labeling and background labeling), we measure the volume and signal density (intensity/volume) for every cluster in the dataset, 5) a threshold is applied to retain clusters that have a higher volume and lower signal density. We exclude signals that have low-volume and high-density, which correspond to single antibody labels. This analysis retains larger clusters that correspond to synaptic objects and excludes non-specific antibody background. 

      The average size of WT synaptic Bassoon clusters ranges from 55 - 3532 voxels (0.00092~0.059 μm<sup>3</sup>), with a median size of 460 voxels (0.0077 μm<sup>3</sup>).

      The average size of WT synaptic VGluT2 clusters ranges from 50 -73752 voxels (0.00084~1.2 μm<sup>3</sup>), with a median size of 980 voxels (0.016 μm<sup>3</sup>).

      The average size of WT synaptic Homer1 clusters ranges from 63-7118 (0.0010~0.12 μm3), with a median size of 654 voxels (0.011 μm<sup>3</sup>).

      In practice, any Bassoon/VGluT2/Homer1 clusters with <10 voxels are immediately filtered at the Otsu thresholding step (2) above.

      The reviewer is correct that we often see Bassoon(+) clusters that are not associated with VGluT2, and these may reflect synapses of non-retinal origin or retinogeniculate synapses that lack VGluT2 expression. To identify retinogeniculate synapses containing VGluT2, we performed a synapse pairing analysis that measured the association between VGluT2 and Bassoon clusters after the synapse cluster filtering described above. We first measured the centroid-centroid distance from each VGluT2 cluster to the closest cluster in the Bassoon channel. We next quantified the signal intensity of the Bassoon channel within a 140 nm shell surrounding each VGluT2 cluster. A 2D histogram was plotted based on the measured centroid-centroid distances and opposing channel signal densities of each cluster. Paired clusters with closely positioned centroids and high intensities of apposed channel signal were identified using the OPTICS algorithm (14).

      In the original Figure 1B, the multi-active zone synapse in WT at P8 had two Bassoon clusters. To clarify this, we have revised the images in Figure 1 to include arrowheads that point to individual active zones. We have also revised Supplemental Figure 1 to show volumetric renderings of individual example synapses that help illustrate the 3D structure of these multi-active zone inputs. All details about synapse analysis and synapse pairing are provided in the methods section.

      (2) Effect sizes are quite small and all comparisons are made on medians of distributions. This leads to an n=3 biological replicates for all comparisons. Hence this small n may lead to significant results based on ANOVAS/t-tests, but the statistical power of these effects is quite weak. To accurately represent the variance in their data, the authors should show all three data points for each category (with a SD error bar when possible). They should also include the number of synapses in each category (e.g. the numerators in Figure 1D and the denominators for Figure 1E). For other figures, there are additional statistical questions described below.

      We thank the reviewer for their suggestion to improve the presentation of our results. We have added all three data points (individual biological replicates) to each figure plot when applicable. We have also included a supplemental table (Table S1) listing total eye-specific synapse numbers of each type (mAZ and sAZ) and AZ number for each biological replicate in both genotypes.

      (3) The authors need to add a caveat regarding their classification of synapses as "complex" vs. "simple" since this is a terminology that already exists in the field and it is not clear that these STORM images are measuring the same thing. For example, in EM studies, "complex" refers to multiple RGCs converging on the same single postsynaptic site. The authors here acknowledge that they cannot assign different AZs to different RGCs so this comparison is an assumption. In Figure 2 they argue this is a good assumption based on the finding that the Vglut column/active zone is constant and therefore each represents a single RGC. However, the authors should acknowledge that they are actually seeing quite different percentages than those in EM studies. For example, in Monavarfeshani et al, eLife 2018, there were no complex synapses found at P8. (Note this study also found many more complex vs. simple synapses in the adult - 70% vs. the 20% found in the current study - but this difference could be a developmental effect). In the future, the authors may want to take another data set in the adult dLGN to make a direct comparison based on numbers and see if their classification method for complex/simple maps onto the one that currently exists in the literature.

      We appreciate the reviewer’s comment that the use of the terms “complex” and “simple” may cause confusion. We have significantly revised the manuscript for clarity: 1) we now refer to “complex” synapses as “multi-active zone synapses” and “simple” synapses as “single-active zone synapses. 2) We have performed electron microscopy analysis of dAPEX2-labeled retinogeniculate projections to confirm the existence of large synaptic terminals with multiple active zones. 3) We have expanded our discussion of previous electron microscopy results describing a lack of axonal convergence at P8 (3). 4) We have added a discussion on how individual RGCs may form multiple synapses in close proximity within their axonal arbor, which would create a clustering effect.

      We agree that it will be informative to collect a STORM data set in the adult mouse dLGN and we look forward to working on this project to compare with EM results in the future.  

      (4) Figure 3 assays the relative distribution of simple vs. complex synapses. They found that a larger percentage of simple synapses were within 1.5 microns of complex synapses than you would expect by chance for both ipsi and contra projecting RGCs, and hence conclude that complex synapses are sites of synaptic clustering. In contrast, there was no clustering of ipsi-simple to contra-complex synapses and vice versa. The authors also argue that this clustering decreases between P4 and P8 for ipsi projecting RGCs.

      This analysis needs much more rigor before any conclusions can be drawn. First, the authors need to justify the 1.5-micron criteria for clustering and how robust their results are to variations in this distance. Second, these age effects need to be tested for statistical significance with an ANOVA (all the stats presented are pairwise comparisons to means expected by random distributions at each age). Finally, the authors should consider what n's to use here - is it still grouped by biological replicate? Why not use individual synapses across mice? If they do biological replicates, then they should again show error bars for each data point in their biological replicates. And they should include the number of synapses that went into these measurements in the caption.

      We appreciate the suggestion to improve the rigor of our analysis of synaptic clustering presented in Figure 3. We have revised our analysis to measure the degree of synapse clustering nearby both multi-AZ and single-AZ synapses after an equivalent randomization of single-AZ synapse positions in the volume. 

      We now present the revised results as a “clustering index” for both multi-AZ synapses and single-AZ synapses. This measurement was performed in several steps: 1) randomization of single-AZ positions within the imaging volume while holding multi-AZ centroid positions fixed, 2) independent measurements of the fraction of single-AZ synapses within the local shell (1.5 μm search radius) around multi-AZ and single-AZ synapses within the random distribution, 3) comparison of the result from (2) with the actual fractional measurements in the raw STORM data to compute a “clustering index” value. 4) Because the randomization is equivalent for both multi-AZ and single-AZ synapse measurements, the measured differences in the degree of clustering reflect a synapse type-specific effect.

      We have also updated Supplemental Figure 3 showing the results of varying the search radius from 1-4 μm for both contralateral- and ipsilateral-eye synapses. The results showed that a search radius of 1.5 μm resulted in the largest difference between the original synapse distribution and a randomized synapse distribution (shuffling of single-active zone synapse position while holding multi-active zone synapse position fixed).

      Finally, we have removed all statistical comparisons of single measurements (means or ratios) across ages from the manuscript. We focus our statistical analysis on paired data comparisons within individual biological replicates.

      For the analysis of synapse clustering, we grouped the data by biological replicates (N=3) to look for a global effect on synapse clustering. In the revised manuscript, we added data points for each replicate in the figure and included the number of synapses in Supplementary Table 1.

      (5) Line 211-212 - the authors conclude that the absence of clustered ipsi-simple synapses indicates a failure to stabilize (Figure 3). Yet, the link between this measurement and synapse stabilization is not clear. In particular, the conclusion that "isolated" synapses are the ones that will be eliminated seems to be countered by their finding in Figure 3D/E which shows that there is no difference in vesicle pool volume between near and far synapses. If isolated synapses are indeed the ones that fail to stabilize by P8, wouldn't you expect them to be weaker/have fewer vesicles? Also, it's hard to tell if there is an age-dependent effect since the data presented in Figures 3D/E are merged across ages.

      We thank the reviewer for their suggestion to clarify the results in Figure 3. Based on the measured eye-specific differences in vesicle pool size and organization, we also expected that synapses outside of clusters would show a reduced vesicle population. However, across all ages, we found no differences in the vesicle pool size of single-active zone synapses based on their proximity to multi-active zone synapses. Below, we show cumulative distributions of these results across all ages (P2/P4/P8) for WT mice CTB(+) data. Statistical tests (Kolmogorov-Smirnov tests) show no significant differences. P = 0.880, 0.767, 0.494 respectively. Separate 5/95% confidence interval calculations showed overlap between far and near populations at each age.

      Author response image 4.

      To clarify the presentation of the results, we have changed the text to state that the “vesicle pool size of sAZ synapses is independent of their distance to mAZ synapses”. We have removed references to stabilization and punishment from the results section of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Because none of the phenomena being measured can be expected to behave randomly (given what is already known about the system) and the sample size is small, I believe quantification of the data requires confidence intervals for effect sizes. Resolving the multi-bouton vs multi-active zone bouton with EM would also help.

      We thank the reviewer for their thorough reading of the manuscript and many helpful suggestions. We provide analysis with confidence intervals in a point-by-point response below. In the manuscript we revised our results and focused our statistical analyses on comparisons within the same biological replicate (paired effects). In addition, we have performed electron microscopy of RGC inputs to the dLGN at postnatal day 8 to demonstrate the presence of retinogeniculate synapses with multiple active zones.

      Figure 1:

      Please show data points in scatter bar plots and not just error bars.

      We have updated all plots to show data points for independent biological replicates.

      Please describe the image processing in more detail and provide an image in which the degree of off-target labeling can be evaluated.

      We have updated the description of the image processing in the methods sections. We have made all the code used in this analysis freely available on GitHub (https://github.com/SpeerLab). We have uploaded the raw STORM images of the full data set to the open-access Brain Imaging Library (16). These images can be accessed here: https://api.brainimagelibrary.org/web/view?bildid=ace-dud-lid (WTP2A data for example). All 18 datasets are currently searchable on the BIL by keyword “dLGN” or PI last name “Speer” and a DOI for the grouped dataset is pending.

      How does panel 1D get very small error bars with N = 3? Please provide scatter plots.

      We have updated panel 1D to show the means for each independent biological replicate.

      Line 129: over what volume is density measured? What are the n's? What is the magnitude (with confidence intervals) of increase?

      The volume we collected from each replicate was ~80μm*80μm*7μm (total volume ~44,800 μm3). N=3 biological replicates for each age, genotype, and tissue location. Because of concerns with the use of ANOVA for low sample numbers, we have removed a majority of the age-wise comparisons from the manuscript and instead focus on within-replicate paired data comparisons. Author response image 5 showa 5/95% confidence intervals for WT data (left panel) and β2KO data (right panel) is shown below:

      Author response image 5.

      The 5/95% CI range for the increase in synapse density from P2 to P8 for CTB(+) synapses is ~ -0.001 ~ 0.037 synapses / μm<sup>3</sup>.

      Line 131: You say that non-dominant increases and then decreases. It appears that the error bars argue that you do not have enough information to reliably determine how much or little density changes.

      Line 140: No confidence intervals. It appears the error bars allow both for the claimed effect of increased fraction and the opposite effect of decreased density.

      Because of concerns with the use of ANOVA for low sample numbers, we have removed age-wise comparisons of single-measurements (means and ratios) from the manuscript and instead focus on within-replicate paired data comparisons.

      Line 144: Confidence intervals would be a reasonable way to argue that fraction is not changed in KO: normal fraction XX%-XX%. KO fraction XX%-XX%.

      Author response image 6 shows panels for WT (left) and β2KO mice (right) with 5/95% CIs.

      Author response image 6.

      In the revised manuscript, we have updated the text to report the measurements, but we do not draw conclusions about changes over development.

      I find it hard to estimate magnitudes on a log scale.

      We appreciate the reviewer’s concern with the presentation of results on a log scale. Because the measured synapse properties are distributed logarithmically, we have elected to present the data on a log scale so that the distribution(s) can be seen clearly. Lognormal distributions enable us to use a mixed linear model for statistical analysis.

      Line 156: Needs confidence interval for difference.

      Line 158: Needs confidence interval for difference of differences.

      Line 160: Needs confidence interval for difference of differences.

      Why only compare at P4 where there is the biggest difference? The activity hypothesis would predict an even bigger effect at P8.

      Below is a table listing the mean volume (log10μm3) and [5/95%] confidence intervals for comparisons of VGluT2 signal between CTB(+) and CTB(-) synapses from Figure 2A and 2B:

      Author response table 2.

      Based on the values given above, the mean difference of differences and [5/95%] confidence intervals are listed below:

      Author response table 3.

      We added these values to the manuscript. We have also reported the difference in median values on a linear scale (as below) so that the readers can have a straightforward understanding of the magnitude.

      Author response table 4.

      We elected to highlight the results at P4 based on our previous finding that the synapse density from each eye-of-origin is similar at this time point (1).

      At P8, there is a decrease in the magnitude of the difference between CTB(+)/CTB(-) synapses compared to P4. This may be due to an increase in VGluT2 volume within non-dominant eye synapses that survive competition between P4-P8.

      At P8 in the mutant, there is an increase in the magnitude of the difference between CTB(+)/CTB(-) synapses compared to P4. This may be due to delayed synaptic maturation in β2KO mice.

      Line 171: The correct statistical comparison was not performed for the claim. Lack of * at P2 does not mean they are the same. Why do you get the same result for KO?

      We have revised the statistical analysis, figure presentation, and text to remove discussion of changes in the number of active zones per synapse over development based on ANOVA. We now report eye-specific differences at each time point using paired T-test analysis, which is mathematically equivalent to comparing the 5/95% confidence interval in the difference.

      Line 175: Qualitative claim. Correlation coefficients and magnitudes of correlation coefficients are not reported.

      Linear fitting slop and R square values are attached:

      Author response table 5.

      The values are added to the manuscript to support the conclusions.

      Line 177: n.s. does not mean that you have demonstrated the values are the same. An argument for similarity could be made by calculating a confidence interval a for potential range of differences. Example: Complex were 60%-170% of Simple.

      Author response image 7 with 5/95% CI is shown below (WT and B2KO):

      Author response image 7.

      Comparing the difference between multi-AZ synapse and single-AZ synapse revealed that the difference in average VGluT2 cluster volume per AZ is:

      Author response table 6.

      The values are added to the manuscript for discussion.

      Line 178: There is no reason to think that the vesical pool for a single bouton does not scale with active zone number within the range of uncertainty presented here.

      We have collected EM images of multi-AZ zone synapses and modified our discussion and conclusions in the revised text.

      Line 196: "non-random clustering increased progressively" is misleading. The density of the boutons increases for both the Original and Randomized. Given the increase in variance at P8, it is unlikely that the data supports the claim that the non-randomness increased. Would be easy to quantify with confidence intervals for a measure of specificity (O/R).

      We have revised the manuscript to remove analysis and discussion of changes in clustering over development. We have modified this section of the manuscript and figures to present a normalized clustering index that describes the non-random clustering effect present at each time point.

      Line 209: Evidence is for correlation, not causation and there is a trivial potential explanation for correlation.

      We appreciate the reviewer’s concern with over interpretation of the results. We have changed the text to more accurately reflect the data.

      Line 238:239: Authors failed to show effect is activity-dependent. Near/Far distinction is not necessarily a criterion for the effect of activity. The claim is likely false in other systems.

      We agree with the reviewer that the original text overinterpreted the results. We have changed the text to more accurately reflect the data. 

      Line 265-266: Assumes previous result is correct and measure of vGlut2 provides information about all presynaptic protein organization.

      We thank the reviewer for pointing out the incorrect reference to all presynaptic protein organization. We have corrected the text to reference only the VGluT2 and Bassoon signals that were measured.

      Line 276: There are many other interpretations that include trivial causes. It is unclear what the measure indicates about the biology and there is no interpretable magnitude of effect.

      We agree with the reviewer that the original text overinterpreted the results. We have changed the text to remove references to mechanisms of synaptic stabilization.

      Line 289: Differences cannot be demonstrated by comparing P-values. Try comparing confidence intervals for effect size or generate a confidence interval for the difference between the two groups.

      5/95% confidence intervals are given below for Figure 4C/D:

      Author response table 7.

      We have added these values to the manuscript to support our conclusion.

      Line 305: "This suggests that complex synapses from the non-dominant-eye do not exert a punishment effect on synapses from the dominant-eye" Even if all the other assumptions in this claim were true, "n.s." just means you don't know something. It cannot be compared with an asterisk to claim a lack of effect.

      We thank the reviewer for raising this concern. We have modified the text to remove references to synaptic punishment mechanisms in the results section.

      Below are the 5/95% confidence intervals for the results in Figure 4F:

      Author response table 8.

      We have added these values to the manuscript to support our conclusion.

      Line 308: "mechanisms that act locally". 6 microns is introduced based on differences in curves above(?). I don't see any analysis that would argue that longer-distance effects were not present.

      The original reference referred to the differences in the cumulative distribution measurements between multi-active zone synapses versus single-active zone synapses in their distance to the nearest neighboring multi-active zone synapse. For clarity, we have deleted the reference to the 6 micron distance in the revised text.

      Reviewer #2 (Recommendations For The Authors):

      (1) This data set would be valuable to the community. However, unless the authors can show experiments that manipulate the presence of complex synapses to test their concluding claims, the manuscript should be rewritten with a reassessment of the conclusions that is more grounded in the data.

      We thank the reviewer for their careful reading of the manuscript and we agree the original interpretations were not causally supported by the experimental results. We have made substantial changes to the text throughout the introduction, results, and discussion sections so that the conclusions accurately reflect the data.

      (2) To convincingly address the claim that "complex synapse" are aggregates of simple synapses, the authors should perform experiments at the EM level showing what the bouton correlates are to these synapses.

      We thank the reviewer for their suggestion to perform EM to gain a better understanding of retinogeniculate terminal structure. We generated an RGC-specific transgenic line expressing the EM reporter dAPEX2 localized to mitochondria. We have collected EM images of retinogeniculate terminals that demonstrate the presence of multiple active zones within individual synapses. These results are now presented in Figure 1. The text has been updated to reflect the new results.

      (3) Experiments using the conditional β2KO mice would help address questions of the contribution of β2-nAChRs in dLGN to the synaptic phenotype.

      We appreciate the reviewer’s concern that the germline β2KO model may show effects that are not retina-specific. To address this, Xu and colleagues generated a retina-specific conditional β2KO transgenic and characterized wave properties and defective eye-specific segregation at the level of bulk axonal tracing (6). The results from the conditional mutant study suggest that the main effects on eye-specific axon refinement in the germline β2KO model are likely of retinal origin through impacts on retinal wave activity. Additionally, anatomical data shows that brainstem cholinergic axons innervate the dLGN toward the second half of eye-specific segregation and are not fully mature at P8 when eye-specific refinement is largely complete (7). We agree with the reviewer that future synaptic studies of previously published wave mutants, including the conditional reporter line, would be needed to conclusively assess a contribution of non-retinal nAChRs. These experiments will take significant time and resources and we respectfully suggest this is beyond the scope of the current manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors need to be more transparent that they are using the same data set from the previous publication (right now it does not appear until line 471) and clarify what was found in that study vs what is being tested here.

      We thank the reviewer for their thoughtful reading of the manuscript and helpful recommendations to improve the clarity of the work. We have edited the text to make it clear that this study is a reanalysis of an existing data set. We have revised the text to discuss the results from our previous study and more clearly define how the current analysis builds upon that initial work. 

      (2) The authors restricted their competition argument in Figure 4 to complex synapses, but why not include the simple ones? This seems like a straightforward analysis to do.

      We appreciate the reviewer’s suggestion to measure spatial relationships between “clustered” and “isolated” single-AZ synapses as we have done for multi-AZ synapses in Figure 4. However, we are not able to perform a direct and interpretable comparison with the results shown for multi-AZ synapses. First, we would need to classify “clustered” and “isolated” single-AZ synapses. This classification convolves two effects: 1) a distance threshold to define clustering and 2) subsequent distance measurements between clustered synapses.

      If we apply an equivalent 1.5 μm distance threshold (or any other threshold) to define clustered synapses, the distance from each “clustered” single-AZ synapse to the nearest other single-AZ synapse will always be smaller than the defined threshold (1.5 μm). Alternatively, if all of the single-AZ synapses within each local 1.5 μm shell are excluded from the subsequent intersynaptic distance measurements, this will set a hard lower boundary on the distance between synaptic clusters (1.5 μm minimum). The two effects discussed above were separated in our original analysis of multi-AZ synapses defined as “clustered” and “isolated” based on their relationship to single-AZ synapses, but these effects cannot be separated when analyzing single-AZ distributions alone.

      (3) The Discussion seems much too long and speculative from the current data that is represented - particularly without verification of complex synapses actually being inputs from different RGCs. Along the same lines, figure captions are misleading. For example, for Figure 4 - the title indicates that the complex synapses are driving the rearrangements. But of course, these are static images. The authors should use titles that are more reflective of their findings rather than this interpretation.

      We thank the reviewer for these helpful suggestions. We have changed each of the figure captions to more accurately reflect the results. We have deleted all of the speculative discussion and revised the remaining text to improve the accuracy of the presentation.

      (4) In the future, the authors may want to consider an analysis as to whether ipsi and contra projection contribute to the same synapses

      We agree with the reviewer that it is of interest to investigate the contribution of binocular inputs to retinogeniculate synaptic clusters during development. At maturity, some weak binocular input remains in the dominant-eye territory (15). To look for evidence of binocular synaptic interactions, we measured the percentage of the total small single-active zone synapses that were within 1.5 micrometers of larger multi-active zone synapses of the opposite eye. On average, ~10% or less of the single-active zone synapses were near multi-active zone synapses of the opposite eye. This analysis is presented in Supplemental Figure S3C/D.

      It is possible that some large mAZ synapses might reflect the convergence of two or more smaller inputs from the two eyes. Our current analyses do not rule this out. However, previous EM studies have found limited evidence for convergence of multiple RGCs (3) at P8 and our own EM images show that larger terminals with multiple active zones are formed by a single RGC bouton. Future volumetric EM reconstructions with eye-specific labels will be informative to address this question.

      References

      (1) Zhang C, Yadav S, Speer CM. The synaptic basis of activity-dependent eye-specific competition. Cell Rep. 2023;42(2):112085.

      (2) Bickford ME, Slusarczyk A, Dilger EK, Krahe TE, Kucuk C, Guido W. Synaptic development of the mouse dorsal lateral geniculate nucleus. J Comp Neurol. 2010;518(5):622-35.

      (3)Monavarfeshani A, Stanton G, Van Name J, Su K, Mills WA, 3rd, Swilling K, et al. LRRTM1 underlies synaptic convergence in visual thalamus. Elife. 2018;7.

      (4) Campbell G, Shatz CJ. Synapses formed by identified retinogeniculate axons during the segregation of eye input. J Neurosci. 1992;12(5):1847-58.

      (5) Hong YK, Park S, Litvina EY, Morales J, Sanes JR, Chen C. Refinement of the retinogeniculate synapse by bouton clustering. Neuron. 2014;84(2):332-9.

      (6) Xu HP, Burbridge TJ, Chen MG, Ge X, Zhang Y, Zhou ZJ, et al. Spatial pattern of spontaneous retinal waves instructs retinotopic map refinement more than activity frequency. Dev Neurobiol. 2015;75(6):621-40.

      (7) Sokhadze G, Seabrook TA, Guido W. The absence of retinal input disrupts the development of cholinergic brainstem projections in the mouse dorsal lateral geniculate nucleus. Neural Dev. 2018;13(1):27.

      (8) Dhande OS, Hua EW, Guh E, Yeh J, Bhatt S, Zhang Y, et al. Development of single retinofugal axon arbors in normal and beta2 knock-out mice. J Neurosci. 2011;31(9):3384-99.

      (9) Rossi FM, Pizzorusso T, Porciatti V, Marubio LM, Maffei L, Changeux JP. Requirement of the nicotinic acetylcholine receptor beta 2 subunit for the anatomical and functional development of the visual system. Proc Natl Acad Sci U S A. 2001;98(11):6453-8.

      (10) Muir-Robinson G, Hwang BJ, Feller MB. Retinogeniculate axons undergo eye-specific segregation in the absence of eye-specific layers. J Neurosci. 2002;22(13):5259-64.

      (11) Fredj NB, Hammond S, Otsuna H, Chien C-B, Burrone J, Meyer MP. Synaptic Activity and Activity-Dependent Competition Regulates Axon Arbor Maturation, Growth Arrest, and Territory in the Retinotectal Projection. J Neurosci. 2010;30(32):10939.

      (12) Hua JY, Smear MC, Baier H, Smith SJ. Regulation of axon growth in vivo by activity-based competition. Nature. 2005;434(7036):1022-6.

      (13) Rahman TN, Munz M, Kutsarova E, Bilash OM, Ruthazer ES. Stentian structural plasticity in the developing visual system. Proc Natl Acad Sci U S A. 2020;117(20):10636-8.

      (14) Ankerst M, Breunig MM, Kriegel H-P, Sander J. OPTICS: ordering points to identify the clustering structure. SIGMOD Rec. 1999;28(2):49–60.

      (15) Bauer J, Weiler S, Fernholz MHP, Laubender D, Scheuss V, Hübener M, et al. Limited functional convergence of eye-specific inputs in the retinogeniculate pathway of the mouse. Neuron. 2021;109(15):2457-68.e12.

      (16) Benninger K, Hood G, Simmel D, Tuite L, Wetzel A, Ropelewski A, et al. Cyberinfrastructure of a Multi-Petabyte Microscopy Resource for Neuroscience Research.  Practice and Experience in Advanced Research Computing; Portland, OR, USA: Association for Computing Machinery; 2020. p. 1–7.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the reviewers for their overall careful evaluation of our work, the constructive criticism, and their many helpful suggestions. We feel that our revision built on the strengths identified by the reviewers, and addressed all the concerns they have raised. Both reviewers recognize that our revisions have improved the paper.  Since the first submission we have:

      • Rewritten large parts of the papers to improve clarity and make it more concise where possible

      • Simulated an alternative working memory model, as recommended by Reviewer 1

      • Included 4 new/revised supplementary figures, following the reviewer’s suggestions for additional analysis.

      Below we provide a brief response to the Reviewers’ comments on our manuscript revision.

      Reviewer #1: Public Review:

      Strengths:

      Overall, the work offers a very interesting approach of a topic which is hard to accomplish experimentally --therefore the computational take is entirely justified and extremely useful. The authors carefully designed the computational experiments to shed light into the demyelination effects on working memory from multiple levels of description, increasing the reliability of their conclusions. I think this work provides now convincing evidence and has the potential to be influential in future studies of myelin alterations (and related disorders such as multiple sclerosis).

      Weaknesses:

      In its current form, the authors have improved the clarity of the results and the model details, and have provided a new set of simulations to complement and reinforce the original ones (including the development of a new spatial working memory model based on silent working memory principles). I do not appreciate any significant weaknesses at this point.

      We thank the reviewer for these positive comments on our revision and for the suggestion of adding the silent memory model, as we feel this has strengthened our findings.

      Reviewer #2: Public Review:

      This paper analyzes the effect of axon de-myelination and re-myelination on action potential speed, and propagation failure. Next, the findings are then incorporated in a standard spiking ring attractor model of working memory.

      I think the results are not very surprising or solid and there are issues with method and presentation.

      The authors did many simulations with random parameters, then averaged the result, and found for instance that the Conduction Velocity drops in demyelination. It gives the reader little insight into what is really going on. My personal preference is for a well understood simple model rather than a poorly understood complex model. The link between the model outcome of WM and data remains qualitative and is further weakened by the existence of known other age-related effects in PFC circuits.

      Comments on revised version:

      The paper has improved in the revision, although I still think a reduced model would have been nice.

      As noted above, in addition to our spiking bump attractor model, our revision includes a second network-level model:  an activity-silent working memory model for continuous features.  We found qualitatively similar effects as in our bump attractor network model, showing that our main conclusions do not critically depend on the exact working memory mechanism (active vs. activity-silent).  This new model was described in two new supplementary figures and a new paragraph in the Results section.

      We did not add a reduced model in our revision to this paper, since neither reviewer explicitly recommended that we add one.  As we noted in our private response to reviewers that accompanied our revision: we share the view that understanding simple models can provide critical insights into brain function (and we believe that many of our papers related to attractor dynamics in working memory and decision-making fall into this category, e.g. Wimmer et al. 2014, Esnaola-Acebes et al. 2022, Ibañez et al 2020). We disagree with the reviewer on an important point: we feel that the model complexity that we have chosen is appropriate and necessary to study the phenomenon at hand. Our modeling efforts are principled, with complexity added as necessary. We started with a biophysical single neuron model with firing dynamics fit to empirical data in pyramidal neurons of rhesus monkey dlPFC (Rumbell et al. 2016) – the same type of neurons and cortical region analyzed in the Peters et al. work on structural changes to myelin seen during aging (e.g., Figure 1).  Because simple models do not accurately capture the CV along thin axons like those in the PFC, we attached a multicompartment axon with detailed myelinated segments, and constructed a cohort of feasible models. We then used this cohort to get quantitative estimates of the effects of variable degrees of demyelination and remyelination. This would not be possible with a simpler model. We then study the consequences of de- and re-myelination in a spiking neural network model. Again, we could not use a simpler model (e.g. a firing rate attractor model) without making gross assumptions about how demyelination affects circuit function. In sum, we believe that our models are relatively simple but comprehensive given the phenomenon that we are studying.

      The reviewer is correct in that there exist “known other age-related effects in PFC circuits”. These are reviewed in the introduction and we discuss future extensions of our model that would incorporate those effects as well. It is important to note that this is the first comprehensive study of demyelination effects in aging PFC, demonstrating that myelin changes alone predict working memory changes associated with aging.

      While we agree that averaging results about different parameter sets provide a limited understanding of the system, we persist in our belief that such analyses provide an important baseline.  We acknowledge that results vary across our model cohort; this is why we included the heatmaps of our single cell model perturbation results (Figure 3 and Supplementary Figure 3), and simulated network models representing a heterogeneity of neuronal axons with healthy and altered myelin sheaths in different degrees, as likely occurs in the aging brain (Figures 7 and 8).  The model framework we present here is well-suited for more targeted analyses and better insights, including those which we are pursuing currently.


      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful evaluation of our work, the constructive criticism, and their many helpful suggestions. We feel that our revision builds on the strengths identified by the reviewers, and addresses all the concerns they have raised. We have:

      • Rewritten large parts of the papers to improve clarity and make it more concise where possible

      • Simulated an alternative working memory model

      • Included 4 new/revised supplementary figures, following the reviewer’s suggestions for additional analysis

      Reviewer #1 (Public Review):

      Summary:

      The authors study the effects of myelin alterations in working memory via the complementary use of two computational approaches: one based on the de- and re-myelination in multicompartmental models of pyramidal neurons, and one based on synaptic changes in a spiking bump attractor model for spatial working memory. The first model provides the most precise angle (biophysically speaking) of the different effects (loss of myelin lamella or segments, remyelination with thinner and shorter nodes, etc), while the second model allows to infer the consequences of myelin alterations in working memory performance, including memory stability, duration, and bump diffusion. The results indicate (i) a slowing down and failure of propagation of spikes with demyelination and partial recovery with remyelination, with detailed predictions on the role of nodes and myelina lamella, and (ii) a decrease in memory duration and an increase in memory drift as a function of the demyelination, in agreement with multiple experimental studies.

      Strengths:

      Overall, the work offers a very interesting approach of a topic which is hard to accomplish experimentally --therefore the computational take is entirely justified and extremely useful. The authors carefully designed the computational experiments to shed light into the demyelination effects on working memory from multiple levels of description, increasing the reliability of their conclusions. I think this work is solid and has the potential to be influential in future studies of myelin alterations (and related disorders such as multiple sclerosis).

      We thank the reviewer for these positive comments on our manuscript.

      Weaknesses:

      In its current form, the study still presents several issues which prevent it from achieving a higher potential impact. These can be summarized in two main items. First, the manuscript is missing some important details about how demyelination and remyelination are incorporated in both models (and what is the connection between both implementations). For example, it is unclear whether an unperturbed axon and a fully remyelinated axon would be mathematically equivalent in the multicompartment model, or how the changes in the number of nodes, myelin lamella, etc, are implemented in the spiking neural network model.

      We thank the reviewer for these suggestions to improve the clarity of our manuscript. A ‘fully remyelinated’ axon is not mathematically equivalent to the unperturbed axon: it has shorter and thinner myelinated segments, and additional nodes in between. This is consistent with empirical observations in rhesus monkey dlPFC, as reviewed in Peters et al. (2009): a 90% increase in paranode profiles, and myelin sheaths that were thinner than expected for the size of the enclosed axon. With no empirical observations of fewer numbers of nodes (but rather, the opposite) or bare sections of axon, we assumed that the remyelination process also creates new nodes (which are identical to existing nodes), as also modeled in Scurfield & Latimer (2018). We have added two new sentences to the results to clarify this fact, before presenting the first set of results for the single cell model: (starting at line 137):

      “To simulate demyelination, we removed lamellae from selected myelinated segments; for remyelination we replaced a fraction of myelinated segments by two shorter and thinner segments with a node in between. As such, a ‘fully remyelinated axon’ had all the demyelinated segments subsequently remyelinated, but with fewer lamellae and additional nodes compared to the unperturbed control case, consistent with empirical observations (Peters, 2009).”

      We also state the maximal amount of remyelination more explicitly in the Results, starting on lines 164-165: "We next examined the extent to which remyelination with shorter and thinner segments, occurring after demyelination, restored axonal AP propagation (Figure 4).”

      Also on line 192-193: “Remyelinating all affected segments with 75% of lamellae (the maximal amount of remyelination) nearly eliminated AP failures (1.8 ± 1.1%).”

      Finally, in Methods we also clarified the structure of the added node (starting at line 634): “Remyelination was performed by replacing an affected (previously demyelinated) segment with two shorter segments, each including paranodes, juxtaparanodes, and an internode, and a new node between them that was identical to existing nodes.”

      We have also provided further details describing how myelin dystrophy was simulated in the network model in Results (lines 243 - 249) and in Methods (lines 722 - 747). How myelin alterations have been implemented in the network model is one of the questions of the reviewer (Question 5 in Reviewer #1: Recommendations for the Authors_)._ We have addressed this question by describing in detail how we adjusted CV and AP failure rate to the values produced by the multicompartment neuron model. Please see our answer to Question 5 for the details.

      Second, it is unclear whether some of the conclusions are strong computational predictions or just a consequence of the model chosen. For example, the lack of effect of decreasing the conduction velocity on working memory performance could be due to the choice of considering a certain type of working memory model (continuous attractor), and therefore be absent under other valid assumptions (i.e. a silent working memory model, which has a higher dependence on temporal synaptic dynamics).

      Whether some conclusions are strong predictions or just a consequence of the model chosen is an important concern and indeed a general problem of computational modeling of working memory. For example, Stein et al. (Stein et al. Towards biologically constrained attractor models of schizophrenia, Curr. Opin. Neurobiol. 2021) showed that opposed manipulations of E/I ratio can produce the same behavioral pattern in different alternative, plausible biological network models. As long as we do not fully understand the neural mechanisms underlying working memory, modeling studies of how alterations (e.g. in E/I ratio or in the reliability and timing of axonal transmission, as we did here) affect circuit function need to be interpreted critically and tested against new experimental data.

      One way to strengthen model predictions is by showing that different computational models make similar predictions. To do this, we implemented an activity-silent working memory model for continuous features, as suggested by the reviewer, and we found qualitatively similar effects as in our bump attractor network model. Thus, our main conclusions do not critically depend on the exact working memory mechanism (active vs. activity-silent).

      In the revised manuscript, we have added two new supplementary figures (Supplementary Figure 8 and 9, see the next page) and a new paragraph in the Results section about activity silent working memory (starting at line 319):

      “Alternative working memory mechanisms. Working memory in our neural network is maintained in an attractor state with persistent neural activity (Compte et al., 2000; Hansel and Mato, 2013). Other mechanisms have been proposed, including that working memory maintenance may rely on activity-silent memory traces (Mongillo et al., 2008; Stokes, 2015; Barbosa et al., 2020). In activity-silent models, a slowly decaying transient of synaptic efficacy preserves information without the need for persistent ongoing activity. We implemented an activity-silent model, to our knowledge the first one for continuous spatial locations, and tested how working memory performance is affected by AP failures and propagation delays. We found that AP failures corresponding to demyelination caused working memory errors qualitatively similar to the delay-active network (Supplementary Figure 8). On the other hand, increasing propagation delays did not lead to additional working memory errors, unless we include unrealistically high values (uniform distribution in the range of 0 to 100 ms; Supplementary Figure 9). These results are qualitatively similar to the delay active network model. Thus, our main findings do not critically depend on the exact working memory mechanism (active vs. activity-silent).”

      Author response image 1.

      Action potential failures impair working memory performance in a network model with activity-silent memory traces. (A) Spiking and synaptic activity in an unperturbed, activity-silent working memory model. Top: Raster plot showing the activity for each excitatory neuron (labeled by its preferred direction) in a single trial with a cue stimulus presented at 180°. We modified our spiking neural network model such that it does not show elevated persistent firing throughout the delay period (see Figure 5B for comparison). In particular, we reduced the external background input to excitatory neurons by a factor of 3.61% and we increased the cue stimulus amplitude by 12.5%. Even though spiking activity decays to baseline (close to 0 Hz), a memory trace is imprinted in enhanced synaptic strength due to short-term synaptic facilitation (Mongillo et al., 2008). Selective spiking activity is recovered by a non-selective constant input applied during 300 ms to all excitatory neurons during the two reactivation periods (marked by yellow and green rectangles in the raster plot). The amplitude of the input was 11 mV during the first and 13 mV during the second reactivation period. Reactivation periods are marked in light gray shading in the remaining panels below and the cue period is indicated by dark gray shading. Firing rates (second row), synaptic facilitation variable u (third row), and synaptic depression variable x (bottom row) for the same trial, averaged for 500 neurons around the neuron with 180° as preferred direction (solid lines) and around the neuron with 0° as preferred direction (dashed lines). Note that reactivation recovers the activity bump (C) but also causes elevated firing and subsequent enhancement of synapses at all positions in the networks. (B) Activity in a network with demyelination of 50% of the myelinated segments by removing 60% of the myelin lamellae. AP failures lead to reduced firing rates in the cue and early delay periods and consequently to weaker synaptic enhancement. (C) Average spike counts of the excitatory neurons during the cue period (black lines), and the two reactivation periods indicated in the raster plots in A and B (yellow and green lines). Solid lines correspond to the control network and dashed lines to the perturbed network. (D) Memory strength as a function of time for the control and perturbed networks. (E-F) Trajectories of the bump center (i.e., remembered cue location) read out from the neural activity across the cue and delay periods using a population vector (see Methods). Cue position was 180° in all trials. The perturbed network (F) shows larger working memory errors towards the end of the delay period compared to the control network (E).

      Author response image 2.

      Effect of propagation delays on control and perturbed activity-silent network models. (A) Memory strength during the whole simulation time for the young, control networks relying on activity-silent working memory (Supplementary Figure 8) with zero propagation delays (blue line), and with propagation delays from a uniform distribution with a range between 0 and 40 ms (yellow line) and between 0 and 100 ms (orange line). (B) Memory strength for perturbed networks when demyelinating 25% of the myelinated segments by removing 50% of the myelin lamellae, without delays (red line), and with uniformly distributed delays between 0 and 40 ms (light gray line) and between 0 and 100 ms (black line). The cue period is indicated by dark gray shading and reactivation periods are marked in light gray. Memory strength was calculated by averaging across 280 trials for one network. Shaded areas indicate SEM for each case. For the young, control networks (A), working memory was not affected by including delays of up to 40 ms. Unrealistically long delays ranging up to 100 ms did cause an impairment (the longest delays found for the most extreme perturbation condition – demyelination of 75% of the segments by removing 100% of the myelin lamellae – were of 49.9 ms on average). When also incorporating AP failures to the networks (B), we observed a similar trend. For this perturbation condition, delays of up to 40 ms were already much larger than the delays quantified in the single neuron model (for the case of 25% of the segments demyelinated by removing 50% of the myelin lamellae, the average delay in the cohort was 3.75 ms).

      With additional simulations to address these issues, I consider that the present study would become a convincing milestone in the computational modeling of myelin-related models, and an important study in the field of working memory.

      Again, we would like to thank the reviewer for the positive comments. We have addressed all the main issues raised (see below our response to the “recommendations for the authors”).

      Reviewer #2 (Public Review):

      This paper analyzes the effect of axon de-myelination and re-myelination on action potential speed, and propagation failure. Next, the findings are then incorporated in a standard spiking ring attractor model of working memory.

      I think the results are not very surprising or solid and there are issues with method and presentation.

      The authors did many simulations with random parameters, then averaged the result, and found for instance that the Conduction Velocity drops in demyelination. It gives the reader little insight into what is really going on. My personal preference is for a well understood simple model rather than a poorly understood complex model. The link between the model outcome of WM and data remains qualitative, and is further weakened by the existence of known other age-related effects in PFC circuits.

      We thank the reviewer for the critical assessment of our work. We share the view that understanding simple models can provide critical insights into brain function (and we believe that many of our papers related to attractor dynamics in working memory and decision making fall into this category, e.g. Wimmer et al. 2014, Esnaola-Acebes et al. 2022, Ibañez et al 2020). However, we respectfully disagree with the reviewer on an important point: the model complexity that we have chosen is appropriate and necessary to study the phenomenon at hand. Our modeling efforts are principled, with complexity added as necessary. We started with a biophysical single neuron model with firing dynamics fit to empirical data in pyramidal neurons of rhesus monkey dlPFC (Rumbell et al. 2016) – the same type of neurons and cortical region analyzed in the Peters et al. work on structural changes to myelin seen during aging (e.g., Figure 1). Because simple models do not accurately capture the CV along thin axons like those in the PFC, we attached a multicompartment axon with detailed myelinated segments, and constructed a cohort of feasible models. We then used this cohort to get quantitative estimates of the effects of variable degrees of demyelination and remyelination. This would not be possible with a simpler model. We then study the consequences of de- and re-myelination in a spiking neural network model. Again, we could not use a simpler model (e.g. a firing rate attractor model) without making gross assumptions about how demyelination affects circuit function. In sum, we believe that our models are relatively simple but comprehensive given the phenomenon that we are studying.

      The reviewer is correct in that there exist “known other age-related effects in PFC circuits”. These are reviewed in the introduction and we discuss future extensions of our model that would incorporate those effects as well. It is important to note that this is the first comprehensive study of demyelination effects in aging PFC, demonstrating that myelin changes alone predict working memory changes associated with aging.

      The specific issues about modeling choices and interpretation of the results are discussed below.

      Both for the de/re myelination the spatial patterns are fully random. Why is this justified?

      We agree that myelin dystrophy during aging could be non-random, that is, localized to certain regions of an axon. Our collaborators (Drs Jennifer Luebke, Maya Medalla, and Patrick Hof) are currently addressing this question using 3D electron microscopy and immunohistochemistry on axons of individual neurons and their associated myelin, but results are not available yet. Early on in this study we examined how the location of myelin alterations affected AP propagation. Focusing demyelination along a section of axon led to more AP slowing and failure than when spatially randomized. Likewise, remyelination of such spatially localized dystrophy led to greater recovery, as there were fewer transitions between long and short internodes (Supplemental Figure 4). Since otherwise the effects in the localized cases were largely similar to those in the spatially random case (see Author response image 3 below), for brevity in this paper we assumed myelin alterations were randomly distributed. Our next paper, extending this study to collateralized axons and which was presented as a poster at the 2023 Society for Neuroscience meeting, will include an examination of localized myelin dystrophy.

      Author response image 3.

      Effect of localized myelin alterations on CV change. Myelin alterations were either focused on the third of myelinated segments closest to the initial segment (‘proximally clustered’), the third of myelinated segments furthest from the initial segment (‘distally clustered’), or distributed according to a uniform distribution as in the current study. For demyelination, all lamellae were removed from 25% of myelinated segments (showing mean +/- SEM of all 50 cohort models, 30 randomized trials each). For remyelination, affected segments were replaced by two shorter segments with 75% of the original lamellae thickness and a node in between.

      We have added two sentences in Methods to justify this assumption more clearly (line 510): “Evidence suggests that aging affects oligodendrocytes in several ways, including the ability for oligodendrocyte precursor cells to mature (Dimovasili et al., 2022). Knowing that individual oligodendrocytes myelinate axons of many different neurons, but without data quantifying how oligodendrocyte dystrophy affects myelination in individual axons, we assumed that myelin alterations were randomly distributed.”

      We have also added a sentence in the Discussion alluding to our upcoming study (line 434): “Our model can also be extended to explore interactions between spatially localized myelin perturbations (such as those seen in multiple sclerosis) and axon collateralization (Sengupta et al., 2023), which would affect the distance-dependence of AP failures.”

      Similarly, to model the myelin parameters were drawn from uniform distributions, Table 1 (I guess). Again, why is this reasonable?

      The reviewer is correct that our initial Latin hypercube sample generated a uniform distribution. However, parameters of the random sample of models selected as biologically feasible were not uniformly distributed. We have added a new figure (Supplementary Figure 1A) to illustrate the parameter distributions, and have added two sentences in Methods (starting on line 596):

      “Of the 1600 simulated models, 138 met these criteria; for the present study, we randomly selected 50 models to comprise the young, control model cohort. Along most dimensions, the chosen cohort was approximately normally distributed (Supplementary Figure 1). The g-ratio (ratio of axon to fiber diameter) among models in the cohort was 0.71 ± 0.02, with total axon lengths of 1.2 ± 0.1 cm.”

      Author response image 4.

      Distribution of parameters and conduction velocities in the single neuron model cohort. (A) Histograms of axon morphology parameters of models selected for the single neuron cohort. Top: axon diameter: middle, length of unperturbed myelin segments; bottom: total myelin thickness in unperturbed segments, computed as the product of lamella thickness and number of lamellae. (B) Histograms of the CV for the 50 axons of the unperturbed model cohort (top), and representative demyelination and remyelination perturbations: mild demyelination (removing 25% of lamellae from 25% of the myelinated segments, second row); severe demyelination (removing all lamellae from 75% of the myelinated segments, third row); and complete (100%) remyelination (where the demyelinated segments from the third row were remyelinated by two shorter segments with 75% of lamellae). CVs averaged over 30 trials in each case. (C) Changes in CV (measured in %) in response to demyelination and remyelination versus the magnitude of current clamp step (+180, +280, or +380 pA). Shown are mean +/- SEM for demyelinating 50% of myelinated segments (removing all lamellae), and subsequent remyelination of those segments by shorter segments with 75% of lamellae.

      The focus of most analysis is on the conduction velocity but in the end, this has no effect on WM, so the discussion of CV remains sterile.

      CV delays likely do affect brain functions that rely on neuronal oscillations and synchrony, as mentioned in the Discussion. As such, we feel that our single neuron model results on CV delays as well as AP failures are valuable for the scientific community. Yet, given the results of our network models here, the reviewer has a valid point. We have clarified in the introduction that AP failures but not CV delays affected the network output (line 115):

      “Higher degrees of demyelination led to slower propagation and eventual failure of APs along the axons of the multicompartment models. In the network models, an increase in AP failure rate resulted in progressive working memory impairment, whereas slower conduction velocities, in the range observed in the multicompartment models, had a negligible effect.”

      We have also revised the single neuron section of the Results throughout, to better highlight the effects of myelin dystrophy on AP failures. Revisions to address this in the demyelination section start on line 148:

      “AP propagation was progressively impaired as demyelination increased (Figure 3): CV became slower, eventually leading to AP failure. Removing 25% of lamellae had a negligible effect on CV, regardless of how many segments were affected. However, when all lamellae were removed, CV slowed drastically – by 38 ± 10% even when just 25% of the segments were demyelinated in this way, and 35 ± 13% of APs failed. When 75% of segments lost all their lamellae, CV slowed by 72 ± 8% and 45 ± 13% of APs failed.”

      Similiarly, we have added several sentences about AP failures that remain after remyelination of the single neuron model (starting on line 190):

      “Results for the percentage of AP failures (Figure 4C,F) were consistent with those for CV recovery. Remyelinating all previously demyelinated segments, even adding just 10% of lamellae, brought AP failure rates down to 14.6 ± 5.1%. Remyelinating all affected segments with 75% of lamellae (the maximal amount of remyelination) nearly eliminated AP failures (1.8 ± 1.1%). Incomplete remyelination, where some segments were still demyelinated, still had relatively high AP failure rates. For example, when one eighth of segments were remyelinated with the maximal amount of lamellae and one eighth were left bare, 25.7 ± 11.5% of APs failed across the cohort (Figure 4C, red dashed line and arrow). AP failure rates were slightly lower when starting with partial demyelination: 10.6 ± 7.6% of APs failed in the analogous paradigm (Figure 4F, red dashed line and arrow). In short: combinations of demyelinated and remyelinated segments often led to sizable CV delays and AP failures.”

      The more important effect of de/re myelination is on failure. However, the failure is, AFAIK, just characterized by a constant current injection of 380pA. From Fig 2 it seems however that the first spike is particularly susceptible to failure. In other words, it has not been justified that it is fine to use the failure rates from this artificial protocol in the I&F model. I would expect the temporal current trace to affect whether the propagation fails or not.

      In general, we did not find the first spike to be more susceptible to failure than latter spikes; the trace in Figure 2 is a representative snapshot intended to illustrate CV slowdown, AP failure, and recovery. Regarding the constant current injection: while the reviewer is correct that neurons do not receive such inputs in vivo, the applied current injections were designed to match in vitro current clamp protocols for these rhesus monkey neurons. While our future studies will include responses to more realistic synaptic inputs, we focused on somatic current injections here. We have added a new panel (C) to Supplementary Figure 1 (see previous response above) showing that the current step magnitude had little effect on the CV change after myelin perturbations; there was little effect on AP failure rates too. We now also state this finding more explicitly in Methods (starting on line 561):

      “As done during in vitro electrophysiological experiments (Chang et al., 2005; Ibanez et al., 2020) and past modeling studies (Coskren et al., 2015; Rumbell et al., 2016), we first applied a holding current to stabilize the somatic membrane potential at -70 mV, then injected a current step into the somatic compartment for 2 seconds. …The CV changes in response to myelin alterations were relatively insensitive to variations in the magnitude of suprathreshold somatic current steps (Supplementary Figure 1C), and whether the current was constant or included Gaussian noise. Therefore, here we quantified CV changes and AP failures from responses to constant +380 pA current steps only.”

      I don't know if there are many axon-collaterals in the WM circuits and or distance dependence in the connectivity, but if so, then the current implementation of failure would be questionable.

      We agree that axon collaterals may affect our results; our unpublished morphological analyses of individual neuron axons indicate that there is a high degree of local axon collateralization in Layer 3 pyramidal neurons in LPFC. In this first study from our group on myelin perturbations, we chose to focus here on unbranched axons. There was some distance dependence of AP failure along the length of the axon. For example, in our most extreme demyelination case (75% of segments losing all their lamellae), about 14% of the axons showed more AP failure at their distal ends relative to the middle (mean difference 6.33%). We are examining this distance dependence more broadly in our next study, now cited in the Discussion (line 434): “Our model can also be extended to explore interactions between spatially localized myelin perturbations (such as those seen in multiple sclerosis) and axon collateralization (Sengupta et al., 2023), which would affect the distance-dependence of AP failures.”

      I would also advise against thresholding at 75% failure in Fig3C. Why don't the authors not simply plot the failure rate?

      We thank the reviewer for this suggestion, and have made this change. As suggested by the reviewer, we now show the AP failure rate in Figure 3 and Figure 4. The trends shown are nearly identical to those from the high failure trials.

      Regarding the presentation, there are a number of dead-end results that are not used further on. The paper is rather extensive, and it would be clearer if written up in half the space. In addition, much information is really supplementary. The issue of the CV I already mentioned, also the Lasso regression for instance remains unused.

      We understand the reviewer’s perspective, and we do value brevity when possible. During the revision process we examined the paper carefully, and made things more concise when it was feasible. As mentioned above, reporting CV results is important, though these revisions increased emphasis on results for AP failures in our revision. We combined the two Supplementary Figures about remyelination in the single neuron model into one (Supplementary Figure 3). We also moved the Lasso figure and associated methods to the Supplementary Material (Supplementary Figure 2), and have separated the Lasso results for demyelination and remyelination into their respective paragraphs (lines 154-160 and lines 200-204 respectively). While we do not use the Lasso explicitly later in Results, we cite them in the Discussion when comparing our findings to previous work (starting on line 417):

      “Since our single neuron cohort sampled a wide range of parameter space, we used Lasso regression to identify which of the complex, interacting parameters contributed most to CV delays (which preceded AP failures). Parameters including axon diameter, node length, length of myelinated segments, and nodal ion channel densities predicted how our models responded to demyelination and remyelination; these findings are consistent with past modeling studies over more limited parameter ranges (e.g., Goldman and Albus, 1968; Moore et al., 1978; Babbs and Shi, 2013; Young et al., 2013; Schmidt and Knösche, 2019).”

      We hope that our revision has struck an appropriate balance between clear and concise writing, and addressing concerns from both reviewers. We greatly value the time you have given to help us to improve our manuscript.

      Response to Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      As I mentioned above, I consider that this study is well designed and it offers very interesting results. I have detailed below some of the issues that should be addressed to improve its potential impact in the field:

      (1) Across the manuscript, it is not entirely clear how the results of the multicompartmental model compare to existing modeling results on demyelination and CV changes (such as in the papers cited by the authors). Is this section confirming previous results with a new (more accurate) computational model, or are there any new insights previously unreported? A new paragraph in the Discussion putting these results in context would be very useful for the reader.

      We thank the reviewer for this suggestion. We have added two new subheadings to organize the Discussion better, and have expanded the single neuron section to three paragraphs. We feel this now clarifies how our model fits in with previous work while stating its novelty more explicitly. Starting on line 391:

      “Myelin changes affect AP propagation in a cohort of model neurons

      The novelty of our neuron model lies in its systematic exploration of a combination of different myelin perturbation types known to occur in myelin dystrophies, across a wide range of biologically feasible models. Our single neuron model assumed that age-related myelin dystrophies (e.g., Figure 1) alter the insulative properties of lamellae analogously to demyelination, and examined interactions between demyelination and remyelination. Past studies of myelin dystrophy examined how either demyelination or remyelination of all segments affected AP propagation for a few representative axon morphologies. For example, Scurfield and Latimer (2018) explored how remyelination affected CV delays, finding that axons with more transitions between long and short myelinated segments had slower CV (Supplementary Figure 4), and was first to explore how remyelination interacts with tight junctions. However, their study did not couple remyelination and demyelination together or examine AP failures. Other basic findings from our single neuron cohort are consistent with past modeling studies, including that demyelination caused CV slowing and eventual AP failures (Stephanova et al., 2005; Stephanova and Daskalova, 2008; Naud and Longtin, 2019), and, separately, that remyelination with shorter and thinner myelinated segments led to CV slowing (Lasiene et al., 2008; Powers et al., 2012; Scurfield and Latimer, 2018). However, by assuming that some previously demyelinated segments were remyelinated while others were not, we found that models could have much higher AP failure rates than previously reported. Such a scenario, in which individual axons have some segments that are normal, some demyelinated, and some remyelinated, is likely to occur. We also found a few neurons in our cohort showing a CV increase after remyelination, which has not generally been reported before and is likely due to an interplay between ion channels in the new nodes and altered electrotonic lengths in the perturbed myelinated segments (e.g., Waxman, 1978; Naud and Longtin, 2019).

      Since our single neuron cohort sampled a wide range of parameter space, we used Lasso regression to identify which of the complex, interacting parameters contributed most to CV delays (which preceded AP failures). Parameters including axon diameter, node length, length of myelinated segments, and nodal ion channel densities predicted how our models responded to demyelination and remyelination; these findings are consistent with past modeling studies over more limited parameter ranges (e.g., Goldman and Albus, 1968; Moore et al., 1978; Babbs and Shi, 2013; Young et al., 2013; Schmidt and Knösche, 2019). Better empirical measurements of these parameters in monkey dlPFC, for example from 3-dimensional electron microscopy studies or single neuron axon studies combined with markers for myelin, would help predict the extent to which myelin dystrophy and remyelination along individual axons with aging affect AP propagation.

      Another important feature of our multicompartment model is that it was constrained by morphologic and physiological data in rhesus monkey dlPFC —an extremely valuable dataset from an animal model with many similarities to humans (Upright and Baxter, 2021; Tarantal et al., 2022). While beyond the scope of the current study, this computational infrastructure –with a detailed axon, initial segment, soma, and apical and basal dendrites– enables simultaneous investigations of signal propagation through the dendritic arbor and axon. Our model can also be extended to explore interactions between spatially localized myelin perturbations (such as those seen in multiple sclerosis) and axon collateralization (Sengupta et al., 2023), which would affect the distance-dependence of AP failures. Integrating such results from single neuron models into network models of working memory, as we have done here, is a powerful way to connect empirical data across multiple scales.”

      (2) Although the authors provide a well-designed study for the multi-compartmental model, it would be useful to add more details about how an unperturbed model and a completely remyelinated model differ in practice, perhaps right before the first results on the single cell model are presented. Are the new myelin sheaths covering the same % of axon as in the original case? Are there the same number of nodes? It is hard to distinguish which of these results are due to a compensation by the new myelin sheaths and which ones are just the model coming back to its original (and mathematically equivalent) starting point.

      A ‘fully remyelinated’ axon is not mathematically equivalent to the unperturbed axon. Newly remyelinated segments had at most 75% of the original number of myelin wraps, with a new node in between, consistent with empirical observations in rhesus monkey dlPFC. Our manuscript changes in response to this recommendation are described in detail above in our response to the public review of the same reviewer.

      (3) The authors observe a directed component in the bias that is known to be caused by heterogeneities in network connectivity, as stated in the text. It occurs to me that similar effects could be also caused by an heterogeneous demyelination in parts of the network. Inducing these biases could be another potential effect of demyelination in practice, and could be easily revealed by the author's current model (and displayed in a supplementary figure).

      As suggested by the reviewer, we have tested heterogeneous demyelination in parts of the network and the results confirm the reviewer’s intuition. We have included these new results as new Supplementary Figure 7 (see below) and we have added the following sentences in the Legend of Figure 5, line 1265: “When demyelination is restricted to a part of the network, diffusion only increases in the perturbed zone (Supplementary Figure 7).” and in the Discussion (line 457): “In addition to age-related changes in memory duration and precision, our network model predicts an age-related increase in systematic errors (bias) due to an increased drift of the activity bump (Supplementary Figure 11). Moreover, if demyelination is spatially localized in a part of the network, the model predicts a repulsive bias away from the memories encoded in the affected zone (Supplementary Figure 7).”

      Author response image 5.

      Effect of spatially heterogeneous demyelination of the model neurons according to their preferred angle. We also tested working memory performance in the network when demyelination affects only parts of the network. The figure shows the decoded bump center position during the cue and delay period for the eight possible cue directions when a fraction of neurons was perturbed and the rest of the neurons in the circuit were unaltered (Figure 5B). We perturbed 10% of the neurons around the neuron with preferred direction 90° (left panel), 25% of the neurons around -90° (middle panel), and 50% of the neurons around 180° (right panel). Bump traces for cues that lie inside the perturbed portion of the circuit are shown in blue. Network perturbation in the three cases consisted in demyelinating 25% of the segments along the axons of model neurons, by removing 70% of the myelin lamellae. In each case, 280 trials were simulated for one network. These simulations show an increased drift and diffusion inside the perturbed zone, consistent with the increased drift and diffusion when perturbing the entire network (Figure 6B and Supplementary Figure 11). In particular, spatially heterogeneous demyelination in our network leads to a bias away from the affected zone and to increased trial-to-trial variability. Note that this is a model prediction, but we are not aware of empirical data showing heterogeneous demyelination with aging. Further, note that while our network model has a topological ring structure, neurons in PFC are not anatomically arranged depending on their preferred features. Thus, spatially heterogeneous demyelination would likely affect neurons with different feature preferences (i.e., neurons throughout our ring model).

      (4) The bump attractor model of WM relies on a continuous attractor dynamics to encode the information stored in memory --a fixed point dynamics that can only vary via the slow noise-driven drift. This means, as the authors mention, that changes in CV won't affect the performance of WM in their model. This seems to be a limitation of the model, or at least an effect which is highly dependent on the modeler's choice, rather than an accurate prediction. While testing the effects of oscillations (as the authors argue in the Discussion) might be out of the scope of this work, there are other WM models which are more sensitive to temporal differences in activity. The authors should test whether the same (lack of) effects are also found in other WM models. A silent WM model seems to be the ideal candidate for this, as the authors already have the key dynamics of that model incorporated in their computational framework (namely, short-term synaptic facilitation in excitatory synapses).

      We fully agree that considering the effects of demyelination in networks with alternative mechanisms would strengthen our manuscript. As suggested by the reviewer, we have simulated demyelination effects (AP failures and changes in CV) in an activity silent working memory model. The results are described in detail above in our response to the public review of the same reviewer.

      We also would like to mention that we have now also tested larger conduction delays in the bump attractor model, revealing additional working memory errors. This is shown in the revised version of Supplementary Figure 6 (see below). However, those delays are unrealistically large and thus the main effect in both the bump attractor and the activity-silent model is due to AP failures.

      Author response image 6.

      Effect of propagation delays on control and perturbed networks. (A) Memory strength (left panels) and diffusion (right panels) for the young, control networks with zero propagation delays (blue solid line), as in Figure 5, and with propagation delays from a uniform distribution with a range between 0 and 100 ms (yellow dashed line). (B) Memory strength and diffusion for perturbed networks when demyelinating 50% of the segments along the axons of model neurons, by removing 60% of the myelin lamellae without delays (red solid line), and with delays from a uniform distribution with a range between 0 and 40 ms (gray dashed line) and between 0 and 85 ms (black dash-dotted line). The measures of working memory performance were calculated by averaging across 20 networks and 280 trials for each network. Shaded areas indicate SEM for each case. For the young, control networks, there was no difference with and without propagation delays, even though the delays used in the network simulations were much larger than the delays quantified in the single neuron model (the longest delays found for the most extreme perturbation condition –demyelination of 75% of the segments by removing 100% of the myelin lamellae– were of 49.9 ms on average; A). Working memory performance was also unaffected in the perturbed network with AP failures for delays ranging between 0 and 40 ms, also larger than the ones quantified in the single neuron model (for the case of 50% of the segments demyelinated by removing 60% of the myelin lamellae, the average delay in the cohort was 4.6 ms and the maximum delay was 15.7 ms; B). However, including extremely long delays of up to 85 ms did further impair memory compared to the impairment level introduced by AP failures alone (B).

      (5) Impact of demyelination and remyelination on working memory: Could the authors explain here how these biologically detailed alterations are implemented in the bump attractor model? Is the CV and AP failure rate adjusted to the values produced by the multicompartment neuron model with these myelin alterations?

      Yes, the reviewer is right, the CV and AP failure rate have been adjusted to the values produced by the multicompartment neuron model. To clarify this in the manuscript, we have restated the text as follows:

      Lines 243 - 249 (Results):

      To investigate how myelin alterations affect working memory maintenance, we explored in the network model the same demyelination and remyelination conditions as we did in the single neuron model. Because our network model consists of point neurons (i.e., without detailed axons), we incorporated CV slowing as an effective increase in synaptic transmission delays (see Methods). To simulate AP failures, we adjusted the AP failure rate to the values given by the single neuron model, by creating a probabilistic model of spike transmission from the excitatory presynaptic neurons to both the excitatory and inhibitory postsynaptic neurons (see Methods).

      Lines 722 - 747 (Methods):

      Modeling action potential propagation failures in the network. The network model is composed of point neurons without an explicit model of the axon. To effectively model the action potential failures at the distal end of the axons quantified with the single neuron model under the different demyelination and remyelination conditions, the AP failure rate was adjusted to the values produced by the single neuron model. To do this, we perturbed the 10 control networks by designing a probabilistic model of spike transmission from the excitatory presynaptic neurons to both the excitatory and inhibitory postsynaptic neurons. From the single neuron model, for each demyelination/remyelination condition, we quantified the probability of AP failure for each of the neurons in the control cohort, as well as the percentage of those neurons that shared the same probabilities of failure. That is, the percentage of neurons that had probability of failure = 0, probability of failure = 1 or any other probability. Then, we computed the probability of transmission, , and we specified for the corresponding percentages of excitatory neurons in the networks. Thus, in the network model, we took into account the heterogeneity observed in the single neuron model under each demyelination/remyelination condition.

      Modeling conduction velocity slowing in the network. To explore the effect of CV slowing along the axons of model neurons, we simulated 20 young, control networks and 20 perturbed networks with AP failure rates adjusted for the case of single model neurons with 50% of the segments demyelinated along the axons by removing 60% of the myelin lamellae (we ran 280 trials for each network). Then, we added random delays uniformly distributed with a minimum value of 0 ms in both cases, a maximum value of 100 ms in the control networks, and a maximum values of 40 ms and 85 ms in the perturbed networks, in both the AMPA and NMDA excitatory connections to both E and I neurons (Supplementary Figure 6). These large values were chosen because we wanted to illustrate the potential effect of CV slowing in our network and smaller, more realistic, values did not have any effect.

      (6) "We also sought to reveal the effect on working memory performance of more biologically realistic network models with AP transmission probabilities matched to both axons with intact and with altered myelin sheaths, as likely occurs in the aging brain (Figure 1). Thus, we ran network model simulations combining AP failure probabilities corresponding to groups of neurons containing intact axons and axons presenting different degrees of demyelination." I fail to see the difference with respect to the results in previous sections. Is it that now we have subnetworks in which axons are intact and subnetworks with significant AP failures, while before there was no topological separation between both cases? Please clarify.

      In Figures 5 and 6 the AP failure rate of the neural population in the network simulations was matched to the AP failure rate of the cohort of single model neurons for each demyelination/remyelination condition. Since not all model neurons have equal features, a given condition produces different levels of impairment in its neuron. Thus, we quantified the probability of AP failure for each neuron in the control cohort, as well as the percentage of those neurons that shared the same probabilities of failure. Then, we computed the probability of AP transmission for the corresponding percentages of excitatory neurons in the networks. Thus, in the network model, we took into account the heterogeneity observed in the single neuron model under each demyelination/remyelination condition.

      However, In Figures 7 and 8, we consider additional heterogeneity due to a different degree of demylination/remyelination of different neurons. Here, excitatory neurons in the network model are not perturbed according to a single demyelination/remyelination condition. Instead, we allowed that different percentages of excitatory neurons had AP failure rates corresponding to different demyelination/remyelination conditions: some were unperturbed, while others had different degrees of demyelination (Figure 7) and different degrees of remyelination (Figure 8). We have modified the text for clarification in several places.

      First, when we describe the impact of demyelination on working memory, we already mention that (line 271): “In each of the 10 networks, we set the AP failure rate of the excitatory neurons according to the distribution of failure probabilities of the neurons in the single neuron cohort for the given demyelination or remyelination condition. Thus, we took into account the heterogeneity of demyelination and remyelination effects from our single neuron cohort (Figure 3A; Supplementary Figure 3). Note that this heterogeneity originates from differences in axon properties, but probabilities of failure for all neurons in the network correspond to the same degree of demyelination (Figure 6). We will also consider networks that contain different combinations of axons with either intact or perturbed myelin (Figure 7 and Figure 8).”

      Second, we have combined the text describing Figures 7 and 8 under a single section title, which reads “Simulated heterogenous myelin alterations match empirical data” (line 334) and start this section with (line 337): “Up to this point we have studied network models with AP failure probabilities corresponding to a single degree of myelin alterations (i.e., with all excitatory neurons in the network having AP failure rates matched to those of the single neuron cohort for one particular demyelination or remyelination condition). Next, we sought to reveal the effect on working memory performance of more biologically realistic network models, where excitatory neurons in the networks were perturbed according to a combination of different demyelination or remyelination conditions. That is, we simulated networks with excitatory neurons having AP failure probabilities matched to both neuronal axons with intact and with altered myelin sheaths in different degrees, as likely occurs in the aging brain (Figure 1).”

      (7) "Unexpectedly, our model indicates that compared to the performance of networks composed of neurons possessing axons with intact myelin sheaths, both demyelination and remyelination leads to an impaired performance." This conclusion is quite interesting, but I lack intuition from the paper as of why it is happening. In fact, the authors say in the Discussion that "complete remyelination of all the previously demyelinated segments with sufficient myelin, with fewer transitions between long and short segments, recovered working memory function." Would we then see a minimum and then an increase in memory duration in Figure 9B if we extended the X-axis until we hit 100% of new myelin sheaths?

      This is a very important question that we have carefully addressed in Results and Discussion. We distinguish between two remyelination cases in the models. Complete remyelination: when all (100%) the previously demyelinated segments have been subsequently remyelinated, and incomplete remyelination: when less than 100% (25%, 50% or 75%) of the demyelinated segments have been remyelinated. Figure 6 (middle and right columns) shows the two cases (black lines for any percentage of lamellae added vs. colored lines): for 100% of the segments remyelinated, the network performance is nearly or completely (when enough lamellae are added) recovered to the young network performance. In fact, with the single neuron model we observe that (lines 192 - 193 in Results): “Remyelinating all affected segments with 75% of lamellae (the maximal amount of remyelination) nearly eliminated AP failures (1.8 ± 1.1%)”. However, incomplete remyelination recovers the performance compared to demyelination (middle and right columns in Figure 6 vs left column), but this performance is worse than the performance of the young networks. The single neuron model shows that (lines 194 - 197 in Results): “Incomplete remyelination, where some segments were still demyelinated, still had relatively high AP failure rates. For example, when one eighth of segments were remyelinated with the maximal amount of lamellae and one eighth were left bare, 25.7 ± 11.5% of APs failed across the cohort (Figure 4C, red dashed line and arrow).”

      In Figure 9B (now Figure 8B), we combine intact axons with axons that are only partially remyelinated (i.e., incomplete remyelination). Extending the X-axis in Figure 8B until 100% of new myelin sheaths would not imply a minimum and a subsequent increase, but a continuous impairment: the more axons we perturb (remyelinate) the higher is the impairment compared to the young cases where all the axons are intact.

      The sentence "Unexpectedly, our model indicates that compared to the performance of networks composed of neurons possessing axons with intact myelin sheaths, both demyelination and remyelination leads to an impaired performance.", now reads as (lines 379 380 in Results): “Therefore, both demyelination and incomplete remyelination lead to impaired performance in our networks, compared to networks with intact myelin sheaths”. We have also rewritten the corresponding section in Discussion (lines 486 - 489) as follows: “Therefore, it is reasonable to assume that ineffective remyelination may lead to working memory impairment. In fact, complete remyelination of all previously demyelinated segments with sufficient myelin, with fewer transitions between long and short segments, led to full recovery of working memory function.”

      (8) [minor] "Our recent network model found that age-related changes in firing rates and synapse numbers in individual neurons can lead to working memory impairment (Ibañez et al., 2020), but did not consider myelin dystrophy." Could you be more precise about which age-related changes were studied in Ibanez et al. 2020? From the paper it seems like it was mostly cellular excitability and synaptic density, so this should be added here for more context.

      To clarify this, we have added the following sentences in the Introduccion (line 105):

      “Our recent network model revealed that the empirically observed age-related increase in AP firing rates in prefrontal pyramidal neurons (modeled through an increased slope of the f-I curve) and loss of up to 30% of both excitatory and inhibitory synapses (modeled as a decrease in connectivity strength) can lead to working memory impairment (Ibañez et al., 2020), but this model did not incorporate the known changes to myelin structure that occur during normal

      aging.”

      (9) [minor] "Recurrent excitatory synapses are facilitating, which promotes robust and reliable persistent activity despite spatial heterogeneities in the connectivity or in the intrinsic properties of the neurons." It would be great to add a reference here to justify the inclusion of this type of plasticity in the excitatory circuit (for example Wang, Markram et al. Nat Neuro 2006).

      We have added the references suggested by the reviewer and a further one in the Results (line 216):

      “Recurrent excitatory synapses are facilitating, as has been empirically observed in PFC (Hempel et al., 2000; Wang et al., 2006), which promotes robust and reliable persistent activity despite spatial heterogeneities in the connectivity or in the intrinsic properties of the neurons.”

      References:

      Hempel, C. M., Hartman, K. H., Wang, X. J., Turrigiano, G. G., and Nelson, S. B. (2000). Multiple forms of short-term plasticity at excitatory synapses in rat medial prefrontal cortex. J. Neurophysiol. 83, 3031–3041. doi: 10.1152/jn.2000.83.5.3031

      Wang, Y., Markram, H., Goodman, P. H., Berger, T. K., Ma, J., and Goldman- Rakic, P. S.(2006). Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nat.Neurosci. 9, 534–542. doi: 10.1038/nn1670

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript examines the contribution of the dorsal and intermediate hippocampus to goal-directed navigation in a wide virtual environment where visual cues are provided by the scenery on the periphery of a wide arena. Among a choice of 2 reward zones located near the arena periphery, rats learn to navigate from the center of the arena to the reward zone associated with the highest reward. Navigation performance is largely assessed from the rats' body orientation when they leave the arena center and when they reach the periphery, as well as the angular mismatch between the reward zone and the site rats reach the periphery. Muscimol inactivation of the dorsal and intermediate hippocampus alters rat navigation to the reward zone, but the effect was more pronounced for the inactivation of the intermediate hippocampus, with some rat trajectories ending in the zone associated with the lowest reward. Based on these results, the authors suggest that the intermediate hippocampus is critical, especially for navigating to the highest reward zone.

      Strengths:

      -The authors developed an effective approach to study goal-directed navigation in a virtual environment where visual cues are provided by the peripheral scenery.

      - In general, the text is clearly written and the figures are well-designed and relatively straightforward to interpret, even without reading the legends.

      - An intriguing result, which would deserve to be better investigated and/or discussed, was that rats tended to rotate always in the counterclockwise direction. Could this be because of a hardware bias making it easier to turn left, some aspect of the peripheral landscape, or a natural preference of rats to turn left that is observable (or reported) in a real environment?

      Thank you for the insightful question. As the reviewer mentioned, the counterclockwise rotation behavior was intriguing and unexpected. To answer the reviewer’s question properly, we examined whether such stereotypical turning behavior appeared before the rats acquired the task rule and reward zones in the pre-surgical training phase of the task. Data from the last day of shaping and the first day of the pre-surgical main task day showed no significant difference in the number of trials in which the first body-turn was either clockwise or counterclockwise, suggesting that the rats did not have a bias toward a specific side (p=0.46 for Shaping; p=0.76 for the Main task, Wilcoxon signed-rank test). These results excluded the possibility that there was something in the apparatus's hardware that made the rats turn only to the left. Also, since we used the same peripheral landscape for the shaping and main task, we could assume that the peripheral landscape did not cause movement bias.

      Author response image 1.

      Although it remains inconclusive, we have noticed that some prior studies alluded to a phenomenon similar to this issue, framed as the topic of lateralization or spatial preference by comparing left and right biases. For example, Wishaw et al. (1992) suggested that there was natural lateralization in rats (“Most of the rats displayed either a strong right limb bias or a strong left limb bias.”) but no dominance to a specific side. Andrade et al. (2001) also claimed that “83% of Wistar rats spontaneously showed a clear preference for left or right arms in the T-maze.” However, to the best of our knowledge, there has been no direct evidence that rats have a dominant natural preference only to one side.

      Therefore, while the left-turning behavior remains an intriguing topic for further investigation, we find it difficult to pinpoint the reason behind the behavior in the current study. However, we would like to emphasize that this behavior did not interrupt testing our hypothesis. Nonetheless, we agree with the reviewer’s point that the counterclockwise rotation needs to be discussed more, so we revised the manuscript as follows:

      “To rule out the potential effect of hardware bias or any particular aspect of peripheral landscape to make rats turn only to one side, we measured the direction of the first body-turn in each trial on the last day of shaping and the first day of the main task (i.e., before rats learned the reward zones). There was no significant difference between the clockwise and counterclockwise turns (p=0.46 for shaping, p=0.76 for main task; Wilcoxon signed-rank test), indicating that the stereotypical pattern of counterclockwise body-turn appeared only after the rats learned the reward locations.” (p.6)

      - Another interesting observation, which would also deserve to be addressed in the discussion, is the fact that dHP/iHP inactivations produced to some extent consistent shifts in departing and peripheral crossing directions. This is visible from the distributions in Figures 6 and 7, which still show a peak under muscimol inactivation, but this peak is shifted to earlier angles than the correct ones. Such change is not straightforward to interpret, unlike the shortening of the mean vector length.

      Maybe rats under muscimol could navigate simply by using the association of reward zone with some visual cues in the peripheral scene, in brain areas other than the hippocampus, and therefore stopped their rotation as soon as they saw the cues, a bit before the correct angle. While with their hippocampus is intact, rats could estimate precisely the spatial relationship between the reward zone and visual cues.

      We agree with the possibility suggested by the reviewer. However, although not described in the original manuscript, we performed several different control experiments in a few rats using various visual stimulus manipulations to test how their behaviors change as a result. One of the experiments was the landmark omission test, where one of the landmarks was omitted. The landmark to be made disappear was pseudorandomly manipulated on a trial-by-trial basis. We observed that the omission of one landmark, regardless of its identity, did not cause a specific behavioral change in finding the reward zones, suggesting that the rats were not relying on a single visual landmark when finding the reward zone.

      Author response image 2.

      Therefore, it is unlikely that rats used the spatial relationship between the reward zone and a specific visual cue to solve the task in our study. However, the result was based on an insufficient sample size (n=3), not permitting any meaningful statistical testing. Thus, we have now updated this information in the manuscript as an anecdotal result as follows:

      “Additionally, to investigate whether the rats used a certain landmark as a beacon to find the reward zones, we conducted the landmark omission test as a part of control experiments. Here, one of the landmarks was omitted, and the landmark to be made disappear was pseudorandomly manipulated on a trial-by-trial basis. The omission of one landmark, regardless of its identity, did not cause a specific behavioral change in finding the reward zones, suggesting that the rats were not relying on a single visual landmark when finding the reward zones. The result can be reported anecdotally only because of an insufficient sample size (n=3), not permitting any meaningful statistical testing.” (p.9)

      Weaknesses:

      -I am not sure that the differential role of dHP and iHP for navigation to high/low reward locations is supported by the data. The current results could be compatible with iHP inactivation producing a stronger impairment on spatial orientation than dHP inactivation, generating more erratic trajectories that crossed by chance the second reward zone.

      To make the point that iHP inactivation affects the disambiguation of high and low reward locations, the authors should show that the fraction of trajectories aiming at the low reward zone is higher than expected by chance. Somehow we would expect to see a significant peak pointing toward the low reward zone in the distribution of Figures 6-7.

      We thank the reviewer for the valuable comments. We agree that it is difficult to rigorously distinguish the loss of value representation from spatial disorientation in our experiment. Since the trial ended once the rat touched either reward zone, it was difficult to specify whether they intended to arrive at the location or just moved randomly and arrived there by chance. Moreover, it is possible that the drug infusion did not completely inactivate the iHP but only partially did so.

      To investigate this issue further, we checked whether the distribution of the departure direction (DD) differed between the trials in which rats initially headed north (NW, N, NE) and south (SE, S, SW) at the start. In the manuscript, we demonstrated that DD aligned with the high-value zone, indicating that the rat remembered the scenes associated with the high-value zone (p.8). Based on the rats’ characteristic counterclockwise rotation, the reward zone rats would face first upon starting while heading north would be the high-value zone. On the other hand, the rat would face the low-value reward zone when starting while heading south. In this case, normal rats would inhibit leaving the start zone and rotate further until they face the high-value zone before finally departing the start location. If the iHP inactivation caused a more severe impairment in spatial orientation but not in value representation, it is likely that the iHP-inactivated rats in both north- and south-starting trials would behave similarly with the dHP-inactivated rats, but producing a larger deviation from the high-value zone. However, if the iHP inactivation affected the disambiguation of high and low reward locations, north and south-starting trials would show different DD distributions.

      The circular plots shown below are the DD distributions of dMUS and iMUS. We could see that when they started facing north, iHP-inactivated rats still aligned themselves towards the high-value zone and thus remained spatially oriented, similar to the dHP inactivation session. However, in the south-starting trials, the DD distribution was completely different from the north-starting trials; the rats failed in body alignment towards the high-value zone. Instead, they departed the start point while heading south in most trials. This pattern was not seen in dMUS sessions, even in their south-starting trials, illustrating the distinct deficit caused by iHP inactivation. Additionally, most of the rats with iHP inactivation visited the low-value zone more in south-headed starting trials than in the north-headed trials, except for one rat.

      Author response image 3.

      Furthermore, we would like to clarify that we do not limit the effect of iHP inactivation to the impairment in distinguishing the high and low reward zones. It is possible that iHP inactivation resulted in the loss of a global value-representing map, leading to the impairment in distinguishing both reward zones from other non-rewarded areas in the environment. Figures 6 and 7 implicated this possibility by showing that the peaks are not restricted only to the reward zones. Unfortunately, we cannot rigorously address this in the current study because of the limitations of our experimental design mentioned above.

      Nonetheless, we agree with the reviewer that this limitation needs to be addressed, so we now added how the current study needs further investigation to clarify what causes the behavioral change after the iHP inactivation in the Limitations section (p.21).

      Reviewer #2 (Public Review):

      Summary:

      The aim of this paper was to elucidate the role of the dorsal HP and intermediate HP (dHP and iHP) in value-based spatial navigation through behavioral and pharmacological experiments using a newly developed VR apparatus. The authors inactivated dHP and iHP by muscimol injection and analyzed the differences in behavior. The results showed that dHP was important for spatial navigation, while iHP was critical for both value judgments and spatial navigation. The present study developed a new sophisticated behavioral experimental apparatus and proposed a behavioral paradigm that is useful for studying value-dependent spatial navigation. In addition, the present study provides important results that support previous findings of differential function along the dorsoventral axis of the hippocampus.

      Strengths:

      The authors developed a VR-based value-based spatial navigation task that allowed separate evaluation of "high-value target selection" and "spatial navigation to the target." They were also able to quantify behavioral parameters, allowing detailed analysis of the rats' behavioral patterns before and after learning or pharmacological inactivation.

      Weaknesses:

      Although differences in function along the dorsoventral axis of the hippocampus is an important topic that has received considerable attention, differences in value coding have been shown in previous studies, including the work of the authors; the present paper is an important study that supports previous studies, but the novelty of the findings is not that high, as the results are from pharmacological and behavioral experiments only.

      We appreciate the reviewer's insightful comments. In response, we would like to emphasize that a very limited number of studies investigated the function of the intermediate hippocampus, especially in spatial memory tasks. We tested the differential functions of the dorsal and intermediate hippocampus using a within-animal design and used reversible inactivation manipulation (i.e., muscimol injection) to prevent potential compensation by other brain regions when using irreversible manipulation techniques (i.e., lesion). Also, very few studies have analyzed the navigation trajectories of animals as closely as in the current study. We emphasize the novelty of our study by comparing it with prior studies, as shown below in Table 1.

      Author response table 1.

      Comparison of our study with those from prior studies

      Moreover, to the best of our knowledge, the current manuscript is the first to investigate the hippocampal subregions along the long axis in a VR environment using a hippocampal-dependent spatial memory task. Nonetheless, we agree that the current study has a limitation as a behavior-only experiment. We now have added a comment on how other techniques, such as electrophysiology, would develop our findings in the Limitation section (p.21).

      Reviewer #3 (Public Review):

      Summary:

      The authors established a new virtual reality place preference task. On the task, rats, which were body-restrained on top of a moveable Styrofoam ball and could move through a circular virtual environment by moving the Styrofoam ball, learned to navigate reliably to a high-reward location over a low-reward location, using allocentric visual cues arranged around the virtual environment.

      The authors also showed that functional inhibition by bilateral microinfusion of the GABA-A receptor agonist muscimol, which targeted the dorsal or intermediate hippocampus, disrupted task performance. The impact of functional inhibition targeting the intermediate hippocampus was more pronounced than that of functional inhibition targeting the dorsal hippocampus.

      Moreover, the authors demonstrated that the same manipulations did not significantly disrupt rats' performance on a virtual reality task that required them to navigate to a spherical landmark to obtain reward, although there were numerical impairments in the main performance measure and the absence of statistically significant impairments may partly reflect a small sample size (see comments below).

      Overall, the study established a new virtual-reality place preference task for rats and established that performance on this task requires the dorsal to intermediate hippocampus. They also established that task performance is more sensitive to the same muscimol infusion (presumably - doses and volumes used were not clearly defined in the manuscript, see comments below) when the infusion was applied to the intermediate hippocampus, compared to the dorsal hippocampus, although this does not offer strong support for the authors claim that dorsal hippocampus is responsible for accurate spatial navigation and intermediate hippocampus for place-value associations (see comments below).

      Strengths:

      (1) The authors established a new place preference task for body-restrained rats in a virtual environment and, using temporary pharmacological inhibition by intra-cerebral microinfusion of the GABA-A receptor agonist muscimol, showed that task performance requires dorsal to intermediate hippocampus.

      (2) These findings extend our knowledge about place learning tasks that require dorsal to intermediate hippocampus and add to previous evidence that, for some place memory tasks, the intermediate hippocampus may be more important than other parts of the hippocampus, including the dorsal hippocampus, for goal-directed navigation based on allocentric place memory.

      (3) The hippocampus-dependent task may be useful for future recording studies examining how hippocampal neurons support behavioral performance based on place information.

      Weaknesses:

      (1) The new findings do not strongly support the authors' suggestion that the dorsal hippocampus is responsible for accurate spatial navigation and the intermediate hippocampus for place-value associations.

      The authors base this claim on the differential effects of the dorsal and intermediate hippocampal muscimol infusions on different performance measures. More specifically, dorsal hippocampal muscimol infusion significantly increased perimeter crossings and perimeter crossing deviations, whereas dorsal infusion did not significantly change other measures of task performance, including departure direction and visits to the high-value location. However, these statistical outcomes offer only limited evidence that dorsal hippocampal infusion specifically affected the perimeter crossing, without affecting the other measures. Numerically the pattern of infusion effects is quite similar across these various measures: intermediate hippocampal infusions markedly impaired these performance measures compared to vehicle infusions, and the values of these measures after dorsal hippocampal muscimol infusion were between the values in the intermediate hippocampal muscimol and the vehicle condition (Figures 5-7). Moreover, I am not so sure that the perimeter crossing measures really reflect distinct aspects of navigational performance compared to departure direction and hit rate, and, even if they did, which aspects this would be. For example, in line 316, the authors suggest that 'departure direction and PCD [perimeter crossing deviation] [are] indices of the effectiveness and accuracy of navigation, respectively'. However, what do the authors mean by 'effectiveness' and 'accuracy'? Accuracy typically refers to whether or not the navigation is 'correct', i.e. how much it deviates from the goal location, which would be indexed by all performance measures.

      So, overall, I would recommend toning down the claim that the findings suggest that the dorsal hippocampus is responsible for accurate spatial navigation and the intermediate hippocampus for place-value associations.

      The reviewer mentioned that the statistical outcomes offer limited evidence as the dHP inactivation results were always positioned between the results of the iHP inactivation and controls. However, we would like to emphasize that, projecting to each other, the two subregions are not completely segregated anatomically. It is highly likely this is also true functionally and there should be some overlap in their roles. Considering such relationships between the dHP and iHP, it could be natural to see an intermediate effect after inactivating the dHP, and that is why we focused on the “magnitude” of behavioral changes after inactivation instead of complete dissociation between the two subregions in our manuscript. Unfortunately, because of the nature of the drug infusion study, further dissociation would be difficult, requiring further investigation with different experimental techniques, such as physiological examinations of the neural firing patterns between the two regions. We mentioned this caveat of the current study in the Limitations as follows:

      “However, our study includes only behavioral results and further mechanistic explanations as to the processes underlying the behavioral deficits require physiological investigations at the cellular level. Neurophysiological recordings during VR task performance could answer, for example, the questions such as whether the value-associated map in the iHP is built upon the map inherited from the dHP or it is independently developed in the iHP.” (p.21)

      Regarding the reviewer’s comment on the meaning of measuring the perimeter crossing directions, we would like to draw the reviewer’s attention to the individual trajectories during the iMUS sessions described in Figure 5. Particularly when they were not confident with the location of the higher reward, rats changed their heading directions during the navigation, which resulted in a less efficient route to the goal location. Rats showing this type of behavior tended to hit the perimeter of the arena first before correcting their routes toward the goal zone. In contrast, rats showing effective navigation hardly bumped into the wall or perimeter before hitting the goal zone. Thus, their PCDs matched DDs almost always. When considered together with DD, our PCD measure could tell whether rats not hitting the goal zone directly after departure were impaired in either maintaining the correct heading direction to the goal zone at the start location or orienting themselves to the target zone accurately from the start. Our results suggest that the latter is the case. We included the relevant explanation in the Discussion section as follows:

      “Particularly, rats changed their heading directions during the navigation when they were not confident with the location of the higher reward, resulting in a less efficient route to the goal location. Rats showing this type of behavior tended to hit the perimeter of the arena first before correcting their routes. Therefore, when considered together with DD, our PCD measure could tell that the rats not hitting the goal zone directly after departure were impaired in orienting themselves to the target zone accurately from the start, not in maintaining the correct heading direction to the goal zone at the start location.” (p.19)

      Nonetheless, we agree with the reviewer that the term ‘accuracy’ might be confusing with performance accuracy, so we replaced the term with ‘precision’ throughout the manuscript, referring to the precise targeting of the reward zones.

      (2) The claim that the different effects of intermediate and dorsal hippocampal muscimol infusions reflect different functions of intermediate and dorsal hippocampus rests on the assumption that both manipulations inhibit similar volumes of hippocampal tissue to a similar extent, but at different levels along the dorso-ventral axis of the hippocampus. However, this is not a foregone conclusion (e.g., drug spread may differ depending on the infusion site or drug effects may differ due to differential expression of GABA-A receptors in the dorsal and intermediate hippocampus), and the authors do not provide direct evidence for this assumption. Therefore, a possible alternative account of the weaker effects of dorsal compared to intermediate hippocampal muscimol infusions on place-preference performance is that the dorsal infusions affect less hippocampal volume or less markedly inhibit neurons within the affected volume than the intermediate infusions. I would recommend that the authors briefly consider this issue in the discussion. Moreover, from the Methods, it is not clear which infusion volume and muscimol concentration were used for the different infusions (see below, 4.a.), and this must be clarified.

      We appreciate these insightful comments from the reviewer and agree that we do not provide direct evidence for the point raised by the reviewer. To the best of our knowledge, most of the behavioral studies on the long axis of the hippocampus did not particularly address the differential expression of GABA-A receptors along the axis. We could not find any literature that specifically introduced and compared the levels of expression of GABA-A receptors or the diffusion range of muscimol in the intermediate hippocampus to the other subregions. However, we found that Sotiriou et al. (2005) made such comparisons with respect to the expression of different GABA-A receptors. They concluded that the dorsal and ventral hippocampi have different levels of the GABA-A receptor subtypes. The a1/b2/g2 subtype was dominant in the dorsal hippocampus, while the a2/b1/g2 subtype was prevalent in the ventral hippocampus. Sotiriou and colleagues also mentioned the lower affinity of GABA-A receptor binding in the ventral hippocampus, and this result is consistent with the Papatheodoropoulos et al. (2002) study that showed a weaker synaptic inhibition in the ventral hippocampus compared to the dorsal hippocampus. Papatheodoropoulos et al. speculated differences in GABA receptors as one of the potential causes underlying the differential synaptic inhibition between the dorsal and ventral hippocampal regions. Based on these findings, the same volume of muscimol is more likely to cause a more severe effect on the ventral hippocampus than the dorsal hippocampus. Therefore, we do not believe that the less significant changes after the dorsal hippocampal inactivation were induced by the expression level of GABA-A receptors. Additionally, we have demonstrated in our previous study that muscimol injections in the dorsal hippocampus impair performance to the chance level in scene-based behavioral tasks (Lee et al., 2014; Kim et al., 2012).

      Nonetheless, we mentioned the possibility of differential muscimol expressions between the two target regions. Following the suggestion of the reviewer, we now included this information in the Discussion as follows:

      “Although there is still a possibility that the levels of expression of GABA-A receptors might be different along the longitudinal axis of the hippocampus, …” (p.20)

      Regarding the drug infusion volume and concentration, we included these details in the Methods. Please see our detailed response to 4.a. below.

      (3) It is good that the authors included a comparison/control study using a spherical beacon-guided navigation task, to examine the specific psychological mechanisms disrupted by the hippocampal manipulations. However, as outlined below (4.b.), the sample size for the comparison study was lower than for the main study, and the data in Figure 8 suggest that the comparison task may be affected by the hippocampal manipulations similarly to the place-preference task, albeit less markedly. This would raise the question as to which mechanisms that are common to the two tasks may be affected by hippocampal functional inhibition, which should be considered in the discussion.

      The sample size for the object-guided navigation task was smaller because we initially did not plan the experiment, but later in the study decided to conduct the control test. Therefore, the object-guided navigation task was added to the study design after finishing the first three rats, resulting in a smaller sample size than the place preference task. We included this detail in the manuscript, as follows:

      “Note the smaller sample size in the object-guided navigation task. This was because the task was later added to the study design.” (p.24)

      Regarding the mechanism behind the two different tasks, we did not perform the same heading direction analysis here as in the place preference task because the two tasks have different characteristics such as task complexity. The object-guided navigation task is somewhat similar to the visually guided (or cued) version of the water maze task, which is widely known as hippocampal-independent (Morris et al., 1986; Packard et al., 1989; also see our descriptions on p.15). Therefore, we would argue that the two tasks (i.e., place preference task and object-guided navigation task) used in the current manuscript do not share neural mechanisms in common. Additionally, we confirmed that several behavioral measurements related to motor capacity, such as travel distance and latency, along with the direct hit proportion provided in Figure 8, did not show any statistically significant changes across drug conditions.

      4. Several important methodological details require clarification:

      a. Drug infusions (from line 673):

      - '0.3 to 0.5 μl of either phosphate-buffered saline (PBS) or muscimol (MUS) was infused into each hemisphere'; the authors need to clarify when which infusion volume was used and why different infusion volumes were used.

      We thank the reviewer for carefully reading our manuscript. We were cautious about side effects, such as suppressed locomotion or overly aggressive behavior, since the iHP injection site was close to the ventricle. We were keenly aware that the intermediate to ventral hippocampal regions are sensitive to the drug dosage from our previous experiments. Thus, we observed the rat’s behavior for 20 minutes after drug injection in a clean cage. We started from 0.5 μl, based on our previous study, but if the injected rat showed any sign of side effects in the cage, we stopped the experiment for the day and tried with a lower dosage (i.e., 0.4 μl first, then 0.3 μl, etc.) until we found the right dosage under which the rat did not show any side effect. This procedure is necessary because cannula tip positions are slightly different from rat to rat. When undergoing this procedure, five out of eight rats received 0.4 μl, two received 0.3 μl, and one received 0.5 μl. Still, there was no significant difference in performance, including the high-value visit percentage, departing and perimeter crossing directions, across all dosages. This information is now added in the Methods section as follows:

      “If the rat showed any side effect, particularly sluggishness or aggression, we reduced the drug injection amount in the rat by 0.1 ml until we found the dosage with which there was no visible side effect. As a result, five of the rats received 0.4 ml, two received 0.3 ml, and one received 0.5 ml.” (p.25)

      - I could not find the concentration of the muscimol solution that was used. The authors must clarify this and also should include a justification of the doses used, e.g. based on previous studies.

      Thank you for the suggestion. We used the drug concentration of 1mg/ml, which was adapted from our previous muscimol study (Lee et al., 2014; Kim et al., 2012). The manuscript is now updated, as follows:

      “…or muscimol (MUS; 1mg/ml, dissolved in saline) was infused into each hemisphere via a 33-gauge injection cannula at an injection speed of 0.167 ml/min, based on our previous study (Lee et al., 2014; Kim et al., 2012).” (p.25)

      -  Please also clarify if the injectors and dummies were flush with the guides or by which distance they protruded from the guides.

      The injection and dummy cannula both protruded from the guide cannula by 1 mm, and this information is now added to the Methods section, as follows:

      “The injection cannula and dummy cannula extended 1 mm below the tip of the guide cannula.” (p.25)

      b. Sample sizes: The authors should include sample size justifications, e.g. based on considerations of statistical power, previous studies, practical considerations, or a combination of these factors. Importantly, the smaller sample size in the control study using the spherical beacon-guided navigation task (n=5 rats) limits comparability with the main study using the place-preference task (n=8). Numerically, the findings on the control task (Figure 8) look quite similar to the findings on the place-preference task, with intermediate hippocampal muscimol infusions causing the most pronounced impairment and dorsal hippocampal muscimol infusions causing a weaker impairment. These effects may have reached statistical significance if the same sample size had been used in the place-preference study.

      We set the current sample size for several reasons. First, based on our previous studies, we assumed that eight, or more than six, would be enough to achieve statistical power in a “within-animal design” study. Also, considering the ethical commitments, we tried to keep the number of animals used in the study to the least. Last, our paradigm required very long training periods (3 months on average per animal), so we could not increase the sample size for practical reasons. Regarding the reasons for the smaller sample size for the object-guided navigation task, please see the previous response to 3 above. The manuscript is now revised as follows:

      “Based on our prior studies (Park et al., 2017; Yoo and Lee, 2017; Lee et al., 2014), the sample size of our study was set to the least number to achieve the necessary statistical power in the current within-subject study design for ethical commitments and practical considerations (i.e., relatively long training periods).” (p.22)

      c. Statistical analyses: Why were the data of the intermediate and dorsal hippocampal PBS infusion conditions averaged for some of the analyses (Figure 5; Figure 6B and C; Figure 7B and C; Figure 8B) but not for others (Figure 6A and Figure 7A)?

      The reviewer is correct that we only illustrated the separate dPBS and iPBS data for Figures 6A and 7A. Since the directional analysis is the main focus of the current manuscript, we tried to provide better visualization and more detailed examples of how the drug infusion changed the behavioral patterns between the PBS and MUS conditions in each region. Except for the visualization of DD and PCD, we averaged the PBS sessions to increase statistical power, as described in p.9. We added a detailed description of the reasons for illustrating dPBS and iPBS data separately in the manuscript, as follows:

      “Note that dPBS and iPBS sessions were separately illustrated here for better visualization of changes in the behavioral pattern for each subregion.” (p.12)

      Reviewing Editor (Recommendations For The Authors):

      The strength of evidence rating in the assessment is currently noted as "incomplete." This can be improved following revisions if you amend your conclusions in the paper, including in the title and abstract, such that the paper's major conclusions more closely match what is shown in the Results.

      Following the suggestions of the reviewing editor, we have mentioned the caveats of our study in the Limitations section of our revised manuscript (p.21). In addition, the manuscript has been revised so that the conclusions in the paper match more closely to the experimental results as can been seen in some of the relevant sentences in the abstract and main text as follows:

      “Inactivation of both dHP and iHP with muscimol altered efficiency and precision of wayfinding behavior, but iHP inactivation induced more severe damage, including impaired place preference. Our findings suggest that the iHP is more critical for value-dependent navigation toward higher-value goal locations.” (Abstract; p.2)

      “Whereas inactivation of the dHP mainly affected the precision of wayfinding, iHP inactivation impaired value-dependent navigation more severely by affecting place preference.” (p.5)

      “The iHP causes more damage to value-dependent spatial navigation than the dHP, which is important for navigational precision” (p.12)

      However, we haven’t changed the title of the manuscript as it carries what we’d like to deliver in this study accurately.

      Reviewer #1 (Recommendations For The Authors):

      - What were the dimensions of the environment? What distance did rats typically run to reach the reward zone? A scale bar would be helpful in Figure 1.

      We used the same circular arena from the shaping session, which was 1.6 meters in diameter (p.23), and the shortest path between the start location and either reward zone was 0.62 meters. We revised the manuscript for clarification as follows:

      “For the pre-training session, rats were required to find hidden reward zones…, on the same circular arena from the shaping session.” (p.23)

      “Therefore, the shortest path length between the start position and the reward zone was 0.62 meters.” (p.23)

      We also added a scale bar in Figure 1C for a better understanding.

      - Line 169: "The scene rotation plot covers the period from the start of the trial to when the rat leaves the starting point at the center and the departure circle (Figure 2B)."

      The sentence is unclear. Maybe it should be "... from the start of the trial to when the rat leaves the departure circle”.

      The sentence has been revised following the reviewer's suggestion. (p.7)

      - Line 147: "First, they learned to rotate the spherical treadmill counterclockwise to move around in the virtual environment (presumably to perform energy-efficient navigation)."

      It is not clear from this sentence if rats naturally preferred the counterclockwise direction or if the counterclockwise direction was a task requirement.

      We now clarified in our revised manuscript that it was not a task requirement to turn counterclockwise, as follows:

      “First, although it was not required in the task, they learned to rotate the spherical treadmill counterclockwise…” (p.6)

      - Line 149: "Second, once a trial started, but before leaving the starting point at the center, the animal rotated the treadmill to turn the virtual environment immediately to align its starting direction with the visual scene associated with the high-value reward zone."

      The sentence is unclear. Maybe "Second, once a trial started, the animal rotated the treadmill immediately to align its starting direction with the visual scene associated with the high-value reward zone.”

      We have updated the description following the suggestion. (p.6)

      Reviewer #2 (Recommendations For The Authors):

      - There are some misleading descriptions of the conclusion of the results in this paper. In this study, the functions of (a) selection of high-value target and (b) spatial navigation to the target were assessed in the behavioral experiments. The results of the pharmacological experiments showed that dHP inactivation impaired (b) and iHP inactivation impaired both (a) and (b) (Figures 5 B & D). However, the last sentence of the abstract states that dHP is important for the functions of (a) and iHP for (b). There are several other similar statements in the main text. Since the separation of (a) and (b) is an important and original aspect of this study, the description should clearly show the conclusion that dHP is important for (a) and iHP is important for both (a) and (b).

      Related to the above, the paragraph title in the Discussion "The iHP may contain a value-associated cognitive map with reasonable spatial resolution for goal-directed navigation (536-537)" is also somewhat misleading: "with reasonable resolution for goal-directed behavior" seems to reflect the results of an object-guided navigation task (Figure 8). However, the term "goal-directed behavior" is also used for value-dependent spatial navigation (i.e., the main task), which causes confusion. I would like to suggest clarifying the wording on this point.

      First, we need to correct the reviewer’s statement regarding our descriptions of the results. As the reviewer mentioned, our results indicated that the dHP inactivation impaired (b) but not (a), while the iHP inactivation impaired both (a) and (b). Regarding the iHP inactivation result, we focused on the impairment of (a) since our aim was to investigate spatial-value association in the hippocampus. Also, it was more likely that (a) affected (b), but not the other way, because (a) remained intact when (b) was impaired after dHP inactivation. We emphasized this difference between dHP and iHP inactivation, which was (a). Therefore, we mentioned in the last sentence of the abstract that the dHP is important for (b), which is the precision of spatial navigation to the target location, and the iHP is critical for (a).

      Moreover, we would like to clarify that we were not referring to the object-guided navigation task in Figure 8 in the phrase ‘with a reasonable spatial resolution for goal-directed navigation.’ Please note that the object-guided navigation task did not require fine spatial resolution to find the reward. The phrase instead referred to the dHP inactivation result (Figure 5 and 6), where the rats could find the high-value zone even with dHP inactivation, although the navigational precision decreased. Nonetheless, we agree with the reviewer for the confusion that the title might cause, so now have updated the title as follows:

      “The iHP may contain a value-associated cognitive map with reasonable spatial resolution for value-based navigation” (p.19)

      - As an earlier study focusing on the physiology of iHP, Maurer et al, Hippocampus 15:841 (2005) is also a pioneering and important study, and I suggest citing it.

      Thank you for the suggestion. We included the Maurer et al. (2005) study in the Introduction section as follows:

      “…Specifically, there is physiological evidence that the size of a place field becomes larger as recordings of place cells move from the dHP to the vHP (Jung et al., 1994; Maurer et al., 2005; Kjelstrup et al., 2008; Royer et al., 2010).” (p.4)

      - One of the strengths of this paper is that we have developed a new control system for the VR navigation task device, but I cannot get a very detailed description of this system in the Methods section. Also, no information about the system control has been uploaded to GitHub. I would suggest adding a description of the manufacturer, model number, and size of components, such as a rotary encoder and ball, and information about the software of the control system, with enough detail to allow the reader to reconstruct the system.

      We have now added detailed descriptions of the VR system in the Methods section (see “2D VR system). (p.22)

      Reviewer #3 (Recommendations For The Authors):

      (1) Some comments on specific passages of text:

      Lines 87 to 89: 'Surprisingly, beyond the recognition of anatomical divisions, little is known about the functional differentiation of subregions along the dorsoventral axis of the hippocampus. Moreover, the available literature on the subject is somewhat inconsistent.'

      I would recommend to rephrase these statements. Regarding the first statement, there is substantial evidence for functional differentiation along the dorso-ventral axis of the hippocampus (e.g., see reviews by Moser and Moser, 1998, Hippocampus; Bannerman et al., 2004, Neurosci Biobehav Rev; Bast, 2007, Rev Neurosci; Bast, 2011, Curr Opin Neurobiol; Fanselow and Dong, 2010, Neuron; Strange et al., 2014, Nature Rev Neurosci). Regarding the second statement, the authors may consider being more specific, as the inconsistencies demonstrated seem to relate mainly to the hippocampal representation of value information, instead of functional differentiation along the dorso-ventral hippocampal axis in general.

      We agree with the reviewer that the abovementioned statements need further clarification. The manuscript is now revised as follows:

      “Surprisingly, beyond the recognition of anatomical divisions, the available literature on the functional differentiation of subregions along the dorsoventral axis of the hippocampus, particularly in the context of value representation, is somewhat inconsistent.” (p.4)

      Lines 92 to 93: 'Thus, it has been thought that the dHP is more specialized for precise spatial representation than the iHP and vHP.'

      I think 'fine-grained' may be the more appropriate term here. Also, check throughout the manuscript when referring to the differences of spatial representations along the hippocampal dorso-ventral axis.

      Thank you for the insightful suggestion. We changed the term to ‘fine-grained’ throughout the manuscript, as follows:

      “Thus, it has been thought that the dHP is more specialized for fine-grained spatial representation than the iHP and vHP.” (p.4)

      “Consequently, the fine-grained spatial map present in the dHP…” (p.20)

      Line 217: well-'trained' rats?

      We initially used the term ‘well-learned’ to focus on the effect of learning, not training. Please note that the rats were already adapted to moving freely in the VR environment during the Shaping sessions, but the immediate counterclockwise body alignment only appeared after they acquired the reward locations for the main task. Nonetheless, we agree that the term might cause confusion, so we revised the manuscript as the reviewer suggested, as follows:

      “This implies that well-trained rats aligned their bodies more efficiently…” (p.8)

      Lines 309 to 311: 'Taken together, these results indicate that iHP inactivation severely damages normal goal-directed navigational patterns in our place preference task.'

      Consider to mention that dHP inactivation also causes impairments, albeit weaker ones.

      We thank the reviewer for the suggestion. We revised the manuscript by mentioning dHP inactivation as follows:

      “Taken together, these results indicate that iHP inactivation more severely damages normal goal-directed navigational patterns than dHP inactivation in our place-preference task.” (p.11-12)

      Lines 550 to 552: 'The involvement of the iHP in spatial value association has been reported in several studies. For example, Bast and colleagues reported that rapid place learning is disrupted by removing the iHP and vHP, even when the dHP remains undamaged (Bast et al., 2009).'

      Bast et al. (2009) did not directly show the role of iHP in 'spatial value associations'. They suggested that the importance of iHP for behavioral performance based on rapid, one-trial, place learning may reflect neuroanatomical features of the intermediate region, especially the combination of afferents that could convey the required fine-grained visuo-spatial information with relevant afferent and efferent connections that may be important to translate hippocampal place memory into appropriate behavioral performance (this may include afferents conveying value information). More recent theoretical and empirical research suggests that projections to the (ventral) striatum may be relevant (see Tessereau et al., 2021, BNA and Bauer et al., 2021, BNA).

      We appreciate the reviewer for this insightful comment. We agree with the reviewer that Bast et al. (2009) did not directly mention spatial value association; however, learning a new platform location needs an update of value information in the spatial environment. Therefore, we thought the study, though indirectly, suggested how the iHP contributes to spatial value associations. Nonetheless, to avoid confusion, we revised the manuscript, as follows:

      “The involvement of the iHP in spatial value association has been reported or implicated in several studies” (p.20)

      (2) Figures and legends:

      Figure 2B: What do the numbers after novice and expert indicate?

      The numbers indicate the rat ID, followed by the session number. We added the details to the Figure legend, as follows:

      “The numbers after ‘Novice’ and ‘Expert’ indicate the rat and session number of the example.” (p.34)

      Figure 2C: Please indicate units of the travel distance and latency measurements.

      The units are now described in the Figure legends, as follows:

      “Mean travel distance in meters and latency in seconds are shown below the VR arena trajectory.” (p.34)

      Figure 3Aii: Here and in other figures - do the vector lengths have a unit (degree?)?

      No, the mean vector length is an averaged value of the resultant vectors, thus having no specific unit.

      Figure 5A: Please explain what the numbers on top of the individual sample trajectories indicate.

      The numbers are IDs for rats, sessions, and trials of specific examples. We added the explanation to the Figure legends, as follows:

      “Numbers above each trajectory indicate the identification numbers for rat, session, and trial.” (p.35)

      (3) Additional comments on some methodological details:

      a. Why was the non-parametric Wilcoxon signed-rank test used for the planned comparison between intermediate and dorsal hippocampal PBS infusions, whereas parametric ANOVA and post-hoc comparisons were used for other analyses? This probably doesn't make a big difference for the interpretation of the present data (as a parametric pairwise comparison would also not have revealed any significant difference between intermediate and dorsal hippocampal PBS infusions), but it would nevertheless be good to clarify the rationale for this.

      We used the non-parametric statistics since our sample size was rather small (n=8) to use the parametric statistics, although we used the parametric ANOVA for some of the results because it is the most commonly known and widely used statistical test in such comparisons. However, we also checked the statistics with the alternatives (i.e., non-parametric Wilcoxon signed-rank test to parametric paired t-test and parametric One-way RM ANOVA with Bonferroni post hoc test to non-parametric Friedman’s test with Dunn’s post hoc test), and the statistical significance did not change with any of the tests. We now added the explanation in the manuscript, as follows:

      “Although most of our statistics were based on the non-parametric tests for the relatively small sample size (n=8), we used the parametric RM ANOVA for comparing three groups (i.e., PBS, dMUS, and iMUS) because it is the most commonly known and widely used statistical test in such comparison. However, we also performed statistical tests with the alternatives for reference, and the statistical significances were not changed with any of the results.” (p.26)

      b. Single housing of rats:

      Why was this chosen? Based on my experience, this is not necessary for studies involving cannula implants and food restriction. Group housing is generally considered to improve the welfare of rats.

      We chose single housing of rats because our training paradigm required precise restrictions on the food consumption of individual rats, which could be difficult in group housing.

      c. Anesthesia:

      Why was pentobarbital used, alongside isoflurane, to anesthetize rats for surgery (line 663)? The use of gaseous anesthesia alone offers very good control of anesthesia and reduces the risk of death from anesthesia compared to the use of pentobarbital.

      Why was anesthesia used for the drug infusions (line 674)? If rats are well-habituated to handling by the experimenter, manual restraint is sufficient for intra-cerebral infusions. Therefore, anesthesia could be omitted, reducing the risk of adverse effects on the experimental rats.

      I do not think that points b. and c. are relevant for the interpretation of the present findings, but the authors may consider these points for future studies to improve further the welfare of the experimental rats.

      We appreciate the reviewer’s careful suggestions. For both the use of pentobarbital during surgery and anesthesia for the drug infusion, we chose to do so to avoid any risk of rats being awake and becoming anxious and to ensure safety during the procedures. They might not be necessary, but they were helpful for the experimenters to proceed with sufficient time to maintain precision. Nonetheless, we agree with the reviewer’s concern, which was the reason why we monitored the rats’ behavior for 20 minutes in the cage after drug infusion to minimize any potential influence on the task performance. We updated the relevant details in the Methods section, as follows:

      “The rat was kept in a clean cage to recover from anesthesia completely and monitored for side effects for 20 minutes, then was moved to the VR apparatus for behavioral testing.” (p.25)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Neuronal activity spatiotemporal fine-tuning of cerebral blood flow balances metabolic demands of changing neuronal activity with blood supply. Several 'feed-forward' mechanisms have been described that contribute to activity-dependent vasodilation as well as vasoconstriction leading to a reduction in perfusion. Involved messengers are ionic (K+), gaseous (NO), peptides (e.g., NPY, VIP), and other messengers (PGE2, GABA, glutamate, norepinephrine) that target endothelial cells, smooth muscle cells, or pericytes. Contributions of the respective signaling pathways likely vary across brain regions or even within specific brain regions (e.g., across the cortex) and are likely influenced by the brain's physiological state (resting, active, sleeping) or pathological departures from normal physiology.

      The manuscript "Elevated pyramidal cell firing orchestrates arteriolar vasoconstriction through COX-2derived prostaglandin E2 signaling" by B. Le Gac, et al. investigates mechanisms leading to activitydependent arteriole constriction. Here, mainly working in brain slices from mice expressing channelrhodopsin 2 (ChR2) in all excitatory neurons (Emx1-Cre; Ai32 mice), the authors show that strong optogenetic stimulation of cortical pyramidal neurons leads to constriction that is mediated through the cyclooxygenase-2 / prostaglandin E2 / EP1 and EP3 receptor pathway with contribution of NPY-releasing interneurons and astrocytes releasing 20-HETE. Specifically, using a patch clamp, the authors show that 10-s optogenetic stimulation at 10 and 20 Hz leads to vasoconstriction (Figure 1), in line with a stimulation frequency-dependent increase in somatic calcium (Figure 2). The vascular effects were abolished in the presence of TTX and significantly reduced in the presence of glutamate receptor antagonists (Figure 3). The authors further show with RT-PCR on RNA isolated from patched cells that ~50% of analyzed cells express COX-1 or -2 and other enzymes required to produce PGE2 or PGF2a (Figure 4). Further, blockade of COX-1 and -2 (indomethacin), or COX-2 (NS-398) abolishes constriction. In animals with chronic cranial windows that were anesthetized with ketamine and medetomidine, 10-s long optogenetic stimulation at 10 Hz leads to considerable constriction, which is reduced in the presence of indomethacin. Blockade of EP1 and EP3 receptors leads to a significant reduction of the constriction in slices (Figure 5). Finally, the authors show that blockade of 20-HETE synthesis caused moderate and NPY Y1 receptor blockade a complete reduction of constriction.

      The mechanistic analysis of neurovascular coupling mechanisms as exemplified here will guide further in-vivo studies and has important implications for human neuroimaging in health and disease. Most of the data in this manuscript uses brain slices as an experimental model which contrasts with neurovascular imaging studies performed in awake (headfixed) animals. However, the slice preparation allows for patch clamp as well as easy drug application and removal. Further, the authors discuss their results in view of differences between brain slices and in vivo observations experiments, including the absence of vascular tone as well as blood perfusion required for metabolite (e.g., PGE2) removal, and the presence of network effects in the intact brain. The manuscript and figures present the data clearly; regarding the presented mechanism, the data supports the authors' conclusions.

      We thank the reviewer for his/her supportive comments as well as for pointing out pros and cons of the brain slice preparation.

      Some of the data was generated in vivo in head-fixed animals under anesthesia; in this regard, the authors should revise the introduction and discussion to include the important distinction between studies performed in slices, or in acute or chronic in-vivo preparations under anesthesia (reduced network activity and reduced or blockade of neuromodulation, or in awake animals (virtually undisturbed network and neuromodulatory activity).

      We have now added a paragraph in the introduction (lines 52-64) to highlight the distinction between ex vivo and in vivo models. We now also discuss that anesthetized animals exhibit slower NVC (Line 308-309).

      Further, while discussed to some extent, the authors could improve their manuscript by more clearly stating if they expect the described mechanism to contribute to CBF regulation under 'resting state conditions' (i.e., in the absence of any stimulus), during short or sustained (e.g., visual, tactile) stimulation, or if this mechanism is mainly relevant under pathological conditions; especially in the context of the optogenetic stimulation paradigm being used (10-s long stimulation of many pyramidal neurons at moderate-high frequencies) and the fact that constriction leading to undersupply in response to strongly increased neuronal activity seems counterintuitive?

      We now discuss more extensively the physiological relevance (lines 422-434 and 436-439) and the conditions where the described mechanisms of neurogenic vasoconstriction may occur.

      We agree with the reviewer that vasoconstriction in response to a large increase in neuronal activity is counterintuitive as it leads to undersupply despite an increased energy demand. We now discuss its potential physio/pathological role in attenuating neuronal activity by reducing energy supply (lines 453-464).

      Reviewer #2 (Public review):

      Summary:

      The present study by Le Gac et al. investigates the vasoconstriction of cerebral arteries during neurovascular coupling. It proposes that pyramidal neurons firing at high frequency lead to prostaglandin E2 (PGE2) release and activation of arteriolar EP1 and EP3 receptors, causing smooth muscle cell contraction. The authors further claim that interneurons and astrocytes also contribute to vasoconstriction via neuropeptide Y (NPY) and 20-hydroxyeicosatetraenoic acid (20-HETE) release, respectively. The study mainly uses brain slices and pharmacological tools in combination with Emx1Cre; Ai32 transgenic mice expressing the H134R variant of channelrhodopsin-2 (ChR2) in the cortical glutamatergic neurons for precise photoactivation. Stimulation with 470 nm light using 10-second trains of 5-ms pulses at frequencies from 1-20 Hz revealed small constrictions at 10 Hz and robust constrictions at 20 Hz, which were abolished by TTX and partially inhibited by a cocktail of glutamate receptor antagonists. Inhibition of cyclooxygenase-1 (COX-1) or -2 (COX-2) by indomethacin blocked the constriction both ex vivo (slices) and in vivo (pial artery), and inhibition of EP1 and EP3 showed the same effect ex vivo. Single-cell RT-PCR from patched neurons confirmed the presence of the PGE2 synthesis pathway.

      While the data are convincing, the overall experimental setting presents some limitations. How is the activation protocol comparable to physiological firing frequency? 

      As also suggested by Reviewer #1 we have now discussed more extensively the physiological relevance of our observations (lines 422-434 and 436-439).

      The delay (minutes) between the stimulation and the constriction appears contradictory to the proposed pathway, which would be expected to occur rapidly. The experiments are conducted in the absence of vascular "tone," which further questions the significance of the findings. 

      The slow kinetics observed ex vivo are probably due to the low recording temperature and the absence of pharmacologically induced vascular tone, as already discussed (lines 312-317). Furthermore, as recommended by reviewer #1, we have presented the advantages and limitations of ex vivo and in vivo approaches (lines 52-64).

      Some of the targets investigated are expressed by multiple cell types, which makes the interpretation difficult; for example, cyclooxygenases are also expressed by endothelial cells.

      Under normal conditions, endothelial cells only express COX-1 and barely COX-2, whose expression is essentially observed in pyramidal cells (see Tasic et al. 2016, Zeisel et al. 2015, Lacroix et al., 2015). As pointed out by Reviewer # 1, our ex vivo pharmacological data clearly indicate that vasoconstriction is mostly due to COX-2 activity, and to a much lesser extent to COX-1. Since it is well established that the previously described vascular effects of pyramidal cells are essentially mediated by COX-2 activity (Iadecola et al., 2000; Lecrux et al., 2011; Lacroix et al., 2015), we are quite confident that vasoconstriction described here is mainly due COX-2 activity of pyramidal cells.

      Finally, how is the complete inhibition of the constriction by the NPY Y1 receptor antagonist BIBP3226 consistent with a direct effect of PGE2 and 20-HETE in arterioles? 

      We agree with both reviewers that the complete blockade of the constriction by the NPY Y1 receptor antagonist BIBP3226 needs to be more carefully discussed. We have now included in the discussion the possible involvement of Y1 receptors in pyramidal cells, which could promote glutamate release and possibly COX-2, thereby contributing to PGE2 and 20-HETE signaling (lines 402-409).

      Overall, the manuscript is well-written with clear data, but the interpretation and physiological relevance have some limitations. However, vasoconstriction is a rather understudied phenomenon in neurovascular coupling, and the present findings may be of significance in the context of pathological brain hypoperfusion.

      We thank the reviewer for his/her comment and suggestions, which have helped us to improve our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Methods:

      It is not clear if brain slices (or animals) underwent one, two, or several optogenetic stimulations - especially for experiments where 'control' is compared to 'treated' - does this data come from the same vessels (before and after treatment) or from two independent groups of vessels? If repeated stimulations are performed, do these repeated stimulations cause the same vascular response?

      As indicated in the Materials and Methods section, line 543: “Only one arteriole was monitored per slice” implies that the comparisons between the ‘control’ and ‘treated’ groups were made from independent groups of vessels. To clarify this point, we have added “receiving a single optogenetic or pharmacological stimulation” to this sentence lines 543-544.

      For in vivo experiments, animals underwent 10-20 optogenetic stimulations with a 5-minute interstimulus interval during an experiment lasting 2 hours for maximum. Trials from the same vessel were averaged (with a 0.1 s interpolation) for analysis, and the mean per vessels is presented in the graphics.

      Figure 2:

      Can the authors speculate about the cause for the slow increase in indicator fluorescence from minute 1.5 onward, which seems dependent on stimulation frequency? Is this increase also present when slices from a ChR2-negative animal undergo the same stimulation paradigm?

      Rhod2 was delivered by the patch pipette as indicated in the Materials and Methods section (line 514). Although a period of “at least 15 min after passing in whole-cell configuration to allow for somatic diffusion of the dye” (line 551-552) was observed, this single-wavelength Ca2+ indicator likely continued to diffuse into the cells during the optical recording thereby, inducing a slight increase in delta F/F0, which is consistent with the positive slopes of the mean fluorescence changes observed during the 30-s control baseline (Fig. 2b).

      Figure 4: Why did the authors include panel a) here? Also, do the authors observe that cells with different COX-1 or -2 expression profiles show different (electrical, morphological) properties?

      The purpose of panel a) in Fig. 4 was to ensure the regular spiking electrophysiological phenotype of the pyramidal neurons whose cytoplasm was harvested for subsequent RT-PCR analysis. Despite our efforts, we found no difference in the 32 electrophysiological features between COX-1 or COX-2 positive and negative cells. This is now clearly stated in the result section (lines 210-212) and a supplementary table of electrophysiological features is now provided. Because it is difficult to determine the morphology of neurons analyzed by single-cell RT-PCR (Devienne et al. 2018), these cells were not processed for biocytin labeling.

      Figure 5: (1) Maybe the authors could highlight panels b-f as in vivo experiments to emphasize that these are in-vivo observations while the other experiments (especially panels g, h) are made in slices? 

      We thank the reviewer for this suggestion. A black frame is now depicted in Figure 5 to emphasize in vivo experiments.

      (2) What is the power of the optogenetic stimulus in this experiment? 

      The power of the optogenetic stimulus was 38 mW/mm<sup>2</sup> in ex vivo experiments (see Line 527). For in vivo experiments, 1 mW pulses of 5 ms were used, the intensity being measured at the fiber end. We now provide the information for in vivo experiments in the Methods lines 639-640.

      (3) Experiments were performed with Fluorescein-Dextran at 920-nm excitation which would overlap with EYFP fluorescence from the ChR2-EYFP transgene. Did the authors encounter any issues with crosstalk between the two labels? 

      Crosstalk between EYFP and fluorescein fluorescence was indeed an issue. This is why arterioles were monitored at the pial level to avoid fluorescence contamination from the cortical parenchyma. Because of the perivascular space around pial arterioles, it was possible to measure vessel diameter without pollution for the parenchyma (see Author response image 1 below). To clarify this point we added the statement “which are not compromised by the fluorescence from the ChR2-EYFP transgene in the parenchyma (Madisen et al. 2012),” Line 628-629. Note that line scan acquisitions without photoactivation stimulation did not trigger any progressive change in the vessel size or resting fluorescence.

      Author response image 1.

      Example of a pial arteriole filled with fluorescein dextran (cyan) in an Emx1-EYFP mouse (parenchyma labeled with YFP, in cyan). The red line represents a line scan to record the change in diameter. Due to the perivascular space surrounding the arterioles, the vessel walls are clearly identified and separated from the fluorescent parenchyma.

      (4) Could the authors potentially extend the time course in panel e) to show the recovery of the preparation to the baseline? 

      Because arterioles were only monitored for a 40-s period during a session of optogenetic stimulation/imaging we cannot extend panel e. Nonetheless, a 5 minutes interstimulus interval was observed to allow the full recovery of the preparation to the baseline. This now clarified line 640. Of note, the arteriole shown in panel d before indomethacin treatment fully recovered to baseline after this treatment.

      Also, did the authors observe any 'abnormal' behavior of the vasculature after stimulation, such as large-amplitude oscillations? (5) 

      We did not specifically investigate resting state oscillations, such as vasomotion, but the 10-s long baseline recording for each measurement indicates no long lasting, abnormal and de novo behavior with a frequency higher than 0.1-0.2 Hz.

      Can the authors show in vivo data from control experiments in EYFP-expressing or WT mice that underwent the same stimulation paradigm (Supplementary Figure 1 shows data from brain slices)?

      The reviewer is correct to point out this important control, as optogenetic stimulation can induce a vascular response without channel rhodopsin activation at high power (see our study on the topic, Rungta et al, Nat Com 2017). We therefore tested this potential artefact in a WT mouse using our setup, with different intensities and durations of optogenetic stimulation.

      Author response image 2A shows that stimulations of 10 seconds, 10 Hz, 1 mW, 5 ms pulses, i.e. the conditions we used for the experiments in Emx1 mice, did not induce dilation or constriction. Stimulation for 5 seconds with the same number of pulses, but with a higher power (4 mW), longer duration (20 ms pulses) and at a higher frequency elicited a small dilation in 1 of 2 pial arterioles (Author response image 2B). For this reason, we used only shorter (5ms) and less intense (1 mW) optogenetic stimulation to ensure that the observed dilation was solely due to Emx1 activation and not to light-induced artefactual dilation.

      Author response image 2.

      Optogenetic stimulation in a wild-type mouse. A. No diameter changes upon stimulations of 10 seconds, 10 Hz, 1 mW, 5 ms pulses, i.e. the conditions we used for the experiments in Emx1 mice. B. Stimulation of higher power (4 mW), longer duration (20 ms pulses) and at a higher frequency elicited a small dilation in 1 (grey traces) of 2 pial arterioles.

      Figures 6 and 7: It is surprising that blockade of NPY Y1 receptors leads to a complete loss of the constriction response. As shown in Figure 7, the authors suggest that pyramidal neuron-released PGE2 (and glutamate) initiate several cascades acting on smooth muscle directly (PGE2-EP1/EP3), through astrocytes (Glu/COX-1/PGE2 or 20-HETE), or through NPY interneurons (Glu/NPY/Y1 or PGE2/NPY/Y1). This would imply that COX-1/2 and NPY/Y1 pathways act in series (as discussed by the authors). Besides the potential effects on NPY release mentioned in the discussion, could the authors comment if both (NPY and PGE2) pathways need to be co-activated in smooth muscle cells to cause constriction?

      We thank the reviewer for raising this surprising complete loss of vasoconstriction by Y1 antagonism, despite the contribution of other vasoconstrictive pathways. We now discuss (lines 402-409) the possibility that activation of the neuronal Y1 receptors in pyramidal cells may also have contributed to the vasoconstriction by promoting glutamate and possibly PGE2 release. The combined activation of vascular and neuronal Y1 receptors may explain the complete blockage of optogenetically induced vasoconstriction by BIBP3226.

      Reviewer #2 (Recommendations for the authors):

      The complete block of the constriction by BIBP3226 needs to be carefully considered.

      We thank the reviewer for stressing this point also raised by Reviewer #1. As mentioned above we now discuss (lines 402-409) the possibility that activation of the neuronal Y1 receptors in pyramidal cells may also have contributed to the vasoconstriction by promoting glutamate and possibly PGE2 release. The combined activation of vascular and neuronal Y1 receptors may explain the complete blockage of optogenetically induced vasoconstriction by BIBP3226.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We thank the Reviewers and the Editor for their thoughtful and constructive feedback. In the revised manuscript, we have addressed all comments thoroughly and made several substantial improvements:

      ● Benchmarking against state-of-the-art methods: We now provide a detailed comparison of our method, PGBAR, with MLspike and CASCADE using our cerebellar dataset recorded at high sampling rates. This comparison demonstrates that PGBAR offers more reliable spike time estimates with significantly lower variability in temporal accuracy (Figure 9).

      ● Quantitative analyses: We replaced qualitative statements with quantitative metrics. For example, we now report Pearson’s correlation (>0.95) of spike probabilities across trials and 100% of posterior samples with correct spike number detection during low SNR conditions (Figures 7 and 8).

      ● Clarified modeling rationale: We elaborated on the motivation behind modeling bursting dynamics using a hidden two-state process, which helps mitigate bias in spike detection under non-stationary firing conditions.

      ● Model identifiability and robustness: We demonstrate that our approach avoids parameter degeneracy through careful model design and parameter reparameterization. Sensitivity analyses (Figure 10) show that PGBAR is more robust to hyperparameter variation than MLspike.

      ● Improved clarity and accessibility: We revised the Introduction and Results sections to better explain the context, goals, and implications of our method, and clarified the advantages of joint parameter and state inference within our Bayesian framework.

      We believe that these additions significantly strengthen our manuscript and demonstrate the utility of PGBAR for high-temporal-precision spike inference. Please find below our detailed responses to both Public Reviews and Recommendations for the authors.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, Diana et al. present a Monte Carlo-based method to perform spike inference from calcium imaging data. A particular strength of their approach is that they can estimate not only averages but also uncertainties of the modelled process. The authors then focus on the quantification of spike time uncertainties in simulated data and in data recorded with a high sampling rate in cerebellar slices with GCaMP8f.

      Strengths:

      - The authors provide a solid groundwork for sequential Monte Carlo-based spike inference, which extends previous work of Pnevmatikakis et al., Greenberg et al., and others.

      - The integration of two states (silence vs. burst firing) seems to improve the performance of the model.

      - The acquisition of a GCaMP8f dataset in the cerebellum is useful and helps make the point that high spike time inference precision is possible under certain conditions.

      Weaknesses:

      - The algorithm is designed to predict single spike times. Currently, it is not benchmarked against other algorithms in terms of single spike precision and spike time errors. A benchmarking with the most recent other SMC model and another good model focused on single spike outputs (e.g., MLSpike) would be useful to have.

      We thank the reviewer for the observation. In our revised manuscript, we have included a detailed comparison of spike time accuracy between our method, MLspike, and the supervised method, CASCADE, now summarized in Figure 9. In this analysis, we used our in vitro dataset to estimate the average temporal accuracy of spike detection across the three methods. As discussed in the main text, the average temporal accuracy was defined as the time difference between ground truth and the nearest detected spikes averaged across the ground truth. The distributions of temporal accuracies across our experiments obtained from MLspike, Cascade, and PGBAR differ in their spread, with 10th-to-90th percentile ranges of 14 ms, 8 ms, and 3 ms, respectively. This result demonstrates that PGBAR spike time estimates are more reliable than MLspike and CASCADE across trials, with a narrower unbiased distribution of temporal accuracy. 

      A direct comparison of PGBAR with the Sequential Binding Model (SBM) developed by Greenberg et al. was not possible since the biophysical model is designed around early GCaMP variants and thus not suitable for inference with our GCaMP8f dataset. We generally agree that employing realistic models of the calcium indicator can improve inference, however, PGBAR responds to a different question, namely how to simultaneously infer spike times and model parameters, which was still an issue with the SBM approach. 

      Some of the analyses and benchmarks seem too cursory, and the reporting simply consists of a visual impression of results instead of proper analysis and quantification. For example, the authors write "The spike patterns obtained using our method are very similar across trials, showing that PGBAR can reliably detect single-trial action potential-evoked GCaMP8f fluorescence transients." This is a highly qualitative statement, just based on the (subjective) visual impression of a plot. Similarly, the authors write "we could reliably identify the two spikes in each trial", but this claim is not supported by quantification or a figure, as far as I can see. 

      We thank the reviewer for this remark. We have now justified quantitatively our statement regarding the similarity across trials. In the revised preprint, we explain that in the specific experiment illustrated in Figure 7, Pearson’s pairwise correlation between spike probabilities (Gaussian filtered with 20 ms bandwidth) across trials is always larger than 0.95. The statement quoted by the reviewer, "we could reliably identify the two spikes in each trial" refers to the fact that in 100% of the posterior samples, generated from the analysis of each trial, we detected 2 spikes in the time window considered. The temporal accuracy of our detection was then illustrated for all trials in Figure 7H, where we compared the posterior distribution of the inter-spike interval between the first two spikes across trials. 

      The statement referred by the Reviewer has been revised to read

      (line 319) “The Pearson’s pairwise correlation between spike probabilities (Gaussian filtered with 20 ms bandwidth) across trials is always larger than 0.95, which demonstrates that PGBAR provides robust predictions across trials and it can reliably detect single-trial action potential-evoked GCaMP8f fluorescence transients.”

      We revised the second statement as:

      (line 324) “Despite the relatively low SNR, 100% of the posterior samples contained two spikes in the considered time interval.” 

      The authors write "but the trade-off between temporal accuracy, SNR and sampling frequency must be considered", but they don't discuss these trade-offs systematically.

      We thank the reviewer for the comment. We have now removed the quoted sentence in the updated preprint. We revised this statement to read: 

      (line 302) “Based on this analysis we expect PGBAR to provide accurate estimates of inter-spike intervals down to 5 ms.”

      It has been shown several times from experimental data that spike inference with single spike resolution does not work well (Huang et al. eLife, 2021; Rupprecht et al., Nature Neuroscience, 2021) in general. This limitation should be discussed with respect to the applicability of the proposed algorithm for standard population calcium imaging data.

      We thank the reviewer for this comment. Detecting single spike times is indeed a difficult task. Compared to previous methods for single spike estimation, the advantage of our statistical approach is the rigorous analysis of uncertainties propagated by unknown model parameters and noisy recordings. This is an important aspect that was missing in previous approaches and that we were able to address thanks to our fully probabilistic approach. 

      Several analyses are based on artificial, simulated data with simplifying assumptions. Ever since Theis et al., Neuron, 2016, it has been known that artificially generated ground truth data should not be used as the primary means to evaluate spike inference algorithms. It would have been informative if the authors had used either the CASCADE dataset or their cerebellum dataset for more detailed analyses, in particular of single spike time precision.

      We thank the reviewer for this comment. 

      To address the reviewer’s concern about single spike time precision, we have added to our revised preprint a further comparison between the temporal accuracy of PGBAR, CASCADE, and MLspike for our cerebellar dataset (Fig. 9, already discussed above). 

      Nevertheless, as pointed out by the reviewer, simulated data should not be used as the primary means to evaluate the performance of an inference algorithm. However, it is standard practice in the field of model-based inference to validate the approach first with data generated by the same model used for inference. This step is usually done for two main reasons: first, for internal consistency of the method, and second, to explore the regimes where inference is achievable. We made use of simulated data to address specific questions. Specifically, in Figure 2, we illustrate the analysis of data simulated using the same model for inference. In Figure 3, we used simulated data to highlight the importance of modeling bursting activity to avoid biases induced by non-homogeneous firing rates. In Figure 6, we used simulated data to explore the theoretical accuracy of PGBAR under different conditions of signal-to-noise ratio and acquisition frequencies.

      In its current state, the sum of the current weaknesses makes the suggested method, while interesting for experts, rather unattractive for experimentalists who want to perform spike inference on their recorded calcium imaging data.

      In our preprint, we illustrated the application of PGBAR to benchmark data and our cerebellar recordings. Therefore, our approach can be part of the calcium imaging data analysis pipeline. The advantage of estimating statistical uncertainties and model parameters makes PGBAR an attractive tool for the wide neuroscience community interested in spike inference and statistical accuracy. In addition, as noted by Reviewer 2, our code is well documented. User-friendliness and integrating our method within GUI analysis software might be the next step if there is increasing interest in using this method.

      Other comments:

      One of the key features of the SMC model is the assumption of two states (bursting vs. non-bursting). However, while it seems clear that this approach is helpful, it is not clear where this idea comes from, from an observation of the data or another concept.

      We thank the reviewer for this comment. As the reviewer pointed out, accounting for two firing regimes is helpful as it prevents biases in estimating the number of spikes when the firing rate is non-stationary and does not follow single-frequency Poisson statistics (as shown in Figure 3 of our preprint), as expected during in vivo recordings. Animals can alter their behavioral state and be exposed to different sensory stimulations, which condition the activity of neurons. A first step beyond the assumption of a steady firing rate is indeed to introduce a hidden two-state process to separate periods of high and low firing rates. In our revised text, we explicitly discuss the rationale behind this choice. We want to emphasize that PGBAR is the only model-based approach that accounts for nonhomogeneous firing rates. In addition, due to the binary character of the underlying bursting state and the high dimensionality of the problem, traditional optimization methods would not be applicable. We solved this problem by applying modern sequential Monte Carlo algorithms (PGAS, Lindsten 2014, for joint estimation for time-varying signals and model parameters) for the first time in the context of spike inference. In summary, the novelty of our work is both in modeling the firing statistics and the inference strategy used.

      Another SMC algorithm (Greenberg et al., 2018) stated that the fitted parameters showed some degeneracy, resulting in ambiguous fitting parameters. It would be good to know if this problem was avoided by the authors.

      As the reviewer pointed out, one of the weaknesses of the SBM approach is the optimization of the model parameters. This is expected, as SBM uses a biophysical model of the calcium indicator, and a general issue of dynamical models is the presence of so-called sloppy directions in the parameter space, which leads to ambiguous estimations. This is an intrinsic problem due to the model complexity also associated with poorly known parameters such as kinetic constants, which are hard to constrain experimentally. PGBAR uses a much simpler model to describe calcium transients (a second-order autoregressive process) precisely to avoid the non-identifiability of model parameters. Furthermore, we employed a parameterization of the autoregressive model (discussed in the Reparameterization section of Materials and Methods) regarding peak response to a single action potential, decay constant, and rise time (i.e., time to peak). These phenomenological parameters are well documented for different calcium indicators, which enables us to design appropriate prior distributions that significantly facilitate the identifiability of parameters.

      Reviewer #2 (Public Review):

      Summary:

      Methods to infer action potentials from fluorescence-based measurements of intracellular calcium dynamics are important for optical measurements of activity across large populations of neurons. The variety of existing methods can be separated into two broad classes: a) model-independent approaches that are trained on ground truth datasets (e.g., deep networks), and b) approaches based on a model of the processes that link action potentials to calcium signals. Models usually contain parameters describing biophysical variables, such as rate constants of the calcium dynamics and features of the calcium indicator. The method presented here, PGBAR, is model-based and uses a Bayesian approach. A novelty of PGBAR is that static parameters and state variables are jointly estimated using particle Gibbs sampling, a sequential Monte Carlo technique that can efficiently sample the latent embedding space.

      Strengths:

      A main strength of PGBAR is that it provides probability distributions rather than point estimates of spike times. This is different from most other methods and may be an important feature in cases when estimates of uncertainty are desired. Another important feature of PGBAR is that it estimates not only the state variable representing spiking activity but also other variables such as baseline fluctuations and stationary model variables, in a joint process. PGBAR can therefore provide more information than various other methods. The information in the GitHub repository is well-organised.

      Weaknesses:

      On the other hand, the accuracy of spike train reconstructions is not higher than that of other model-based approaches, and clearly lower than the accuracy of a model-independent approach based on a deep network. The authors demonstrate convincingly that PGBAR can resolve inter-spike intervals in the range of 5 ms using fluorescence data obtained with a very fast genetically encoded calcium indicator at very high sampling rates (line scans at >= 1 kHz). It would be interesting to more systematically compare the performance of PGBAR to other methods in this regime of high temporal resolution, which has not been explored much.

      We appreciate the Reviewer’s comment. In response to this observation, we have now included a thorough comparison of PGBAR, MLspike, and CASCADE in addition to the analysis of our cerebellar dataset acquired with a high sampling rate (Figure 9 in the revised preprint). PGBAR and CASCADE predictions are comparable in terms of correlation with the ground truth spikes, and both outperform MLspike. We have also quantified the spike time accuracy as the average distance between ground-truth spikes and the nearest prediction for all the methods. Among the three, PGBAR has the lowest variability of spike time accuracy across our experimental trials. We concluded that while PGBAR and CASCADE show comparable correlations with ground truth, our method provides more reliable spike time estimates.  

      Recommendations for the authors

      Reviewing Editor (Recommendations For The Authors):

      In the discussion with reviewers, it was also suggested that while the manuscript emphasized the high temporal resolution of the method (5 ms), this was achieved under favorable conditions (very high sampling rate, fast indicator). Results cannot be compared easily to alternative methods based on published data because these conditions are unusual. Do other methods (at least some of which are presumably easier to use) achieve similar temporal resolution when applied to the same dataset? I feel this could be addressed easily and add valuable information.

      We thank the Reviewing Editor for the suggestion. In our revised preprint, we have now added a full comparison between the performance of PGBAR, MLspike (as an alternative Bayesian approach), and CASCADE (as a state-of-the-art supervised method) tested on our cerebellar dataset. This analysis highlights the improved reliability of our method in terms of temporal accuracy and trial-to-trial variability.

      Reviewer #1 (Recommendations For The Authors):

      - It is in several places difficult to understand the bigger context of some details. For example, the authors write "In this work, we use Monte Carlo methods to approximate the posterior distribution in Eq. (13)." It would be helpful to state what the bigger goal behind this procedure is, here and at other places. Please go through the Introduction and the Results, there is some room for improvement in terms of accessibility.

      We thank the Reviewer for the comment. Monte Carlo methods are generally used when dealing with intractable (non-analytical) probability distributions, which is the case for the models used for spike inference. The “bigger goal behind this procedure” is just the numerical approximation of posterior probabilities, which simply formalizes the question of estimating unknowns from data given a statistical model according to the Bayesian theorem. The advantage of Monte Carlo methods, compared to other techniques (e.g., variational methods), is to be statistically unbiased, which is one of the main reasons why we developed this approach. We clarified the goal of the Monte Carlo inference In the introduction, by adding the following text:

      (line 79) “In this work we employ the particle Gibbs (PG) sampler on a bursting autoregressive (BAR) model of time series calcium-dependent fluorescence to provide not only point estimates of spike times but also quantify the statistical uncertainty of each estimate. This is important for downstream analyses such as comparing activity across neurons or conditions.”

      We introduce the Results/Model section with the sentence:

      (line 91) “To infer spike times and their uncertainty from noisy fluorescence traces, we first build a probabilistic generative model that captures the main dynamics underlying the fluorescence signal.”

      And later on in the Results/Sequential Monte Carlo section, we added:

      (line 156) “The model described in the previous section is analytically intractable, therefore we employ Monte Carlo methods to sample from the posterior distribution in Eq. (13) of spike times and model parameters, allowing us to make probabilistic inferences rather than relying on point estimates alone.”

      In the Abstract: "it provides a flexible statistical framework to test more specific models of calcium indicators". What is meant by this sentence? I was unable to find any results related to this statement.

      In our work, we propose a statistical model (depicted in Figure 1A) that accounts for a binary model for non-homogeneous firing, a Gaussian random walk to describe the modulation of the baseline fluorescence coupled to an autoregressive process to link spikes to fluorescence. The phrase quoted by the Reviewer refers to the possibility of replacing the autoregressive model with more specific models of calcium indicators in the future. For instance, employing the biophysical models  of calcium indicators to refine the link between spikes and calcium fluorescence. The inference algorithm does not depend on the specific spike-to-fluorescence model. In this sense, our framework is flexible as it offers the opportunity to analyze data acquired using other calcium indicators.  

      The authors write "One of the key advantages of our sampling algorithm is the joint estimation of latent states and time-independent model parameters." Why is this an advantage? Advantage compared to which alternative algorithm?

      We thank the reviewer for this comment. All existing spike inference algorithms use ad-hoc techniques to choose or calibrate the hyperparameters introduced. The estimation of spike times is in general highly sensitive to parameters such as the peak fluorescence in response to single action potentials, kinetic constants, noise levels, baseline, or any regularization or model parameter. These parameters are usually unknown, and all available inference methods provide additional prescriptions to calibrate them. This problem can lead to the propagation of errors and systematic biases. Modern Monte Carlo algorithms, such as the ones employed in our work, address specifically this problem by targeting the joint posterior distribution of all time-dependent variables and the model parameters. Compared to previous approaches, our method offers a statistically rigorous algorithm to identify the parameters. Furthermore, this approach enables us to use Bayesian priors to constrain their ranges without introducing ad-hoc biases and reducing the sensitivity to inaccurate choices of hyperparameters compared to other methods (MLspike), as shown in our new Figure 10 (following a suggestion from Reviewer 2), where we illustrate a parameter sensitivity analysis across MLspike and PGBAR (see responses to Reviewer 2 for further details). We clarified this in the Introduction by adding the sentence:

      (line 60) “[...] Moreover, current Bayesian methods do not treat time-independent model parameters (e.g. rate constants) and dynamic variables equally. Instead, they require additional optimization procedures to calibrate model parameters, typically relying on ad-hoc tuning or grid search. This separation can lead to biased inference and poorly calibrated uncertainty estimates, particularly when parameters such as calcium decay time or spike amplitude are inaccurately specified. In contrast, our approach jointly infers both spike times and model parameters within a unified Bayesian framework, enabling uncertainty-aware estimation and avoiding separate, error-prone calibration steps.”

      and In the section “Validation and performance of PGBAR” we added the text:

      (line 201) “One of the key advantages of our sampling algorithm is the joint estimation of latent states and time-independent model parameters, such as spike amplitude, decay time, noise level, and baseline variance. This stands in contrast to most existing spike inference algorithms, which rely on fixed or externally calibrated parameters. Such fixed-parameter methods are vulnerable to systematic errors when parameter values are uncertain or misestimated. By jointly sampling from the posterior of all variables and parameters, our method propagates uncertainty correctly and mitigates bias due to manual tuning or poor initialization.”

      We also added the following text in the discussion:

      (line 411) “The estimation of time-independent model parameters is a well-known issue in spike detection algorithms, typically requiring ad-hoc calibration procedures, grid search, or manual settings. Because spike inference is sensitive to parameters such as the calcium response amplitude, rise and decay kinetics, and noise level, errors in these parameters can substantially affect the accuracy of spike time estimates. By jointly sampling model parameters and latent variables, PGBAR eliminates the need for separate calibration and ensures that uncertainty in parameters is propagated to spike time estimates in a principled way. As illustrated in Figure 10, this leads to a more robust inference compared to existing methods like MLSpike, which show greater sensitivity to parameter variation. In addition, PGBAR enables the users to calibrate the inference of action potentials by setting prior mean and variance of phenomenological parameters (e.g. rise and decay constants, firing rates, bursting frequencies).”

      The authors write "We tested our approach on the fast calcium indicator GCaMP8f (...)". Be more precise. Why exactly were these experiments done, what aspects of the algorithm were supposed to be tested? It is left to the reader to make sense out of these experiments. Please provide the logic of this experiment.

      We thank the reviewer for the comment. We developed our method specifically for regimes of high firing rates. For this reason, in addition to the CASCADE benchmark dataset, we have tested our approach on recordings of cerebellar granule cells due to their fast spiking patterns. For this purpose, we have employed the ultrafast state-of-the-art calcium indicator GCaMP8f combined with linescan imaging techniques to enable fast acquisition rates. We added the following text in the manuscript to clarify:

      (line 306) “We tested our approach on the fast calcium indicator GCaMP8f by performing high-speed (2.8 kHz) two-photon linescan calcium imaging of cerebellar granule cells in vitro. GCaMP8f was expressed in the Crus I region of the cerebellum using adeno-associated virus (AAV) injection (Fig. 7A). Compared to GCaMP6f, GCaMP8f exhibits a rise time that is nearly an order of magnitude faster, which we expected to translate into substantially improved temporal accuracy in spike time detection.”

      The authors write "If we consider as reference correlation the average across the CASCADE dataset (0.75) (...)". Why would this threshold be appropriate? This sounds arbitrary; this experiment was conducted with 2.8 kHz line scan imaging of GCaMP8, while the reference stems from low-rate imaging of older indicators.

      We thank the reviewer for the remark. In the sentence quoted, we have used 0.75 as a reference for the state-of-the-art correlation between ground truth and predicted spikes and indicated the lowest temporal resolution (10 ms) where the PGBAR correlation is larger than the reference value. As the Reviewer correctly pointed out, the reference 0.75 refers to datasets with much lower acquisition frequency; therefore, in our revised preprint, we have added a comparison of the correlations obtained from PGBAR, CASCADE, and MLspike using high-speed recordings of cerebellar GCs (Figure 9), showing the increased performance of our method at high temporal resolution.  

      How was PGBAR evaluated using a given dataset in Figure 4c or in Figure 7? It is unclear to the reviewer whether the priors were automatically/manually adjusted for each data set.

      We thank the Reviewer for this comment. Briefly, for the CASCADE dataset, we have designed the priors for all parameters according to the existing characterization of the calcium indicator used in each experiment (Chen et al. 2013). For our cerebellar data, we have performed single stimulation trials for each recording, which we used to design priors on peak fluorescence response, decay constant, and time to peak fluorescence. In the Results section of the revised preprint, we clarified more specifically how priors were designed for the CASCADE and our cerebellar datasets. We have added the following statements:

      (line 239) “Bayesian priors for all PGBAR parameters were adapted to each experiment according to the existing characterization of the different calcium indicators used (Chen et al., 2013).”

      (line 314) “For each recorded soma and bouton we applied two types of stimulations. Single time point stimulation and a fixed stimulation pattern generated from a 20 Hz Poisson process with 29 stimulation time points. First, we used the single-stimulation trials to design prior distributions of amplitudes, rise and decay constants (Fig. 7C). Next, we used PGBAR to analyze independently each Poisson stimulation trial in Figure 7E. By generating thousands of posterior samples of spike time patterns, we obtained the spike probability for all time frames and trials (Fig. 7F).” 

      The authors write "This analysis illustrates the variability expected when analyzing multiple trials of the same neuron." Variability across trials of neuronal activity? Or variability of spike inference?

      We thank the reviewer for the comment. In the revised text, we clarify that we refer to the variability of spike inference across trials.

      The original statement has been revised to read: 

      (line 301) “This analysis illustrates the expected variability of spike inference when analyzing multiple trials of the same neuron.”

      Technical question: How can the authors be sure that glass electrode stimulation only elicits a single AP per stimulation? This was not clear to me from the manuscript alone.

      We thank the reviewer for the question. Our experimental protocol is designed in a way that in each trial we make sure a single electrical stimulation elicits a single AP. We adjust our stimulation strength until we see an all-or-none calcium transient in response to a single AP. Given the fast temporal properties of GCaMP8f, we could distinguish a single AP response from multiple APs during a single electrical stimulation. We then introduced a single stim trial ahead of each Poisson-train trial to see whether our stimulation strength could elicit a single AP response reliably and consistently. In this way, we ensured that every single stim was producing a single AP. 

      Figure 8: Please explain what you mean by "bouton". What is the dashed line in (A)? Why is it interesting to look at the differences between bouton and soma?

      We thank the Reviewer for the comment. In the updated text we clarified that we refer to synaptic boutons along the parallel fiber (line 311) and that the dashed line in Figure 8 refers to the ground-truth number of spikes (29). We also pointed out that the estimated delay between somas and boutons is compatible with the proximity of synaptic boutons to the stimulation site along the parallel fiber by adding the following text: 

      (line 340) “This result is compatible with the proximity of synaptic boutons to the electrical stimulation along the parallel fiber. We analyzed both signals from somata and synaptic boutons because in vivo two-photon imaging can be made from both parts of the cell. Here we showed that our method performs reliably on both, demonstrating its robustness across recording sites.”

      Reviewer #2 (Recommendations For The Authors):

      The authors emphasised the result that PGBAR can resolve spike timing differences of 5 ms. However, this result was obtained based on fluorescence data measured with a very fast calcium indicator at very high sampling rates. It remains unclear how the performance of PGBAR compares to other methods in this regime of high temporal resolution, which has not been explored much in previous comparisons of methods. Potential users interested in this regime would benefit from a direct comparison to other approaches.

      We thank the Reviewer for this suggestion. In our revised manuscript, we have included a detailed comparison of spike time accuracy between our method, MLspike, and Cascade, summarized in Figure 9. In this analysis, we used our in vitro dataset to estimate the average temporal accuracy of spike detection across the three methods. As discussed in the main text, the average temporal accuracy was defined as the temporal offset between the ground truth and the nearest detected spikes averaged across the ground truth. The distributions of temporal accuracies across our experiments obtained from MLspike, Cascade, and PGBAR differ in their spread, with 10th-to-90th percentile ranges of 14 ms, 8 ms, and 3 ms, respectively. This result demonstrates that PGBAR estimates are more reliable than MLspike and CASCADE across trials, with a narrower unbiased distribution of temporal accuracy. 

      In practice, approaches are more appealing to users when they do not require dedicated measurements to estimate parameters such as rise/decay time constants of calcium fluorescence signals within cells. Users may therefore be interested to know how results would be affected if these parameters are estimated only crudely. It would thus be useful to know how spike probability estimates vary as a function of these parameters, which should be easy to test systematically, and whether the sensitivity of PGBAR to inaccurate initial parameter estimates is lower or higher than that of other methods, which should also be easy to test. As PGBAR jointly estimates spike probabilities and model parameters, it may have an advantage here over other methods.

      We thank the Reviewer for this suggestion. In the new Figure 10, we show a parametric sensitivity analysis for both PGBAR and MLspike. For PGBAR, we considered the hyperparameters of the Bayesian priors associated with the peak response to a single spike and the baseline variance, which influences how much of the fluorescence can be attributed to baseline modulation. For MLspike, we considered the transient amplitude and the decay time constant. For both methods, we varied the parameters between -50% and +50% of their optimal value and estimated the correlation between predictions and ground truth as well as the number of spikes (Figure 10A). Next, we calculated the coefficient of variation across all parameter configurations for each trial (Figure 10B). Our analysis shows that, compared to previous methods, PGBAR has a much lower sensitivity to the initial choices of the hyperparameters, confirming the intuition of the Reviewer thanks to the simultaneous inference of spike times and model parameters. This result provides an important addition to our work.  

      Equation 10: -1 should be in subscript (t-1). Remark: I have not fully verified the mathematical parts because some of it is beyond my expertise. 

      We thank the Reviewer for pointing out the typo. This has been corrected in the revised preprint.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors intended to prove that gut GLP-1 expression and secretion can be regulated by Piezo1, and hence by mechanistic/stretching regulation. For this purpose, they have assessed Piezo1 expression in STC-1 cell line (a mouse GLP-1 producing cell line) and mouse gut, showing the correlation between Piezo1 level and Gcg levels (Figure S1). They then aimed to generate gut L cell-specific Piezo1 KO mice, and claimed the mice show impaired glucose tolerance and GLP-1 production, which can be mitigated by Ex-4 treatment (Figures 1-2). Pharmacological agents (Yoda1 and GsMTx4) and mechanic activation (intestinal bead implantation) were then utilized to prove the existence of ileal Piezo1-regulated GLP-1 synthesis (Figure 3). This was followed by testing such mechanism in a limited amount of primary L cells and mainly in the STC-1 cell line (Figures 4-7).

      While the novelty of the study is somehow appreciable, the bio-medical significance is not well demonstrated in the manuscript. The authors stated (in lines between lines 78-83) a number of potential side effects of GLP-1 analogs, how can the mechanistic study of GLP-1 production on its own be essential for the development of new drug targets for the treatment of diabetes. Furthermore, the study does not provide a clear mechanistic insight on how the claimed CaMKKbeta/CaMKIV-mTORC1 signaling pathway upregulated both GLP-1 production and secretion. This reviewer also has concerns about the experimental design and data presented in the current manuscript, including the issue of how proglucagon expression can be assessed by Western blotting.

      Strengths:

      The novelty of the concept.

      Weaknesses:

      Experimental design and key experiment information.

      We appreciate the reviewer's comments. Nowadays, GLP-1-based therapy is well-recognized and commonly used in treatment of Type 2 Diabetes Mellitus (T2DM). Therefore, elucidation of the mechanism that regulates GLP-1 production is essential for the development of new drug targets for the treatment of diabetes. We have revised the relevant wording in the manuscript.

      In our previous studies, we have elucidated the role of mTOR/S6K pathway in regulating GLP-1 production in L cells. Using STC-1 cell line and different mouse models, including Neurog3-Tsc1−/− mice, rapamycin or L-lucine treatment to stimulate mTOR activity, we have demonstrated that mTOR stimulates proglucagon gene expression and thus GLP-1 production (Diabetologia 2015;58(8):1887-97; Mol Cell Endocrinol. 2015 Nov 15:416:9-18.). Based on our previous studies, we found that Piezo1 regulated mTOR/S6K pathway and thus proglucagon expression and GLP-1 production through a Ca2+/CaMKKbeta/CaMKIV pathway in our present study. Although we could not exclude involvement of other signaling pathways downstream of Piezo1 in regulating the cleavage of proglucagon, granule maturation and the final release of GLP-1, our present study provided evidence to support the involvement of the Ca2+/CaMKKbeta/CaMKIV/mTOR pathway in mediating the role Piezo1 in proglucagon expression and GLP-1 production.

      The reviewer also expressed concerns on the use of western blot to detect proglucagon expression. Proglucagon is encoded by the GCG gene and is cleaved by PC1/3 in L cells to form mature GLP-1. In fact, measurement of intestinal proglucagon protein is a common approach for assessing GLP-1 production in the intestine. Here are some examples from other researchers: Diabetes. 2013 Mar;62(3):789-800. Gastroenterology. 2011 May;140(5):1564-74. 2004 Jul 23;279(30):31068-75. The proglucagon antibody used in our study was purchased from abcam (Cat#ab23468), which can detect proglucagon at 21 kDa.

      Reviewer #2 (Public Review):

      Summary:

      The study by Huang and colleagues focuses on GLP-1 producing entero-endocrine (EEC) L-cells and their regulation of GLP-1 production by a mechano-gated ion channel Piezo1. The study describes Piezo1 expression by L-cells and uses an exciting intersectional mouse model (villin to target epithelium and Gcg to target GLP-1-producing cells and others like glucagon-producing pancreatic endocrine cells), which allows L-cell specific Piezo1 knockout. Using this model, they find an impairment of glucose tolerance, increased body weight, reduced GLP-1 content, and changes to the CaMKKbeta-CaMKIV-mTORC1 signaling pathway using a normal diet and then high-fat diet. Piezo1 chemical agonist and intestinal bead implantation reversed these changes and improved the disrupted phenotype. Using primary sorted L-cells and cell model STC-1, they found that stretch and Piezo1 activation increased GLP-1 and altered the molecular changes described above.

      Strengths:

      This is an interesting study testing a novel hypothesis that may have important mechanistic and translational implications. The authors generated an important intersectional genetics mouse model that allowed them to target Piezo1 L-cells specifically, and the surprising result of impaired metabolism is intriguing.

      Weaknesses:

      However, there are several critical limitations that require resolution before making the conclusions that the authors make.

      (1) A potential explanation for the data, and one that is consistent with existing literature [see for example, PMC5334365, PMC4593481], is that epithelial Piezo1, which is broadly expressed by the GI epithelium, impacts epithelial cell density and survival, and as such, if Piezo1 is involved in L-cell physiology, it may be through regulation of cell density. Thus, it is critical to determine L-cell densities and epithelial integrity in controls and Piezo1 knockouts systematically across the length of the gut, since the authors do not make it clear which gut region contributes to the phenotype they see. Current immunohistochemistry data are not convincing.

      We appreciate the reviewer's comment and agree that Piezo1 may impact L-cell density and epithelial integrity. To address this, we have incorporated quantification of L-cell density in new Figure Supplement 7. The quantitative results demonstrate that the specific deletion of the piezo1 gene in L cells did not significantly impact L-cell density.

      Regarding epithelial integrity, we assessed the expression of tight junction proteins (ZO-1 and Occludin). As demonstrated in new Figure Supplement 8, the expression of tight junction proteins such as ZO-1 and Occludin did not show significant changes in IntL-Piezo1-/- mice compared to littermate controls.

      Furthermore, we conducted double immunofluorescence of Piezo1 and GLP-1 in the duodenum, jejunum, ileum, and colon of control and IntL-Piezo1-/- mice. As illustrated in new Figure Supplement 5, Piezo1 is expressed in GLP-1-positive cells of the duodenum, jejunum, ileum, and colon of control mice, but not in IntL-Piezo1-/- mice.

      (2) Calcium signaling in L-cells is implicated in their typical role of being gut chemo-sensors, and Piezo1 is a calcium channel, so it is not clear whether any calcium-related signaling mechanism would phenocopy these results.

      We agree with the reviewer that Piezo1 is a calcium channel (validation of the Ca2+ influx-mediated Piezo1 in primary L cells and STC-1 cells are shown in figure 4A-C and figure 5A-C). According to our study, calcium-related signaling mechanism such as calcium/calmodulin-dependent protein kinase kinase 2 (CaMKKβ) -Calcium/Calmodulin Dependent Protein Kinase IV (CaMKIV) may contribute the phenotype seen in the _IntL-Piezo1-/_mice. In addition, we also discussed other potential calcium-related signaling mechanisms in the article's discussion section (lines645-656).

      (3) Intestinal bead implantation, while intriguing, does not have clear mechanisms and is likely to provide a point of intestinal obstruction and dysmotility.

      We appreciate the reviewer’s comment. To ascertain if intestinal bead implantation led to intestinal obstruction and dysmotility, we conducted a bowel transit time test and detected the postoperative defecation (As shown in new Figure Supplement 9). The results revealed no difference in bowel transit time and fecal mass between the sham-operated mice and those implanted with beads. Furthermore, to assess whether the animals were in pain or under any discomfort after intestinal bead implantation, we performed abdominal mechanical sensitivity test three days after the surgery. As indicated in Figure Supplement 9C, no difference in abdominal pain threshold was observed between sham and bead-implanted mice. These results suggest that the mice did not experience discomfort during the experiment.

      (4) Previous studies, some that are very important, but not cited, contradict the presented results (e.g., epithelial Piezo1 role in insulin secretion) and require reconciliation.

      Thanks a lot for the point. We have cited more previous studies. The lack of changes in blood glucose seen in Villin-Piezo1-/- mice reported by Sugisawa et. al. is not surprising (Cell. 2020 Aug 6;182(3):609-624.e21.). Actually, in another recent study from our group, we found similar results when the Villin-Piezo1-/- mice Piezo1fl/fl control mice were fed with normal chow diet. Since Villin-1 is expressed in all the epithelial cells of the gut, including enterocytes and various types of endocrine cells, the effect of L-cell Piezo1 loss may be masked by other cell types under normal condition. However, impaired glucose tolerance was seen in Villin-Piezo1-/- mice compared to the Piezo1fl/fl control mice after high fat diet for 8 weeks. We further found that Piezo1 in enterocytes exerted a negative effect on the glucose and lipid absorption. Loss of Piezo1 in enterocytes led to over-absorption of nutrients under high-fat diet. (Tian Tao, Qing Shu, Yawen Zhao, Wenying Guo, Jinting Wang, Yuhao Shi, Shiqi Jia, Hening Zhai, Hui Chen, Cunchuan Wang*, Geyang Xu*, Mechanical regulation of lipid and sugar absorption by Piezo1 in enterocytes, Acta Pharmaceutica Sinica B , Accepted, 2024. (https://doi.org/10.1016/j.apsb.2024.04.016).

      Overall, this study makes an interesting observation but the data are not currently strong enough to support the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns

      (1) Figure 1L was labeled wrong, and the co-localization was not clear. The KO leads to such a strong effect on the percentage of GLP-1 positive cells (panel M) but was not clearly demonstrated with immune-staining. Additional experiments are needed to prove tissue-specific knockout in gut GLP-1-producing cells only, but not in other cell lineages or elsewhere. If so, how was the change in gut Gcg mRNA expression? Importantly, this review is not clear on how to use Western blotting to measure proglucagon expression in the tissue samples. What is the size of the product? The antibody information was not provided in the manuscript. Figure 1N, a potential mechanism that affects GLP-1 production involving mTORC and downstream molecules. This comes from nowhere.

      We appreciate the reviewer's feedback. The incorrect label has been corrected in the new Figure 1L. As suggested, we have performed additional experiments to demonstrate tissue-specific knockout of Piezo1 in gut GLP-1-producing cells exclusively, excluding other cell lineages or locations.

      As shown in Figure Supplement 6, Piezo1 remains expressed in ileal ghrelin-positive cells and pancreatic glucagon-positive cells of IntL-Piezo1-/mice, suggesting that Piezo1 was specifically knocked out in L cells, but not in other endocrine cell types. Furthermore, the decrease was only observed in GLP-1 levels, but not PYY levels, in L cells of IntL-Piezo1-/- mice compared to controls, suggesting that the loss of Piezo1 in L cells affects GLP-1 levels specifically, but not the secretion of other hormones produced by L cells (Figure Supplement 7A-D).

      In our previous studies, we have elucidated the role of mTOR/S6K pathway in regulating GLP-1 production in L cells. Using STC-1 cell line and different mouse models, including Neurog3-Tsc1−/− mice, rapamycin or L-lucine treatment to stimulate mTOR activity, we have demonstrated that mTOR stimulates proglucagon gene expression and thus GLP-1 production (Diabetologia 2015;58(8):1887-97; Mol Cell Endocrinol. 2015 Nov 15:416:9-18.). Based on our previous studies, we found that Piezo1 regulated mTOR/S6K pathway and thus proglucagon expression and GLP-1 production through a Ca2+/CaMKKbeta/CaMKIV pathway in our present study.

      Although we could not exclude involvement of other signaling pathways downstream of Piezo1 in regulating the cleavage of proglucagon, granule maturation and the final release of GLP-1, our present study provided evidence to support the involvement of the Ca2+/CaMKKbeta/CaMKIV/mTOR pathway in mediating the role Piezo1 in proglucagon expression and GLP-1 production.

      The reviewer also expressed concerns on the use of western blot to detect proglucagon expression. Proglucagon is encoded by the GCG gene and is cleaved by PC1/3 in L cells to form mature GLP-1. In fact, measurement of intestinal proglucagon protein is a common approach for assessing GLP-1 production in the intestine. Here are some examples from other researchers: Diabetes. 2013 Mar;62(3):789-800. Gastroenterology. 2011 May;140(5):1564-74. 2004 Jul 23;279(30):31068-75. The proglucagon antibody used in our study was purchased from abcam (Cat#ab23468), which can detect proglucagon at 21 kDa.

      (2) In Figure 2, the LFD control mouse group was missing. Again, I don't understand the detection of proglucagon by Western blotting in this figure.

      We appreciate the reviewer's comments. The figure 1 presents the phenotypic changes of transgenic mice under low-fat diet feeding, while figure 2 focuses on the phenotypic changes of transgenic mice under high-fat diet feeding. As we mentioned before, western blot is often used in detection of the precursor of GLP-1 named proglucagon.

      (3) Why show body weight change but not body weight itself? How are the changes compared (which one serves as the control)? Again, how to do Western blotting on pro-glucagon detection?

      We appreciate the reviewer's comments. Body weight has been added in new figure3. Proglucagon is the precursor of GLP-1. Intestinal proglucagon protein measurement is commonly used to assess GLP-1 production in the intestine.

      (4) After reading the whole manuscript, this reviewer cannot get a clear picture of how the claimed CaMKKbeta-mTORC1 pathway mediates the function of Pieo1 activation (via the utilization of Yoda1 or intestinal bead implantation) on Gcg expression (at the transcription level or mRNA stability level?), hormone production, the genesis of GLP-1 producing cells, and the secretion of the hormone.

      We appreciate the reviewer's comments. Figure 7 showed that overexpression of CaMKKbeta and CaMKIV enhanced mTOR and S6K phosphorylation, proglucagon expression and GLP-1 secretoin, while CaMKKbeta inhibitor STO609 inhibited mTOR and S6K phosphorylation, proglucagon expression and GLP-1 secretoin, suggesting CaMKKbeta and CaMKIV was involved in GLP-1 production. Moreover, mTOR inhibitor rapamycin inhibited Yoda1-induced proglucagon expression and GLP-1 secretion. These results suggested that CaMKKbeta/CaMKIV/mTOR mediated the effect of Piezo1 on GLP-1 production.

      I strongly suggest that authors focus on more solid findings and dissect the mechanistic insight on something more meaningful, but not on everything (hormone coding gene expression, hormone production, and hormone secretion).

      GLP-1 production involves multiple steps, including proglucagon expression, protein cleavage, granule packaging and final release. In our present study, we focused on how mechanical signals regulated proglucagon expression in L-cells and thus promote GLP-1 production. We did not exclude the possibility that mechanical signals could also affect other step of GLP-1 production and we discussed this possibility in the discussion section.

      Minor concerns

      (1) Figure S1A. STC-1 is a Gcg expression cell line, which shows less amount of Peio1 mRNA when compared with most primary tissue samples tested. This does not support the fundamental role of Peio1 in regulating Gcg expression. Maybe qRT-PCR will be more helpful for establishing the correlation.

      Thanks a lot for the comments. As suggested, the results of qRT-PCR have been added in new Figure S1A.

      (2) There are numerous scientific presentation problems in the written manuscript. Necessary literature citations are missing especially for key methods (such as bean implantation).

      Thank you very much for your comments. We have made every effort to enhance the scientific presentation and have included the necessary literature citations.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this study makes an interesting observation but the data are not currently strong enough to support the conclusions.

      (1) There needs to be data localizing Piezo1 to L-cells and importantly, this needs to be quantified - are all L-cells (small bowel and colon) Piezo1 positive?

      Thank you very much for your comments. We performed double immunofluorescence of Piezo1 and GLP-1 in the duodenum, jejunum, ileum, and colon of control and IntL-Piezo1-/- mice. As shown in new Figure Supplement 5, Piezo1 is expressed in about 90% of GLP-1-positive cells in the duodenum, jejunum, ileum, and colon of control mice, but not in IntL-Piezo1-/- mice.

      (2) The intersectional model for L-cell transduction needs deeper validation. Images in Figure 1e are not convincing for the transduction of GFP in L-cells. The co-localization studies are not convincing, especially because Piezo1 labeling is very broad. There needs to be stronger validation of the intersectional Gcg-Villin-Piezo1 KO model. It is important to determine whether L-cell Piezo1 localization epithelium in the small bowel and colon is present (above) and affected specifically in the knockout.

      Thanks a lot for the comments. In our study, we conducted a double immunofluorescence analysis for Piezo1 and GLP-1 across various segments of the gastrointestinal tract, including the duodenum, jejunum, ileum, and colon, in both control and IntL-Piezo1-/- mice. As illustrated in the newly incorporated Figure Supplement 5, it was observed that Piezo1 is indeed expressed within the cells of the aforementioned gastrointestinal segments in control mice, which are also positive for GLP-1 expression. In stark contrast, no evidence of Piezo1 expression was detected in the IntL-Piezo1-/- mice. Consistent with these findings, in situ hybridization experiments corroborated the absence of Piezo1 expression within GLP-1 positive cells in the IntL-Piezo1-/- mice, offering evidence for the successful knockout of Piezo1 in the L cells of these knockout mice. (Figure 1L and M).

      In Figure 1E, IntL-Cre mice were bred with mT/mG reporter mice to further validate Cre recombinase activity and specificity. All tissues and cells of mT/mG mice express red fluorescence (membrane-targeted tdTomato; mT) at baseline, and switch to membrane-targeted EGFP in the presence of cell-specific Cre. EGFP expression was only observed scatteredly in the intestine, but not in the pancreas, indicating the intestinal-specific Cre activity in the IntL-Cre mice (Figure 1E). We have revised the relevant expressions in the main text.

      (3) The authors state that "Villin-1 (encoded by Vill1 gene) is expressed in the gastrointestinal epithelium, including L cells, but not in pancreatic α cells" (lines 378-379). However, Villin is highly expressed in whole mouse islets (https://doi.org/10.1016/j.molmet.2016.05.015, Figure 1A).

      Thanks a lot for the comments. Although Hassan Mziaut et al. reported that Villin is highly expressed in whole mouse islets, in that article, only the co-localization of insulin cells with Villin is mentioned, while the co-localization of glucagon and Villin is lacking.

      According to our research (refer to Author response image 1 below) and previous study (Rutlin, M. et al, 2020, The Villin1 Gene Promoter Drives Cre Recombinase Expression in Extraintestinal Tissues. Cell Mol Gastroenterol Hepatol, 10(4), 864-867.e865. ), Villin is sparsely expressed in pancreatic tissue but not highly expressed in islets. We did not observed co-localization of glucagon and Villin in the pancreas (see Author response image 1A and B below). The same antibody was used to stain intestine, which show specific expression on the apical side of the intestinal villi (see Author response image 1C below).

      Author response image 1.

      (4) There needs to be quantification of L-cells in Piezo1 knockout. This is because several studies show Piezo1 affecting epithelial cell densities. If there are changes in L-cell or other EEC densities in Piezo1 knockout, that shift can potentially explain the changes that the authors see in glucose metabolism and weight.

      We appreciate the reviewer’s comment. We agree that Piezo1 may affect L-cell density and epithelial integrity.

      To assess epithelial integrity we examined the expression of tight junction proteins (ZO-1 and Occludin). As shown in new Figure Supplement 8, the expression of tight junction proteins, including ZO-1 and Occludin, remained unchanged in IntL-Piezo1-/- mice when compared to littermate controls.

      To assess the L-cell density, we stained PYY, another hormone mainly secreted by L cells, in both control and IntL-Piezo1-/- mice. As shown in new Figure Supplement 7A and B, the percentage of PYY positive cells were not significantly different between control and IntL-Piezo1-/- mice, suggesting that the L-cell density was not affected by Piezo1 knockout.

      (5) L-cells are classically considered to be chemosensors. Do nutritive signals, which presumably also increase calcium compete or complement or dominate L-cell GLP1 synthesis regulation?

      We appreciate the reviewer ’ s comment and agree that L-cells are traditionally considered to be chemosensors. It is also recognized that nutritive signals regulate L-cell GLP1 synthesis. We have addressed these points in lines 568-595. Both nutritive and mechanical signals regulate GLP-1 production. While the food needs to be digested and nutrients absorbed before L-cells can detect the nutritive signals, mechanical stimulation provides a more direct and rapid response. However, determining whether nutritive signals compete, complement with mechanical signals or dominate in L-cell GLP-1 production will require to be further explored.

      (6) The mechanism of Glp1 synthesis vs release downstream of Piezo1 is not clear. The authors hypothesize that "Piezo1 might regulate GLP-1 synthesis through the CaMKKβ/CaMKIV-mTOR signaling pathway". However, references cited suggest that Ca2+ or cAMP leads to GLP-1-release, while mTOR primarily acts on the regulation of gene expression by promoting Gcg gene expression. These pathways do not clearly link to Piezo1 GLP-1 production. These mechanisms need to be reconciled.

      Thanks a lot for the point. The effect of Piezo1-mediated Ca2+ increase on GLP-1 production may be two-fold: promote Gcg gene expression through CaMKKβ/CaMKIV-mTOR and promote GLP-1 release by degranulation. Both gene expression and release are important to sustained GLP-1 production.

      (7) Previous study PMID 32640190 (not cited here) found that Villin-driven Piezo1 knockout, which knocks out Piezo1 from all epithelial intestinal cells (including L-cells), showed no significant alterations in blood glucose or body weight. This is the opposite of the presented findings and therefore the current results require reconciliation.

      We have cited PMID 32640190 in our revised manuscript. The lack of changes in blood glucose seen in Villin-Piezo1-/- mice reported by Sugisawa et. al. is not surprising (Cell. 2020 Aug 6;182(3):609-624.e21.). Actually, in another recent study from our group, we found similar results when the Villin-Piezo1-/_mice _Piezo1fl/fl control mice were fed with normal chow diet. Since Villin-1 is expressed in all the epithelial cells of the gut, including enterocytes and various types of endocrine cells, the effect of L-cell Piezo1 loss may be masked by other cell types under normal condition. However, impaired glucose tolerance was seen in Villin-Piezo1-/- mice compared to the Piezo1fl/fl control mice after high fat diet for 8 weeks. We further found that Piezo1 in enterocytes exerted a negative effect on the glucose and lipid absorption. Loss of Piezo1 in enterocytes led to over-absorption of nutrients under high-fat diet (Tian Tao, Qing Shu, Yawen Zhao, Wenying Guo, Jinting Wang, Yuhao Shi, Shiqi Jia, Hening Zhai, Hui Chen, Cunchuan Wang, Geyang Xu, Mechanical regulation of lipid and sugar absorption by Piezo1 in enterocytes, Acta Pharmaceutica Sinica B, Accepted, 2024, https://doi.org/10.1016/j.apsb.2024.04.016).

      Reviewing Editor (Recommendations For The Authors):

      Your paper - while innovative in concept and interesting - has many flaws that in my opinion need to be corrected before the paper and pre-print should be published or uploaded as pre-print. Can you please make every effort to address the missing data that the Reviewers have asked for and correct the lack of references as noted in the reviews? Thank you.

      Thank you for the invaluable suggestions provided by the editors and reviewers. In response to these suggestions, we have included the missing data as requested and rectified the lack of references to the best of our ability. We hope that these revisions will effectively address the concerns raised by the editors and reviewers.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Although the use of antimony has been discontinued in India, the observation that there are Leishmania parasites that are resistant to antimony in circulation has been cited as evidence that these resistant parasites are now a distinct strain with properties that ensure their transmission and persistence. It is of interest to determine what are the properties that favor the retention of their drug resistance phenotype even in the absence of the selective pressure that would otherwise be conferred by the drug. The hypothesis that these authors set out to test is that these parasites have developed a new capacity to acquire and utilize lipids, especially cholesterol which affords them the capacity to grow robustly in infected hosts.

      We sincerely appreciate Reviewer 1's thoughtful and positive evaluation of our manuscript. We acknowledge that the reviewer has a few major concerns, and we would like to address them one by one in the following section.

      Major issues:

      (1) There are several experiments for which they do not provide sufficient details, but proceed to make significant conclusions.

      Experiments in section 5 are poorly described. They supposedly isolated PVs from infected cells. No details of their protocol for the isolation of PVs are provided. They reference a protocol for PV isolation that focused on the isolation of PVs after L. amazonensis infection. In the images of infection that they show, by 24 hrs, infected cells harbor a considerable number of parasites. Is it at the 24 hr time point that they recover PVs? What is the purity of PVs? The authors should provide evidence of the success of this protocol in their hands. Earlier, they mentioned that using imaging techniques, the PVs seem to have fused or interconnected somehow. Does this affect the capacity to recover PVs? If more membranes are recovered in the PV fraction, it may explain the higher cholesterol content.

      We would like to thank the reviewer for correctly pointing out lack of details regarding PV isolation and its purity. There are multiple questions raised by the reviewer and we will answer them one by one in a point wise manner:

      Firstly, “Is it at the 24 hr time point that they recover PVs?”

      In the ‘Methods’ section of the original submission (Line number 606-611), there is a separate section on “Parasitophorous vacuole (PV) Isolation and cholesterol measurement”, where it is clearly mentioned, “24Hrs LD infected KCs were lysed by passing through a 22-gauge syringe needle to release cellular contents. Parasitophorous vacuoles (PV) were then isolated using a previously outlined protocol [Ref: 73].” However, we do acknowledge further details might be useful to enrich this section, and hence we would like to include the following details in the Methods section of the revised manuscript, Line 663-678 “Parasitophorous vacuoles (PV) were isolated using a previously outlined protocol with slight modifications [76]. 107 KCs were seeded in a 100 mm plate and allowed to adhere for 24Hrs. Following this infection was performed with Leishmania donovani (LD) for 24Hrs, the infected KCs were then harvested by gentle scraping and lysed through five successive passages through an insulin needle to ensure membrane disruption while preserving organelle integrity. The lysate was centrifuged at 200 × g for 10mins at 4°C to remove intact cells and large debris. The resulting supernatant was carefully collected and subjected to a discontinuous sucrose density gradient (60%, 40%, and 20%). The gradient was centrifuged at 700 × g for 25mins at 4°C to facilitate organelle separation. The interphase between the 40% and 60% sucrose layers, enriched with PVs, was carefully collected and subjected to a final centrifugation step at 12,000 × g for 25mins at 4°C. The supernatant was discarded, and the resulting pellet was enriched for purified parasitophorous vacuoles, suitable for downstream biochemical and molecular analyses. Cholesterol and protein contents in PV were determined by an Amplex Red assay kit and Bradford assay, respectively. Resulting data were represented as micrograms of cholesterol per microgram of protein.”

      Secondly, What is the purity of PVs? Earlier, they mentioned that using imaging techniques, the PVs seem to have fused or interconnected somehow. Does this affect the capacity to recover PVs? If more membranes are recovered in the PV fraction, it may explain the higher cholesterol content.

      We appreciate the reviewer for pointing this critical lack of data in the submitted manuscript. In the revised manuscript, we have now provided data on the purity of isolated fraction by performing Confocal imaging and Western blot against PV and cytoplasmic fraction in the revised manuscript. We admit, as rightly pointed out by the reviewer we need to access the purity of isolated PV in our experiment. As suggested by the reviewer, we have included the results of this experiment in the Figure 3C i, C ii and C iii. Our results clearly showed an efficient PV isolation with demarcating LAMP-1 positive staining around LD amastigotes, which was further validated by Western Blot showing a significant enrichment of LAMP-1 specifically in the PV fraction. This has been included as (Line 225-234), in the revised manuscript which read as, “Parasitophorous vacuole fractions were isolated from LD-S and LD-R-infected KCs at 24Hrs p.i. using a previously established protocol [35]. Following isolation, PV purity was confirmed through LAMP-1 staining which showed a significant enrichment around isolated PV in Confocal microscopy (Figure 3C i). Purity of isolated PV fractions was further confirmed by Western blot which showed an enhanced enrichment of LAMP-1 for LD-R-PV fraction as compared to LD-S-PV fraction, while PV excluded cellular fraction showed residual LAMP-1 expression confirming the purity of the isolated PV fractions (Figure 3C ii, iii). Following isolation, protein concentration was measured for isolated PV fractions using the Bradford assay, and PV fractions from both LD-S- and LD-R-infected KCs were normalized accordingly.”

      (2) In section 6 they evaluate the mechanism of LDL uptake in macrophages. Several approaches and endocytic pathway inhibitors are employed. The authors must be aware that the role of cytochalasin D in the disruption of fluid phase endocytosis is controversial. Although they reference a study that suggests that cytochalasin D has no effect on fluid-phase endocytosis, other studies have found the opposite (doi: 10.1371/journal.pone.0058054). It wasn't readily evident what concentrations were used in their study. They should consider testing more than 1 concentration of the drug before they make their conclusions on their findings on fluid phase endocytosis.

      We thank the reviewer for this insightful comment and we apologise for missing out mentioning Cytochalasin-D concentration. To clarify, LDL uptake by LD-R infected KCs is LDL-receptor independent as clearly shown in Section 6, Figure 4A, Figure S4A, Figure S4B i and Figure S4B ii in the  submitted manuscript. In (Figure 4F and Figure S4D) of the  submitted manuscript, as referred by the Reviewer, Cytochalasin-D was used at a concentration of 2.5µg/ml. At this concentration, we did not observe any effect of Cytochalasin-D on LDL-receptor independent fluid phase endocytosis as intracellular LD-R amastigotes was able to uptake LDL successfully and proliferate in infected Kupffer cells, unlike Latranculin-A (5µM) treatment which completely inhibited intracellular proliferation of LD-R amastigotes by blocking only receptor independent Fluid phase endocytosis (Video 2A and 2B and Figure 4E in the  submitted manuscript). In fact, the study referred by the reviewer (doi: 10.1371/journal.pone.0058054), used a concentration of 4µg/ml Cytochalasin-D which did affect both LDL-receptor dependent and also receptor independent endocytosis in bone marrow derived macrophages. We would also like to clarify that in this work during our preliminary experiments we have also tested higher concentration Cytochalasin-D (5µg/ml). However, even at this higher concentration there were no significant effect of Cytochalasin-D on LD-R induced LDL-receptor independent fluid phase endocytosis as observed from intracellular LD-R amastigote count. Thus, we strongly believe that Cytochalasin-D does not have any impact on LD-R induced fluid phase endocytosis even at higher concentration. We have now included this data as Figure 4F and Figure S4E in the revised manuscript. Further, to clear out any confusion that readers might have, and also concentration of all the inhibitors used in the study will be mentioned in the Result section (Line 278 and 284), as well as in the revised Figure labels.

      (3) In Figure 5 they present a blot that shows increased Lamp1 expression from as early as 4 hrs after infection with LD-R and by 12 hrs after infection of both LD-S and LD-R. Increased Lamp1 expression after Leishmania infection has not been reported by others. By what mechanism do they suggest is causing such a rapid increase (at 4hrs post-infection) in Lamp-1 protein? As they report, their RNA seq data did not show an increase in LAMP1 transcription (lines 432-434).

      We would like to express our gratitude to the reviewer for highlighting the novelty of this observation. Indeed, to the best of our knowledge, no similar findings (we could not find reference of any quantitative Western blot for LAMP-1) have been reported previously in primary macrophages infected with Leishmania donovani (LD). Firstly, we would like to point out, as stated in the Methods section (Lines 556–566) of the  submitted manuscript: "Flow-sorted metacyclic LD promastigotes were used at a MOI of 1:10 (with variations of 1:5 and 1:20 in some cases) for 4 hours, which was considered the 0th point of infection. Macrophages were subsequently washed to remove any extracellular loosely attached parasites and incubated further as per experimental requirements.” This indicates that our actual study points correspond to approximately the 8th and 28th hours post-infection”. We just wanted to clarify the time point just to prevent any potential confusion.

      Now regarding LAMP1 expression, although we could not find any previous reports of its expression in LD infected primary macrophages, we would like to mention that there is a previous report (doi.org/10.1128/mBio.01464-20), which shows a similar punctuated LAMP-1 upregulation (as observed by us in Figure 5A i of the  submitted manuscript) in response to leishmania infection in nonphagocytic fibroblast. It is tempting to speculate that increased LAMP-1 expression observed in response to LD-R infected macrophages might be due to increased lysosomal biogenesis, required for degrading increased endocytosed-LDL into bioavailable cholesterol. However, since no change in LAMP-1 expression in RNA seq data (Figure 6, of the  submitted manuscript), we can only speculate that this is happening due to some post transcriptional or post translational modifications. But further work will definitely require to investigate this mechanism in details which is beyond the scope of this work. That is why, in the  submitted manuscript, (Line 432-435), we have discussed this, “Although available RNAseq analysis (Figure 6) did not support this increased expression of lamp-1 in the transcript level, it did reflect a notable upregulation of vesicular fusion protein (VSP) vamp8 and stx1a in response to LD-R-infection. LD infection can regulate LAMP-1 expression, and the role of VSPs in LDLvesicle fusion with LD-R-PV is worthy of further investigation.”

      However, we agree with the reviewer that this might not be enough for the clarification. Hence in the revised manuscript this has been updated in the Discussion section (Line 465-472) as follows, “Although available RNAseq analysis (Figure 6) did not support this increased expression of lamp-1 in the transcript level, it did reflect a notable upregulation of vesicular fusion protein (VSP) vamp8 and stx1a in response to LD-R-infection. How, LD infection can regulate LAMP-1 expression, and the role of VSPs in LDL-vesicle fusion with LD-R-PV is worthy of further investigation. It is possible and has been earlier reported that LD infection can regulate host proteins expression through post transcriptional and post translational modifications [61-63]. It is tempting to speculate that LD-R amastigote might be promoting an increased lysosomal biogenesis through any such mechanism to increase supply of bioavailable cholesterol through action of lysosomal acid hydrolases on LDL.”

      (4) In Figure 6, amongst several assays, they reported on studies where SPC-1 is knocked down in PECs. They failed to provide any evidence of the success of the knockdown, but nonetheless showed greater LD-R after NPC-1 was knocked down. They should provide more details of such experiments.

      Although we do understand the concern raised by the reviewer, this statement in question is factually incorrect. We would like to point out that in Figure 6F i, of the  submitted manuscript (Figure 6G ii in the revised manuscript), we have demonstrated decreased NPC-1 staining following transfection with NPC-1-specific siRNA, whereas no such reduction was observed with scrambled RNA. Similar immunofluorescence data confirming LDL-receptor knockdown has also been provided in Figure S4B i of the  submitted manuscript (Figure S4B ii in the revised manuscript). However, we acknowledge that the reviewer may be referring to the lack of quantitative validation of the knockdown via Western blot. We would like to clarify although, we already had this data, but we did not include it to avoid duplication to reduce the data density of the MS. But as suggested by the reviewer, we have included western blot for both NPC-1 and LDL-receptor knock down in the revised manuscript as Figure 6G i and Figure S4B i which again confirms an efficient Knock down of NPC-1 and LDLr as we have observed with IFA.

      Additionally, as suggested by the reviewer, we also noticed lack of details in Methods section of the  submitted manuscript, concerning siRNA mediated Knock down (KD). Therefore, we have included more details in the revised manuscript (Line 821-828), which read as, “For all siRNA transfections, Lipofectamine® RNAiMAX Reagent (Life Technologies, 13778100) specifically designed for knockdown assays in primary cells was used according to the manufacturer's instructions with slight modifications. PECs were seeded into 24-well plates at a density of 1x10<sup>5</sup> per well, and incubated at 37°C with 5% CO2. The transfection complex, comprising (1µl Lipofectamine® RNAiMAX and 50µl Opti MEM) and (1 µl siRNA and 50µl Opti MEM) mixed together directly added to the incubated PECs. Gene silencing was checked by IFA and by Western blot as mentioned previously.”

      Minor issues

      (1) There is an implication that parasite replication occurs well before 24hrs post-infection?

      Studies on Leishmania parasite replication have reported on the commencement of replication after 24hrs post-infection of macrophages (PMCID: PMC9642900). Is this dramatic increase in parasite numbers that they observed due to early parasite replication?

      We thank the reviewer for this insightful comment and appreciate the opportunity to clarify our findings. Indeed, as rightly assumed by the Reviewer, as our data suggest, and we also believe that this increase intracellular amastigotes number is a consequence of early replication of Leishmania donovani. As already mentioned in response to Point number 3 raised by Reviewer 1, we would again like to highlight that in the Methods section (Lines 562–566), it is clearly stated: "Flow-sorted metacyclic LD promastigotes were used at a MOI of 1:10 (with variations of 1:5 and 1:20 in some cases) for 4 hours, which was considered the 0th point of infection. Macrophages were subsequently washed to remove any extracellular loosely attached parasites and incubated further as per experimental requirements.” This effectively means that our actual study points correspond to approximately the 8th and 28th hours post-infection and we just want to mention it to avoid any confusion regarding experimental time points.

      Now, regarding specific concern related to Leishmania parasite replication, we would like to point out that the study referred by the reviewer on the commencement of replication after 24hrs, was conducted on Leishmania major, which may differ significantly from Leishmania donovani owing to its species and strain-specific characteristics (PMCID: PMC9642900). In fact, doubling time of Leishmania donovani (LD) has been previously reported to be approximately 11.4 hours (doi: 10.1111/j.1550-7408. 1990.tb01147.x). Moreover, multiple studies have indicated an exponential increase in intracellular LD amastigote number (more than two-fold increase) by 24Hrs post infection. (doi:10.1128/AAC.0119607, doi.org/10.1016/j.ijpara.2011.07.013). We also have a similar observation for both infected PEC and KC as depicted in Figure 1C and Figure S1C in the  submitted and revised manuscript) indicating that active replication is happening in this time frame for Leishmania donovani. Hence it was an informed decision from our side to focus on 24Hrs time point to perform the analysis on intracellular LD proliferation.

      (2) Several of the fluorescence images in the paper are difficult to see. It would be helpful if a blown-up (higher magnification image of images in Figure 1 (especially D) for example) is presented.

      We apologise for the inconvenience. Although we have provided Zoomed images for several other Figures in the  submitted manuscript and revised manuscript, like Figure 4, Figure 5, Figure 6 and Figure 8. However, this was not always doable for all the figures (like for Figure 1D), due to lack of space and Figure arrangements requirements. However, to accommodate Reviewer’s request we have provide a blown-up image for Figure 1D iii in the revised manuscript.

      (3) The times at which they choose to evaluate their infections seem arbitrary. It is not clear why they stopped analysis of their KC infections at 24 hrs. As mentioned above, several studies have shown that this is when intracellular amastigotes start replicating. They should consider extending their analyses to 48 or 72 hrs post-infection. Also, they stop in vitro infection of Apoe/- mice at 11 days. Why? No explanation is given for why only 1 point after infection.

      Reviewer has raised two independent concerns and we would like to address them individually.

      Firstly, “The times at which they choose to evaluate their infections seem arbitrary. It is not clear why they stopped analysis of their KC infections at 24 hrs. As mentioned above, several studies have shown that this is when intracellular amastigotes start replicating. They should consider extending their analyses to 48 or 72 hrs post-infection.”

      We have already provided a detail justification for time point selection in our response to Reviewer 1, Minor Comment 1. As mentioned already we observed a significant and sharp rise in the number of intracellular amastigotes between 4Hrs and 24Hrs post-infection in KC, with replication rate appeared to be not increasing proportionally (not doubling) after that (Figure 1C in the revised manuscript). This early stage of rapid replication of LD amastigotes, therefore likely coincides with a critical period of lipid acquisition by intracellular amastigotes (Video 3A and 3B and Figure 4E in the  submitted manuscript and revised manuscript) and thus 24Hrs infected KC was specifically selected. In this regard, we would further like to add that at 72Hrs post-infection, we noticed a notable number of infected Kupffer cells began detaching from the wells with extracellular amastigotes probably egressing out. This phenomenon potentially reflects the severe impact of prolonged infection on Kupffer cell viability and adhesion properties as shown in Video 2 in the revised manuscript and Author response image 1. This observation further influenced our decision to conclude all infection studies in Kupffer cells by the 48Hrs post-infection, which necessitate to complete the infection time point at 24 Hrs, for allowing treatment of Amp-B for another 24 Hrs (Figure 8, and Figure S5, in the  submitted manuscript and revised manuscript). We acknowledge that we should have been possibly clearer on our selection of infection time points and as the Reviewer have suggested we have included this information in the revised manuscript (Line 134-141) for clear understanding of the reader. This read as, “Interestingly, as compared to a significant and sharp rise in the number of intracellular amastigotes between 4Hrs and 24Hrs post infected KC in response to LD-R infection, the number of intracellular amastigotes although increased significantly did not doubled from 24Hrs to 48Hrs p.i. suggesting exponential LD amastigote replication between 4Hrs and 24Hrs time frame and slowing down after that (Figure 1Ci, ii). Moreover, it was also noticed that at 72Hrs p.i. a notable number of infected-KC began detaching from the wells with extracellular amastigotes probably egressing out from the infected-KCs (Video 2). Thus, 24Hrs time point was selected to conduct all further infection studies involving KCs.”

      Author response image 1.

      Representative images of Kupffer cells infected with Leishmania donovani at 72Hrs post-infection showing a significant morphological change. Infected cells exhibit a rounded morphology and progressive detachment. Scale bar 10µm.

      Secondly “Also, they stop in vitro infection of Apoe-/- mice at 11 days. Why? No explanation is given for why only 1 point after infection.”

      We apologize for not providing an explanation regarding the selection of the 11-day time point for  Apoe<sup>-/-</sup> experiments (Figure 2 of the  submitted and revised manuscript). Our rationale for this choice is based on both previous literature and the specific objectives of our study. Previous report suggests that Leishmania donovani infection in hypercholesteraemic Apoe<sup>-/-</sup> mice triggers a heightened inflammatory response at approximately six weeks’ post-infection compared to C57BL/6 mice, leading to more efficient parasite clearance. This is owing to unique membrane composition of Apoe<sup>-/-</sup> which rectifies leishmania mediated defective antigen presentation at a later stage of infection (DOI 10.1194/jlr.M026914). Additionally, previous studies have also indicated that Leishmania donovani infection is well-established in vivo within 6 to 11 days post-infection in murine models (doi: 10.1128/AAC.47.5.1529-1535.2003). Given that in this experiment we particularly aimed to assess the early infection status (parasite load) in diet-induced hypercholesterolemic mice, we would like to argue that the selection of the 11-day time point was rational and well-aligned with our study objectives as this time point within this window are optimal for capturing initial parasite burden depending on initial lipid utilization, before host-driven immune clearance mechanisms could significantly alter infection dynamics. We have included this explanation in the revised manuscript (Line 170-179) as suggested by the Reviewer and this read as, “Previous report has suggested that LD infection in hypercholesteremic Apoe<sup>-/-</sup> mice triggers a heightened inflammatory response at approximately six weeks’ post-infection compared to wild type BL/6 mice, leading to more efficient parasite clearance. This is owing to unique membrane composition of Apoe-/- which rectifies leishmania mediated defective antigen presentation at a later stage of LD infection [20]. Additionally, previous studies have also indicated that LD infection is well-established in mice within 6 to 11 days post-infection in murine models [33]. Thus to evaluate impact of initial lipid utilization on LD amastigote replication in vivo, BL/6 and diet-induced hypercholesterolemic Apoe<sup>-/-</sup> mice were infected with GFP expressing LD-S or LD-R promastigotes and sacrificed 11 days p.i.”

      Reviewer #2 (Public review):

      Summary:

      This study by Pradhan et al. offers critical insights into the mechanisms by which antimonyresistant Leishmania donovani (LD-R) parasites alter host cell lipid metabolism to facilitate their own growth and, in the process, acquire resistance to amphotericin B therapy. The authors illustrate that LD-R parasites enhance LDL uptake via fluid-phase endocytosis, resulting in the accumulation of neutral lipids in the form of lipid droplets that surround the intracellular amastigotes within the parasitophorous vacuoles (PV) that support their development and contribute to amphotericin B treatment resistance. The evidence provided by the authors supporting the main conclusions is compelling, presenting rigorous controls and multiple complementary approaches. The work represents an important advance in understanding how intracellular parasites can modify host metabolism to support their survival and escape drug treatment.

      We would like to sincerely thank the reviewer for appreciating our work and find the evidence compelling to address the issue of emergence of drug resistance in infection with intracellular protozoan pathogens.

      Strengths:

      (1) The study utilizes clinical isolates of antimony-resistant L. donovani and provides interesting mechanistic information regarding the increased LD-R isolate virulence and emerging amphotericin B resistance.

      (2) The authors have used a comprehensive experimental approach to provide a link between antimony-resistant isolates, lipid metabolism, parasite virulence, and amphotericin B resistance. They have combined the following approaches:

      a) In vivo infection models involving BL/6 and Apoe-/- mice.

      b) Ex-vivo infection models using primary Kupffer cells (KC) and peritoneal exudate macrophages (PEC) as physiologically relevant host cells.

      c) Various complementary techniques to ascertain lipid metabolism including GC-MS, Raman spectroscopy, microscopy.

      d) Applications of genetic and pharmacological tools to show the uptake and utilization of host lipids by the infected macrophage resident L. donovani amastigotes.

      (3) The outcome of this study has clear clinical significance. Additionally, the authors have supported their work by including patient data showing a clear clinical significance and correlation between serum lipid profiles and treatment outcomes.

      (4) The present study effectively connects the basic cellular biology of host-pathogen interactions with clinical observations of drug resistance.

      (5) Major findings in the study are well-supported by the data:

      a) Intracellular LD-R parasites induce fluid-phase endocytosis of LDL independent of LDL receptor (LDLr).

      b) Enhanced fusion of LDL-containing vesicles with parasitophorous vacuoles (PV) containing LD-R parasites both within infected KCs and PECs cells.

      c) Intracellular cholesterol transporter NPC1-mediated cholesterol efflux from parasitophorous vacuoles is suppressed by the LD-R parasites within infected cells.

      d) Selective exclusion of inflammatory ox-LDL through MSR1 downregulation.

      e) Accumulation of neutral lipid droplets contributing to amphotericin B resistance.

      Weaknesses:

      The weaknesses are minor:

      (1) The authors do not show how they ascertain that they have a purified fraction of the PV postdensity gradient centrifugation.

      (2) The study could have benefited from a more detailed analysis of how lipid droplets physically interfere with amphotericin B access to parasites.

      We have addressed both these concerns in the revised Version of this work as elaborated in the following section.

      Impact and significance:

      This work makes several fundamental advances:

      (1) The authors were able to show the link between antimony resistance and enhanced parasite proliferation.

      (2) They were also able to reveal how parasites can modify host cell metabolism to support their growth while avoiding inflammation.

      (3) They were able to show a certain mechanistic basis for emerging amphotericin B resistance.

      (4) They suggest therapeutic strategies combining lipid droplet inhibitors with current drugs.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Experimental suggestions:

      a) The authors could have provided a more detailed analysis of lipid droplet composition. This is a critically missing piece in this nice study.

      We completely agree with the Reviewer on this, a more detailed analysis of lipid droplets composition, dynamics of its formation and mechanism of lipid transfer to amastigotes residing within the PV would be worthy of further investigation. To answer the Reviewer, we are already conducting investigation in this direction and have very promising initial results which we are willing to share with the Reviewer as unpublished communication if requested. Since, we plan to address these questions independently, we hope Reviewer will understand our hesitation to include these data into the present work which is already data dense. We sincerely believe existence of lipid droplet contact sites with the PV along with the specific lipid type transfer to amastigotes and its mechanism requires special attention and could stand out as an independent work by itself.

      b) The macrophages (PEC, KC) could have been treated with latex beads as a control, which would indicate that cholesterol and lipids are indeed utilized by the Leishmania parasitophorous vacuole (PV) and essential for its survival and proliferation.

      We thank the reviewer for this nice suggestion, which we believe will further strengthen the conclusion of this work. We have now included this data as Figure 5E in the revised manuscript. Our data showed that infected KC harbouring both LD-R amastigotes and Fluorescent Latex Beads, showed a concentrated staining of Cholesterol around amastigotes, with no positive Cholesterol staining around internalized latex beads similar to LD-S amastigotes. This observation clearly confirmed specific lipid uptake in LD-R-PV, which can not be replicated by phagocytosed Latex Beads.

      c) HMGCoA reductase is an important enzyme for the mevalonate pathway and cholesterol synthesis. The authors have not commented on this enzyme in either host or parasite. Additionally, western blots of these enzymes along with SREBP2 could have been performed.

      We appreciate the concern and do see the point why reviewer is suggesting this. We would like to mention that regarding HMGCoA we already do have real time qPCR data which perfectly aligns with our RNAseq data (Figure 6 A i, in the  submitted and revised manuscript), showing significant downregulation specifically in LD-R infected KC as compared to uninfected control. We are including this data as Author response image 2. However, we did not proceed with checking the level of HMGCoA at the protein level as we noticed several previous reports have suggested that HMGCoA reductase remains under transcriptional control of SERBP2 (doi.org/10.1016/j.cmet.2011.03.005, doi: 10.1194/jlr.C066712, doi:10.1194/jlr.RA119000201), which acts the master regulator of mevalonate pathway and cholesterol synthesis (doi.org/10.1161/ATVBAHA.122.317320) and SERBP2 remains significantly downregulated in response to LD-R infection (Figure 6B i and Figure 6C in the  submitted and revised manuscript). However, as suggested by the Reviewer, we have updated this data in the revised manuscript as Figure 6D. Western blot data further confirmed a significant expected downregulation of HMGCoA in response to LD-R infection.

      Author response image 2.

      qPCR Analysis of HMGCR Expression Following Leishmania donovani Infection: Quantitative PCR analysis showing the relative expression of hmgcr (3-hydroxy-3-methylglutaryl-CoA reductase) in Kupffer cells after 24 hours of Leishmania donovani (LD) infection compared to uninfected control cells. Gene expression levels are normalized to β-actin as an internal control, and fold change is represented relative to the uninfected condition.

      d) The authors should discuss the expression pattern of any enzyme of the mevalonate pathway that they have found to be dysregulated in the transcript data.

      As per the reviewer’s suggestion, we have looked into the RNA seq data and observed that apart from hmgcr, hmgcs (3-hydroxy-3methylglutaryl-CoA synthase), another key enzyme in the mevalonate pathway, is significantly downregulated in host PECs in response to LD-R infection compared to the LD-S infection. We have Discussed this in the revised manuscript (Line 484-490), which read as “Further RNA sequencing data also revealed a significant downregulation of hmgcs (3-hydroxy-3-methylglutarylCoA synthase) in LD-R infected PECs as compared to LD-S infection. Downregulation of HMGCS which catalyzes the condensation of acetyl-CoA with acetoacetyl-CoA to form 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA), which serves as an intermediate in both cholesterol biosynthesis and ketogenesis further supports our observation that LD-R-infected PECs preferentially rely on endocytosed low-density lipoprotein (LDL)-derived cholesterol rather than de novo synthesized cholesterol to support their metabolic needs.”

      e) The authors have followed a previously published protocol by Real F (reference 73) to enrich for parasitophorous vacuole (PV). However, they do not show how they ascertain that they have a purified fraction of the PV post-density gradient centrifugation. The authors should at least show Western blot data for LAMP1 for different fractions of density gradient from which they enriched the PV.

      As we previously stated in our response to Reviewer 1, in the revised manuscript we have included a detailed analysis of purity for different fractions during PV isolation. We sincerely appreciate the reviewer for highlighting this important concern and for suggesting an approach to conduct the experiment. We have included this data as Figure 3C i, ii, iii) in the revised manuscript. Our Imaging and Western blot data showed a significant enrichment of LAMP-1 in PV fraction, and we believe this result further reinforce the conclusions of our study on increased Cholesterol.

      (2) Presentation improvements:

      a) Add a clear timeline for infection experiments.

      As suggested by the Reviewer, we have included a schematic of Timelines for all the animal infection experiment (Figure 2Ci and Figure 7A,Fi) in the revised manuscript.

      b) Provide more details on patient sample collection and analysis.

      We have included more details on the sample collection in the Method section of the revised manuscript (Line 830-835), “Blood samples were collected from a total of 22 individuals spanning a diverse age range (8 to 70 years) by RMRI, Bihar, India. Among these, nine samples were obtained from healthy individuals residing in endemic regions to serve as controls. Serum was isolated from each blood sample through centrifugation, and the lipid profile was subsequently analysed using a specialized diagnostic kit (Coral Clinical System) following the manufacturer's protocol.”

      c) Consider reorganizing figures to better separate mechanistic and clinical findings.

      We would like to thank the reviewer for this suggestion. We felt that a major arrangement altering the sequence of the Figures as presented in the Original Submission will impact smooth flow of the story and hence, we did not disturb that. However, as suggested by the Reviewer we have performed major rearrangement within Figure 2, Figure 5 and Figure 6 and Figure 9 of the revised manuscript for a better representation of the data and convenience of the reader. Also, if the reviewer has specific suggestion regarding rearrangement of any particular figure, we will be happy to consider that.

      (3) Technical clarifications needed:

      a) Specify exact concentrations used for inhibitors.

      We apologise for this unwanted and unnecessary mistake. Please note we have now clearly mentioned the concentration of all the inhibitors used in this study in Result section and in the Figures of the revised manuscript. For easy understanding The revised section (Line 281-287) read as, “Finally, we infected the KCs with GFP expressing LD-R for 4Hrs, washed and allowed the infection to proceed in presence of fluorescent red-LDL and Latrunculin-A (5µM), a compound which specifically inhibits fluid phase endocytosis by inducing actin depolymerization [41]. Real-time fluorescence tracking demonstrated that Latrunculin-A treatment not only prevented the uptake of fluorescent red-LDL but also severely impacted intracellular proliferation of LD-R amastigotes (Video 2A and 2B and Figure 4E). In contrast, treatment with Cytochalasin-D, which alters cellular F-actin organization but does not affect fluid phase endocytosis [41], had no effect on the intracellular proliferation of LD-R irrespective of Cytochalasin-D concentrations (2.5µg/ml and 5µg/ml respectively) (Figure 4F and Figure S4D).”

      b) Include more details on image analysis methods.

      Please note that in specific sections like in Line numbers 574-579, 653-658, 10471049 of the  submitted manuscript, we have put special attention in describing the Image analysis process. However, we agree that in some particular cases more details will be appreciated by the reader. Hence, we have included an additional section of Image Analysis in the Methods section of the revised manuscript. This section (Line 727-739) read as, “Image processing and analysis were conducted using Fiji (ImageJ). For optimal visualization, Giemsa-stained macrophages (MΦs) were represented in grayscale to enhance contrast and structural clarity. To improve the distinction of different fluorescent signals, pseudo-colors were assigned to fluorescence images, ensuring better differentiation between various cellular components. For colocalization analysis (Figures 3, Figure 5, Figure 6, and Figure S2), we utilized the RGB profile plot plugin in ImageJ, which allows for the precise assessment of signal overlap by generating fluorescence intensity profiles across selected regions of interest. This approach provided quantitative insights into the spatial relationship between labelled molecules within infected cells. Additionally, for analyzing the distribution of cofilin in Figure 4, the ImageJ surface plot plugin was employed. This tool enabled three-dimensional visualization of fluorescence intensity variations, facilitating a more detailed examination of cofilin localization and its potential reorganization in response to infection.”

      c) Clarify statistical analysis procedures.

      We have already provided a dedicated section of Statistical Analysis in the Methods section of the Original Submission and also have also shown the groups being compared to determine the statistical analysis in the Figure and in the Figure Legends of the  submitted manuscript. Furthermore, as suggested by the Reviewer we have now also add additional clarification regarding the statistical analysis performed in the revised manuscript (Line 737-749). In the revised manuscript this section read as, “All statistical analyses were performed using GraphPad Prism 8 on raw datasets to ensure robust and reproducible results. For datasets involving comparisons across multiple conditions, one-way or two-way analysis of variance (ANOVA) was conducted, followed by Tukey’s post hoc test to assess pairwise differences while controlling for multiple comparisons. A 95% confidence interval (CI) was applied to determine the statistical reliability of the observed differences. For non-parametric comparisons across multiple groups, Wilcoxon rank-sum tests were employed, maintaining a 95% confidence interval, which is particularly useful for analysing skewed data distributions. In cases where only two groups were compared, Student’s t-test was used to determine statistical significance, ensuring an accurate assessment of mean differences. All quantitative data are represented as mean ± standard error of the mean (SEM) to illustrate variability within experimental replicates. Statistical significance was determined at P ≤ 0.05. Notation for significance levels: *P ≤ 0.05; **P ≤ 0.001; ***P ≤ 0.0001.”

      (4) Minor corrections:

      a) Methods section could benefit from more details on Raman spectroscopy analysis.

      We agree with this suggestion of the Reviewer. For providing more clarity have incorporate additional details in the Methodology for the Raman section of the revised manuscript (Line 638-649). The updated section will read as follows in the revised manuscript. “For confocal Raman spectroscopy, spectral data were acquired from individual cells at 1000× magnification using a 100 × 100 μm scanning area, following previously established specifications. After spectral acquisition, distinct Raman shifts corresponding to specific biomolecular signatures were extracted for further analysis. These included: Cholesterol (535–545 cm¹), Nuclear components (780–790 cm¹), Lipid structures (1262–1272 cm<sup>1</sup>), Fatty acids (1436–1446 cm<sup>1</sup>) Following spectral extraction, pseudo-color mapping was applied to highlight the spatial distribution of each biomolecular component within the cell. These processed spectral images are presented in Figure 3D1, where the first four panels illustrate the individual biomolecular distributions. A merged composite image was then generated to visualize the co-localization of these biomolecules within the cellular microenvironment, with the final panel specifically representing the spatial distribution of key biomolecules.”

      b) In the methods section line 609, page 14, the authors cite Real F protocol as reference 73 for PV enrichment. However, in the very next section on GC-MS analysis (lines 615-616, page 15), they state they have used reference 74 for PV enrichment. Can they explain why a discrepancy in PV isolation references this? Reference 74 does not mention anything related to PV isolation.

      Response: We would like to sincerely apologise for this confusion which probably raised from our writing of this section. We would like to confirm that our PV isolation protocol is based on the published work of Real F protocol (reference 73). However, in the next section of the submitted manuscript, GC-MS analysis was described and that was performed based on protocol referenced in 74. In the revised manuscript, we have avoided this confusion and made correction by putting the references in the proper places. In the revised manuscript, this section (Line 663-678) read as,

      “GC-MS analysis of LD-S and LD-R-PV

      Following a 24Hrs infection period, KCs were harvested, washed with phosphate-buffered saline (PBS), and pelleted. Subsequent to this, PV isolation was carried out using the previously described protocol [35]. After PV isolation Bradford assay was carried out for normalizing the protein concentration. The resulting equal volume of PV pellet was suspended in 20 ml of dichloromethane: methanol (2:1, vol/vol) and incubated at 4°C for 24hours. After centrifugation (11,000 g, 1 hour, 4°C), the supernatant was checked through thin layer chromatography (TLC) and subsequently evaporated under vacuum. The residue and pellet were saponified with 30% potassium hydroxide (KOH) in methanol at 80°C for 2 hours. Sterols were extracted with n-hexane, evaporated, and dissolved in dichloromethane. A portion of the clear yellow sterol solution was treated with N, O-bis(trimethylsilyl)trifluoroacetamide (BSTFA) and heated at 80°C for 1 hour to form trimethylsilyl (TMS) ethers. Gas chromatography/mass spectrometry (GC/MS) analysis was performed using a Varian model 3400 chromatograph equipped with DB5 columns (methyl-phenylsiloxane ratio, 95/5; dimensions, 30 m by 0.25 mm). Helium was used as the gas carrier (1 ml/min). The column temperature was maintained at 270°C, with the injector and detector set at 300°C. A linear gradient from 150 to 180°C at 10°C/min was used for methyl esters, with MS conditions set at 280°C, 70 eV, and 2.2 kV[77].

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to assess the effect of salt stress on root:shoot ratio, identify the underlying genetic mechanisms, and evaluate their contribution to salt tolerance. To this end, the authors systematically quantified natural variations in salt-induced changes in root:shoot ratio. This innovative approach considers the coordination of root and shoot growth rather than exploring biomass and the development of each organ separately. Using this approach, the authors identified a gene cluster encoding eight paralog genes with a domain-of-unknown-function 247 (DUF247), with the majority of SNPs clustering into SR3G (At3g50160). In the manuscript, the authors utilized an integrative approach that includes genomic, genetic, evolutionary, histological, and physiological assays to functionally assess the contribution of their genes of interest to salt tolerance and root development.

      Strengths:

      The holistic approach and integrative methodologies presented in the manuscript are essential for gaining a mechanistic understanding of a complex trait such as salt tolerance. The authors focused on At3g50160 but included in their analyses additional DUF247 paralogs, which further contributes to the strength of their approach. In addition, the authors considered the developmental stage (young seedlings, early or late vegetative stages) and growth conditions of the plants (agar plates or soil) when investigating the role of SR3G in salt tolerance and root or shoot development.

      Weaknesses:

      The authors' claims and interpretation of the results are not fully supported by the data and analyses. In several cases, the authors report differences that are not statistically significant (e.g., Figures 4A, 7C, 8B, S14, S16B, S17C), use inappropriate statistical tests (e.g., t-test instead of Dunnett Test/ANOVA as in Figures 10B-C, S19-23), present standard errors that do not seem to be consistent with the post-hoc Tukey HSD Test (e.g., Figures 4, 9B-C, S16B), or lack controls (e.g., Figure 5C-E, staining of the truncated versions with FM4-64 is missing).

      We thank the reviewer for their critical thoughts on the presented data. We have revised our data interpretation in the main text to more accurately reflect the results. Given the nature of our experimental setup, where we trace the roots of individual Arabidopsis seedlings grown on plates, there is considerable biological variation, which makes achieving strong statistical significance between samples or genotypes challenging. However, we think that the representation of the data as transparently as possible is necessary to provide the readers and reviewers a true picture of the variability that we are observing.  Consequently, we have centered our data interpretation around observable trends that facilitate drawing conclusions.

      The choice of statistical test is closely tied to the specific biological question being addressed. In Figures 10A-C, as in Figures 6A-B, we compared all genotypes to the wild-type Col-0 within each condition, and thus ANOVA analysis, testing the general effect of the genotype across both mutants and Col-0 wild-type is not appropriate. Similarly, in Figures S19-S23, we compared each mutant line to the wild-type Col-0 under each condition.

      We repeated the post-hoc Tukey HSD Test for Figures 4, 9B-C, and S16B and made adjustments where necessary (see tracked changes manuscript).

      The truncated versions do not localize to the plasma membrane; instead, they are targeted to the nucleus and cytosol, mimicking the localization pattern of free GFP, which was used as a control in Panel F. Therefore, we believe that having FM4-64 as a control for these specific images is not informative, but instead using free GFP is serving as a better control in that particular construct.

      In other cases, traits of root system architecture and expression patterns are inconsistent between different assays despite similar growth conditions (e.g., Figures S17A-B vs. 10A-C vs. 6A, and Figures S16B vs. 4A/9B), or T-DNA insertion alleles of WRKY75 that are claimed to be loss-of-function show comparable expression of WRKY75 as WT plants. Additionally, several supplemental figures are mislabeled (Figures S6-9), and some figure panels are missing (e.g., Figures S16C and S17E).

      We thank the reviewer for raising these points and noticing the inconsistency between different assays (e.g., Figures S17A-B vs. 10A-C vs. 6A, and Figures S16B vs. 4A/9B). As mentioned above, considerable biological variation makes achieving strong statistical significance between samples, genotypes, or experiments challenging. Thus, we have centered our data interpretation around observable “trends” between experiments to facilitate drawing conclusions. Considering Figures S17A-B, 10A-C, and 6A, we acknowledge the reviewer's concern about inconsistencies in root system architecture across experiments. Initially, we observed that the sr3g mutant had reduced lateral root length compared to Col-0 under salt stress. This led us to focus on this specific phenotypic trait rather than the overall root system architecture. Despite some variation, the sr3g mutant consistently showed a similar trend/phenotype when compared to Col-0 under salt stress. We believe the variation in main root length and lateral root number between experiments is due to inherent differences between biological replicates.

      Regarding gene expression patterns between Figures S16B and 4A/9B, we included part of Figure 9B (SR3G gene expression in Col-0) in Figure 4A. Figure S16B represents a completely different assay. Despite variations between assays, the overall message remains consistent: SR3G gene expression is induced under salt stress in the root but not in the shoot.

      Both SR3G and WRKY75 are expressed at very low levels, even under the 75 mM salt stress condition we tested. When gene expression is so low, detecting changes is challenging due to inherent variations. Nonetheless, we observed a reduction in WRKY75 expression in the mutant lines compared to wild-type Col-0, though this reduction was not statistically significant. More importantly, we observed a similar phenotype in the wrky75 mutant, specifically reduced main root length under salt stress, consistent with the findings of the published paper in The Plant Cell by Lu et al. (2023) “Lu, K.K., Song, R.F., Guo, J.X., Zhang, Y., Zuo, J.X., Chen, H.H., Liao, C.Y., Hu, X.Y., Ren, F., Lu, Y.T. and Liu, W.C., 2023. CycC1; 1–WRKY75 complex-mediated transcriptional regulation of SOS1 controls salt stress tolerance in Arabidopsis. The Plant Cell, 35(7), pp.2570-2591”.

      We appreciate the reviewer for spotting the missing labels for Figures S6-9. We corrected them at the main text, figures, and legends. We added panel C to Figure S16 and removed panel E from Figure S17 legend,  now they match to actual figures and legends.

      Consequently, the authors' decisions regarding subsequent functional assays, as well as major conclusions about gene function, including SR3G function in root system architecture, involvement in root suberization, and regulation of cellular damage are incomplete.

      We greatly appreciate the reviewer's thorough review of our manuscript and their critical comments. We have carefully addressed all comments and concerns.

      Reviewer #2 (Public Review):

      Salt stress is a significant and growing concern for agriculture in some parts of the world. While the effects of sodium excess have been studied in Arabidopsis and (many) crop species, most studies have focused on Na uptake, toxicity, and overall effects on yield, rather than on developmental responses to excess Na, per se. The work by Ishka and colleagues aims to fill this gap.

      Working from an existing dataset that exposed a diverse panel of A. thaliana accessions to control, moderate, and severe salt stress, the authors identify candidate loci associated with altering the root:shoot ratio under salt stress. Following a series of molecular assays, they characterize a DUF247 protein which they dub SR3G, which appears to be a negative regulator of root growth under salt stress.

      Overall, this is a well-executed study that demonstrates the functional role played by a single gene in plant response to salt stress in Arabidopsis.

      The abstract and beginning of the Discussion section highlight the "new tool" developed here for measuring biomass accumulation. I feel that this distracts from the central aims of the study, which is really about the role of a specific gene in root development under salt stress. I would suggest moving the tool description to less prominent parts of the manuscript.

      We appreciate the reviewer's suggestion. We believe that the innovative tool used to extract shoot-to-root ratio data from previous experiments underscores the value of reutilizing previously acquired data for new discoveries and demonstrates how reanalyzing the same data can provide fresh insights, such as identification of new allelic variation. Therefore, we decided to retain this section, as our discovery of the SR3G gene originated from this innovative tool.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      Line 58 (opening sentence) - salt accumulation in the soil is not caused by evaporation exceeding input; that scenario results in soil water deficit. The issue is when the input water has dissolved ions.

      We thank the reviewer for raising this important point. While this point is theoretically true, all of the water that is found in natural environments contains some dissolved ions. Therefore, drought conditions will lead, over time, to increased soil salinization. We have amended this sentence to represent our point better.

      “Salt stress is predominant in the dryland areas where evaporation rate exceeds water input. As all water contains dissolved ions, the prolonged exposure to drought stress results in increased accumulation of salts in the upper soil layers 1–3.”

      I feel that it would be helpful, for replication and for interpretation, if the authors could provide water potentials for the growing media used throughout. What water potentials are the plants experiencing when grown in 1/2 MS + agar at 0, 75, and 150mM NaCl? Juenger and Verslues present a great recent discussion of the importance of reporting these values (Juenger, T. E. and P. E. Verslues (2023). "Time for a drought experiment: Do you know your plants' water status?" Plant Cell 35(1): 10-23.)

      Critically, how do the water potentials experienced by agar-grown plants compare to those experienced in soil-grown plants? As a stated aim of this study is to allow translation to crops these data are very important to convince physiologists of the relevance of the results.

      We thank the reviewer for raising this important point. We completely agree that growing plants on agar plates is an artificial setup and knowing the water potential of the plants within this setup would be highly informative. However, as indicated in review by Juenger and Verslues 2023, the agar plate setup is much more reproducible compared to various soil conditions, and we report the media composition in sufficient detail for it to be reproduced in other laboratory conditions.

      Furthermore, while investigating the water status of plants and soil is indeed intriguing, it is beyond the scope of this study and would require us to redo the experiments with specific tools listed within the Juennger and Verslues review, which are currently not within our laboratory equipment list.

      Importantly, any changes reported in this manuscript apply equally to both wild-type and mutant lines under all conditions. We provide extensive report on the soil type used, as well as soil quantity. We are using the gravimetric method to determine the water content, and salt stress application, as described in previous works from our lab (Yu and Sussman et al., 2024 Plant Physiology and Awlia et al., 2016 Frontiers in Plant Science). 

      Nonetheless, we have now included water content measurements for soil-grown plants under different conditions, calculated by subtracting dry weight from fresh weight (new Fig. S24). Although plant water content may not fully capture the water status of the media or soil, our measurements did not reveal any significant differences in water content between genotypes across the various conditions tested.

      Line 69- missing an "and" after "(ABA)."

      Thanks. We added the missing “and”.

      Line 79 - I think the association being made is between natural variation in root and shoot growth and genetic variants, not "underlying genes."

      We thank the reviewer for this suggestion. The cause for the identified association indeed relies on allelic variation within the genetic region. We have re-phrased this sentence within the manuscript.

      “Many forward genetic studies were highly successful in associating natural variation in root and shoot growth with allelic variation in gene coding and promoter regions, thereby identifying potential new target traits for improved stress resilience 18,20,21.”

      Figure 1 - what do "seGF" and "reGF" stand for? Shoot and root growth rate, respectively, but there are extra letters in there…

      The abbreviations stand for shoot exponential Growth Factor and root exponential Growth factor. An explanation of the acronym has been added to the text.

      “The increase in the projected area of shoot and root (Fig. S2) was used to estimate (A) shoot and (B) root exponential growth rate (seGR and reGR respectively).”

      Figure 1 legend - there's an "s" missing in "across." And two "additionally" in the penultimate sentence.

      Thanks for spotting the errors. We fixed these errors.

      Line 109 - how was the white balance estimated for the images on the flatbed scanner?

      Within the developed tool, we have not adjusted or controlled for white balance in any way, as the white balance from the flatbed scanner is kept at one value. The tool transforms the imaged pixels into bins consisting of white (root), green (shoot), and blue (place) pixels based on the closest distance in the RGB scale to the particular color, which makes correcting for white balance obsolete. We have provided an additional explanation for this within the M&M section.

      “A Matlab-based tool was developed to simplify and speed up the segmentation and analysis pipeline. For automatic segmentation, the tool uses a combination of image operations (histogram equalization), thresholding on different color spaces (e.g., RGB, YCbCr, Lab, HSV), and binary image processing (boundary and islands removal). As the tool is digitalizing various color scales and classifies pixels into either white (root), green (shoot) or blue (background) categories, the adjustment for white balance is obsolete. ”

      GWAS was performed separately on traits measured at control, 75mM, and 150mM NaCl treatments. Would it also be informative to map the STI measurement (i.e. plasticity) introduced here?

      We thank the reviewer for this important point. We have performed GWAS on both “raw” and STI traits, however, we found that the identified associations were not as abundant as the ones identified with “raw traits”. This makes sense, as we are compounding the root or shoot growth under both conditions, and plastic responses to the environment are expected to be genetically more complex, as they involve more genetic regulators compared to phenotypes that have low plasticity. We have added this as a part of the result description, as we acknowledge that this might be an interesting observation for the field to build upon, and might provide fodder for new methods to deconvolute the complexity in mapping the plastic traits. 

      “To identify genetic components underlying salt-induced changes in root:shoot ratio, we used the collected data as an input for GWAS. The associations were evaluated based on the p-value, the number of SNPs within the locus, and the number of traits associated with individual loci. As Bonferroni threshold differs depending on the minor allele count (MAC) considered, we identified significant associations based on a Bonferroni threshold for each subpopulation of SNPs based on MAC (Table S3). While we conducted a GWAS on directly measured traits, as well as their Salt Tolerance Index (STI) values, however the amount of associations with STI was much lower compared to directly measured traits (Table S3). This observation aligns with the understanding that plastic responses to environmental conditions tend to be genetically more complex. This complexity likely stems from the involvement of more genetic regulators compared to low-plasticity phenotypes.”

      Line 167 - how was LD incorporated into this analysis? Did you use a genome average? Or was LD allowed to vary (as it does) across the genome?

      Initially, we have used genome average LD for this purpose (10 kbp for Arabidopsis), and extended the region of interest based on the number of coding genes within the window. We have added this as a part of description to our manuscript.

      “For the most promising candidate loci (Table S4), we have identified the gene open reading frames that were located within the genome-wide linkage-disequilibrium (LD) of the associated SNPs. The LD was expanded if multiple SNPs were identified within the region, and the region of interest was expanded based on the number of coding genes within the LD window. ”

      Line 291 - I think the water potentials are essential, here. What does 50% of soil water holding capacity equal in these soils? In the substrate that we use in our lab, that would represent a considerable soil water deficit even without any salts in the soil.

      We thank the reviewer for this comment. As Arabidopsis is occurring naturally in low soil water holding capacity soils (i.e. sandy soils), it is typically growing better in soils that are not very saturated with the water. Throughout many experiments, performed within this study, and other studies performed in our lab (results reported in Awlia et al., 2016 Frontiers in Plant Science and Yu & Sussman et al., 2024 Plant Physiology), we have not observed any drought like symptoms at 50% soil water holding capacity. The fact that this is reproducible across similar soil types across two laboratories (one in Saudi Arabia and one in the USA) is not to be dismissed. Again - we are currently not equipped to measure water potentials for these plants, as this is not a standard practice (yet) for stress experiments, but we are taking these comments on board for all of our future experiments.

      Moreover, our control plants are also “dried down” to 50% of SWHC, and soaked in non-saline water during the “salt stress treatment” to make sure that the soil water saturation is accounted for within the experimental setup. This “dry down” of soil is necessary to ensure equal and effective salt penetration into the soil particles. More details on this method can be found in Awlia et al., 2016.

      Again - We have added a new dataset measuring water content in individually soil-grown plants under different conditions as a proxy for soil water status (see new Fig. S24). While we did not observe any significant differences in water content between genotypes under the various conditions, the sr3g mutant showed a slightly higher, though non-significant, water content compared to wild-type Col-0 under control conditions.

      We have provided additional information and comments to warn the readers about this method:

      “The seeds were germinated in ½ MS media for one week, as described for the agar-based plate experiments. One week after germination, the seedlings were transplanted to the pot (12 x 4 cm insert) containing the Cornell Mix soil (per batch combine: 0.16 m3 of peat moss, 20.84 kg of vermiculite, 0.59 kg of Uni-Mix fertilizer, and 2.27 kg of lime) watered to 100% water holding capacity and placed in the walk-in growth chamber with the 16 h light / 8 h dark period, 22°C and 60% relative humidity throughout the growth period. When all of the pots dried down to the weight corresponding to 50% of their water holding capacity, they were soaked for 1 h in tap water or a 200 mM NaCl solution, resulting in an effective concentration of 100 mM NaCl based on the 50% soil water holding capacity, which corresponded to a moderate level of salt stress (Awlia et al., 2016). The control pots were soaked for the same length of time in 0 mM NaCl solution, to account for the soil saturation effect. We then allowed the pots to be drained for 2-3 h to eliminate excess moisture. The pots were placed under phenotyping rigs equipped with an automated imaging system (Yu et al., 2023) and the pot weight was measured daily to maintain the reference weight corresponding to 50% of the soil water holding capacity throughout the experiment. We would like to note that this gravimetric based method for application of salt stress has been developed for soils typically used for pot-grown plants, with relatively high water holding capacity (Awlia et al. 2016). Within these specific conditions, no drought stress symptoms were observed.”

      Lines 415-416 - are these contrasts significant? Figure S3 likewise does not have any notation for significant differences in the means.

      We have previously not tested the stronger effect of 125 mM vs 75 mM on relative root and shoot growth, and thus these test results were initially not included in Fig. S3. We have now added the tests and included them within Fig. S3, and added description of their significance into the main body of the manuscript:

      “In comparison, the growth rates of the shoot were significantly reduced to 0.71 and 0.43 of the control in 75 and 125 mM NaCl treatments, respectively (Fig. S3). While the mean value of root:shoot growth rate did not change upon salt stress treatment, the variance in the root:shoot ratio significantly expanded with the increasing concentrations of salt (Fig. 1C). These results suggest that while root and shoot growth are well coordinated under non-stress conditions, salt stress exposure results in loss of coordination of organ growth across Arabidopsis accessions.”

      Line 418 - same comment as preceding. Is this change in variance significant?

      We have previously not tested this. We have now added the ANOVA tests and included them within each figure, and added description of their significance into the main body of the manuscript. (see text above)

      Line 421 - why would we expect there to be a correlation between root:shoot growth ratio and seedling size?

      We were trying to use the seedling size as a proxy for “fitness” - or how well the plants can survive under these specific conditions. We were testing here whether any simple and directional strategy - such as increase or decrease in root:shoot ratio under salt stress - is resulting in better salt tolerance - which would translate into larger overall seedlings. We have rephrased this within the manuscript, to better explain the hypothesis being tested within this specific figure:

      “To test whether there is a clear directional correlation between the change in root:shoot ratio and overall salt stress tolerance, we have used the overall seedling size as a proxy for plant salt tolerance (Fig. S4, S5). No significant correlation was found between the root:shoot growth ratio and total seedling size (Fig. S4, S5), indicating that the relationship between coordination of root and shoot growth and salt tolerance during the early seedling establishment is complex.”

      Line 438 - I think a stable web link would be more appropriate than listing Dr. Nordborg's email address.

      Sorry about this. There is a glitch with our reference citing software. We agree, and thank the reviewer for noticing this! We assigned reference number 43 to it.

      Line 439 - I expect that many of your readers may not be experienced with GWAS. Can you provide an explanation as to why only one locus was detected with both the 250K SNP panel and the 4M SNP panel?

      We thank the reviewer for raising this point. We have added additional explanation to this observation:

      “Increased SNP density can provide more potential associations, highlighting the associated loci with more confidence, due to more SNPs being detected within specific region. The different panels could capture different LD blocks across the genome. If the locus detected by both panels is in a region of strong LD or under selection, it could be detected consistently. In contrast, other loci may not be captured well by the lower-density 250K SNP panel. The new GWAS revealed 32 additional loci, with only one significantly associated locus being picked up by both 250k and 4M SNPs GWAS (locus 30, Table S3). The detection of only one common locus between the two SNP panels is likely due to differences in resolution, statistical power, and how well each panel captures the genomic regions associated with the trait. ”

      Figure 2A and B - I suggest adding the p-value cutoff to the y-axis of the Manhattan Plots

      We thank the reviewer for this suggestion, however this is not appropriate. The genome wide p-value cutoffs for GWAS studies are arbitrary, and we have not used a genome-wide cutoff for our SNPs, but rather used cutoffs depending on the minor allele frequency. Therefore, we think adding a straight line to the graphs in Fig. 2A-B representing the overall cutoff, would be misleading. Please see below the text where we explain how the threshold was calculated for individual groups of SNPs with varying MAF:

      “The GWAS associations were evaluated for minor allele count (MAC) and association strength above the Bonferroni threshold with -log10(p-value/#SNPs), calculated for each sub-population of SNPs above threshold MAC (Table S3, Bonf.threshold.MAC.specific)”

      Line 490-492 - Presents the results of the gene tree to support a model in which SR3G diverged from AT3G50150 prior to the speciation events leading to Capsella and Arabidopsis. But this topology requires at least two independent losses of SR3G - can you rule out the hypothesis that the position of SR3G on the gene tree is a result of long branch attraction? Given the syntenic orientation of AT3G50150 and SR3G, and apparent directional selection experienced by the latter lineage, it seems more parsimonious that AT3G50150 and SR3G arose from a very recent duplication event.

      We agree with the reviewer that it seemed most parsimonious for AT3G50160 (SR3G) to be a recent tandem duplication of AT3G50150 – and this was certainly our expectation given the other tandem duplications that have occurred in this genomic region. However, irrespective of the type of alignment from which we built the phylogeny (nucleotide vs AA; sometimes nucleotide is noisier but provides more information) we were never able to recapitulate a tree where AT3G50160 was immediately sister to AT3G50150 – even with a long branch for AT3G50160 indicating a rapid pace of nucleotide/AA change relative to AT3G50150. In regards to long branch attraction, it is our interpretation that long branch attraction typically requires multiple long branches that get placed together at a poorly supported node where sampling is sparse (https://www.nature.com/articles/s41576-020-0233-0), whereas we have the single long branch for AT3G50160, and all other A/C clade (Arabidopsis/Camelina/Capsella) members forming a lineage with a much shorter branch. To test the possibility of long branch attraction we subtracted out individual members of the AT3G50150/160 clade to see if there was algorithmic uncertainty in the placement of AT3G50160. We did not observe this in any of the branch subtractions that we performed (see below). Thus, it appears that we must stick with our original interpretation. If the reviewer would like us to soften this interpretation, we would be more than happy to do so, as it does not impact the overall conclusions for AT3G50160 being a rapidly evolving member of this clade.

      Author response image 1.

      Line 494 (and throughout) - I expect that all of the genes being studied herein are "experiencing selection," even if it's boring-old purifying selection on functionally conserved proteins. I think you mean to say "directional selection."

      We thank the reviewer for this comment and completely agree that we lacked precision on our statement. We have corrected this throughout the manuscript.

      Line 497 - state the background and foreground values of omega, here.

      We apologize for not including these values and have added them at this point in the manuscript (new Table S6).

      Line 511 and Line 673 - Inspection of Figure S13B suggests that SR3G is not "predominantly" expressed nor does it have the "highest enrichment" in the root stele. Certainly, among root cell types, this is predominant. But it appears to be quite highly expressed in late-stage seeds and some floral organs, as well.

      We appreciate the reviewer for recognizing that SR3G is not a highly expressed gene. In root cell types, its expression is enriched in the root stele. Overall, SR3G is expressed at both early and later developmental stages. Our investigation of later developmental stages related to seed production did not reveal any significant phenotypic differences in fertility.

      Line 514 - "54-folds" should be "54-fold."

      Thanks. We made corrections.

      Figure 7 - For symmetry, I suggest adding the "Beginning of salt stress" arrow to the "Early Stress" panel as well (even if it's right at day 0).

      Thanks. We added the arrow to Early Stress in both Panels A and B.

      Figure S2 - both graphs should have the same scale on the y-axis

      Thanks - we have now re-plotted the graph with the matching y-axis scales.

      Line 531 - I feel that this is a significant overstatement. The strongest statement supported by the results presented here is that SR3G is the most prominent DUF247 studied herein in root development under salt stress.

      Thanks for the comments. We rephrase the statement.

      “These results suggest that SR3G is the most prominent DUF247 studied within our study to affect root development under salt stress.”

      Lines 583-605 - These data seem to me to be tangential to the central aims of the study. I suggest removing them for clarity/brevity.

      We greatly appreciate the reviewer's suggestion. Our study primarily focused on characterizing the main GWAS candidate, SR3G. Since SR3G is located within a cluster of other DUF247 genes on chromosome 3, we believe that screening the neighboring DUF247 genes could provide further insights into SR3G’s role in root development. Additionally, we believe that the generated data and lines will serve as a valuable resource for other researchers interested in studying these genes. For these reasons, we have decided to retain these datasets in the manuscript.

      Lines 650-652 - these sections 1-3 differences in suberization between SR3G and Col-0 under control conditions are not significant. At best, this may be described as a "trend" and not "higher levels." In section 4, it is VERY marginally significant (and probably not at all after the large number of tests performed, here.)

      We appreciate the reviewer's feedback and have revised the wording accordingly.

      Line 660 - this statement is only true for Section 1. I suggest adding this caveat.

      We appreciate the reviewer's comments on this matter. We quantified four suberin monomers in whole root seedlings rather than in individual root sections due to the technical challenges of separating the sections without microscopy and the limited availability of samples for GS-MS analysis.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We want to thank the Editor and Reviewers for their thorough assessment of the manuscript as well as their constructive critiques. We have collated below the public review and recommendations from each Reviewer as well as our responses to them.

      eLife assessment

      This study by Verdikt et al. provided solid evidence demonstrating the potential impacts of Δ9-tetrahydrocannabinol (Δ9-THC) on early embryonic development using mouse embryonic stem cells (mESCs) and in vitro differentiation. Their results revealed that Δ9-THC enhanced mESCs proliferation and metabolic adaptation, possibly persisting through differentiation to Primordial Germ Cell-Like Cells (PGCLCs), though the evidence supporting this persistence was incomplete. Although the study is important, it was limited by being conducted solely in vitro and lacking parallel human model experiments.

      Reviewer #1 (Public Review):

      The authors investigated the metabolic effects of ∆9-THC, the main psychoactive component of cannabis, on early mouse embryonic cell types. They found that ∆9-THC increases proliferation in female mouse embryonic stem cells (mESCs) and upregulates glycolysis. Additionally, primordial germ cell-like cells (PGCLCs) differentiated from ∆9-THC-exposed cells also show alterations to their metabolism. The study is valuable because it shows that physiologically relevant ∆9-THC concentrations have metabolic effects on cell types from the early embryo, which may cause developmental effects. However, the claim of "metabolic memory" is not justified by the current data, since the effects on PGCLCs could potentially be due to ∆9-THC persisting in the cultured cells over the course of the experiment, even after the growth medium without ∆9-THC was added.

      The study shows that ∆9-THC increases the proliferation rate of mESCs but not mEpiLCs, without substantially affecting cell viability, except at the highest dose of 100 µM which shows toxicity (Figure 1). Treatment of mESCs with rimonabant (a CB1 receptor antagonist) blocks the effect of 100 nM ∆9-THC on cell proliferation, showing that the proliferative effect is mediated by CB1 receptor signaling. Similarly, treatment with 2-deoxyglucose, a glycolysis inhibitor, also blocks this proliferative effect (Figure 4G-H). Therefore, the effect of ∆9-THC depends on both CB1 signaling and glycolysis. This set of experiments strengthens the conclusions of the study by helping to elucidate the mechanism of the effects of ∆9-THC.

      Although several experiments independently showed a metabolic effect of ∆9-THC treatment, this effect was not dose-dependent over the range of concentrations tested (10 nM and above). Given that metabolic effects were observed even at 10 nM ∆9-THC (see for example Figure 1C and 3B), the authors should test lower concentrations to determine the dose-dependence and EC50 of this effect. The authors should also compare their observed EC50 with the binding affinity of ∆9-THC to cellular receptors such as CB1, CB2, and GPR55 (reported by other studies).

      The study also profiles the transcriptome and metabolome of cells exposed to 100 nM ∆9-THC. Although the transcriptomic changes are modest overall, there is upregulation of anabolic genes, consistent with the increased proliferation rate in mESCs. Metabolomic profiling revealed a broad upregulation of metabolites in mESCs treated with 100 nM ∆9-THC.

      Additionally, the study shows that ∆9-THC can influence germ cell specification. mESCs were differentiated to mEpiLCs in the presence or absence of ∆9-THC, and the mEpiLCs were subsequently differentiated to mPGCLCs. mPGCLC induction efficiency was tracked using a BV:SC dual fluorescent reporter. ∆9-THC treated cells had a moderate increase in the double positive mPGCLC population and a decrease in the double negative population. A cell tracking dye showed that mPGCLCs differentiated from ∆9-THC treated cells had undergone more divisions on average. As with the mESCs, these mPGCLCs also had altered gene expression and metabolism, consistent with an increased proliferation rate.

      My main criticism is that the current experimental setup does not distinguish between "metabolic memory" vs. carryover of THC (or its metabolites) causing metabolic effects. The authors assume that their PGCLC induction was performed "in the absence of continuous exposure" but this assumption may not be justified. ∆9-THC might persist in the cells since it is highly hydrophobic. In order to rule out the persistence of ∆9-THC as an explanation of the effects seen in PGCLCs, the authors should measure concentrations of ∆9-THC and THC metabolites over time during the course of their PGCLC induction experiment. This could be done by mass spectrometry. This is particularly important because 10 nM of ∆9-THC was shown to have metabolic effects (Figure 1C, 3B, etc.). Since the EpiLCs were treated with 100 nM, if even 10% of the ∆9-THC remained, this could account for the metabolic effects. If the authors want to prove "metabolic memory", they need to show that the concentration of ∆9-THC is below the minimum dose required for metabolic effects.

      Overall, this study is promising but needs some additional work in order to justify its conclusions. The developmental effects of ∆9-THC exposure are important for society to understand, and the results of this study are significant for public health.

      *Reviewer #1 (Recommendations For The Authors):

      This has the potential to be a good study, but it's currently missing two key experiments:

      What is the minimum dose of ∆9-THC required to see metabolic effects?

      We would like to thank Reviewer 1 for their insightful comments. We have included exposures to lower doses of ∆9-THC in Supplementary Figure 1. Our data shows that ∆9-THC induces mESCs proliferation from 1nM onwards. However, when ESCs and EpiLCs were exposed to 1nM of ∆9-THC, no significant change in mPGCLCs induction was observed (updated Figure 6B). Of note, in their public review, Reviewer 1 mentioned that “The authors should also compare their observed EC50 with the binding affinity of ∆9-THC to cellular receptors such as CB1, CB2, and GPR55 (reported by other studies).” According to the literature, stimulation of non-cannabinoid receptors and ion channels (including GPR18, GPR55, TRPVs, etc.) occurs at 40nM-10µM of ∆9-THC (Banister et al., 2019). We therefore expect that at the lower nanomolar range tested, CB1 is the main receptor stimulated by ∆9-THC, as we showed for the 100nM dose in our rimonabant experiments (Fig. 2).

      Is the residual THC concentration during the PGCLC induction below this minimum dose? Even if the effects are due to residual ∆9-THC, this would not undermine the overall study. There would simply be a different interpretation of the results.

      This experiment was particularly important to distinguish between a “true” ∆9-THC metabolic memory or residual ∆9-THC leftover during PGCLCs differentiation. Our mass spectrometry quantification revealed that no significant ∆9-THC could be detected in day 5 embryoid bodies compared to treated EpiLCs prior to differentiation (Supplementary Figure 13). These results support the existence of ∆9-THC metabolic memory across differentiation.

      You also do not mention whether you tested your cells for mycoplasma. This is important since mycoplasma contamination is a common problem that can cause artifactual results. Please test your cells and report the results.

      All cells were tested negative for mycoplasma by a PCR test (ATCC® ISO 9001:2008 and ISO/IEC 17025:2005 quality standards). This information has been added in the Material and Methods section.

      Minor points:

      1. I don't think it's correct to say that cannabis is the most commonly used psychoactive drug. Alcohol and nicotine are more commonly used. See: https://nida.nih.gov/research-topics/alcohol and https://www.cancer.gov/publications/dictionaries/cancer-terms/def/psychoactive-substance I looked at the UN drugs report [ref 1] and alcohol or nicotine were not included on that list of drugs, so the UN may use a different definition. This doesn't affect the importance or conclusions of this study, but the wording should be changed.

      We agree and are now following the WHO description of cannabis (https://www.who.int/teams/mental-health-and-substance-use/alcohol-drugs-and-addictive-behaviours/drugs-psychoactive/cannabis) by referring to it as the “most widely used illicit drug in the world”. (Line 44).

      1. It would be informative to use your RNA-seq data to examine the expression of receptors for ∆9-THC such as CB1, CB2, and GPR55. CB1 might be the main one, but I am curious to see if others are present.

      We have explored the protein expression of several cannabinoid receptors, including CB2, GPR18, GPR55 and TRPV1 (Bannister et al., 2019). These proteins, except TRPV1, were lowly expressed in mouse embryonic stem cells compared to the positive control (mouse brain extract, see Author response image 1). Furthermore, our experiment with Rimonabant showed that the proliferative effects of ∆9-THC are mediated through CB1.

      Author response image 1.

      Cannabinoid receptors and non-cannabinoid receptors protein expression in mouse embryonic stem cells.

      1. Make sure to report exact p-values. You usually do this, but there are a few places where it says p<0.0001. Also, report whether T-tests assumed equal variance (Student's) or unequal variance (Welch's). [In general, it's better to use unequal variance, unless there is good reason to assume equal variance.]

      Prism, which was used for statistical analyses, only reports p-values to four decimal places. For all p-values that were p<0.0001, the exact decimals were calculated in Excel using the “=T.DIST.2T(t, df)” function, where the Student’s distribution and the number of degrees of freedom computed by Prism were inputted. Homoscedasticity was confirmed for all statistical analyses in Prism.

      1. Figure 2A: An uncropped gel image should be provided as supplementary data. Additionally, show positive and negative controls (from cells known to either express CB1 or not express CB1)

      The uncropped gel image is presented in Author response image 2. The antibody was validated on mouse brain extracts as a positive control as shown in Figure 1.

      Author response image 2.

      Uncropped gel corresponding to Fig. 2A where an anti-CB1 antibody was used.

      1. Figure 6B: Please show a representative gating scheme for flow cytometry (including controls) as supplementary data. Also, was a live/dead stain used? What controls were used for compensation? These details should be reported.

      The gating strategy is presented in Supplementary Figure 11. The Material and Methods section has also been expanded.

      1. As far as I can tell, you only used female mESCs. It would be good to test the effects on male mESCs as well since these have some differences due to differences in X-linked gene expression (female mESCs have two active X chromosomes). I understand that you might not have a male BV:SC reporter line, so it would be acceptable to omit the mPGCLC experiments on male cells.

      We have tested the 10nM-100µM dose range in the male R8 mESCs (Supplementary Figure 3). Similar results as with the female H18 cells were observed. Accordingly, PGCLCs induction was increased when R8 ESCs + EpiLCs were exposed to 100nM of ∆9-THC (Supplementary Figure 12). This is in line with ∆9-THC impact on fundamentally conserved metabolic pathways across species and sex, although it should be noted that one representative model of each sex is not sufficient to exclude sex-specific effects.

      Reviewer #2 (Public Review):

      In the study conducted by Verdikt et al, the authors employed mouse Embryonic Stem Cells (ESCs) and in vitro differentiation techniques to demonstrate that exposure to cannabis, specifically Δ9-tetrahydrocannabinol (Δ9-THC), could potentially influence early embryonic development. Δ9-THC was found to augment the proliferation of naïve mouse ESCs, but not formative Epiblast-like Cells (EpiLCs). This enhanced proliferation relies on binding to the CB1 receptor. Moreover, Δ9-THC exposure was noted to boost glycolytic rates and anabolic capabilities in mESCs. The metabolic adaptations brought on by Δ9-THC exposure persisted during differentiation into Primordial Germ Cell-Like Cells (PGCLCs), even when direct exposure ceased, and correlated with a shift in their transcriptional profile. This study provides the first comprehensive molecular assessment of the effects of Δ9-THC exposure on mouse ESCs and their early derivatives. The manuscript underscores the potential ramifications of cannabis exposure on early embryonic development and pluripotent stem cells. However, it is important to note the limitations of this study: firstly, all experiments were conducted in vitro, and secondly, the study lacks analogous experiments in human models.

      Reviewer #2 (Recommendations For The Authors):

      1. EpiLCs, characterized as formative pluripotent stem cells rather than primed ones, are a transient population during ESC differentiation. The authors should consider using EpiSCs and/or formative-like PSCs (Yu et al., Cell Stem Cell, 2021; Kinoshita et al., Cell Stem Cell, 2021), and amend their references to EpiLCs as "formative".

      Indeed, EpiLCs are a transient pluripotent stem cell population that is “functionally distinct from both naïve ESCs and EpiSCs” and “enriched in formative phase cells related to pre-streak epiblast” (Kinoshita et al., Cell Stem Cell, 2021). Here, we used the differentiation system developed by M. Saitou and colleagues to derive PGCLCs (Hayashi et al, 2011). Since EpiSCs are refractory to PGCLCs induction (Hayashi et al, 2011), we used the germline-competent EpiLCs and took advantage of a well-established differentiation system to derive mouse PGCLCs. Most authors, however, agree that in terms of epigenetic and metabolic profiles, mouse EpiLCs represent a primed pluripotent state. We have added that PGCs arise in vivo “from formative pluripotent cells in the epiblast” on lines 85-86.

      1. Does the administration of Δ9-THC, at concentrations from 10nM to 1uM, alter the cell cycle profiles of ESCs?

      The proliferation of ESCs was associated with changes in the cell cycle, as presented in the new Supplementary Figure 2, which we discuss in lines 118-123.

      1. Could Δ9-THC treatment influence the differentiation dynamics from ESCs to EpiLCs?

      No significant changes were observed in the pluripotency markers associated with ESCs and EpiLCs (Supplementary Figure 9). We have added this information in lines 277-279.

      1. The authors should consider developing knockout models of cannabinoid receptors in ESCs and EpiLCs (or EpiSCs and formative-like PSCs) for control purposes.

      This is an excellent suggestion. Due to time and resource constraints, however, we focused our mechanistic investigation of the role of CB1 on the use of rimonabant which revealed a reversal of Δ9-THC-induced proliferation at 100nM.

      1. Lines 134-136: "Importantly, SR141716 pre-treatment, while not affecting cell viability, led to a reduced cell count compared to the control, indicating a fundamental role for CB1 in promoting proliferation." Regarding Figure 2D, does the Rimonabant "+" in the "mock" group represent treatment with Rimonabant only? If that's the case, there appears to be no difference from the Rimonabant "-" mock. The authors should present results for Rimonabant-only treatment.

      To be able to compare the effects +/- Rimonabant and as stated in the figure legend, each condition was normalized to its own control (mock with, or without Rimonabant). Author response image 3 is the unnormalized data showing the same effects of Δ9-THC and Rimonabant on cell number.

      Author response image 3.

      Unnormalized data corresponding to the Figure 2D.

      1. In Figure 3, both ESCs and EpiLCs show a significant decrease in oxygen consumption and glycolysis at a 10uM concentration. Do these conditions slow cell growth? BrdU incorporation experiments (Figure 1) seem to contradict this. With compromised bioenergetics at this concentration, the authors should discuss why cell growth appears unaffected.

      Indeed, we believe that cell growth is progressively restricted upon increasing doses of ∆9-THC (consider Supplementary Figure 2). In addition, oxygen consumption and glycolysis can be decoupled from cellular proliferation, especially considering the lower time ranges we are working with (44-48h).

      1. Beyond Δ9-THC exposure prior to PGCLCs induction, it would be also interesting to explore the effects of Δ9-THC on PGCLCs during their differentiation.

      We agree with the Reviewer. Our aim was to study whether exposure prior to differentiation could have an impact, and if so, what are the mediators of this impact. Full exposure during differentiation is another exposure paradigm that is relevant but would not have allowed us to show the metabolic memory of ∆9-THC exposure. Future work, however, will be dedicated to analyzing the effect of continuous exposure through differentiation.

      1. As PGC differentiation involves global epigenetic changes, it would be interesting to investigate how Δ9-THC treatment at the ESCs/EpiLCs stage may influence PGCLCs' transcriptomes.

      We also agree with the Reviewer. While this paper was not primarily focused on Δ9-THC’s epigenetic effects, we have explored the impact of Δ9-THC on more than 100 epigenetic modifiers in our RNA-seq datasets. These results are shown in Supplementary Table 1 and Supplementary Figure 10 and discussed in lines 301-316.

      1. Lines 407-408: The authors should exercise caution when suggesting "potentially adverse consequences" based solely on moderate changes in PGCLCs transcriptomes.

      We agree and have modified the sentence as follows: “Our results thus show that exposure to Δ9-THC prior to specification affects embryonic germ cells’ transcriptome and metabolome. This in turn could have adverse consequences on cell-cell adhesion with an impact on PGC normal development in vivo.“

      1. Investigating the possible impacts of Δ9-THC exposure on cultured mouse blastocysts, implantation, post-implantation development, and fertility could yield intriguing findings.

      We thank the Reviewer for this comment. We have amended our discussion to include these points in the last paragraph.

      1. Given that naïve human PSCs and human PGCLCs differentiation protocols have been established, the authors should consider carrying out parallel experiments in human models.

      We have performed Δ9-THC exposures in hESCs (Supplementary Figure 4 and Supplementary Figure 5), showing that Δ9-THC alters the cell number and general metabolism of these cells. We present these results in light of the differences in metabolism between mouse and human embryonic stem cells on lines 135-141 and 185-188. Implications of these results are discussed in lines 474-486.

      Reviewer #3 (Public Review):

      Verdikt et al. focused on the influence of Δ9-THC, the most abundant phytocannabinoid, on early embryonic processes. The authors chose an in vitro differentiation system as a model and compared the proliferation rate, metabolic status, and transcriptional level in ESCs, exposure to Δ9-THC. They also evaluated the change of metabolism and transcriptome in PGCLCs derived from Δ9-THC-exposed cells. All the methods in this paper do not involve the differentiation of ESCs to lineage-specific cells. So the results cannot demonstrate the impact of Δ9-THC on preimplantation developmental stages. In brief, the authors want to explore the impact of Δ9-THC on preimplantation developmental stages, but they only detected the change in ESCs and PGCLCs derived from ESCs, exposure to Δ9-THC, which showed the molecular characterization of the impact of Δ9-THC exposure on ESCs and PGCLCs.

      Reviewer #3 (Recommendations For The Authors):

      1. To demonstrate the impact of Δ9-THC on preimplantation developmental stages, ESCs are an appropriate system. They have the ability to differentiate three lineage-specific cells. The authors should perform differentiation experiments under Δ9-THC-exposure, and detect the influence of Δ9-THC on the differentiation capacity of ESCs, more than just differentiate to PGCLCs.

      We apologize for the lack of clarity in our introduction. We specifically looked at the developmental trajectory of PGCs because of the sensitivity of these cells to environmental insults and their potential contribution to transgenerational inheritance. We have expanded on these points in our introduction and discussion sections (lines 89-91 and 474-486). Because our data shows the relevance of Δ9-THC-mediated metabolic rewiring in ESCs subsisting across differentiation, we agree that differentiation towards other systems (neuroprogenitors, for instance) would yield interesting data, albeit beyond the scope of the present study.

      1. Epigenetics are important to mammalian development. The authors only detect the change after Δ9-THC-exposure on the transcriptome level. How about methylation landscape changes in the Δ9-THC-exposure ESCs?

      We have explored the impact of Δ9-THC on more than 100 epigenetic modifiers in our RNA-seq datasets. These results are shown in Supplementary Table 1 and Supplementary Figure 10, discussed in lines 301-316. While indeed the changes in DNA methylation profiles appear relevant in the context of Δ9-THC exposure (because of Tet2 increased expression in EpiLCs), we highlight that other epigenetic marks (histone acetylation, methylation or ubiquitination) might be relevant for future studies.

      1. In the abstract, the authors claimed that "the results represent the first in-depth molecular characterization of the impact of Δ9-THC exposure on preimplantation developmental stages." But they do not show whether the Δ9-THC affects the fetus through the maternal-fetal interface.

      We have addressed the need for increased clarity and have modified the sentence as follows: “These results represent the first in-depth molecular characterization of the impact of Δ9-THC exposure on early stages of the germline development.”

      1. To explore the impact of cannabis on pregnant women, the human ESCs may be a more proper system, due to the different pluripotency between human ESCs and mouse ESCs.

      We have performed Δ9-THC exposures in hESCs (Supplementary Figure 4 and Supplementary Figure 5). These preliminary results show that Δ9-THC exposure negatively impacts the cell number and general metabolism of hESCs. With the existence of differentiation systems for hPGCLCs, future studies will need to assess whether Δ9-THC-mediated metabolic remodelling is also carried through differentiation in human systems. We discuss these points in the last paragraph of our discussion section.

      1. All the experiments are performed in vitro, and the authors should validate their results in vivo, at least a Δ9-THC-exposure pregnant mouse model.

      Our work is the first of its kind to show that exposure to a drug of abuse can alter the normal development of the embryonic germline. We agree with the Reviewer that to demonstrate transgenerational inheritance of the effects reported here, future experiments in an in vivo mouse model should be conducted. The metabolic remodeling observed upon cannabis exposure could also be directly studied in a human context, although these experiments would be beyond the scope of the present study. For instance, changes in glycolysis may be detected in pregnant women using cannabis, or directly measured in follicular fluid in a similar manner as done by Fuchs-Weizman and colleagues (Fuchs-Weizman et al., 2021). We hope that our work can provide the foundation to inform such in vivo studies.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study is an important advancement to the consideration of antimalarial drug resistance: the authors make use of both modelling results and supporting empirical evidence to demonstrate the role of malaria strain diversity in explaining biogeographic patterns of drug resistance. The theoretical methods and the corresponding results are convincing, with the novel model presented moving beyond existing models to incorporate malaria strain diversity and antigen-specific immunity. This work is likely to be interesting to malaria researchers and others working with antigenically diverse infectious diseases.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper is an attempt to explain a geographic paradox between infection prevalence and antimalarial resistance emergence. The authors developed a compartmental model that importantly contains antigenic strain diversity and in turn antigen-specific immunity. They find a negative correlation between parasite prevalence and the frequency of resistance emergence and validate this result using empirical data on chloroquine-resistance. Overall, the authors conclude that strain diversity is a key player in explaining observed patterns of resistance evolution across different geographic regions.

      The authors pose and address the following specific questions:

      1. Does strain diversity modulate the equilibrium resistance frequency given different transmission intensities?

      2. Does strain diversity modulate the equilibrium resistance frequency and its changes following drug withdrawal?

      3. Does the model explain biogeographic patterns of drug resistance evolution?

      Strengths:

      The model built by the authors is novel. As emphasized in the manuscript, many factors (e.g., drug usage, vectorial capacity, population immunity) have been explored in models attempting to explain resistance emergence, but strain diversity (and strain-specific immunity) has not been explicitly included and thus explored. This is an interesting oversight in previous models, given the vast antigenic diversity of Plasmodium falciparum (the most common human malaria parasite) and its potential to "drive key differences in epidemiological features".

      The model also accounts for multiple infections, which is a key feature of malarial infections, with individuals often infected with either multiple Plasmodium species or multiple strains of the same species. Accounting for multiple infections is critical when considering resistance emergence, as with multiple infections there is within-host competition which will mediate the fitness of resistant genotypes. Overall, the model is an interesting combination of a classic epidemiological model (e.g., SIR) and a population genetics model.

      In terms of major model innovations, the model also directly links selection pressure via drug administration with local transmission dynamics. This is accomplished by the interaction between strain-specific immunity, generalized immunity, and host immune response.

      R: We thank the reviewer for his/her appreciation of the work.

      Weaknesses:

      In several places, the explanation of the results (i.e., why are we seeing this result?) is underdeveloped. For example, under the section "Response to drug policy change", it is stated that (according to the model) low diversity scenarios show the least decline in resistant genotype frequency after drug withdrawal; however, this result emerges mechanistically. Without an explicit connection to the workings of the model, it can be difficult to gauge whether the result(s) seen are specific to the model itself or likely to be more generalizable.

      R: We acknowledge that the explanation of certain results needs to be improved. We have now added the explanation of why low diversity scenarios show the least decline in resistance frequency after drug withdrawal: “Two processes are responsible for the observed trend: first, resistant genotypes have a much higher fitness advantage in low diversity regions even with reduced drug usage because infected hosts are still highly symptomatic; second, due to low transmission potential in low diversity scenarios (i.e., longer generation intervals between transmissions), the rate of change in parasite populations is slower.” (L243-247). We also compared the drug withdrawal response to that of the generalized-immunity-only model (L268-271). The medium transmission region has the fastest reduction in resistance frequency, followed by the high and low transmission regions, which differs from the full model that incorporates strain-specific diversity.

      In addition, to provide the context of different biogeographic transmission zones, we now include a new figure (now Fig. 3) that presents the parameter space of transmission potential and strain diversity of different continents, which demonstrates that PNG and South America have less strain diversity than expected by transmission potential (L179-184 and L198-202). Therefore, these two regions have low disease prevalence and high resistance frequency.

      The authors emphasize several model limitations, including the specification of resistance by a single locus (thus not addressing the importance of recombination should resistance be specified by more than one locus); the assumption that parasites are independently and randomly distributed among hosts (contrary to empirical evidence); and the assumption of a random association between the resistant genotype and antigenic diversity. However, each of these limitations is addressed in the discussion.

      R: As pointed out by the referee, our model presents several limitations that have all been addressed in the discussion and considered for future extensions.

      Did the authors achieve their goals? Did the results support their conclusion?

      Returning to the questions posed by the authors:

      1. Does strain diversity modulate the equilibrium resistance frequency given different transmission intensities? Yes. The authors demonstrate a negative relationship between prevalence/strain diversity and resistance frequency (Figure 2).

      2. Does strain diversity modulate the equilibrium resistance frequency and its changes following drug withdrawal? Yes. The authors find that, under resistance invasion and some level of drug treatment, resistance frequency decreased with the number of strains (Figure 4). The authors also find that lower strain diversity results in a slower decline in resistant genotypes after drug withdrawal and higher equilibrium resistance frequency (Figure 6).

      3. Does the model explain biogeographic patterns of drug resistance evolution? Yes. The authors find that their full model (which includes strain-specific immunity) produces the empirically observed negative relationship between resistance and prevalence/strain diversity, while a model only incorporating generalised immunity does not (Figure 8).

      Utility of work to others and relevance within and beyond the field?

      This work is important because antimalarial drug resistance has been an ongoing issue of concern for much of the 20th century and now 21st century. Further, this resistance emergence is not equitably distributed across biogeographic regions, with South America and Southeast Asia experiencing much of the burden of this resistance emergence. Not only can widespread resistant strains be traced back to these two relatively low-transmission regions, but these strains remain at high frequency even after drug treatment ceases.

      Reviewer #2 (Public Review):

      Summary:

      The evolution of resistance to antimalarial drugs follows a seemingly counterintuitive pattern, in which resistant strains typically originate in regions where malaria prevalence is relatively low. Previous investigations have suggested that frequent exposures in high-prevalence regions produce high levels of partial immunity in the host population, leading to subclinical infections that go untreated. These subclinical infections serve as refuges for sensitive strains, maintaining them in the population. Prior investigations have supported this hypothesis; however, many of them excluded important dynamics, and the results cannot be generalized. The authors have taken a novel approach using a deterministic model that includes both general and adaptive immunity. They find that high levels of population immunity produce refuges, maintaining the sensitive strains and allowing them to outcompete resistant strains. While general population immunity contributed, adaptive immunity is key to reproducing empirical patterns. These results are robust across a range of fitness costs, treatment rates, and resistance efficacies. They demonstrate that future investigations cannot overlook adaptive immunity and antigenic diversity.

      R: We thank the reviewer for his/her appreciation of the work.

      Strengths:

      Overall, this is a very nice paper that makes a significant contribution to the field. It is well-framed within the body of literature and achieves its goal of providing a generalizable, unifying explanation for otherwise disparate investigations. As such, this work will likely serve as a foundation for future investigations. The approach is elegant and rigorous, with results that are supported across a broad range of parameters.

      Weaknesses:

      Although the title states that the authors describe resistance invasion, they do not support or even explore this claim. As they state in the discussion (line 351), this work predicts the equilibrium state and doesn't address temporal patterns. While refuges in partially immune hosts may maintain resistance in a population, they do not account for the patterns of resistance spread, such as the rapid spread of chloroquine resistance in Africa once it was introduced from Asia.

      R: We do agree that resistance invasion is not the focus of our manuscript. Rather we mainly investigate the maintenance and decline after drug withdrawal. Therefore, we changed the title to “Antigenic strain diversity predicts different biogeographic patterns of maintenance and decline of anti-malarial drug resistance” (L1-4).

      We did, however, present a fast initial invasion phase for the introduction of resistant genotypes regardless of transmission scenarios in Fig. 5 (now Fig. 6). Even though the focus of the manuscript is to investigate long term persistence of resistant genotypes, we did emphasize that the initial invasion phase and how that changes the host immunity profile are key to the coexistence of resistant and wild-type genotypes (L228-239).

      As the authors state in the discussion, the evolution of compensatory mutations that negate the cost of resistance is possible, and in vitro experiments have found evidence of such. It appears that their results are dependent on there being a cost, but the lower range of the cost parameter space was not explored.

      R: It is true that compensatory mutations might mitigate the negative fitness consequences. We didn’t add a no-cost scenario because in general if there is no cost but only benefit (survival through drug usage), then resistant haplotypes will likely be fixed in the population. This is contingent on the assumption that these compensatory mutations are in perfect linkage with resistant alleles, which is unlikely in high-transmission scenarios. Our model does not incorporate recombination, but earlier models (Dye & Williams 1997, Hastings & D’Alessandro 2000) have demonstrated that recombination will delay the fixation of resistant alleles in high-transmission.

      As suggested, we ran our model with costs equal 0 and 0.01 (Fig. 2C and L189-191). We found that resistant alleles almost always fix except for when diversity is extremely high, treatment/resistance efficacy is low. In these cases, additional benefits brought by more transmission from resistant alleles do not bring many benefits (as lower GI classes have a very small number of hosts). This finding does not contradict a wider range of coexistence between wild-type and resistant alleles when the cost is higher. We therefore added these scenarios to our updated results.

      Author response image 1.

      The use of a deterministic, compartmental model may be a structural weakness. This means that selection alone guides the fixation of new mutations on a semi-homogenous adaptive landscape. In reality, there are two severe bottlenecks in the transmission cycle of Plasmodium spp., introducing a substantial force of stochasticity via genetic drift. The well-mixed nature of this type of model is also likely to have affected the results. In reality, within-host selection is highly heterogeneous, strains are not found with equal frequency either in the population or within hosts, and there will be some linkage between the strain and a resistance mutation, at least at first. Of course, there is no recourse for that at this stage, but it is something that should be considered in future investigations.

      R: We thank the reviewer for their insightful comments on the constraints of the deterministic modeling approach. We’ve added these points to discussion in the paragraph discussing the second limitation of the model (L359-364).

      The authors mention the observation that patterns of resistance in high-prevalence Papua New Guinea seem to be more similar to Southeast Asia, perhaps because of the low strain diversity in Papua New Guinea. However, they do not investigate that parameter space here. If they did and were able to replicate that observation, not only would that strengthen this work, it could profoundly shape research to come.

      R: We appreciate the suggestion to investigate the parameter space of Papua New Guinea. We now include a new figure (now Fig. 3) that presents the parameter space of transmission potential and strain diversity of different continents, which demonstrates that PNG and South America have less strain diversity than expected by transmission potential (L179-184 and L198-202). This translates to low infectivity for most mosquito bites, and most infections only occur in hosts with lower generalized immunity. Therefore resistant genotypes will help ensure disease transmission in these symptomatic hosts and be strongly selected to be maintained.

      Reviewer #1 (Recommendations For The Authors):

      1. I found lines 41-49 difficult to follow. Please rephrase (particularly punctuation) for clarity.

      R: We have edited the lines to improve the writing (L41-50)):

      “Various relationships between transmission intensity and stable frequencies of resistance were discovered, each of which has some empirical support: 1) transmission intensity does not influence the fate of resistant genotypes [Models: Koella and Antia (2003); Masserey et al. (2022); Empirical: Diallo et al. (2007); Shah et al. (2011, 2015)]; 2) resistance first increases in frequency and slowly decreases with increasing transmission rates [Models: Klein et al. (2008, 2012)]; and 3) Valley phenomenon: resistance can be fixed at both high and low end of transmission intensity [Model: Artzy-Randrup et al. (2010); Empirical: Talisuna et al. (2002)]. Other stochastic models predict that it is harder for resistance to spread in high transmission regions, but patterns are not systematically inspected across the parameter ranges [Model: Whitlock et al. (2021); Model and examples in Ariey and Robert (2003)].”

      1. Line 65: There should be a space after "recombination" and before the citation.

      R: Thank you for catching the error. We’ve added the space (L64).

      1. I'm interested in the dependency of the results on the assumption that there is a cost to resistance via lowered transmissibility (lines 142-145). I appreciate that variation in the cost(s) of resistance in single and mixed infections is explored; however, from what I can tell the case of zero cost is not explored.

      R: As suggested, we have now added the no-cost scenario. Please see the response to the Reviewer2 weaknesses paragraph 2.

      1. I felt the commentary/explanation of the response to drug policy change was a bit underdeveloped. I would have liked a walk-through of why in your model low diversity scenarios show the slowest decline in resistant genotypes after switching to different drugs.

      R: We acknowledge that the explanation of the response to drug policy change needs to be improved. We have now added the explanation of why we observe low diversity scenarios show the least decline in resistance frequency after drug withdrawal: “Two processes are responsible for the seen trend: first, resistant genotypes have a much higher fitness advantage in low diversity regions even with reduced drug usage because infected hosts are still highly symptomatic; second, due to low transmission potential in low diversity scenarios (i.e., longer generation intervals between transmissions), the rate of change in parasite populations is slower.” (L243-247). We also compared the drug withdrawal response to that of the generalized-immunity-only model. The medium transmission region has the fastest reduction in resistance frequency, followed by the high and low transmission regions, which differs from the full model that incorporates strain-specific diversity.

      1. Line 352: persistent drug usage?

      R: Yes, we meant persistent drug usage. We’ve clarified the writing (L389-391).

      1. The organisation of the manuscript would benefit from structuring around the focal questions so that the reader can easily find the answers to the focal questions within the results and discussion sections.

      R: This is a great suggestion. We modified the subheadings of results to provide answers to focal questions (L151, L179, L203-204, and L240).

      1. Line 353: Please remove either "shown" or "demonstrated".

      R: Thank you for catching the grammatical error, we’ve retained “shown” only for the sentence (L391-392).

      Reviewer #2 (Recommendations For The Authors):

      Overall, this was very nice work and a pleasure to read.

      Major:

      1. Please provide a much more thorough explanation of how resistance invasions are modeled. It is not clear from the text and could not be replicated.

      R: We have now added a section “drug treatment and resistance invasion” in Methods and Materials to explain how resistance invasions are modeled (L488-496):

      “Given each parameter set, we ran the ODE model six times until equilibrium with the following genotypic compositions: 1) wild-type only scenario with no drug treatment; 2) wild-type only scenario with 63.2% drug treatment (0.05 daily treatment rate); 3) wild-type only scenario with 98.2% drug treatment (0.2 daily treatment rate); 4) resistant-only scenario with no drug treatment; 5) resistance invasion with 63.2% drug treatment; 6) resistance invasion with 98.2% drug treatment. Runs 1-4 start with all hosts in G0,U compartment and ten parasites. Runs 5 and 6 (resistance invasion) start from the equilibrium state of 2 and 3, with ten resistant parasites introduced. We then followed the ODE dynamics till the next equilibrium.”

      1. Please make your raw data, code, and replicable examples that produce the figures in the manuscript available.

      R: We have added the data availability session, which provides the GitHub site with all the code for the model, data processing, and figures: All the ODE codes, numerically-simulated data, empirical data, and analyzing scripts are publicly available at https://github.itap.purdue.edu/HeLab/MalariaResistance.

      1. Regarding the limitations described in the paragraph about the model in the public response, these results would be strengthened if there were separate compartments for strains which could be further divided into sensitive and resistant. Could you explore this for at least a subset of the parameter space?

      R: In our model, sensitive and resistant pathogens are always modeled as separate compartments (Fig. S1B and Appendix 1). In Results/Model structure, L135-136, we stated the setup:

      “The population sizes of resistant (PR) or sensitive (wild-type; PW) parasites are tracked separately in host compartments of different G and drug status.”

      1. To what extent do these results rely on a cost to resistance? Were lower costs explored? This would be worth demonstrating. If this cannot be maintained without cost, do you think this is because there is no linkage between strain and resistance?

      R: As suggested, we have now added the no-cost scenario (Fig. 2C and L189-191). Please see the response to the Reviewer1 weaknesses paragraph 2. In sum, under a no-cost scenario, if treatment rate is low, then wild-type alleles will still be maintained in high transmission scenarios; when treatment rate is high, resistant alleles will always be fixed.

      Minor:

      1. "Plasmodium" should be italicized throughout. Ironically, italics aren't permitted in this form.

      R: We did italicize “Plasmodium” or “P. falciparum” throughout the text. If the reviewer is referring to “falciparum malaria”, the convention is not to italicize falciparum in this case.

      1. Fig 1A: the image is reversed for the non-infected host with prior exposure to strain A. Additionally, the difference between colors for WT and resistant is not visible in monochrome.

      R: Thank you for pointing out the problem of color choice in monochrome. We have modified the figure. The image in Fig 1A is not reversed for non-infected hosts with prior exposure to strain A. We now spell out “S” to be “specific immunity”, and explain it better in the figure legend.

      1. Fig 2B: add "compare to the pattern of prevalence shown in Fig 2A" or something similar to make the comparison immediately clear.

      R: We thank the reviewer’s suggestion. We’ve added a sentence to contrast Fig 2A and B in the Figure legend: “A comparison between the prevalence pattern in (A) and resistance frequency in (B) reveals that high prevalence regions usually correspond to low resistance frequency at the end of resistance invasion dynamics.”

      1. Figs 2B & C: Please thoroughly explain how you produced this data in the methods section and briefly describe it in the results sections.

      R: We agree that the modeling strategies need to be explained better. Since we explained the rationale for the parameter ranges and the prevalence patterns we observe in the results section “Appropriate pairing of strain diversity and vectorial capacity” (now “Impact of strain diversity and transmission potential on disease prevalence”), we added sentences in this section to explain how we run models until equilibrium for wild-only infections with or without drug treatment (L152-178). Then in the following section “Drug-resistance and disease prevalence” section, we explain how we obtained the resistance invasion data:

      “To investigate resistance invasion, we introduce ten resistant infections to the equilibrium states of drug treatment with wild-type only infections, and follow the ODE dynamics till the next equilibrium” (L180-181).

      1. Fig 3: The axis labels are not particularly clear. For the Y axis, please state in the label what it is the frequency of (either the mutation or the phenotype). In the X axis, it is better to spell that out in words, like "P. falciparum prevalence in children".

      R: Thank you for pointing this out. We’ve modified the axes labels of Fig. 3 (now Fig. 4): X-axis: “P. falciparum prevalence in children aged 2-10”; Y-axis: “Frequency of resistant genotypes (pfcrt 76T)”.

      1. Fig 4 and the rest of the figures of this nature: Showing an equilibrium-state timestep before treatment was introduced would improve the readers' understanding of the dynamics.

      R: We agree that the equilibrium state before treatment is important. In fact, we have those states in our figure 4 (now figure 5): the left panel- “Daily treatment rate 0” indicates the equilibrium-state timestep before treatment. We clarified this point in the caption.

      1. Fig 5 is very compelling, but the relationships in Fig 5 would be clearer if the Y axes were not all different. Consider using the same scale for the hosts, and the same scale for resistant parasites (both conditions) and WT parasites, 113 strains. It may be clearer to reference them if they are given as A-F instead of three figures each for A and B.

      R: We agree with the suggested changes and have modified figure 5 (now Fig. 6): we used one Y-axis scale for the hosts, and one Y-axis scale for the parasites. The wild-type one is very low for the low diversity scenario, thus we included one inset plot for that case.

      1. Fig 5 caption: High immune protection doesn't select against resistance. The higher relative fitness of the sensitive strain selects against resistance in a high-immunity environment.

      R: Thank you for pointing this out. Here we meant that a reduction in resistant population after the initial overshoot occurs in both diversity levels. We are not comparing resistant strains to sensitive ones. We’ve modified the sentence to: “The higher specific immunity reduces the infectivity of new strains, leading to a reduction of the resistant parasite population regardless of the diversity level”.

      1. Line 242: "keep" should be plural.

      R: We’ve corrected “keep” to “keeps” (L267).

      1. Line 360 and elsewhere: The strength of the results is somewhat overstated at times. This absolutely supports the importance of strain-specific immunity, but these results do not explain patterns of the origin of resistance and there are a number of factors that are not incorporated (a necessary evil of modeling to be sure).

      R: Thank you for pointing this out. We’ve modified discussion to remove the overstated strength of results:

      1) Original: “The inclusion of strain diversity in the model provides a new mechanistic explanation as to why Southeast Asia has been the original source of resistance to certain antimalarial drugs, including chloroquine.”

      Modified: “The inclusion of strain diversity in the model provides a new mechanistic explanation as to why Southeast Asia has persisting resistance to certain antimalarial drugs, including chloroquine, despite a lower transmission intensity than Africa. “ (L328-330)

      2) In sum, we show that strain diversity and associated strain-specific host immunity, dynamically tracked through the macroparasitic structure, can explainpredict the complex relationship between transmission intensity and drug-resistance frequencies.

      1. The color palettes are not discernible in grayscale, especially the orange/blue/gray in Fig 2. The heatmaps appear to be in turbo, the only viridis palette that isn't grayscale-friendly. Just something to keep in mind for the accessibility of individuals with achromatopsia and most people who print out papers.

      R: Thank you for the visualization suggestions. We updated all the figures with the “viridis:magma” palette. As for the orange/blue/gray scale used in Fig 2C, it is difficult to pick nine colors that are discernable in brightness in grayscale. Currently, the four colors correspond to clonal genotype cost (i.e. green, red, grey, and blue), and the three-level brightness maps to mixed genotype cost.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Heitmann et al introduce a novel method for predicting the potential of drug candidates to cause Torsades de Pointes using simulations. Despite the fact that a multitude of such methods have been proposed in the past decade, this approach manages to provide novelty in a way that is potentially paradigm-shifting. The figures are beautiful and manage to convey difficult concepts intuitively.

      Strengths:

      (1) Novel combination of detailed mechanistic simulations with rigorous statistical modeling

      (2) A method for predicting drug safety that can be used during drug development (3) A clear explication of difficult concepts.

      Weaknesses:

      (1) In this reviewer's opinion, the most important scientific issue that can be addressed is the fact that when a drug blocks multiple channels, it is not only the IC50 but also the Hill coefficient that can differ. By the same token, two drugs that block the same channel may have identical IC50s but different Hill coefficients. This is important to consider since concentration-dependence is an important part of the results presented here. If the Hill coefficients were to be significantly different, the concentration- dependent curves shown in Figure 6 could look very different.

      See our response below.

      (2) The curved lines shown in Figure 6 can initially be difficult to comprehend, especially when all the previous presentations emphasized linearity. But a further issue is obscured in these plots, which is the fact that they show a two-dimensional projection of a 4dimensional space. Some of the drugs might hit the channels that are not shown (INaL & IKs), whereas others will not. It is unclear, and unaddressed in the manuscript, how differences in the "hidden channels" will influence the shapes of these curves. An example, or at least some verbal description, could be very helpful.

      See our response below.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript is generally well-written (with one important exception, see below). The manuscript can be improved with a few suggested modifications, ordered from most important to least important.

      (1) In this reviewer's opinion, the most important scientific issue that the authors need to address is the fact that when a drug blocks multiple channels, it is not only the IC50 but also the Hill coefficient that can differ. By the same token, two drugs that block the same channel may have identical IC50s but different Hill coefficients. This is important to consider since concentration-dependence is an important part of the results presented here.

      In a recent study (Varshneya et al, CPT PSP 2021 (PMID: 33205613)) they originally ran simulations with Hill coefficients of 1 for all the 4 drugs and 7 channels, then re-ran the simulations with differing Hill coefficients. The results were quantitatively quite different than what was originally obtained, even though the overall trends were identical. A look at the table provided in that paper's supplement shows that the estimated Hill coefficients range from 0.5 to 1.9, which is a pretty wide range.

      In this case, I don't think the authors should re-run the entire analysis. That would require entirely too much work and potentially detract from the elegant presentation of the manuscript in its current form. Although I haven't looked at the Llopis-Lorente dataset recently, I doubt that reliable Hill coefficients have been obtained for all 105 drugs. However, the Crumb et al dataset (PMID: 27060526) does provide this information for 30 drugs.

      Perhaps the authors could choose an example of two drugs that affect similar channels but with differences in the estimated Hill coefficients. Or even a carefully-designed hypothetical example could be of value. At the very least, Hill coefficients need to be mentioned as a limitation, but this would be stronger if it were coupled with at least some novel analyses.

      We fixed the Hill coefficients to h=1 because there is no evidence for co-operative drug binding in the literature that would require coefficients other than one. There is also the practical matter that only 17 of the 109 drugs in the dataset have a complete set of Hill coefficients. We have revised the Methods (Drug datasets) to make these justifications explicit:

      Lines 560-566: “… We also fixed the Hill coefficients at h = 1 because (i) there is no evidence for co-operative drug binding in the literature, and thus no theoretical justification for using coefficients other than one; (ii) only 17 of the 109 drugs in the dataset had a complete set of Hill coefficients (hCaL, hKr, hNaL, hKs) anyway. …”

      Out of interest, we re-ran our analysis using only those n=17 drugs (Amiodarone, Amitriptyline, Bepridil, Chlorpromazine, Diltiazem, Dofetilide, Flecainide, Mibefradil, Moxifloxacin, Nilotinib, Ondansetron, Quinidine, Quinine, Ranolazine, Saquinavir, Terfenadine and Verapamil). When the Hill coefficients were fixed at h=1, the prediction accuracy was 88.2% irrespective of the dosage (Author response image 1). When we used the estimated (free) Hill coefficients, the prediction accuracy remained unchanged (88.2%) for all doses except the lowest (1x to 2x) where it dropped to 82.4%. We concluded that using the Hill coefficients from the dataset made little difference to the results.

      Author response image 1.

      (2) I initially had a hard time understanding the curved lines shown in Figure 6 when all the previous presentations emphasized linearity. After thinking for a while, I was able to get it, but there was a further issue that I still struggle with. That is the fact that the plots all show a two-dimensional projection of a 4-dimensional space. Some of the drugs might hit the channels that are not shown (INaL & IKs), whereas others will not. How will differences in the "hidden channels" influence the shapes of these curves? An example, or at least some verbal description, could be very helpful.

      We omitted GKs and GNaL from Figure 6 because they added little to the story. Those “hidden” channels operate in the same manner as GKr and GNaL. They are shown in Supplementary Dataset S1. We have included more explicit references to the Supplementary in both the main text and the caption of Figure 6. We have also rewritten the section on ‘The effect of dosage on multi-channel block’ (lines 249-268) to better convey that the drug acts in four dimensions.

      (3) I also struggled a bit with Figure 3 and the section "Drug risk metric." What made this confusing was the PQR notation on the figure and the equations represented as A and B. Can these be presented in a common notation, or can the relationship be defined?

      We have replaced the PQR notation in Figure 3A with vector notation A and B to be consistent with the equations.

      Also in Figure 3B, I was unclear about the units on the x-axis. Is each step (e.g. from 0 to 1) the same distance as a single log unit along the abscissa or ordinate in Figure 3A?

      Yes it is. We have revised the caption for Figure 3B to explain it better.

      (4) The manuscript manages to explain difficult concepts clearly, and it is generally wellwritten. The important exception, however, is that the manuscript contains far too many sentence fragments. These often occur when the authors explain a difficult concept, then follow up with something that is essentially "and this in addition" or "with the exception of this."

      Lines 220-223: "In comparison, Linezolid is an antibacterial agent that has no clinical evidence of Torsades (Class 4) even though it too blocks IKr. Albeit less than it blocks ICaL (Figure 5A, right)."

      Lines 242-245: "Conversely, Linezolid shifts the population 1.18 units away from the ectopic regime. So only 0.0095% of those who received Linezolid would be susceptible. A substantial drop from the baseline rate of 0.93%."

      There are several others that I didn't note, so the authors should perform a careful copy edit of the entire manuscript.

      Thank you. We have remediated the fragmented sentences throughout.

      Reviewer #2 (Public Review):

      Summary:

      In the paper from Hartman, Vandenberg, and Hill entitled "assessing drug safety, by identifying the access of arrhythmia and cardio, myocytes, electro physiology", the authors, define a new metric, the axis of arrhythmia" that essentially describes the parameter space of ion channel conductance combinations, where early after depolarization can be observed.

      Strengths:

      There is an elegance to the way the authors have communicated the scoring system. The method is potentially useful because of its simplicity, accessibility, and ease of use. I do think it adds to the field for this reason - a number of existing methods are overly complex and unwieldy and not necessarily better than the simple parameter regime scan presented here.

      Weaknesses:

      The method described in the manuscript suffers from a number of weaknesses that plague current screening methods. Included in these are the data quality and selection used to inform the drug-blocking profile. It's well known that drug measurements vary widely, depending on the measurement conditions.

      We agree and have added a new section to describe these limitations, as follows:

      Lines 467-478: Limitations. The method was evaluated using a dataset of drugs that were drawn from multiple sources and diverse experimental conditions (LlopisLorente et al., 2020). It is known that such measurements differ prominently between laboratories and recording platforms (Kramer et al., 2020). Some drugs in the dataset combined measurements from disparate experiments while others had missing values. Of all the drugs in the dataset, only 17 had a complete set of IC50 values for ICaL, IKr, INaL and IKs. The accuracy of the predictions are therefore limited by the quality of the drug potency measurements.

      There doesn't seem to be any consideration of pacing frequency, which is an important consideration for arrhythmia triggers, resulting from repolarization abnormalities, but also depolarization abnormalities.

      It is true that we did not consider the effect of pacing frequency. We have included this in the limitations:

      Lines 479-485: The accuracy of the axis of arrhythmia is likewise limited by the quality of the biophysical model from which it is derived. The present study only investigated one particular variant of the ORd model (O’Hara et al., 2011; KroghMadsen et al., 2017) paced at 1 Hz. Other models and pacing rates are likely to produce differing estimates of the axis.

      Extremely high doses of drugs are used to assess the population risk. But does the method yield important information when realistic drug concentrations are used?

      Yes it does. The drugs were assessed across a range of doses from 1x to 32x therapeutic dose (Figure 8A). The prediction accuracy at low doses is 88.1%.

      In the discussion, the comparison to conventional approaches suggests that the presented method isn't necessarily better than conventional methods.

      The comparison is not just about accuracy. Our method achieves the same results at greatly reduced computational cost without loss of biophysical interpretation. We emphasise this in the Conclusion:

      Lines 446-465: Conclusion. Our approach resolves the debate between model complexity and biophysical realism by combining both approaches into the same enterprise. Complex biophysical models were used to identify the relationship between ion channels and torsadogenic risk — as it is best understood by theory. Those findings were then reduced to a simpler linear model that can be applied to novel drugs without recapitulating the complex computer simulations. The reduced model retains a bio-physical description of multi-channel drug block, but only as far as necessary to predict the likelihood of early after-depolarizations. It does not reproduce the action potential itself. Our approach thus represents a convergence of biophysical and simple models which retains the essential biophysics while discarding the unnecessary details. We believe the benefits of this approach will accelerate the adoption of computational assays in safety pharmacology and ultimately reduce the burden of animal testing.

      In conclusion, I have struggled to grasp the exceptional novelty of the new metric as presented, especially when considering that the badly needed future state must include a component of precision medicine.

      Safety pharmacology has a different aim to precision medicine. The former concerns the population whereas the latter concerns the individual. The novelty of our metric lies in reducing the complexity of multi-channel drug effects to a linear model that retains a biophysical interpretation.

      Reviewer #2 (Recommendations For The Authors):

      A large majority of drugs have more complex effects than a simple reduction and channel conductance. Some of these are included in the 109 drugs shown in Figure 7. An example is ranolazine, which is well known to have potent late sodium channel blocking effects - how are such effects included in the model as presented? I think at least suggesting how the approach can be expanded for broader applicability would be important to discuss.

      Our method does consider the simultaneous effect of the drug on multiple ion channels, specifically the L-type calcium current (ICaL), the delayed rectifier potassium currents (IKr and IKs), and the late sodium current (INaL). In the case of ranolazine (class 3 risk), the dose-responses for all four ion channels, based on IC50s published in Llopis-Lorente et al. are given in Supplementary Dataset S1.

      The response curves in Author response image 2 show that in this dataset, ranolazine blocks IKr and INaL almost equally - being only slightly less potent against IKr. There are two issues to consider here that potentially contribute to ranolazine being misclassified as pro-arrhythmic. First, the cell model is more sensitive to block of IKr than INaL. As a result, in the context of an equipotent drug, the prolonging effect of IKr block outweighs the balancing effect of INaL block, resulting in a pro-arrhythmic risk score. Second, the potency of IKr block in this dataset may be overestimated which in turn exaggerates the risk score. For example, measurements of ranolazine block of IKr from our own laboratory (Windley et al J Pharmacol Toxicol 87, 99–107, 2017) suggest that the IC50 of IKr is higher (35700 nM) than that reported in the LlopisLorente dataset (12000 nM). If this were taken into account, there would be less block of IKr relative to INaL, resulting in a safer risk score.

      Author response image 2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper examines the Bithorax complex in several butterfly species, in which the complex is contiguous and not split, as it is in the well-studied fruit fly Drosophila. Based on genetic screens and genetic manipulations of a boundary element involved in segment-specific regulation of Ubx, the authors provide solid evidence for their conclusions, which could be further strengthened by additional data and analyses. The data presented are relevant for those interested in the evolution and function of Hox genes and of gene regulation in general.

      We are deeply grateful to the eLife editorial team and the two reviewers for their thoughtful and constructive feedback. We have used this feedback to improve our manuscript and have provided a point-by-point response below.

      Public Reviews:

      Reviewer #1 (Public Review):

      In their article, "Cis-regulatory modes of Ultrabithorax inactivation in butterfly forewings," Tendolkar and colleagues explore Ubx regulation in butterflies. The authors investigated how Ubx expression is restricted to the hindwing in butterflies through a series of genomic analyses and genetic perturbations. The authors provide evidence that a Topologically Associated Domain (TAD) maintains a hindwing-enriched profile of chromatin around Ubx, largely through an apparent boundary element. CRISPR mutations of this boundary element led to ectopic Ubx expression in forewings, resulting in homeotic transformation in the wings. The authors also explore the results of the mutation in two non-coding RNA regions as well as a possible enhancer module. Each of these induces homeotic phenotypes. Finally, the authors describe a number of homeotic phenotypes in butterflies, which they relate to their work.

      Together, this was an interesting paper with compelling initial data. That said, I have several items that I feel would warrant further discussion, presentation, or data.

      First, I would not state, "Little is known about how Hox genes are regulated outside of flies." They should add "in insects" since so much in known in vertebrates

      Corrected

      For Figure 1, it would aid the readers if the authors could show the number of RNAseq reads across the locus. This would allow the readership to evaluate the frequency of the lncRNAs, splice variants, etc.

      We have found it useful in the past to feature “Sashimi Plots”, as they provide a good overview of transcript splicing junctions and read support. Here we could not accommodate this in our Fig. 1A as this would require compiling the RNAseq reads from many tissues and stages to be meaningful, and we would lose the resolution on forewing vs hindwing tissues that is important in this article (only the Kallima inachus dataset allows this comparison, and was used in Fig 1B). More specifically, the wing transcriptomes available for J. coenia and V. cardui are not deep enough to provide a good visualization of Antp alternative promoter usage or on AS5’ transcription.

      How common are boundary elements within introns? Typically, boundary elements are outside gene bodies, so this could be explored further. This seems like an interesting bit of biology which, following from the above point, it would be interesting to, at a minimum, discuss, but also relate to how transcription occurs through a possible boundary element (are there splice variants, for example?).

      We do not see evidence of alternative splicing, and prefer to avoid speculating on transcriptional effects, but we agree that the intragenicity of the TAD boundary is interesting. We briefly highlighted this point in the revised Discussion:

      "Lastly, it is worth noting that the Antp/Ubx TAD boundary we identified is intragenic, within the last intron of Ubx. It is unclear if this feature affects Ubx transcription, but this configuration might be analogue to the Notch locus in Drosophila, which includes a functional TAD boundary in an intronic position (Arzate-Mejía et al. 2020)."

      The CRISPR experiments led to compelling phenotypes. However, as a Drosophila biologist, I found it hard to interpret the data from mosaic experiments. For example, in control experiments, how often do butterflies die? Are there offsite effects? It's striking that single-guide RNAs led to such strong effects. Is this common outside of this system? Is it possible to explore the function effects at the boundary element - are these generating large deletions (for example, like Mazo-Vargas et al., 2022)? For the mosaic experiments, how frequent are these effects in nature or captive stocks? Would it be possible to resequence these types of effects? At the moment, this data, while compelling, was hard to put into the context of the experiments above without understanding how common the effects are. Ideally, there would be resequencing of these tissues, which could be targeted, but it was not clear to me the general rates of these variants.

      We agree with this assessment completely: mosaics complicate the proper interpretation of CRISPR based perturbation assays in regulatory regions. Here, unlike in Mazo-Vargas et al. (2022), we were unable to breed homeotic effects to a G1 generation, possibly because the phenotypes are dominant and lethal at the embryonic stage (see also our reply to Reviewer 2). This means that mosaic mutants are often survivors with clones of restricted size in the wing, and they are probably rare, but we are unable to meaningfully measure a mutation spectrum frequency (e.g. how often large deletions are generated). As mentioned in the first paragraph of our Discussion, we think that many of the phenotypes we observed (besides the Ubx GOF effects from the BE targeting) were confounded by alleles that could include large SVs. We aim to address these questions in an upcoming manuscript, at a locus where regulatory perturbation does not impact survival, including using germline mutants and unbiased genotyping (whole genome resequencing).

      We elaborated on this issue in our Discussion:

      "It is crucial here to highlight the limitations of the method, in order to derive proper insights about the functionality of the regulatory regions we tested. In essence, butterfly CRISPR experiments generate random mutations by non-homologous end joining repair, that are usually deletions (Connahs et al. 2019; Mazo-Vargas et al. 2022; Van Belleghem et al. 2023). Ideally, regulatory CRISPR-induced alleles require genotyping in a second (G1) generation to be properly matched to a phenotype (Mazo-Vargas et al. 2022). Possibly because of lethal effects, we failed to pass G0 mutations to a G1 generation for genotyping, and were thus limited here to mosaic analysis. As adult wings have lost scale building cells that may underlie a given phenotype, we circumvented this issue by genotyping a pupal forewing displaying an homeotic phenotype in the more efficient Antp-Ubx_BE perturbation experiment (Fig. S4). In this case, PCR amplification of a 600 bp fragment followed by Sanger sequencing recovered signatures of indel variants, with mixed chromatograms starting at the targeted sites. But in all other experiments (CRM11, IT1, and AS5’ targets), we did not genotype mutant tissues, as they were only detected in adult stages and generally with small clone sizes. Some of these clones may have been the results of large structural variants, as data from other organisms suggests that Cas9 nuclease targeting can generate larger than expected mutations that evade common genotyping techniques (Shin et al. 2017; Adikusuma et al. 2018; Kosicki et al. 2018; Cullot et al. 2019; Owens et al. 2019). Even under the assumption that such mutations are relatively rare in butterfly embryos, the fact we injected >100 embryos in each experiment makes their occurrence likely (Fig. 9), and we are unable to assign a specific genotype to the homeotic effects we obtained in CRM11, IT1 and AS5’ perturbation assays."

      Our revision also includes a new Fig. S4 that features the mosaic genotyping of a G0 Antp-Ubx_BE mutant tissue. While this does not fully address the reviewer questions, it provides reasonable validation that the frequent GOF effects we observed upon perturbation at this target site are generated by on-target indels from DNA repair.

      Author response image 1.

      Validation of CRISPR-induced DNA Lesions in an Antp-Ubx_BE crispant pupat forewing. (A-A') Pupal forewing cuticle phenotype of an Antp-Ubx_BE J. coenia crispant, as in Fig. S3. (B-B") Aspect of the same forewing under trans-illumination following dissection out of the pupal case. Regions from mutant clones have a more transparent appearance. (C). Sanger sequencing of an amplicon targeting the Antp-Ubx_BE region in the mutant tissue shown in panel B", compared to a control wing tissue, showing mixed chromatogram around the expected CRISPR cutting site due to indel mutations from non-homologous end-joining.

      In sum, I enjoyed the extensive mosaic perturbations. However, I feel that more molecular descriptions would elevate the work and make a larger impact on the field.

      Reviewer #2 (Public Review):

      Summary:

      The existence of hox gene complexes conserved in animals with bilateral symmetry and in which the genes are arranged along the chromosome in the same order as the structures they specify along the anteroposterior axis of organisms is one of the most spectacular discoveries of recent developmental biology. In brief, homeotic mutations lead to the transformation of a given body segment of the fly into a copy of the next adjacent segment. For the sake of understanding the main observation of this work, it is important to know that in loss-of-function (LOF) alleles, a given segment develops like a copy of the segment immediately anterior to it, and in gain-of-function mutations (GOF), the affected segment develops like a copy of the immediately posterior segment. Over the last 30 years the molecular lesions associated with GOF alleles led to a model where the sequential activation of the hox genes along the chromosome result from the sequential opening of chromosomal domains. Most of these GOF alleles turned out to be deletions of boundary elements (BE) that define the extent of the segment-specific regulatory domains. The fruit fly Drosophila is a highly specialized insect with a very rapid mode of segmentation. Furthermore, the hox clusters in this lineage have split. Given these specificities it is legitimate to question whether the regulatory landscape of the BX-C we know of in D.melanogaster is the result of very high specialization in this lineage, or whether it reflects a more ancestral organization. In this article, the authors address this question by analyzing the continuous hox cluster in butterflies. They focus on the intergenic region between the Antennapedia and the Ubx gene, where the split occurred in D.melanogaster. Hi-C and ATAC-seq data suggest the existence of a boundary element between 2 Topologically-Associated-Domain (TAD) which is also characterized by the presence of CTCF binding sites. Butterflies have 2 pairs of wings originating from T2 (forewing) specified by Antp and T3 specified by Ubx (hindwing). Remarkably, CRISPR mutational perturbation of this boundary leads to the hatching of butterflies with homeotic clones of cells with hindwings identities in the forewing (a posteriorly oriented homeotic transformation). In agreement with this phenotype, the authors observe ectopic expression of Ubx in these clones of cells. In other words, CRISPR mutagenesis of this BE region identified by molecular tool give rise to homeotic transformations directed towards more posterior segment as the boundary mutations that had been 1st identified on the basis of their posterior oriented homeotic transformation in Drosophila. None of the mutant clones they observed affect the hindwing, indicating that their scheme did not affect the nearby Ubx transcription unit. This is reassuring and important first evidence that some of the regulatory paradigms that have been proposed in fruit flies are also at work in the common ancestor to Drosophilae and Lepidoptera.

      Given the large size of the Ubx transcription unit and its associated regulatory regions it is not surprising that the authors have identified ncRNA that are conserved in 4 species of Nymphalinae butterflies, some of which also present in D.melanogaster. Attempts to target the promoters by CRISPR give rise to clones of cells in both forewings and hindwings, suggesting the generation of regulatory mutations associated with both LOF and GOF transformations. The presence of clones with dual homeosis suggests the targeting of Ubx activator and repression CRMs. Unfortunately, these experiments do not allow us to make further conclusions on the role of these ncRNA or in the identification of specific regulatory elements. To the opinion of this reviewer, some recent papers addressing the role that these ncRNA may play in boundary function should be taken with caution, and evidence that ncRNA(s) regulate boundaries in the BX-C in a WT context is still lacking.

      Strengths:

      The convincing GOF phenotype resulting from the targeting of the Antp-Ubx_BE.

      Weaknesses:

      The lack of comparisons with the equivalent phenotypes obtained in D.melanogaster with for example the Fub mutation.

      We are grateful for this excellent contextualization of our findings and have incorporated some of the historical elements into our revision, as detailed below.

      Reviewer #2 (Recommendations For The Authors):

      In the whole paper, the authors bring the notion of boundaries through the angle of the existence of TADs and ignore almost entirely to explain the characteristics of boundary mutation in the BX-C. To my knowledge examples where targeted boundary deletions between TADs result in misregulation of the neighboring genes, and/or a phenotype, are extremely sparse (especially in the context of the mouse hox genes). Given the extensive litterature describing the boundary mutations and their associated GOF phenotypes, the paper would certainly gain strength if the authors justify their approach through this wealth of information. I must admit that this referee is surprised by the absence of any references to the founding work of the Karch and Bender laboratories on this topic. As a matter of fact, one of the founding members of the boundary class of regulatory elements was already brought in 1993 with the Fab-7 and Mcp elements of the BX-C. Based on gain-of-function homeotic phenotypes, additional Fab boundaries were added to the list. Finally, in 2013, Bender and Lucas (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3606092/) identified the Fub boundary element that delimits the Ubx and abd-A domains in the BX-C. Fub fulfills the criterium of lying at the border of 2 neighboring TADs. Significantly, a deletion of Fub leads to a very penetrant and strong homeotic gain-of-function phenotype in which the flies hatch with a 1st abdominal segment transformed into the 2nd. In agreement with this, abd-A is expressed one parasegment too anterior in embryos. This is exactly the observation gathered from the targeted mutations in the Antp-Ubx_BE; a dominant transformation of anterior to posterior wing accompanied by an ectopic expression of Ubx in the forming primordia of the forwing where it is normally silenced. I believe the paper would gain credibility if the results were reported with the knowledge of the similarities with Fub.

      Line 53, I am not aware of the existence of TADs for each of the 9 regulatory domains. The insulators delimit the extent of the regulatory domains but certainly not of TADs.

      We thank the reviewer for these suggestions, as well as for the correction – we agree our previous text suggested that all BX-C boundaries are TAD boundaries, which was incorrect. We added a new introduction paragraph that combines classic literature on GOF mutations at boundary elements with recent evidence these are TAD insulators, including Fub (as suggested), and adding Fab-7 for breadth of scope.

      "For instance, the deletion of a small region situated between Ubx and abd-A produces the Front-ultraabdominal phenotype (Fub) where the first abdominal segment (A1) is transformed into a copy of the second abdominal segment A2, due to a gain-of-expression of abd-A in A1 where it is normally repressed (Bender and Lucas 2013). At the molecular level, the Fub boundary is enforced by insulating factors that separate Topologically Associating Domains (TADs) of open-chromatin, while also allowing interactions of Ubx and abd-A enhancers with their target promoters (Postika et al. 2018; Srinivasan and Mishra 2020). Likewise, the Fab-7 deletion, which removes a TAD boundary insulating abd-A and Abd–B (Moniot-Perron et al. 2023), transforms parasegment 11 into parasegment 12 due to an anterior gain-of-expression of Abd-B (Gyurkovics et al. 1990). By extrapolation, one may expect that if the Drosophila Hox locus was not dislocated into two complexes, Antp and Ubx 3D contact domains would be separated by a Boundary Element (BE), and that deletions similar with Fub and Fab-7 mutations would result in gain-of-function mutations of Ubx that could effectively transform T2 regions into T3 identities."

      A reference to the 1978 Nature article of Lewis should be added after line 42 of introduction.

      Added

      Line 56-57; the BX-C encoded miRNAs are known to regulate Ubx and abd-A, but not Abd-B.

      Corrected

      From lines 57 to 61, the authors mention reports aimed at demonstrating a role of ncRNA into Ubx regulation. To my eyes, these gathered evidences are rather weak. A reference to the work of Pease et al in Genetics in 2013 should be mentioned (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3832271/).

      Added. Our paragraph includes qualifier language about the functionality of the Ubx-related ncRNAs (“are thought to”, “appears to”), and updated references regarding bxd (Petruk et al. 2006; Ibragimov et al. 2023).

      Line 62 authors, should write "Little is known about how Hox genes are regulated outside of Drosophila" and not flies.

      Corrected

      Lines 110-112 could lncRNA:Ubx-IT1 correspond to PS4 antisense reported by Pease et al in 2023 (see URL above)? Lines 115-117, could lncRNA:UbxAS5' correspond to bxd antisense of Pease et al in 2023 (see above)?

      As we could not detect sequence similarities, we preferred to avoid drawing homology, and we intentionally avoided reference to the fly transcripts when we named IT1 and AS5’. This said, we agree it is important to clarify that further studies are needed to clarify this relationship. We elaborated on this point in our discussion:

      "Of note, a systematic in-situ survey (Pease et al. 2013) showed that Drosophila embryos express an antisense transcripts in its 5’ region (lncRNA:bxd), as well as within its first intron (lncRNA:PS4). It is thought that Drosophila bxd regulates Ubx, possibly by transcriptional interference or by facilitation of the Fub-1 boundary effect (Petruk et al. 2006; Ibragimov et al. 2023), while the possible regulatory roles of PS4 remain debated (Hermann et al. 2022). While these dipteran non-coding transcripts lack detectable sequence similarity with the lepidopteran IT1 and AS5’ transcripts, further comparative genomics analyses of the Ubx region across the holometabolan insect phylogeny should clarify the extent to which Hox cluster lncRNAs have been conserved or independently evolved."

      Lines 154-155: "This concordance between Hi-C profiling and CTCF motif prediction thus indicates that Antp-Ubx_BE region functions as an insulator between regulatory domains of Antp and Ubx ». This is only correlative, I would write "suggests" instead of "indicates" and add a "might function".

      Corrected as suggested.

      Line 254, I assume the authors wish to write Ubx-IT1 in V. cardui instead of Ubx-T1.

      Typo corrected

      Line 255 : Fig.5 is absent from the pdf file and replaced by table 1. I did not find a legend for Table 1.

      Corrected, with our sincere apologies for the loss of this image in our first submission.

      Line 293 "Individual with hindwing clones 2.75 times more common than...." "are" is missing?

      Corrected

      Lines 303-313, it is not entirely clear how many guide RNAs were injected. Would be useful to indicate the sites targeted in Fig.S8.

      We specify in the revised text : using a single guide RNA (Ubx11b9)

      Lines 323-337: it is not entirely clear to this referee (a drosophilist) if those spontaneous mutations can be inbred or whether these individuals are occasional mosaics. In general, did anyone try to derive lines from those mosaic animals? Is it possible to hit the germline at the syncitial stages at which the guides are injected? Are the individuals with wing phenotype fertile? Given the fact that the Antp-Ubx_BE mutations should be dominant, I wonder if this characteristic would not help in identifying germline transmission. Similar remark for the discussion where the authors explain at line 360, that genotyping can only be done in the progeny of the Go. I do not have the impression that the authors have performed this genotyping and if I am right, I do not understand why.

      We improved our discussion section on this topic (new text in orange):

      "It is crucial here to highlight the limitations of the method, in order to derive proper insights about the functionality of the regulatory regions we tested. In essence, butterfly CRISPR experiments generate random mutations by non-homologous end joining repair, that are usually deletions (Connahs et al. 2019; Mazo-Vargas et al. 2022; Van Belleghem et al. 2023). Ideally, regulatory CRISPR-induced alleles require genotyping in a second (G1) generation to be properly matched to a phenotype (Mazo-Vargas et al. 2022). Possibly because of lethal effects, we failed to pass G0 mutations to a G1 generation for genotyping, and were thus limited here to mosaic analysis. As adult wings have lost scale building cells that may underlie a given phenotype, we circumvented this issue by genotyping a pupal forewing displaying an homeotic phenotype in the more efficient Antp-Ubx_BE perturbation experiment (Fig. S4). In this case, PCR amplification of a 600 bp fragment followed by Sanger sequencing recovered signatures of indel variants, with mixed chromatograms starting at the targeted sites. But in all other experiments (CRM11, IT1, and AS5’ targets), we did not genotype mutant tissues, as they were only detected in adult stages and generally with small clone sizes. Some of these clones may have been the results of large structural variants, as data from other organisms suggests that Cas9 nuclease targeting can generate larger than expected mutations that evade common genotyping techniques (Shin et al. 2017; Adikusuma et al. 2018; Kosicki et al. 2018; Cullot et al. 2019; Owens et al. 2019). Even under the assumption that such mutations are relatively rare in butterfly embryos, the fact we injected >100 embryos in each experiment makes their occurrence likely (Fig. 9), and we are unable to assign a specific genotype to the homeotic effects we obtained in CRM11, IT1 and AS5’ perturbation assays."

      We agree that the work we conducted with mosaics has important caveats. So far, our attempts at breeding homeotic G0 mutants have not been fruitful at this locus, while less deleterious loci can yield viable alleles into further generations, such as WntA (published) and cortex (in prep.). We prefer to stay vague about negative data here, as it is difficult to disentangle if they were due to real mutational effects (e.g. the alleles can be dominant and lethal in the G1 generation) to failure to germline carriers of mutations as founders, or to health issues that are often amplified by inbreeding depression (including a possible iflavirus in our V. cardui cultures).

      We concur with the prediction that Antp-Ubx_BE mutations are probably dominant, and intend to follow up with similar GOF experiments in the Plodia pantry moth, a laboratory model for lepidopteran functional genomics that is more amenable than butterflies to inbreeding and long-term studies in mutant lines. In our experience (https://www.frontiersin.org/articles/10.3389/fevo.2021.643661/full), Ubx coding knock-out can be more extensive in Plodia than in butterflies, so we think these animals will also be more resilient to the deleterious effects of the GOF phenotype.

      Line 423, 425, I am not a fan of the term "de-insulating!!!!!

      We replaced this neologism by Similar deletion alleles resulting in a TAD fusion and misexpression effect (see below).

      Line 425, why bring the work on Notch while there are so many examples in the BX-C itself....

      Our revised sentence makes it more clear we are referring here to documented examples of deletion-mediated TAD fusion (ie. featuring a conformation capture assay such as HiC/micro-C):

      This suggests a possible loss of the TAD boundary in the crispant clones, resulting in a TAD fusion or in a long-range interaction between a T2-specific enhancer and Ubx promoter. Similar deletion alleles resulting in a TAD fusion and misexpression effect have been described at the Notch locus in Drosophila (Arzate-Mejía et al. 2020), in digit-patterning mutants in mice and humans (Lupiáñez et al. 2015; Anania et al. 2022), or at murine and fly Hox loci depleted of CTCF-mediated regulatory blocking (Narendra et al. 2015; Gambetta and Furlong 2018; Kyrchanova et al. 2020).

      Our revision also includes more emphasis on the Drosophila BX-C boundary elements Fub and Fab-7 (see above).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      This study investigates the hypoxia rescue mechanisms of neurons by non-neuronal cells in the brain from the perspective of exosomal communication between brain cells. Through multi-omics combined analysis, the authors revealed this phenomenon and logically validated this intercellular rescue mechanism under hypoxic conditions through experiments. The study proposed a novel finding that hemoglobin maintains mitochondrial function, expanding the conventional understanding of hemoglobin. This research is highly innovative, providing new insights for the treatment of hypoxic encephalopathy.

      Overall, the manuscript is well organized and written, however, there are some minor/major points that need to be revised before this manuscript is accepted.

      We thank the reviewer for the detailed analysis of our study. Please find our answers to the points raised by the reviewer below.

      Major points:

      (1) Hypoxia can induce endothelial cells to release exosomes carrying hemoglobin, however, how neurons are able to actively take up these exosomes? It is possible for other cells to take up these exosomes also? This point needs to be clarified in this study.

      We sincerely appreciate the reviewer’s valuable comments. Regarding the question of how neurons actively uptake extracellular vesicles (EVs) carrying hemoglobin mRNA, existing studies suggest that EVs can enter cells via three main pathways: direct fusion, receptor-mediated endocytosis, and phagocytosis (PMID: 25288114). Our experimental results show that neurons are able to actively uptake EVs from endothelial cells without any treatment, and hypoxic conditions did not significantly increase the uptake of endothelial EVs by neurons (Fig. 5A and I). As for the specific uptake mechanism, there is currently no definitive conclusion. Some studies have found that hypoxic-ischemic injury may induce neurons to upregulate Cav-1, which could enhance the uptake of endothelial-derived EVs via Cav-1-mediated endocytosis (PMID: 31740664), but this mechanism still requires further validation.

      Regarding whether other cell types also take up these EVs, we focused on neurons based on existing literature and our own data, which show that the increased hemoglobin in the brain under hypoxic conditions is primarily found in neurons (Fig. 4H-J, PMID: 19116637). Moreover, we observed that, under hypoxic conditions, almost all non-neuronal supporting cells in the brain transcribe hemoglobin in large amounts and release it via EVs (Fig. 3J). Furthermore, we would like to emphasize that although neurons do not transcribe hemoglobin, we observed substantial expression of hemoglobin within neurons. This suggests that it may serve as an important protective mechanism for the brain. Therefore, the focus of our study is on the protective effect of EVs carrying hemoglobin mRNA on neurons, and the uptake by other cell types was not explored. We greatly appreciate the reviewer’s question, and we believe this is an intriguing avenue for further investigation. This could provide new insights for interventions in hypoxic brain injury, and we plan to delve into this topic in future studies.

      (2) The expression of hemoglobin in neurons is important for mitochondrial homeostasis, but its relationship with mitochondrial homeostasis needs to be further elucidated in the study.

      We sincerely appreciate the reviewer’s valuable comments. We fully agree with the importance of hemoglobin expression in neurons for mitochondrial homeostasis. In this study, we have confirmed through in vitro experiments that when neurons are treated with conditioned medium from endothelial cells, they exhibit increased hemoglobin expression. This, in turn, enhances their resistance to hypoxia by restoring mitochondrial membrane potential and increasing mitochondrial numbers, thereby effectively improving neuronal viability. Notably, this protective effect disappears when EVs are removed from the endothelial-conditioned medium or when hemoglobin in endothelial cells is disrupted, further supporting the notion that endothelial cells transfer hemoglobin via EVs, helping neurons express hemoglobin under hypoxic conditions and exert protective effects.

      In summary, hemoglobin primarily helps maintain mitochondrial membrane potential, thereby supporting the restoration of energy metabolism and production under hypoxic conditions, which effectively improves the neuronal resistance to hypoxia. Although we were unable to explore the specific mechanisms of hemoglobin’s role in mitochondrial homeostasis in detail within this study, we recognize the importance of this aspect and plan to further investigate how hemoglobin regulates mitochondrial homeostasis and function in neurons in future research.

      Once again, we greatly appreciate the reviewer’s insightful comments. We will continue to optimize our research direction and look forward to further elucidating these important biological mechanisms in future studies.

      Minor points:

      (1) In Figures 1-3, the authors use "Endo" to represent endothelial cells, while in Figures 4-7, the abbreviation "EC" is used. Please standardize the format.

      Thank you for the reviewer’s suggestion. We will use “EC” consistently to refer to endothelial cells throughout the manuscript to ensure uniformity.

      (2) In all qPCR statistical results, please italicize the gene names on the axis.

      Thank you for the reviewer’s valuable suggestion. We will make sure to italicize the gene names on the axis in all qPCR statistical results to adhere to the formatting requirements.

      (3) In the Western blot result of Figure 3C, what type of cell-derived exosomes does the Control group represent, and why can it be used as a control group for brain-derived exosomes?

      Thank you for the reviewer’s insightful question. In Fig. 3C, the control group (Control) represents the cell lysate sample, which serves as a positive control in the EVs Western blot analysis. In this experiment, the positive control is primarily used to validate the specificity of the antibody and the accuracy of the experimental procedure. We used cell lysate as the control to confirm that the antibody can detect EV-associated markers in the cell lysates, thus providing a comparative basis for the identification of brain-derived EVs.

      (4) In Figure 4F, the morphology of hemoglobin in the Con group and the H28d group is not entirely consistent with Figure 4H. Is this difference due to different experimental batches?

      Thank you for the reviewer’s careful observation. The observed difference may indeed be due to variations between different experimental batches. To ensure consistency of the results, we have updated the representative immunofluorescence images, which are now presented in Fig. 4H.

      (5) Supplement the transcription and expression levels of hemoglobin in neurons under different treatment conditions after medium exchange with exosome removal and medium exchange after HBA1 interference.

      Thank you for the reviewer’s valuable suggestions. We have added the experimental data regarding the exchange of culture medium after the removal of EVs. As shown in Fig. S6, the endothelial-derived medium without EVs does not enhance the hemoglobin levels in neurons under hypoxic conditions. Additionally, we have included the detection results of hemoglobin expression in neurons after HBA1 interference, as shown in Fig. S7E-F. The results indicate that the culture medium derived from HBA1-interfered endothelial cells also fails to help neurons increase hemoglobin expression under hypoxic conditions.

      (6) Figure S3 should be split to separately explain the increased exosome release induced by hypoxia, the non-toxic effect of endothelial cell culture medium on neurons, and the successful screening of the HBA1 interference plasmid.

      Thank you for the reviewer’s suggestions. Based on your feedback, we have split the original Fig. S3 into multiple parts to more clearly present the different experimental results. Specifically, the results of hypoxia-induced EVs release increase have been updated in Fig. S4, the non-toxic effects of endothelial cell culture medium on neurons are shown in Fig. S5, and the successful screening of the HBA1 interference plasmid is presented in Fig. S7.

      (7) Regarding the extracellular vesicles/exosomes, it should be expressed consistently in the whole manuscript.

      Thank you for the reviewer’s reminder. We will ensure that the term “extracellular vesicles” is used consistently throughout the manuscript.

      (8) In lines 70 and 80, the O2 should be changed to "O<sub>2</sub>".

      Thank you for the reviewer’s careful observation. We have corrected the formatting of “O2” to “O₂” in lines 70 and 80.

      We would like to thank the Reviewer for taking the time to thoroughly examine our work, for their helpful feedback that has significantly contributed to improving our manuscript, and for their kind and encouraging words.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting study with a lot of data. Some of these ideas are intriguing. But a few major points require further consideration.

      We thank the reviewer for the detailed assessment of our study and pinpointing its current weaknesses. Please find our answers to all comments below.

      Major points:

      (1) What disease is this model of whole animal hypoxia supposed to mimic? If one is focused on the brain, can one just use a model of focal or global cerebral ischemia?

      Thank you for the reviewer’s insightful question. The chronic hypoxia model we employed is designed to mimic the multi-organ damage caused by systemic hypoxia, which is relevant to clinical conditions such as high-altitude hypoxia, chronic obstructive pulmonary disease, and acute hypoxic brain injury. In contrast to focal or global cerebral ischemia models, the focus of our study is on how the brain, under extreme systemic hypoxia, utilizes endothelial cell-derived extracellular vesicles (EVs) to transfer hemoglobin mRNA, thereby protecting neurons and aiding the brain’s response to hypoxia-induced damage.

      We understand the reviewer’s concern that focal or global ischemia models are typically used to simulate localized brain hypoxia or ischemic injury. However, the core of our research is to explore the brain’s overall adaptive mechanisms under systemic hypoxic conditions. By using a systemic hypoxia model, we can more comprehensively simulate the effects of global hypoxia on the brain and uncover how the brain engages specific molecular mechanisms for self-protection. This approach offers a novel perspective on brain hypoxic-ischemic diseases and holds potential clinical applications, particularly in the study of stroke, vascular cognitive impairment and dementia (VCID), and related conditions.

      Additionally, we have observed that hemoglobin significantly increases in the brain in an animal model of focal ischemia (as shown in Author response image 1 below). This finding further supports the idea that hemoglobin upregulation may be a universal protective mechanism for the brain’s response to hypoxic damage. While this part of the research is still ongoing, preliminary results suggest that both systemic hypoxia and focal ischemia might trigger protective effects through hemoglobin regulation.

      Author response image 1.

      The expression level of Hba-a1 in the brain of VCID mouse.

      Therefore, the core of our study is to elucidate the brain’s self-protection mechanisms under systemic hypoxia, rather than focusing solely on cerebral ischemia models. We believe this approach provides new insights into the prevention and treatment of brain hypoxic-ischemic diseases, with significant clinical application potential.

      In light of this, we have added a related discussion to the manuscript, clearly explaining the rationale for choosing the systemic hypoxia model. The updated content can be found on P11, Line 13-21 as follows: “To investigate this phenomenon, we employed a chronic hypoxia model in which mice were exposed to 7% oxygen for 28 days. This model aims to mimic systemic hypoxia-induced multi-organ damage, a condition observed in diseases such as high-altitude hypoxia, chronic obstructive pulmonary disease, and acute hypoxic brain injury. The primary goal of this model is to explore how the brain adapts under extreme low-oxygen conditions and employs specific mechanisms to protect itself from hypoxia-induced damage. This approach provides valuable insight into diseases related to hypoxic-ischemic injury in the brain, including stroke and vascular dementia, offering a novel perspective for potential clinical applications.”

      (2) If this model subjects the entire animal to hypoxia, then other organs will also be hypoxic. Should one also detect endothelial upregulation and release of extracellular vesicles containing hemoglobin mRNA in non-CNS organs? Where do these vesicles go? Into blood?

      Thank you for the reviewer’s valuable feedback. Indeed, in a whole-body hypoxia model, other organs are also affected by hypoxia. Therefore, future research may need to investigate the upregulation of endothelial cells in organs other than the central nervous system, as well as the release of EVs containing hemoglobin mRNA from these organs. However, in this study, we isolated EVs from the brain tissue in situ following perfusion with physiological saline, a method that effectively eliminates the influence of EVs from blood or other organs. As a result, our primary focus was on studying how EVs released by brain endothelial cells are actively taken up by neurons to exert neuroprotective effects. The potential for these EVs to enter the bloodstream and their subsequent fate is indeed a topic worthy of further investigation. Future research could offer new insights into the cross-organ effects of systemic hypoxia.

      (3) What other mRNA are contained in the vesicles released from brain endothelial cells?

      Thank you for the reviewer’s valuable suggestions. We have further analyzed EVs derived from brain endothelial cells, and in addition to hemoglobin mRNA, these EVs also contain a variety of other mRNAs, including Vwf, Hbb-bt, Hba-a1, Hbb-bs, Hba-a2, Acer2, Angpt2, Ldha, Gm42418, Slc16a1, Cxcl12, B2m, Ctla2a, Ccnd1, and Hmgcs2 (Log2FC > 1.2). The biological processes associated with these mRNAs primarily involve: cell-substrate adhesion, regulation of cellular amide metabolic process, negative regulation of cell migration, negative regulation of cell motility, and negative regulation of cellular component movement. These processes may be closely related to the neuroprotective effects of endothelial cell EVs in a hypoxic environment, especially in terms of regulating cell behavior and maintaining cell structure and function. Additionally, these EVs contain multiple key factors associated with intracellular metabolism, movement, and migration, which may collectively influence neuronal function and survival. Notably, our study also found that mRNA of various hemoglobin subunits ranks among the top five in terms of abundance in the mRNA secreted by hypoxic endothelial EVs, further emphasizing the importance of hemoglobin mRNA in endothelial-derived EVs. Therefore, future research may explore the functions of these mRNAs and reveal how they act in concert to protect neurons from hypoxia-induced damage.

      We have updated and added these results in Fig. S4, and have further elaborated on the findings in the revised figure. Once again, we thank the reviewer for the attention and valuable suggestions regarding our work.

      (4) Where do the endothelial vesicles go? Only to neurons? Or to other cells as well?

      Thank you for the reviewer’s important question. As previously mentioned, the focus of this study is to investigate how EVs carrying hemoglobin mRNA influence neuronal function. Through a combined analysis of single-cell transcriptomics and EV transcriptomics from brain tissue, we found that, besides neurons, almost all types of supportive cells in the brain and their secreted EVs contain a significant amount of hemoglobin mRNA (Fig. 3J, 4B). Notably, although neurons do not transcribe hemoglobin mRNA themselves, under hypoxic conditions, neurons significantly increase hemoglobin expression, resulting in a phenomenon where the transcription and expression levels of hemoglobin in neurons are inconsistent. This phenomenon has been observed both in our study and others (Fig. 4H-J, PMID: 19116637). This observation led us to focus on the active uptake of EVs by neurons and the potential neuroprotective effects they might bring.

      Regarding whether other cell types uptake these EVs and their potential functions, although our current research is focused on neurons, this is indeed an important area for further investigation. Given that non-neuronal supportive cells may also transfer hemoglobin mRNA via EVs under hypoxic conditions, future research will further explore the uptake of EVs by different cell types and their roles in hypoxic adaptation.

      We are particularly interested in the hemoglobin expression in neurons under hypoxic conditions and consider neurons to be the primary expressers of hemoglobin, providing a new perspective for exploring the neuroprotective role of hemoglobin. We plan to delve deeper into these issues in future studies.

      (5) Neurons can express endogenous hemoglobin. Is it useful to subject neurons to hypoxia and then see how much the endogenous mRNA goes up? How large is the magnitude of endogenous hemoglobin gene upregulation compared to the hypothesized exogenous mRNA that is supposed to be donated from endothelial vesicles?

      Thank you for the reviewer’s valuable question. We have observed that, in the absence of treatment with endothelial cell-derived conditioned medium, there is no significant change in the transcription and expression levels of endogenous hemoglobin in neurons under hypoxic conditions (Fig. 5I, 6C-D). However, when neurons were treated with endothelial cell-conditioned medium, under the same hypoxic conditions, the transcription levels of hemoglobin increased by approximately 1.2-fold, and the expression levels increased by approximately 3.8-fold (Fig. 6B-D). Additionally, we have added pre-treatment experiments involving EVs depletion from the endothelial culture medium and HBA interference. The results show that, after these two pre-treatments, the conditioned medium lost its ability to enhance the transcription and expression of hemoglobin in neurons under hypoxic conditions (Fig. S6, S7D-F), further emphasizing the important role of endothelial EVs in this process. This finding indicates that endothelial-derived EVs significantly promote hemoglobin expression in neurons, and this effect is far greater than the upregulation of endogenous hemoglobin in neurons. Therefore, while neurons can express endogenous hemoglobin, exogenous hemoglobin significantly enhances its expression, which may help neurons tolerate the hypoxic environment and provide additional protection.

      (6) Finally, it may be useful to provide more information and data to explain how the expression of this exogenous endothelial-derived hemoglobin binds to neuronal mitochondria to alter function.

      Thank you for the reviewer’s valuable suggestion. As we previously mentioned, hemoglobin plays a protective role in neurons by maintaining mitochondrial membrane potential, helping neurons restore energy metabolism and energy production under hypoxic conditions. We fully agree on the importance of this research direction. Several studies have shown that when hemoglobin is expressed in neurons, it predominantly localizes to mitochondria, which aligns with the physiological process of heme synthesis within mitochondria (PMID: 23187133). Furthermore, in the brains of Parkinson’s disease patients, the localization of hemoglobin in neuronal mitochondria is altered compared to normal conditions (PMID: 27181046). Therefore, the interaction between hemoglobin and mitochondria plays a crucial role in neuronal function.

      Although existing research indicates the role of hemoglobin in neuronal mitochondria, studies in this area remain limited. We plan to further investigate how hemoglobin binds to mitochondria and its specific effects on mitochondrial function in our future work. We believe that a deeper understanding of this mechanism will provide essential theoretical insights into the effects of hypoxia on neurons and offer new potential strategies for neuroprotective therapies.

      We would like to thank the Reviewer for taking the time to thoroughly examine our work, for their helpful feedback that has significantly contributed to improving our manuscript, and for their kind and encouraging words.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      (1) Some details are not described for experimental procedures. For example, what were the pharmacological drugs dissolved in, and what vehicle control was used in experiments? How long were pharmacological drugs added to cells?

      We apologise for the oversight. These details have now been added to the methods section of the manuscript as well as to the relevant figure legends.

      Briefly, latrunculin was used at a final concentration of 250 nM and Y27632 at a final concentration of 50 μM. Both drugs were dissolved in DMSO. The vehicle controls were effected with the highest final concentration of DMSO of the two drugs.

      The details of the drug treatments and their duration was added to the methods and to figures 6, S10, and S12.

      (2) Details are missing from the Methods section and Figure captions about the number of biological and technical replicates performed for experiments. Figure 1C states the data are from 12 beads on 7 cells. Are those same 12 beads used in Figure 2C? If so, that information is missing from the Figure 2C caption. Similarly, this information should be provided in every figure caption so the reader can assess the rigor of the experiments. Furthermore, how heterogenous would the bead displacements be across different cells? The low number of beads and cells assessed makes this information difficult to determine.

      We apologise for the oversight. We have now added this data to the relevant figure panels.

      To gain a further understanding of the heterogeneity of bead displacements across cells, we have replotted the relevant graphs using different colours to indicate different cells. This reveals that different cells appear to behave similarly and that the behaviour appears controlled by distance to the indentation or the pipette tip rather than cell identity.

      We agree with the reviewer that the number of cells examined is low. This is due to the challenging nature of the experiments that signifies that many attempts are necessary to obtain a successful measurement.

      The experiments in Fig 1C are a verification of a behaviour documented in a previous publication [1]. Here, we just confirm the same behaviour and therefore we decided that only a small number of cells was needed.

      The experiments in Fig 2C (that allow for a direct estimation of the cytoplasm’s hydraulic permeability) require formation of a tight seal between the glass micropipette and the cell, something known as a gigaseal in electrophysiology. The success rate of this first step is 10-30% of attempts for an experienced experimenter. The second step is forming a whole cell configuration, in which a hydraulic link is formed between the cell and the micropipette. This step has a success rate of ~ 50%. Whole cell links are very sensitive to any disturbance. After reaching the whole cell configuration, we applied relatively high pressures that occasionally resulted in loss of link between the cell and the micropipette. In summary, for the 12 successful measurements, hundreds of unsuccessful attempts were carried out.

      (3) The full equation for displacement vs. time for a poroelastic material is not provided. Scaling laws are shown, but the full equation derived from the stress response of an elastic solid and viscous fluid is not shown or described.

      We thank the reviewer for this comment. Based on our experiments, we found that the cytoplasm behaves as a poroelastic material. However, to understand the displacements of the cell surface in response to localised indentation, we show that we also need to take the tension of the submembranous cortex into account. In summary, the interplay between cell surface tension generated by the cortex and the poroelastic cytoplasm controls the cell behaviour. To our knowledge, no simple analytical solutions to this type problem exist.

      In Fig 1, we show that the response of the cell to local indentation is biphasic with a short time-scale displacement followed by a longer time-scale one. In Figs 2 and 3, we directly characterise the kinetics of cell surface displacement in response to microinjection of fluid. These kinetics are consistent with the long time-scale displacement but not the short time-scale one. Scaling considerations led us to propose that tension in the cortex may play a role in mediating the short time-scale displacement. To verify this hypothesis, we have now added new data showing that the length-scale of an indentation created by an AFM probe depends on tension in the cortex (Fig S5).  

      In a previous publication [2], we derived the temporal dynamics of cell surface displacement for a homogenous poroelastic material in response to a change in osmolarity. In the current manuscript, the composite nature of the cell (membrane, cortex, cytoplasm) needs to be taken into account as well as a realistic cell shape. Therefore, we did not attempt to provide an analytical solution for the displacement of the cell surface versus time in the current work. Instead, we turned to finite element modelling to show that our observations are qualitatively consistent with a cell that comprises a tensed submembranous actin cortex and a poroelastic cytoplasm (Fig 4). We have now added text to make this clearer for the reader.

      Reviewer #2 (Public review):

      Comments & Questions:

      The authors state, "Next, we sought to quantitatively understand how the global cellular response to local indentation might arise from cellular poroelasticity." However, the evidence presented in the following paragraph appears more qualitative than strictly quantitative. For instance, the length scale estimate of ~7 μm is only qualitatively consistent with the observed ~10 μm, and the timescale 𝜏𝑧 ≈ 500 ms is similarly described as "qualitatively consistent" with experimental observations. Strengthening this point would benefit from more direct evidence linking the short timescale to cell surface tension. Have you tried perturbing surface tension and examining its impact on this short-timescale relaxation by modulating acto-myosin contractility with Y-27632, depolymerizing actin with Latrunculin, or applying hypo/hyperosmotic shocks?

      Upon rereading our manuscript, we agree with the reviewer that some of our statements are too strong. We have now moderated these and clarified the goal of that section of the text.

      The reviewer asks if we have examined the effect of various perturbations on the short time-scale displacements. In our experimental conditions, we cannot precisely measure the time-scale of the fast relaxation because its duration is comparable to the frame rate of our image acquisition. However, we examined the amplitude of the displacement of the first phase in response to sucrose treatment and we have carried out new experiments in which we treat cells with 250nM Latrunculin to partially depolymerise cellular F-actin. Neither of these treatments had an impact on the amplitude of vertical displacements (Fig. S3).

      The absence of change in response to Latrunculin may be because the treatment decreases both the elasticity of the cytoplasm  and the cortical tension . As the length-scale  of the deformation of the surface scales as , the two effects of latrunculin treatment may therefore compensate one another and result in only small changes in . We have now added this data to supplementary information and comment on this in the text.   

      The reviewer’s comment also made us want to determine how cortical tension affects the length-scale of the cell surface deformation created by localised microindentation. To isolate the role of the cortex from that of cell shape, we decided to examine rounded mitotic cells. In our experiments, we indented a mitotic cell expressing a membrane targeted GFP with a sharp AFM tip (Fig. S5).

      In our experiments, we adjusted force to generate a 2μm depth indentation and we imaged the cell profile with confocal microscopy before and during indentation. Segmentation of this data allowed us to determine the cell surface displacement resulting from indentation and measure a length scale of deformation. In control conditions, the length scale created by deformation is on the order of 1.2μm. When we inhibited myosin contractility with blebbistatin, the length-scale of deformation decreased significantly to 0.8 μm, as expected if we decrease the surface tension γ without affecting the cytoplasmic elasticity. We have now added this data to our manuscript.

      The authors demonstrate that the second relaxation timescale increases (Figure 1, Panel D) following a hyperosmotic shock, consistent with cytoplasmic matrix shrinkage, increased friction, and consequently a longer relaxation timescale. While this result aligns with expectations, is a seven-fold increase in the relaxation timescale realistic based on quantitative estimates given the extent of volume loss?

      We thank the reviewer for this interesting question. Upon re-examining our data, we realised that the numerical values in the text related to the average rather than the median of our measurements. The median of the poroelastic time constant increases from ~0.4s in control conditions to 1.4s in sucrose, representing approximately a 3.5 fold increase.

      Previous work showed that HeLa cell volume decreases by ~40% in response to hyperosmotic shock [3]. The fluid volume fraction in cells is ~65-75%. If we assume that the water is contained in N pores of volume , we can express the cell volume as with the volume of the solid fraction. We can rewrite .

      With ∅ = 0.42  -0.6.  As  does not change in response to osmotic shock, we can rewrite the volume change to obtain the change in pore size .

      The poroelastic diffusion constant scales as and the poroelastic timescale scales as . Therefore, the measured change in volume leads to a predicted increase in poroelastic diffusion time of 1.7-1.9 fold, smaller than observed in our experiments. This suggests that some intuition can be gained in a straightforward manner assuming that the cytoplasm is a homogenous porous material.

      However, the reality is more complex and the hydraulic pore size is distinct from the entanglement length of the cytoskeleton mesh, as we discussed in a previous publication [4]. When the fluid fraction becomes sufficiently small, macromolecular crowding will impact diffusion further and non-linearities will arise. We have now added some of these considerations to the discussion.

      If the authors' hypothesis is correct, an essential physiological parameter for the cytoplasm could be the permeability k and how it is modulated by perturbations, such as volume loss or gain. Have you explored whether the data supports the expected square dependency of permeability on hydraulic pore size, as predicted by simple homogeneity assumptions?

      We thank the reviewer for this comment. As discussed above, we have explored such considerations in a previous publication (see discussion in [4]). Briefly, we find that the entanglement length of the F-actin cytoskeleton does play a role in controlling the hydraulic pore size but is distinct from it. Membrane bounded organelles could also contribute to setting the pore size. In our previous publication, we derived a scaling relationship that indicates that four different length-scales contribute to setting cellular rheology: the average filament bundle length, the size distribution of particles in the cytosol, the entanglement length of the cytoskeleton, and the hydraulic pore size. Many of these length-scales can be dynamically controlled by the cell, which gives rise to complex rheology. We have now added these considerations to our discussion.

      Additionally, do you think that the observed decrease in k in mitotic cells compared to interphase cells is significant? I would have expected the opposite naively as mitotic cells tend to swell by 10-20 percent due to the mitotic overshoot at mitotic entry (see Son Journal of Cell Biology 2015 or Zlotek Journal of Cell Biology 2015).

      We thank the reviewer for this interesting question. Based on the same scaling arguments as above, we would expect that a 10-20% increase in cell volume would give rise to 10-20% increase in diffusion constant. However, we also note that metaphase leads to a dramatic reorganisation of the cell interior and in particular membrane-bounded organelles. In summary, we do not know why such a decrease could take place. We now highlight this as an interesting question for further research.

      Based on your results, can you estimate the pore size of the poroelastic cytoplasmic matrix? Is this estimate realistic? I wonder whether this pore size might define a threshold above which the diffusion of freely diffusing species is significantly reduced. Is your estimate consistent with nanobead diffusion experiments reported in the literature? Do you have any insights into the polymer structures that define this pore size? For example, have you investigated whether depolymerizing actin or other cytoskeletal components significantly alters the relaxation timescale?

      We thank the reviewer for this comment. We cannot directly estimate the hydraulic pore size from the measurements performed in the manuscript. Indeed, while we understand the general scaling laws, the prefactors of such relationships are unknown.

      We carried out experiments aiming at estimating the hydraulic pore size in previous publications [3,4] and others have shown spatial heterogeneity of the cytoplasmic pore size [5]. In our previous experiments, we examined the diffusion of PEGylated quantum dots (14nm in hydrodynamic radius). In isosmotic conditions, these diffused freely through the cell but when the cell volume was decreased by a hyperosmotic shock, they no longer moved [3,4]. This gave an estimate of the pore radius of ~15nm.

      Previous work has suggested that F-actin plays a role in dictating this pore size but microtubules and intermediate filaments do not [4].

      There are no quantifications in Figure 6, nor is there a direct comparison with the model. Based on your model, would you expect the velocity of bleb growth to vary depending on the distance of the bleb from the pipette due to the local depressurization? Specifically, do blebs closer to the pipette grow more slowly?

      We apologise for the oversight. The quantifications are presented in Fig S10 and Fig S12. We have now modified the figure legends accordingly.

      Blebs are very heterogenous in size and growth velocity within a cell and across cells in the population in normal conditions [6]. Other work has shown that bleb size is controlled by a competition between pressure driving growth and actin polymerisation arresting it[7]. Therefore, we did not attempt to determine the impact of depressurisation on bleb growth velocity or size.

      In experiments in which we suddenly increased pressure in blebbing cells, we did notice a change in the rate of growth of blebs that occurred after we increased pressure (Author response image 1). However, the experiments are technically challenging and we decided not to perform more.

      Author response image 1.

      A. A hydraulic link is established between a blebbing cell and a pipette. At time t>0, a step increase in pressure is applied. B. Kymograph of bleb growth in a control cell (top) an in a cell subjected to a pressure increase at t=0s (bottom). Top: In control blebs, the rate of growth is slow and approximately constant over time. The black arrow shows the start of blebbing. Bottom: The black arrow shows the start of blebbing. The dashed line shows the timing of pressure application and the red arrow shows the increase in growth rate of the bleb when the pressure increase reaches the bleb. This occurs with a delay δt.

      I find it interesting that during depressurization of the interphase cells, there is no observed volume change, whereas in pressurization of metaphase cells, there is a volume increase. I assume this might be a matter of timescale, as the microinjection experiments occur on short timescales, not allowing sufficient time for water to escape the cell. Do you observe the radius of the metaphase cells decreasing later on? This relaxation could potentially be used to characterize the permeability of the cell surface.

      We thank the reviewer for this comment.

      First, we would like to clarify that both metaphase and interphase cells increase their volume in response to microinjection. The effect is easier to quantify in metaphase cells because we assume spherical symmetry and just monitor the evolution of the radius (Fig 3). However, the displacement of the beads in interphase cells (Fig 2) clearly shows that the cell volume increases in response to microinjection. For both interphase and metaphase cells, when the injection is prolonged, the membrane eventually detaches from the cortex and large blebs form until cell lysis. In contrast to the reviewer’s intuition, we never observe a relaxation in cell volume, probably because we inject fluid faster than the cell can compensate volume change through regulatory mechanisms involving ion channels.

      When we depressurise metaphase cells, we do not observe any change in volume (Fig S10). This contrasts with the increase that we observe upon pressurisation. The main difference between these two experiments is the pressure differential. During depressurisation experiments, this is the hydraulic pressure within the cell ~500Pa (Fig 6A); whereas during pressurisation experiments, this is the pressure in the micropipette, ranging from 1.4-10 kPa (Fig 3). We note in particular that, when we used the lowest pressures in our experiments, the increase in volume was very slow (see Fig 3C). Therefore, we agree with the reviewer that it is likely the magnitude of the pressure differential that explains these differences.

      I am curious about the saturation of the time lag at 30 microns from the pipette in Figure 4, Panel E for the model's prediction. A saturation which is not clearly observed in the experimental data. Could you comment on the origin of this saturation and the observed discrepancy with the experiments (Figure E panel 2)? Naively, I would have expected the time lag to scale quadratically with the distance from the pipette, as predicted by a poroelastic model and the diffusion of displacement. It seems weird to me that the beads start to move together at some distance from the pipette or else I would expect that they just stop moving. What model parameters influence this saturation? Does membrane permeability contribute to this saturation?

      We thank the reviewer for pointing this out. In our opinion, the saturation occurring at 30 microns arises from the geometry of the model. At the largest distance away from the micropipette, the cortex becomes dominant in the mechanical response of the cell because it represents an increasing proportion of the cellular material.

      To test this hypothesis, we will rerun our finite element models with a range of cell sizes. This will be added to the manuscript at a later date.

      Reviewer #3 (Public review):

      Weaknesses: I have two broad critical comments:

      (1) I sense that the authors are correct that the best explanation of their results is the passive poroelastic model. Yet, to be thorough, they have to try to explain the experiments with other models and show why their explanation is parsimonious. For example, one potential explanation could be some mechanosensitive mechanism that does not involve cytoplasmic flow; another could be viscoelastic cytoskeletal mesh, again not involving poroelasticity. I can imagine more possibilities. Basically, be more thorough in the critical evaluation of your results. Besides, discuss the potential effect of significant heterogeneity of the cell.

      We thank the reviewer for these comments and we agree with their general premise.

      Some observations could qualitatively be explained in other ways. For example, if we considered the cell as a viscoelastic material, we could define a time constant with η the viscosity and E the elasticity of the material. The increase in relaxation time with sucrose treatment could then be explained by an increase in viscosity. However, work by others has  previously shown that, in the exact same conditions as our experiment, viscoelasticity cannot account for the observations[1]. In its discussion, this study proposed poroelasticity as an alternative mechanism but did not investigate that possibility. This was consistent with our work that showed that the cytoplasm behaves as a poroelastic material and not as a viscoelastic material [4]. Therefore, we decided not to consider viscoelasticity as possibility. We now explain this reasoning better and have added a sentence about a potential role for mechanotransductory processes in the discussion.

      (2) The study is rich in biophysics but a bit light on chemical/genetic perturbations. It could be good to use low levels of chemical inhibitors for, for example, Arp2/3, PI3K, myosin etc, and see the effect and try to interpret it. Another interesting question - how adhesive strength affects the results. A different interesting avenue - one can perturb aquaporins. Etc. At least one perturbation experiment would be good.

      We agree with the reviewer. In our previous studies, we already examined what biological structures affect the poroelastic properties of cells [2,4]. Therefore, the most interesting aspect to examine in our current work would be perturbations to the phenomenon described in Fig 6G and, in particular, to investigate what volume regulation mechanisms enable sustained intracellular pressure gradients. However, these experiments are particularly challenging and with very low throughput. Therefore, we feel that these are out of the scope of the present report and we mention these as promising future directions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Please add more information to Materials and methods and figure captions to more clearly share how many different cells and trials the data are coming from.

      This has been done.

      Please add the full equation for displacement vs. time for the poroelastic model and describe appropriately.

      This cannot be done but we explain why.

      Overall, the clarity of the writing in the manuscript could be improved.

      This has been done.

      Please increase text size in some of the figures.

      This has been done.

      Reviewer #2 (Recommendations for the authors):<br /> Figure 1 would benefit from some revisions for clarity. In Panel D, for the control experiment with 7 cells, why are only 3 data points shown?

      This was due to the use of excel for generating the box plot. Some data points overlap. We now have used a different software.

      In Panel E, there is no legend explaining the red dots in the whisker plots.

      This has now been added.

      Additionally, the inset in Panel D lacks a legend, and it is unclear how k was computed.

      This inset panel has been removed.

      Moreover, I find Figure 1, Panel C somewhat pixelated, which makes it challenging to interpret. As I am colorblind, I need to zoom in significantly to distinguish the colors, and the current resolution makes this difficult. Improving the image resolution would be helpful.

      Apologies for this. We have now verified the quality of images on our submission.  

      I am unsure about the method used to compute the relaxation timescale in Figure S2. If an exponential relaxation is assumed, I would expect a function of the form:

      which implies that for t=t1+tau_p, the result should be d1+0.6*Delta d which does not correspond to the formula given. Have you tried fitting the data with an exponential function or using the model to extract tau_p without assuming a specific functional form?

      We thank the reviewer for pointing this out. We have now added further explanation of the fitting to the figure legend.

      References:

      (1) Rosenbluth, M. J., Crow, A., Shaevitz, J. W. & Fletcher, D. A. Slow stress propagation in adherent cells. Biophys J 95, 6052-6059 (2008). https://doi.org/10.1529/biophysj.108.139139

      (2) Esteki, M. H. et al. Poroelastic osmoregulation of living cell volume. iScience 24, 103482 (2021). https://doi.org/10.1016/j.isci.2021.103482

      (3) Charras, G. T., Mitchison, T. J. & Mahadevan, L. Animal cell hydraulics. J Cell Sci 122, 3233-3241 (2009). https://doi.org/10.1242/jcs.049262

      (4) Moeendarbary, E. et al. The cytoplasm of living cells behaves as a poroelastic material. Nat Mater 12, 253-261 (2013). https://doi.org/10.1038/nmat3517

      (5) Luby-Phelps, K., Castle, P. E., Taylor, D. L. & Lanni, F. Hindered diffusion of inert tracer particles in the cytoplasm of mouse 3T3 cells. Proc Natl Acad Sci U S A 84, 4910-4913 (1987). https://doi.org/10.1073/pnas.84.14.4910

      (6) Charras, G. T., Coughlin, M., Mitchison, T. J. & Mahadevan, L. Life and times of a cellular bleb. Biophys J 94, 1836-1853 (2008). https://doi.org/10.1529/biophysj.107.113605

      (7) Tinevez, J. Y. et al. Role of cortical tension in bleb growth. Proc Natl Acad Sci U S A 106, 18581-18586 (2009). https://doi.org/10.1073/pnas.0903353106

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1:

      Summary:

      Identifying drugs that target specific disease phenotypes remains a persistent challenge. Many current methods are only applicable to well-characterized small molecules, such as those with known structures. In contrast, methods based on transcriptional responses offer broader applicability because they do not require prior information about small molecules. Additionally, they can be rapidly applied to new small molecules. One of the most promising strategies involves the use of “drug response signatures”-specific sets of genes whose differential expression can serve as markers for the response to a small molecule. By comparing drug response signatures with expression profiles characteristic of a disease, it is possible to identify drugs that modulate the disease profile, indicating a potential therapeutic connection.

      This study aims to prioritize potential drug candidates and to forecast novel drug combinations that may be effective in treating triple-negative breast cancer (TNBC). Large consortia, such as the LINCS-L1000 project, offer transcriptional signatures across various time points after exposing numerous cell lines to hundreds of compounds at different concentrations. While this data is highly valuable, its direct applicability to pathophysiological contexts is constrained by the challenges in extracting consistent drug response profiles from these extensive datasets. The authors use their method to create drug response profiles for three different TNBC cell lines from LINCS.

      To create a more precise, cancer-specific disease profile, the authors highlight the use of single-cell RNA sequencing (scRNA-seq) data. They focus on TNBC epithelial cells collected from 26 diseased individuals compared to epithelial cells collected from 10 healthy volunteers. The authors are further leveraging drug response data to develop inhibitor combinations.

      Strengths:

      The authors of this study contribute to an ongoing effort to develop automated, robust approaches that leverage gene expression similarities across various cell lines and different treatment regimens, aiming to predict drug response signatures more accurately. The authors are trying to address the gap that remains in computational methods for inferring drug responses at the cell subpopulation level.

      Weaknesses:

      One weakness is that the authors do not compare their method to previous studies. The authors develop a drug response profile by summarizing the time points, concentrations, and cell lines. The computational challenge of creating a single gene list that represents the transcriptional response to a drug across different cell lines and treatment protocols has been previously addressed. The Prototype Ranked List (PRL) procedure, developed by Iorio and co-authors (PNAS, 2010, doi:10.1073/pnas.1000138107), uses a hierarchical majority-voting scheme to rank genes. This method generates a list of genes that are consistently overexpressed or downregulated across individual conditions, which then hold top positions in the PRL. The PRL methodology was used by Aissa and co-authors (Nature Comm 2021, doi:10.1038/s41467-021-21884-z) to analyze drug effects on selective cell populations using scRNA-seq datasets. They combined PRL with Gene Set Enrichment Analysis (GSEA), a method that compares a ranked list of genes like PRL against a specific set of genes of interest. GSEA calculates a Normalized Enrichment Score (NES), which indicates how well the genes of interest are represented among the top genes in the PRL. Compared to the method described in the current manuscript, the PRL method allows for the identification of both upregulated and downregulated transcriptional signatures relevant to the drug’s effects. It also gives equal weight to each cell line’s contribution to the drug’s overall response signature.

      The authors performed experimental validation of the top two identified drugs; however, the effect was modest. In addition, the effect on TNBC cell lines was cell-line specific as the identified drugs were effective against BT20, whose transcriptional signatures from LINCS were used for drug identification, but not against the other two cell lines analyzed. An incorrect choice of genes for the signature may result in capturing similarities tied to experimental conditions (e.g., the same cell line) rather than the drug’s actual effects. This reflects the challenges faced by drug response signature methods in both selecting the appropriate subset of genes that make up the signature and managing the multiple expression profiles generated by treating different cell lines with the same drug.

      We appreciate the reviewer’s thoughtful feedback and their suggestion to refer to the Prototype Ranked List (PRL) manuscript. Unfortunately, since this methodology for the PRL isn’t implemented in an open-source package, direct comparison with our approach is challenging. Nonetheless, we investigated whether using ranks would yield similar results for the most likely active drug pairs identified by retriever. To do this, we calculated and compared the rankings of the average effect sizes provided by retriever. Although the Spearman (ρ \= 0.98) correlation coefficient was high, we observed that key genes are disadvantaged when using ranks compared to effect sizes. This difference is particularly evident in the gene set enrichment analysis, where using average ranks identified only one pathway as statistically significantly enriched. The code to replicate these analyses is available at https://github.com/dosorio/L1000-TNBC/blob/main/Code/.

      Author response image 1.

      Given the similarity in purpose between retriever and the PRL approach, we have added the following statement to the introduction: “Previously, this goal was approached using a majority-voting scheme to rank genes across various cell types, concentrations, and time points. This approach generates a prototype ranked list (PRL) that represents the consistent ranks of genes across several cell lines in response to a specific drug.”

      Regarding the experimental validation, we believe there is a misunderstanding about the evidence we provided. We would like to claridy that we used three different TNBC cell lines: CAL120, BT20, and DU4475. It’s important to note that CAL120 and DU4475 were not included in the signature generation process. Despite this, we observed effects that exceeded the additive effects expectations, particularly in the CAL120 cell line (Figure 5, Panel F).

      Reviewer 2:

      Summary:

      In their study, Osorio and colleagues present ‘retriever,’ an innovative computational tool designed to extract disease-specific transcriptional drug response profiles from the LINCS-L1000 project. This tool has been effectively applied to TNBC, leveraging single-cell RNA sequencing data to predict drug combinations that may effectively target the disease. The public review highlights the significant integration of extensive pharmacological data with high-resolution transcriptomic information, which enhances the potential for personalized therapeutic applications.

      Strengths:

      A key finding of the study is the prediction and validation of the drug combination QL-XII-47 and GSK-690693 for the treatment of TNBC. The methodology employed is robust, with a clear pathway from data analysis to experimental confirmation.

      Weaknesses:

      However, several issues need to be addressed. The predictive accuracy of ’retriever’ is contingent upon the quality and comprehensiveness of the LINCS-L1000 and single-cell datasets utilized, which is an important caveat as these datasets may not fully capture the heterogeneity of patient responses to treatment. While the in vitro validation of the drug combinations is promising, further in vivo studies and clinical trials are necessary to establish their efficacy and safety. The applicability of these findings to other cancer types also warrants additional investigation. Expanding the application of ’retriever’ to a broader range of cancer types and integrating it with clinical data will be crucial for realizing its potential in personalized medicine. Furthermore, as the study primarily focuses on kinase inhibitors, it remains to be seen how well these findings translate to other drug classes.

      We thank the reviewer for their thoughtful and constructive feedback. We appreciate your insights and agree that several important considerations need to be addressed.

      We recognize that the predictive accuracy of retriever depends on the LINCS-L1000 and single-cell datasets. These resources may not fully represent the complete range of transcriptional responses to disease and treatment across different patients. As you mentioned, this is an important limitation. However, we believe that by extrapolating the evaluation of the most likely active compound to each individual patient, we can help address this issue. This approach will provide valuable insights into which patients in the study are most likely to respond positively to treatment.

      On the in-vitro validation of drug combinations, we agree that while promising, these results are not sufficient on their own to establish clinical efficacy. Additional in-vivo studies will be essential in assessing the therapeutic potential and safety of these combinations, and clinical trials will be an important next step to validate the translational impact of our findings.

      Lastly, we appreciate the reviewer’s comment about the focus of our study on kinase inhibitors. This result was unexpected, as we tested the full set of compounds from the LINCS-L1000 project. We agree that exploring other top candidates, including different drug classes, will be important for assessing how broadly retriever approach can be applied.

      Reviewing Editor:

      I appreciate the interesting and potentially impactful nature of your research; the reviewers have some concerns that I believe need to be addressed. While your research addresses an important and timely topic in cancer treatment, the current manuscript does not provide a substantial advance in its present form.

      The significance of your findings is substantial, as you present a novel computational tool, ’retriever,’ which has the potential to revolutionize personalized cancer treatment strategies by predicting effective drug combinations for triple-negative breast cancer (TNBC). The integration of single-cell RNA-seq data with the LINCS-L1000 project’s transcriptional profiles is a powerful approach that could lead to more targeted and effective therapies. However, the manuscript would benefit from a more explicit discussion of how your work advances the field beyond current methodologies, particularly in the context of drug repurposing and combinatorial therapy.

      The strength of the evidence presented is robust, as evidenced by the systematic testing of 152 drug response profiles and 11,476 drug combinations. The identification of QL-XII-47 and GSK-690693 as promising treatment candidates for TNBC is a significant outcome that warrants further exploration. To enhance the manuscript, it would be beneficial to include a more detailed analysis of the biological pathways and mechanisms of action associated with these drugs, as well as a broader experimental validation beyond the cell lines tested.

      Taken together, I encourage you to address the issues raised and consider resubmitting a revised version of your work.

      Following the suggestions of the reviewing editor, we have included a more detailed discussion on how retriever advances the field, especially in the context of drug repurposing and combinatorial therapy in precision medicine, going beyond current methodologies.

      We agree with the suggestion of the editor to offer a more detailed analysis of the biological pathways and mechanisms of action related to these drugs. Consequently, we have expanded our evaluation of these mechanisms. We utilized the Biological Process Gene Ontology to identify changes associated with the mechanisms of each compound individually, as well as the proposed drug combination. Our findings reveal that the statistically significant processes are closely related to cancer deregulation, cross-validating our previous report using the Cancer Hallmarks.

      Author response image 2.

      Recommendations for the authors:

      Reviewer 1:

      (1) The LINCS-L1000 project is introduced in the manuscript as a resource for published transcriptional profiles of several cell lines. Since the original citation, it has been expanded into a vast resource, and the description probably needs to reflect the recent version of LINCS.

      We agree with the reviewer that the LINCS-L1000 project is introduced in the manuscript as a resource for transcriptional profiles of several cell lines. Since the original citation, the project has grown into a much larger resource.

      To reflect this, we have added a 2022 citation that summarizes efforts to link omics signatures with biological mechanisms using iLINCS: Pilarczyk, Marcin, et al. ”Connecting omics signatures and revealing biological mechanisms with iLINCS.” Nature communications 13.1 (2022): 4678.

      Reviewer 2:

      (1) It would be beneficial for the manuscript if the authors could expand on the potential limitations inherentto the ’retriever’ tool. This discussion could insightfully address how the foundational assumptions of the analysis may influence the predictive accuracy and the extent to which dataset quality could affect the reliability of the outcomes.

      We agree with the reviewer that expanding on the limitations of retriever would help raise awareness of the underlying assumptions in the analysis and how they affect the predictive accuracy and reliability of the results.

      The following paragraph was added to the Discussion section: “Although retriever represents a significant advancement in extracting disease-specific drug response profiles from the LINCS-L1000 dataset. Several limitations must be considered when interpreting its results. One key limitation is the restricted scope of gene expression data in the LINCS-L1000 project, which includes expression profiles for only 1,000 genes. While this gene set provides valuable insights into broad transcriptional changes, it may not fully capture the complexity of cellular responses to drug treatments. A possible solution to this limitation relies on imputation techniques to address the missing quantification in the gene expression matrix. The accuracy of the imputed values is dependent on the quality of the imputation model and the completeness of the available data. Consequently, there is an inherent risk that the imputed values may not accurately represent the true and complete underlying biological response.”

      (2) Enhancing the manuscript with a more detailed exploration of the clinical ramifications of the study’s findings would be valuable. The authors might consider including how these predictions could be strategically incorporated into the design of clinical trials, thereby bridging the gap between computational predictions and clinical application.

      We appreciate the opportunity provided by the reviewer to expand on the potential of retriever for the design of clinical trials and clinical application.

      The following paragraph was added to the discussion section: “Finally, we have shown that the approach implemented in retriever method can predict effective drug combinations for patients with triplenegative breast cancer (TNBC), but its potential goes beyond that. It can also be applied to single-cell RNA sequencing data from individual tumors and other diseases for which a the single-cell transcriptomic profile of a normal control population is available. In line with this, the LINCS project has released datasets for iPSC-derived cardiomyocytes and motor neurons, opening up new possibilities for precision medicine not only in cancer but also in a variety of other diseases. By predicting the most effective drug and combination treatments for each patient, clinical trials can be designed to target the right populations with the responsive transcriptional phenotype, leading to more successful outcomes.”

      (3) It would be insightful if the authors could discuss the potential for drug resistance in the context of thedrug combinations identified by ’retriever’. An analysis of this phenomenon could provide critical insights into the longevity and effectiveness of the proposed treatment strategies.

      We agree with the reviewer that the potential for drug resistance is a critical consideration when evaluating any therapeutic strategy in cancer, especially when using drug combinations. While the current study focuses on identifying effective drug pairings using ‘retriever’, we recognize that the emergence of resistance could limit their long-term utility. We have addressed the topic within the introduction: “Nonetheless, monotherapy in cancer is highly susceptible to the development of resistance following an initial response to treatment. Combination therapy, or the simultaneous administration of multiple drugs to treat a disease, has evolved into the standard pharmacological regimen for treating complex diseases such as cancer. Combination therapy prevent tumor evolution and help inhibit the development of drug resistance in cancer, thereby improving patient survival.”

      (4) Providing details regarding the computational resources necessary for the implementation of ’retriever’,along with any limitations associated with these requirements, could greatly enhance the transparency and reproducibility of the methodology. Such information would be instrumental for other researchers seeking to apply this tool in their own work.

      The following paragraph was added to the data availability section of the manuscript: “The retriever package is available from the Kuijjer Lab repository https://github.com/kuijjerlab/retriever or from the CRAN repositories at https://cran.r-project.org/package=retriever, and it is implemented as an R multiplatform package that can run on standard laptops or desktops with around 16 GB of RAM, making it accessible for most users. It is designed to work on Windows, macOS, and Linux. While the package can function with modest hardware, performance may vary based on dataset size and complexity. For larger datasets, systems with more RAM or cloud-based resources may improve efficiency.”

      (5) A thoughtful discussion on the ethical considerations surrounding the use of patient-derived data in thedevelopment and validation of ’retriever’ would round out the manuscript. Addressing issues of data privacy and the ethical use of such data could set a precedent for responsible research practices in the field of computational biology and personalized medicine.

      We agree with the reviewer on the need of discussing the ethical considerations surrounding the use of patient-derived data in the validation, development and re-purposing of drugs for disease treatment.

      The following paragraph was added to the discussion section: “We want to highlight the important ethical considerations involved in using patient-derived data for drug development and repurposing, particularly around data privacy, informed consent, and the reliability of predictive models. To protect patient privacy, it is crucial to adhere to data protection laws, such as HIPAA and GDPR, and to rigorously de-identify data to minimize the risk of re-identification. Additionally, datasets must be diverse and representative to prevent bias, ensuring that predictive models are applicable to a broad population. Computational models should undergo extensive validation before being used in clinical settings to ensure their accuracy and transparency. Ethical protocols for data sharing must also be established to respect patient autonomy and control over their data. Furthermore, continuous monitoring and validation of drug predictions are necessary to ensure treatment safety, effectiveness, and fairness.”

    1. Author Response

      We appreciate your consideration of our manuscript entitled “Deciphering molecular heterogeneity and dynamics of neural stem cells in human hippocampal development, aging, and injury” (eLife-RP-RA-2023-89507). We thank all the reviewers for their valuable and thoughtful comments and suggestions. We have carefully considered all the comments and revised our manuscript (eLife-VOR-RA2023-89507) accordingly. You can find our point-by-point responses here. In the revised manuscript, we have addressed most of the issues and concerns raised by the reviewers. We hope that the changes will better illustrate the quality of our sn-RNA data and the criteria of the cell type identification. However, due to the scarcity of stroke and neonatal human brain samples, we cannot strengthen our findings and conclusions by increasing this type of hippocampal tissue for analysis within the expected timeframe. With these improvements and limitations, we would like to ask whether we could get a better judgment from the reviewers.

      Reviewer #1 (Public Review):

      In this manuscript, Yao et al. explored the transcriptomic characteristics of neural stem cells (NSCs) in the human hippocampus and their changes under different conditions using single-nucleus RNA sequencing (snRNA-seq). They generated single-nucleus transcriptomic profiles of human hippocampal cells from neonatal, adult, and aging individuals, as well as from stroke patients. They focused on the cell groups related to neurogenesis, such as neural stem cells and their progeny. They revealed genes enriched in different NSC states and performed trajectory analysis to trace the transitions among NSC states and towards astroglia and neuronal lineages in silico. They also examined how NSCs are affected by aging and injury using their datasets and found differences in NSC numbers and gene expression patterns across age groups and injury conditions. One major issue of the manuscript is questionable cell type identification. For example, in Figure 2C, more than 50% of the cells in the astroglia lineage clusters are NSCs, which is extremely high and inconsistent with classic histology studies.

      We appreciate the concerns raised by Reviewer 1 regarding the cell type identification. We suggest that the identification of the 16 main cell types in our study is accurate, as supported by the differential gene expression and the similarity of transcriptional profiles across species (Figure 1B to D, Figure Supplement 1C to E, and Figure 2A and B).

      While we appreciate the reviewer for bringing up the concern regarding the high proportion of NSCs within the astroglia lineage clusters, it is worth mentioning that distinguishing hippocampal qNSCs from astrocytes by transcription profiling poses a significant challenge in the field due to their high transcriptional similarity. From previous global UMAP analysis, AS1 (adult specific) can be separated from qNSCs, but AS2 (NSC-like astrocytes) cannot. Therefore, the data presented in Figure 2C to G aimed to further distinguish the qNSCs from AS2 by using gene set scores analysis. Based on different scores, we categorized qNSC/AS lineages into qNSC1, qNSC2 and AS2. Figure 2C presented the UMAP plot of qNSC/AS2 population from only neonatal sample. We apologize for not clarifying this in the figure legend. We have now clarified this information in the figure legend of Figure 2C. More importantly, we have added UMAP plots and quantifications for other groups in Figure2Supplement 2A and B, including adult, aging, and injure samples. This supplementary figure provides more complete information of the cell type composition and dynamic variations during aging and injury. Although the ratio of NSCs in the astroglia lineage clusters remains higher compared to classic histology studies, the trends indicate a reduction in qNSCs and an increase in astrocytes during aging and injury, which supports that cell type identification by using gene set score analysis is effective, although still not optimal. Combined methods to accurately distinguish between qNSCs and astrocytes are required in the future, and we also discuss this in the corresponding texts.

      Major comments:

      In Figure 1E, the authors should provide supporting quality control of their snRNAseq dataset in the corresponding supplementary figures. Specifically, they should show that the average number of genes and transcripts detected in each cluster are similar across different conditions. This would rule out the possibility that the stem cell gene enrichment is an artifact of increased global gene expression.

      Thanks for the suggestion. We have provided the supporting quality control of our snRNA-seq dataset in Figure1-Supplement 1A, B and F. The detailed data presented in Figure 1-Supplement 1A and Figure 1-source data 1 show that more than 2000 genes per cell were detected in all donor samples and mitochondrial genes accounted for less than 5%, suggesting that most cells were viable before freezing and underwent minimal RNA degradation. The hippocampi were dissected and collected from donors with a short post-mortem interval of about 3-4 hours to ensure low levels of RNA degradation and cellular apoptosis rates in the collected samples. For subsequent transcriptome analysis, we removed cells with fewer than 200 genes or more than 8600 genes (potentially indicating cell debris and doublets) and those with more than 20% of transcripts generated from mitochondrial genes, as shown in Figure 1-Supplement 1A and B. Figure 1-Supplement 1F provides evidence supporting that the average number of genes detected in each neurogenic cell type (AS2/qNSC, pNSC, aNSC, NB and GC) is similar across different conditions. This suggests that the enrichment of stem cell genes is not simply an artifact of increased global gene expression.

      In Figure 2A, the authors performed a cross-species comparative analysis of neurogenic cell clusters by integrating their datasets with published datasets from mice, pigs, and macaques. They assigned cell types to the clusters based on their similarity to the same cell group across species. However, they did not address why a previous study by Franjic et al. (Neuron 2022) using the same method and analysis did not detect any neurogenic clusters in human hippocampal and entorhinal cells. This discrepancy could have implications for the validity of their approach and the interpretation of their results. The authors should provide possible explanations for the different outcomes.

      We appreciate the valuable feedback provided by the reviewer. In our dataset, we sequenced 24,671 GC nuclei and 92,966 total DG cell nuclei, which also includes neonatal samples. The number of nuclei we sequenced is 4.5 times higher than that of Wang et al. (Cell Research, 2022), who also detected NBs. Thus, it is reasonable to conclude that we were able to detect NBs. Moreover, the presence of these rare cell types has been demonstrated in our study through immunostaining techniques, which provides further evidence. In addition, we downloaded the snRNAseq data from Franjic et al. (Neuron 2022) and mapped the dataset onto our snRNAseq dataset using the “multimodal reference mapping” method. Based on the mapping analysis, astrocytes, qNSCs, and aNSCs were identified in Franjic’s data with varying correlation efficiencies, but neuroblasts or immature neurons could not be detected (Figure 6-figure supplement 11 A to G). Therefore, we speculated that the discrepancies between our study and Franjic’s might be caused by health state differences across hippocampi, which subsequently lead to different degrees of hippocampal neurogenesis and immature neuron maintenance.

      In Figure 2C-2J, the authors examined the astroglia lineage clusters to identify NSC subpopulations and their gene features. However, they did not use consistent cell types for the analysis. Some comparisons involved quiescent NSCs (qNSCs) and differentiated astrocytes, while others involved primed NSCs (pNSCs), and active NSCs (aNSCs). This could introduce bias and affect the results. The authors should consistently include all astroglia cell clusters in their analysis, such as q, p, a NSCs and astrocytes.

      We understand the concerns raised by the reviewer, and we use different cell types as the starting points for the developmental trajectory for specific reasons. pNSCs represent an intermediate state between quiescence and activation. During embryonic development, pNSCs demonstrate the greatest similarity to RGLs. Subsequently, pNSCs progressively exit the cell cycle and transition into qNSCs during the postnatal stage. These qNSCs have the ability to re-enter the cell cycle upon activation by stimuli. Based on this knowledge, we have set the pNSC population as the root of the developmental trajectory in the neonatal sample, which aligns more closely with the actual developmental process. However, setting qNSCs as the root of the NSC developmental trajectory in the adult injury sample is more fit to the process of adult neurogenesis.

      In addition, the authors’ identification of qNSCs, pNSCs and aNSCs is very questionable in Figure 2. For instance, qNSC2 cells in Figure 2G express MBP, PLP1, and MOBP, which are markers of mature oligodendrocytes. They receive low scores in RGL gene module scoring in Figure 2E, even lower than those of astrocytes. These cells are likely misclassified mature oligodendrocytes. In Figure 2H-I, the authors did not present the DEGs in pNSCs and aNSCs, the GO terms of these clusters are very similar. To confirm their results, the authors should either use histology or cite literature that supports the differentiation of pNSCs and aNSCs by these genes.

      We appreciate the reviewer’s observation regarding the high expression of oligodendrocyte (OL) genes in the qNSC2 population, and we acknowledge that we currently do not have a clear explanation for this finding. However, despite the expression of OL genes in qNSC2, when we conducted a transcriptional similarity analysis comparing qNSC2 to other cell populations, we still observed a higher similarity between qNSC2 and qNSC1, as well as between qNSC2 and astrocytes, rather than oligodendrocytes. Therefore, qNSC2 are not misclassified mature oligodendrocytes (Figure 2-figure supplement 2C).

      Regarding pNSCs and aNSCs, both cell types share similar molecular characteristics, with a key distinction in their proliferation abilities. Notably, aNSCs primarily reside in the S/G2/M phase and highly express the cell cycle-related gene CCND2, reflecting active mitosis. Since its capacity to differentiate into neuroblast/immature granule cells, aNSCs also express a small subset of genes associated with neuronal differentiation, including STMN2, SOX11, and SOX4 (Figure 1C, D, and Figure 2J). As per the reviewer’s request, we have presented the DEGs in pNSCs and aNSCs (Figure 2-figure supplement 2D, Figure 2-source data 2). The results of GO analysis reveal that pNSC is more associated with the Wnt signaling pathway, axonogenesis, and Hippo signaling, while aNSC is more associated with G2/M transition of mitotic cell cycle, neuron projection development, axon development, and dendritic spine organization (Figure2-figure supplement 2E, Figure 2-source data 2).

      As Figure 2C illustrates, the authors isolated qNSCs and differentiated astrocytes from the astroglia lineage clusters to identify DEGs. However, more than 50% of the cells in the astroglia lineage clusters are NSCs, which is extremely high and inconsistent with classic histology studies. This could be due to cluster misclassification or over-representation of neonatal NSCs in the NSC cluster. The authors should stratify their data by age groups and provide corresponding UMAP plots and quantification. They should also compare DEGs between NSCs and astrocytes within each age group in all of the analyses, as neonatal, adult, and aging NSCs may have different properties and outputs.

      While we appreciate the reviewer for bringing up the concern regarding the high proportion of NSCs within the astroglia lineage clusters, it is worth mentioning that distinguishing hippocampal qNSCs from astrocytes by transcription profiling poses a significant challenge in the field due to their high transcriptional similarity. From previous global UMAP analysis, AS1 (adult specific) can be separated from qNSCs, but AS2 (NSC-like astrocytes) cannot. Therefore, the data presented in Figure 2C to G aimed to further distinguish the qNSCs from AS2 by using gene set scores analysis. Based on different scores, we categorized qNSC/AS lineages into qNSC1, qNSC2 and AS2. Figure 2C presented the UMAP plot of qNSC/AS2 population from only neonatal sample. We apologize for not clarifying this in the figure legend. We have now clarified this information in the figure legend of Figure 2C. More importantly, we have added UMAP plots and quantifications for other groups in Figure2-Supplement 2A and B, including adult, aging, and injure samples. This supplementary figure provides more complete information of the cell type composition and dynamic variations during aging and injury. Although the ratio of NSCs in the astroglia lineage clusters remains higher compared to classic histology studies, the trends indicate a reduction in qNSCs and an increase in astrocytes during aging and injury, which supports that cell type identification by using gene set score analysis is effective, although still not optimal. Combined methods to accurately distinguish between qNSCs and astrocytes are required in the future, and we also discuss this in the corresponding texts. (The same question has been answered in the first part of this letter.)

      In Figure 3, the authors discuss the important issues of shared gene expression between interneurons and NB/im-GCs. In the published work (Zhou et al. Nature 2022; Wang et al. Cell Research 2022), however, NBs and im-GCs are not located in the interneuron cluster. This needs to be stated to avoid confusion. Specifically, this suggests the limitation of using a few preselected markers for cell type identification. The author should also examine whether these shared markers are indeed expressed in human interneurons by immunostaining as one application of these markers will be in histology for the field.

      Thanks for the reviewer’s comments. We agree that single nucleus transcriptome analysis is capable of effectively distinguishing between immature neurons and interneurons. In our UMAP plot, the NBs and im-GCs are not located in the interneuron cluster, either. When we compared the granule cell lineage which contains NB/immature GC and the interneuron population at the whole transcriptome level between our dataset and published mouse (Hochgerner et al. 2018), macaque and human (Franjic et al. 2022) transcriptome datasets, we found high transcriptomic congruence across different datasets (Figure 3-figure supplement 3A). Specifically, our identified human GABA-INs very highly resembled the well-annotated interneurons in different species (similarity scores > 0.95) (Figure 3-figure supplement 3A). The point we want to convey here is that many markers previously used to identify immature neurons are also expressed in interneurons. Therefore, when using these markers for staining and identification purposes, there is a possibility of mistaking an interneuron for an immature neuron. Hence, when selecting markers, we need to be aware of this and exclude genes that are highly expressed in interneurons as markers for immature neurons. To support our view, we conducted co-immunostainings of DCX (a traditional neuroblast marker) and SST (a typical interneuron marker). Our results demonstrate that SST-positive interneurons are indeed capable of being stained by the traditional neuroblast marker DCX in primates. Please see Figure 3-figure supplement 4A-C.

      In Figure 4, the authors' classification of cell subpopulations in the neuronal lineage is not convincing. They claim to have identified two subpopulations of granule cells (GCs) that derive from neuroblasts in Figure 4A-4D. However, this is inconsistent with previous single-cell transcriptomic studies of human hippocampus, which only identified one GC cluster. The differentially expressed genes (DEGs) that they used to distinguish the two GC subpopulations are not supported by prior research. This could be a result of over-classification or technical bias. CALB1 marks mature neurons whereas CALB2 marks immature neurons. However, in Figure 4F, it suggests that CALB1 is expressed in cells that have similar pseudotime scores as CALB2, both of which reside in an intermediate position during the differentiation trajectory. This does not match the known expression patterns of these markers in GCs. The authors should explain this discrepancy and provide additional evidence to support their claims. In addition, for Figure 4F, the authors should address how the different cell fate groups correspond to cell clusters.

      We appreciate the concerns raised by the reviewer. Unfortunately, despite trying various strategies to confirm the identity of the two subpopulations of granule cells (GCs) derived from neuroblasts, we were unable to find a clear answer. As a result, we can only provide an objective description of the differences in gene expression and developmental trajectory and speculate that these differences may be related to their degree of maturity but are not aligned on the same trajectory.

      Regarding the expression of CALB1 and CALB2, the original Figure 4F did not provide precise positional information for these genes due to the compression of a large amount of gene information. In order to address this, we conducted a separate trajectory analysis specifically for CALB1 and CALB2 (Figure4-figure supplement 6B). The results of this analysis are in line with previous literature reports: CALB2 was found to be enriched in immature neurons, while CALB1 exhibited a delayed expression pattern and was enriched in mature neurons.

      The authors compared NSCs in different age groups in Figure 5, but their analysis in Figure S5A-D only included neonatal and aging stages, omitting adult stages. They should perform cross-age analyses with all three stages for consistency.

      Thank you for the reviewer's comments. We have now included the differentially expressed genes (DEGs) of the neurogenic lineage in the adult stage. Please see Figure5-supplyment 8.

      In Figure 6E, the authors should separate the data by age and calculate the proportion of the re-clustered cell groups, as they did in Figure 6B. In the re-clustered groups, how do the aNSCs and reactive astrocytes change with age?

      Thanks for the reviewer's comments. We have removed the previous Figure 6B and recalculated the proportions of the re-clustered cell groups, including reactive astrocytes (AS). The changes in the proportions of qNSC1, qNSC2, pNSC, aNSCs, and reactive astrocytes with age are now shown in Figure 6E of the updated version. We observed that the proportion of aNSCs decreases with age but increases after injury. Reactive astrocytes primarily appear in the injury group, while their proportion is very low in the other groups.

      In Figure 6E-H, the authors assert that the aNSC group in stroke injury can produce oligodendrocytes in vivo based on trajectory analysis, which is a bold claim and lacks literature support. Their evidence is insufficient, as it relies on a single in vitro study.

      Thanks for the reviewer's comments. We have provided more references to support our claim (e.g., El Waly, Cayre, and Durbec 2018; Parras et al. 2004; Enric Llorens-Bobadilla et al. 2015b; Koutsoudaki et al. 2016). These studies have indicated that under injury conditions, neural stem cells have potentials to differentiate into oligodendrocytes.

      In Figure S8 and the Discussion section, they compared their dataset with Zhou et al. (Nature 2022), a published snRNA-seq dataset of the human hippocampus across the lifespan. The authors speculated that the new neurons identified in the EdU in vitro culture analysis in Zhou et al. might be related to epilepsy, but they did not provide any evidence for this claim. To partially validate their speculation, the authors should conduct the same integrative analysis with Ayhan et al. (Neuron 2021), which examined snRNA-seq data from epileptic patient hippocampi, to demonstrate that they could detect the injury-induced aNSC population and injury-associated genes. Furthermore, they should also conduct the same integrative analysis with the other two published human hippocampal datasets, namely Franjic et al. (Neuron 2022) and Wang et al. (Cell Research 2022).

      Thanks for the reviewer's comments. As the reviewer’s request, we down loaded the snRNA-seq data from Zhou et al. (Nature 2022), Wang et al (Cell Research, 2022a), Franjic et al. (Neuron 2022) and Ayhan et al. (Neuron 2021) for integrative analysis. Except for the dataset from Zhou et al. (Nature 2022), which utilized machine learning and made it difficult to extract cell type information for fitting with our own data, the datasets from the other three laboratories were successfully mapped onto our dataset. Different levels of correlation were observed, confirming the presence of astrocytes, qNSCs, aNSCs, and NBs (Figure 6-figure supplement 11 E to G).

      There are a few minor concerns that the authors could improve upon. In Fig. 5D, HOPX immunostaining pattern doesn't not look like NSCs. In Figure 5B and 6B, the same data were presented twice. And proper statistical tests are missing in Figure 6B.

      Thanks for the reviewer's comments. We have added the arrowheads to indicate the typical immunostaining of HOPX immunostaining, which clearly shows its nuclear localization. This observation is consistent with previous reports on the subcellular distribution of HOPX protein. In the updated version, Figure 5B and 6D are distinct and not repetitive. The inclusion of the proportions of reactive astrocytes in Figure 6D provides valuable information about their distribution within the different groups. Unfortunately, statistical tests cannot be conducted for the neonatal and injury samples since only one sample is available in each case.

      # Reviewer 2

      Major points:

      1) The number of sequenced nuclei is lower than the calculated numbers of nuclei required for detecting rare cell types according to a recent meta-analysis of five similar datasets (Tosoni et al., Neuron, 2023). However, Yao et al report succeeding in detecting rare populations, including several types of neural stem cells in different proliferation states, which have been demonstrated to be extremely scarce by previous studies. It would be very interesting to read how the authors interpret these differences.

      We appreciate the valuable comments from the reviewer. We understand the reviewer’s concern and have also noticed that according to the computational modeling conducted by Tosoni et al. (Neuron, 2023), at least 21 neuroblast cells (NBs) can be identified out of 30,000 granule cells (GCs) from a total of 180,000 dentate gyrus (DG) cells. In our dataset, we sequenced 24,671 GC nuclei and 92,966 total DG cell nuclei, which also includes neonatal samples. The number of nuclei we sequenced is 4.5 times higher than that of Wang et al. (Cell Research, 2022), who also detected NBs. Therefore, it is reasonable to conclude that we were able to detect NBs. Moreover, the presence of these rare cell types has been demonstrated in our study through immunostaining techniques, which provides further evidence. we have implemented strict quality control measures to support the reliability of our sequencing data. These measures include: 1. Immediate collection of tissue samples after postmortem (3-4 hrs) to ensure the quality of isolated nuclei. 2. Only nuclei expressing more than 200 genes but fewer than 5000-8600 genes (depending on the peak of enrichment genes) were considered. On average, each cell detected around 3000 genes. 3. The average proportion of mitochondrial genes in each sample was approximately 1.8%, with no sample exceeding 5%. The related supplementary information has been included in Figure 1-supplement 1A, B and F, and Figure 1source data 1.

      2) The information regarding the donors including in this study is very scarce. Factors such as chronic conditions, medication, lifestyle parameters, inflammatory levels should be provided.

      Thanks for the reviewer's comments. We have incorporated additional details about the donors. However, we would like to clarify that information regarding lifestyle parameters has not been collected. Please refer to Figure 1-source data 1 for the updated information.

      3) The number of donors included per group is insufficient: neonatal group n=1; adult group n=2; stroke n=1. Although the scarcity and value of each human brain sample is a factor to be considered, the authors must explain why and how the results obtained from individuals can be extrapolated to the population at these low numbers, especially considering that the rate of adult hippocampal neurogenesis is assumed to be very variable across individuals (Tosoni et al., Neuron, 2023).

      Thanks for the reviewer's comments. We acknowledge these limitations and understand that the inclusion of a larger number of donors would strengthen the statistical power and generalizability of our findings. However, due to the scarcity of stroke or neonatal human samples, it was not feasible to collect a larger sample size within the expected timeframe. To explain why and how we could identify the rare neurogenic populations, we have shown that the number of cells captured from individual samples and the average number of genes detected per cell are sufficient, indicating overall good sequencing quality (Figure 1-supplement 1A and B, and Figure 1-source data 1). Additionally, we have further confirmed the presence of these cell types with low abundance by integrating immunofluorescence staining (Figure 4E and Figure 6F), cell type-specific gene expression (Figure1 C and D), overall transcriptomic characteristics (Figure 1-supplement 1E), and developmental potential (Figure4 A-D, Figure 6A-D).

      4) The definition of primed NSCs (pNSCs) is poor and questionable. "Primed" may be interpreted as a loaded term and the authors only make an effort to follow them into their neurogenic trajectory while figure 4A suggest that they also, if not preferentially judging on the directionality of the RNA velocity vectors, generate astrocytes and quiescent NSCs.

      Thanks for the reviewer's comments. We apologize for not clearly explaining the definition of pNSC in our study. We have now included an explanation in the text and added supplementary information to highlight the features of pNSC and aNSC (Figure 2H to J, Figure2-figure supplement 2D and E). The results of GO analysis reveal that pNSC is more associated with the Wnt signaling pathway, axonogenesis, and Hippo signaling, while aNSC is more associated with G2/M transition of mitotic cell cycle, neuron projection development, axon development, and dendritic spine organization (Figure2-figure supplement 2E, Figure 2-source data 2). The pNSCs referred to in this study represent an intermediate state between quiescence and activation. During embryonic development, pNSCs exhibit the greatest similarity to RGLs. Subsequently, pNSCs gradually exit the cell cycle and transition into qNSCs during the postnatal development (Figure 2J). Thus, in Figure 4A, for the neonatal sample analysis, some pNSCs are shown to enter the neurogenic trajectory, while others exit the cell cycle and transition into qNSCs or become astrocytes (AS2) during postnatal development, indicating a bidirectional trajectory.

      5) The experimental definition of quiescent NSCs (qNSC1) is poor and questionable. The qNSC1 cluster is defined by the expression of HOXP (page 6), which the authors indicate is a"quiescence NSC gene". However, at least in mice, HOXP collages with BrdU in proliferative NSCs (Deqiang Li et al, Stem Cell Res. 2015).

      Thank you for providing the information about the study conducted by Deqiang Li et al (Stem Cell Res. 2015). We have carefully reviewed their findings. They propose that Hopx is specifically expressed in RGL cells, which are predominantly in a quiescent state. Additionally, they observed that Hopx-positive cells are long-term BrdU-label retaining cells, and Hopx-null NSCs show enhanced neurogenesis, as evidenced by an increased number of BrdU-positive cells. These results suggest that high expression of Hopx in NSCs indicates their quiescence. Furthermore, other studies have provided further support for using high expression of the HOPX gene as a marker to identify quiescent NSCs (Jaehoon Shin et al., Cell Stem Cell 2015; Daniel A. Berg et al., Cell 2019)

      6) The term quiescent is never defined in the text, and the reader is forced to assume that they refer to the absence of active proliferation genes, most commonly MKI67. Is that what the authors intended? this should be clarified.

      Thanks for the reviewer's comments. We apologize for not clearly explaining the definition of qNSC in our study. We have now included an explanation in the text. qNSCs exhibit reversible cell cycle arrest and display a low rate of metabolic activity. However, they still possess a latent capacity to generate neurons and glia when they receive activation signals. They express genes such as GFAP, ALDH1L1, ID4, and HOPX (Figure 2B). The absence or low expression of active proliferation genes is one feature of qNSCs. The main difference lies in the state of the cell cycle and metabolism.

      7) They find cell clusters that express the proliferation marker MKI67. however, previous studies have indicated the difficulty of snRNA-seq techniques to detect proliferation marker transcripts, specially MKI67 even in hippocampal samples from human infants (for example see the snRNAseq studies from Wang and from Zhou cited by the authors and previously mentioned meta-analysis).

      Thanks for the reviewer's comments. We could detect MKI67 in our snRNA-seq data, albeit with a very low number of cells (not clustered) expressing it. Here, we are providing the feature plot in Author response image 1 to illustrate the expression of MKI67. In our Figure 5C, we compared the expression level of MKI67 in neurogenic lineage among neonatal, adult and aged groups, and observed its high expression in neonatal rather than adult and aged groups. But the fraction of cells expressed MIK67 is still very low. We apologize for the confusion. We did not claim that we identified specific cell clusters expressing MKI67 in our study.

      Author response image 1.

      8) The authors observe declining numbers of proliferating cells with aging and interpret this as evidence of declining neurogenesis. However, they also observe sustained neuroblast numbers in the aged brains they analyzed. Wouldn't these neuroblast support neurogenesis? This is unclear and should be discussed.

      Thanks for the reviewer's question. We will revise the inaccurate description to clarify that the number of proliferating NPCs, rather than immature neurons, is dramatically reduced with aging. This is because, compared to rodents, immature neurons in primates are indeed retained for a longer period and possess the potential to further develop into mature neurons (Kohler, S.J., et al., PNAS, 2011). We have discussed this in the corresponding texts (Figure 5).

      9) The authors indicate that they find DCX transcript expression in interneurons. This is a potentially interesting observation. However, the authors should be very clear to state that in most studies that use DCX as a marker of immature granule cells, DCX's expression is detected by immunohistochemistry. Therefore, the fact that DCX transcripts may be present in other immature neurons does not necessarily disqualify its use as a protein maker of immature granule cells. This clarification will help to prevent misinterpretations of the data presented by the authors.

      Thanks for the reviewer's suggestion. We have clarified that we observed DCX transcripts present in interneurons in addition to immature neurons by snRNAseq. In this revised version, we conducted co-immunostainings of DCX (a traditional neuroblast marker) and SST (a typical interneuron marker). Our results demonstrate that SST-positive interneurons are indeed capable of being stained by the traditional neuroblast marker DCX in primates. Please see Figure 3-figure supplement 4A-C. The similar result has also been reported by Franjic et al. (Neuron 2022).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Continuous attractor networks endowed with some sort of adaptation in the dynamics, whether that be through synaptic depression or firing rate adaptation, are fast becoming the leading candidate models to explain many aspects of hippocampal place cell dynamics, from hippocampal replay during immobility to theta sequences during run. Here, the authors show that a continuous attractor network endowed with spike frequency adaptation and subject to feedforward external inputs is able to account for several previously unaccounted aspects of theta sequences, including (1) sequences that move both forwards and backwards, (2) sequences that alternate between two arms of a T-maze, (3) speed modulation of place cell firing frequency, and (4) the persistence of phase information across hippocampal inactivations. I think the main result of the paper (findings (1) and (2)) are likely to be of interest to the hippocampal community, as well as to the wider community interested in mechanisms of neural sequences. In addition, the manuscript is generally well written, and the analytics are impressive. However, several issues should be addressed, which I outline below.

      Major comments:

      1. In real data, population firing rate is strongly modulated by theta (i.e., cells collectively prefer a certain phase of theta - see review paper Buzsaki, 2002) and largely oscillates at theta frequency during run. With respect to this cyclical firing rate, theta sweeps resemble "Nike" check marks, with the sweep backwards preceding the sweep forwards within each cycle before the activity is quenched at the end of the cycle. I am concerned that (1) the summed population firing rate of the model does not oscillate at theta frequency, and (2) as the authors state, the oscillatory tracking state must begin with a forward sweep. With regards to (1), can the authors show theta phase spike preference plots for the population to see if they match data? With regards to (2), can the authors show what happens if the bump is made to sweep backwards first, as it appears to do within each cycle?

      Thank you for raising these two important points. As the reviewer mentioned, experimental data does show that the population activity (e.g., calculated from the multiunit activity of tetrode recording) is strongly modulated by theta. While we mainly focused on sweeps of bump position, the populational activity also shows cyclical firing at the theta frequency (we added Fig. S7 to reflect this). This is also reflected in Fig. 4d where the bump height (representing the overall activity) oscillates at individual theta cycles. The underlying mechanism of cyclical population activity is as follows: the bump height is determined by the amount of input the neuron received (which located at the center of the bump). While the activity bump sweeps away from the external input, the center neuron receives less input from the external input, and hence the bump height is smaller. Therefore, not only the position sweeps around the external input, also the populational activity sweeps accordingly at the same frequency.

      For the “Nike” check marks: we first clarify that the reason for we observed a forward sweep preceding a backward sweep is that we always force the artificial animal runs from left to right on the track where we treated “right” as “forward”. At the beginning of simulation, the external input to the network moves towards right, and therefore the activity bump starts from a position behind the animals and sweeps towards right (forward). In general, this means that the bump will never do a backward sweep first in our model. However, this does not mean that the forward sweeps precede the backward sweeps in each theta cycle. Experimentally, to determine the “0” phase of theta cycles, the LFP signal in CA1 was first bandpass filtered and then Hilbert transformed to get the phase at each time point. Then, a phase histogram of multiunit activity in CA1 was calculated across locomotor periods; the phase of maximal CA1 firing on the histogram was then defined to be “0” phase. Since we didn’t model LFP oscillation in the attractor model, we cannot obtain a “0” phase reference like the experimental procedure. Instead, we define the “0” phase using the “population activity quenched time”, where phase “0” is defined as the minimum population activity during oscillation cycles, which happens when the activity bump is farthest from the animal position. In this way, we observed a “Nike” pattern where the activity bump begins with a backward sweep towards the external input and then followed up with a forward sweep. This was showed in Fig. 3b in the main text.

      1. I could not find the width of the external input mentioned anywhere in the text or in the table of parameters. The implication is that it is unclear to me whether, during the oscillatory tracking state, the external input is large compared to the size of the bump, so that the bump lives within a window circumscribed by the external input and so bounces off the interior walls of the input during the oscillatory tracking phase, or whether the bump is continuously pulled back and forth by the external input, in which case it could be comparable to the size of the bump. My guess based on Fig 2c is that it is the latter. Please clarify and comment.

      Thank you for your comment. We added the width of the external input to the text and table (see table 1). The bump is continuously pulled back and forth by the external input, as guessed by the reviewer. Experimentally, theta sweeps live roughly in the window of place field size. This is also true in our model, where theta sweep length depends on the strength of recurrent connections which determines the place field size. However, it also depends on the adaptation strength where large adaptation (more intrinsic mobility) leads to large sweep length. We presume that the reason for the reviewer had the guess that the bump may live within a window bounded by the external input is that we also set the width of external input comparable to the place field size (in fact, we don’t know how wide the external location input to the hippocampal circuits is in the biological brain, but it might be reasonable to set the external input width as comparable to the place field size, otherwise the location information conveyed to the hippocampus might be too dispersed). We added a plot in the SI (see Fig. S1) to show that when choosing a smaller external input width, but increasing the adaptation strength, the activity bump lives in a window exceeding the external input.

      We clarified this point by adding the following text to line 159

      “... It is noteworthy that the activity bump does not live within a window circumscribed by the external input bump (bouncing off the interior walls of the input during the oscillatory tracking state), but instead is continuously pulled back and forth by the external input (see Fig. S1)...”

      1. I would argue that the "constant cycling" of theta sweeps down the arms of a T-maze was roughly predicted by Romani & Tsodyks, 2015, Figure 7. While their cycling spans several theta cycles, it nonetheless alternates by a similar mechanism, in that adaptation (in this case synaptic depression) prevents the subsequent sweep of activity from taking the same arm as the previous sweep. I believe the authors should cite this model in this context and consider the fact that both synaptic depression and spike frequency adaptation are both possible mechanisms for this phenomenon. But I certainly give the authors credit for showing how this constant cycling can occur across individual theta cycles.

      Thank you for raising this point. We added the citation of Romani & Tsodyks’ model in the context (line 304). As the reviewer pointed out, STD can also act as a potential mechanism for this phenomenon. We also gave the Romani & Tsodyks’ model credit for showing how this “cycling spanning several theta cycles” can account for the phenomenon of slow (~1Hz) and deliberative behaviors, namely, head scanning (Johson and Redish, 2007). We commented this in line 302

      “... As the external input approaches the choice point, the network bump starts to sweep onto left and right arms alternatively in successive theta cycles (Fig. 5b and video 4; see also Romani and Tsodyks (2015) for a similar model of cyclical sweeps spanning several theta cycles) ...”

      1. The authors make an unsubstantiated claim in the paragraph beginning with line 413 that the Tsodyks and Romani (2015) model could not account for forwards and backwards sweeps. Both the firing rate adaptation and synaptic depression are symmetry breaking models that should in theory be able to push sweeps of activity in both directions, so it is far from obvious to me that both forward and backward sweeps are not possible in the Tsodyks and Romani model. The authors should either prove that this is the case (with theory or simulation) or excise this statement from the manuscript.

      Thank you for your comment. Our claim about the Tsodyks and Romani (2015) model's inability to account for both forward and backward sweeps was inappropriate. We made this claim based on our own implementation of the Tsodyks and Romani (2015) model and didn’t find a parameter region where the bump oscillation shows both forward and backward sweeps. It might be due to the limited parameter range we searched from. Additionally, we also note some difference in these two models, where the Romani & Tsodyks’ model has an external theta input to the attractor network which prevent the bump to move further. This termination may also prevent the activity bump to move backward as well. We didn’t consider external theta input in our model, and the bump oscillation is based on internal dynamics. We have deleted that claim from line 424 in the revised paper, and revised that portion of the manuscript by adding the following text to line 424:

      “…Different from these two models, our model considers firing rate adaptation to implement symmetry breaking and hence generates activity propagation. To prevent the activity bump from spreading away, their model considers an external theta input to reset the bump location at the end of each theta cycle, whereas our model generates an internal oscillatory state, where the activity bump travels back due to the attraction of external location input once it spreads too far away. Moreover, theoretical analysis of our model reveals how the adaptation strength affect the direction of theta sweeps, as well as offers a more detailed understanding of theta cycling in complex environments…”

      1. The section on the speed dependence of theta (starting with line 327) was very hard to understand. Can the authors show a more graphical explanation of the phenomenon? Perhaps a version of Fig 2f for slow and fast speeds, and point out that cells in the latter case fire with higher frequency than in the former?

      Thank you for raising this valuable point. There are two different frequencies showed in Fig. 6 a,c &d. One is the bump oscillation frequency, the other is the firing frequency of single cell. To help understanding, we included experimental results (from Geisler et al, 2007) in Fig. 6a. It showed that when the animal increases its running speed, the LFP theta only increases a bit (compare the blue curve and the green curve), while the single cell firing rate oscillation frequency increases more. In our model, we first demonstrated this result using unimodal cells which have only significant phase precession (Fig. 6c). While the animal runs through the firing field of a place cell, the firing phase will always precess for half a cycle in total. Therefore, faster running speed means that the half cycle will be accomplished faster, and hence single cell oscillation frequency will be higher. We also predicted the results on bimodal cells (Fig. 6d). To make this point clearer, we modified Fig. 6 by including experimental results, and rewrote the paragraph as follows (line 337):

      “…As we see from Fig. 3d and Fig. 4a&b, when the animal runs through the firing field of a place cell, its firing rate oscillates, since the activity bump sweeps around the firing field center of the cell. Therefore, the firing frequency of a place cell has a baseline theta frequency, which is the same as the bump oscillation frequency. Furthermore, due to phase precession, there will be a half cycle more than the baseline theta cycles as the animal runs over the firing field, and hence single cell oscillatory frequency will be higher than the baseline theta frequency (Fig. 6c). The faster the animal runs, the faster the extra half cycle is accomplished. Consequently, the firing frequency of single cells will increase more (a steeper slope in Fig. 6c red dots) than the baseline frequency.…”

      1. I had a hard time understanding how the Zugaro et al., (2005) hippocampal inactivation experiment was accounted for by the model. My intuition is that while the bump position is determined partially by the location of the external input, it is also determined by the immediate history of the bump dynamics as computed via the local dynamics within the hippocampus (recurrent dynamics and spike rate adaptation). So that if the hippocampus is inactivated for an arbitrary length of time, there is nothing to keep track of where the bump should be when the activity comes back online. Can the authors please explain more how the model accounts for this?

      Thank you for the comments. The easiest way to understand how the model account for the experimental result from Zugaro et al., (2005) is from Eq. 8:

      This equation says that the firing phase of a place cell is determined by the time the animal traveled through the place field, i.e., the location of the animal in the place field (with d0,c0 and vext all constant, and tf the only variable). No matter how long the hippocampus is inactivated (for an arbitrary length of time), once the external input is on, the new phase will continue from the new location of the animal in the place field. In other words, the peak firing phase keeps tracking the location of the animal. To make this point clearer, we modified Fig. 6 by including experimental results from Zugaro et al., (2005), and updated the description from line 356:

      “…Based on the theoretical analysis (Eq. 8), we see that the firing phase is determined by the location of the animal in the place field, i.e., vext tf. This means that the firing phase keeps tracking the animal's physical location. No matter how long the network is inactivated, the new firing phase will only be determined by the new location of the animal in the place field. Therefore, the firing phase in the first bump oscillation cycle after the network perturbation is more advanced than the firing phase in the last bump oscillation cycle right before the perturbation, and the amount of precession is similar to that in the case without perturbation (Fig. 6e) …”

      1. Can the authors comment on why the sweep lengths oscillate in the bottom panel of Fig 5b during starting at time 0.5 seconds before crossing the choice point of the T-maze? Is this oscillation in sweep length another prediction of the model? If so, it should definitely be remarked upon and included in the discussion section.

      We appreciate the reviewer’s valuable attention of this phenomenon. We thought it was a simulation artifact due to the parameter setting. However, we found that this phenomenon is quite robust to different parameter settings. While we haven’t found a theoretical explanation, we provide a qualitative explanation for it: this length oscillation frequency may be coupled with the time constant of the firing rate adaptation. Specifically, for a longer sweep, the neurons at the end of the sweep are adapted (inhibited), and hence the activity bump cannot travel that long in the next round. Therefore, the sweep length is shorter compared to the previous one. In the next round, the bump will sweep longer again because those neurons have recovered from the previous adaptation effect. We think this length oscillation is quite interesting and will check that in the experimental data in future works. We added this point in the main text as a prediction in line 321:

      “…We also note that there is a cyclical effect in the sweep lengths across oscillation cycles before the animal enters the left or right arm (see Fig. 5b lower panel), which may be interesting to check in the experimental data in future work (see Discussion for more details) …”

      And line 466:

      “…Our model of the T-maze environment showed an expected phenomenon that as the animal runs towards the decision point, the theta sweep length also shows cyclical patterns (Fig. 5b lower panel). An intuitive explanation is that, due to the slow dynamics in firing rate adaptation (with a large time constant compared to neural firing), a long sweep leads to an adaptation effect on the neurons at the end of the sweep path. Consequently, the activity bump cannot travel as far due to the adaptation effect on those neurons, resulting in a shorter sweep length compared to the previous one. In the next round, the activity bump exhibits a longer sweep again because those neurons have recovered from the previous adaptation effect. We plan to test this phenomenon in future experiments...”

      1. Perhaps I missed this, but I'm curious whether the authors have considered what factors might modulate the adaptation strength. In particular, might rat speed modulate adaptation strength? If so, would have interesting predictions for theta sequences at low vs high speeds.

      Thank you for raising up this important point. As we pointed out in line 279: “…the experimental data (Fernandez et al, 2017) has indicated that there is a laminar difference between unimodal cells and bimodal cells, with bimodal cells correlating more with the firing patterns of deep CA1 neurons and unimodal cells with the firing patterns of superficial CA1 neurons. Our model suggests that this difference may come from the different adaptation strengths in the two layers…”. Our guess is that the adaptation strength might reflect some physiological differences of place cells in difference pyramidal layers in the hippocampus. For example, place cells in superficial layer and deep layer receive different amount of input from MEC and sensory cortex, and such difference may contribute to a different effect of adaptation of the two populations of place cells.

      Our intuition is that animal’s running speed may not directly modulate the adaptation strength. Note that the effect of adaptation and adaptation strength are different. As the animal rapidly runs across the firing field, the place cell experiences a dense firing (in time), therefore the adaptation effect is large; as the animal slowly runs across the field, the place cell experiences sparse firing (in time), and hence the adaptation effect is small. In these two situations, the adaption strength is fixed, but the difference is due to the spike intervals.

      From Eq. 45-47, our theoretical analysis shows several predictions of theta sequences regarding to the parameters in the network. For example, how the sweep length varies when the running speed changes in the network. We simulated the network in both low running speed and high running speed (while kept all other parameters fixed), and found that the sweep length at low speed is larger than that at high speed. This is different from previously data, where they showed that the sweep length increases as the animal runs faster (Maurer et al, 2012). However, we are not sure how other parameters are changed in the biological brain as the animal runs faster, e.g., the external input strength and the place field width might also vary as confounds. We will explore this more in the future and investigate how the adaptation strength is modulated in the brain.

      1. I think the paper has a number of predictions that would be especially interesting to experimentalists but are sort of scattered throughout the manuscript. It would be beneficial to have them listed more prominently in a separate section in the discussion. This should include (1) a prediction that the bump height in the forward direction should be higher than in the backward direction, (2) predictions about bimodal and unimodal cells starting with line 366, (3) prediction of another possible kind of theta cycling, this time in the form of sweep length (see comment above), etc.

      Thank you for pointing this out. We updated the manuscript by including a paragraph in Discussion summarizing the prediction we made throughout the manuscript (from line 459):

      ‘’…Our model has several predictions which can be tested in future experiments. For instance, the height of the activity bump in the forward sweep window is higher than that in the backward sweep window (Fig. 4c) due to the asymmetric suppression effect from the adaptation. For bimodal cells, they will have two peaks in their firing frequency as the animal runs across the firing fields, with one corresponding to phase precession and the other corresponding to phase procession. Similar to unimodal cells, both the phase precession and procession of a bimodal cell after transient intrahippocampal perturbation will continue from the new location of the animal (Fig. S5). Interestingly, our model of the T-maze environment showed an expected phenomenon that as the animal runs towards the decision point, the theta sweep length also shows cyclical patterns (Fig. 5b lower panel). An intuitive explanation is that, due to the slow dynamics in firing rate adaptation (with a large time constant compared to neural firing), a long sweep leads to an adaptation effect on the neurons at the end of the sweep path. Consequently, the activity bump cannot travel as far due to the adaptation effect on those neurons, resulting in a shorter sweep length compared to the previous one. In the next round, the activity bump exhibits a longer sweep again because those neurons have recovered from the previous adaptation effect. We plan to test this phenomenon in future experiments…’

      Reviewer #2:

      In this work, the authors elaborate on an analytically tractable, continuous-attractor model to study an idealized neural network with realistic spiking phase precession/procession. The key ingredient of this analysis is the inclusion of a mechanism for slow firing-rate adaptation in addition to the otherwise fast continuous-attractor dynamics. The latter which continuous-attractor dynamics classically arises from a combination of translation invariance and nonlinear rate normalization. For strong adaptation/weak external input, the network naturally exhibits an internally generated, travelling-wave dynamics along the attractor with some characteristic speed. For small adaptation/strong external stimulus, the network recovers the classical externally driven continuous-attractor dynamics. Crucially, when both adaptation and external input are moderate, there is a competition with the internally generated and externally generated mechanism leading to oscillatory tracking regime. In this tracking regime, the population firing profile oscillates around the neural field tracking the position of the stimulus. The authors demonstrate by a combination of analytical and computational arguments that oscillatory tracking corresponds to realistic phase precession/procession. In particular the authors can account for the emergence of a unimodal and bimodal cells, as well as some other experimental observations with respect the dependence of phase precession/procession on the animal's locomotion. The strengths of this work are at least three-fold: 1) Given its simplicity, the proposed model has a surprisingly large explanatory power of the various experimental observations. 2) The mechanism responsible for the emergence of precession/procession can be understood as a simple yet rather illuminating competition between internally driven and externally driven dynamical trends. 3) Amazingly, and under some adequate simplifying assumptions, a great deal of analysis can be treated exactly, which allows for a detailed understanding of all parametric dependencies. This exact treatment culminates with a full characterization of the phase space of the network dynamics, as well as the computation of various quantities of interest, including characteristic speeds and oscillating frequencies.

      1. As mentioned by the authors themselves, the main limitation of this work is that it deals with a very idealized model and it remains to see how the proposed dynamical behaviors would persist in more realistic models. For example, the model is based on a continuous attractor model that assumes perfect translation-invariance of the network connectivity pattern. Would the oscillating tracking behavior persist in the presence of connection heterogeneities?

      Thank you for raising up this important point. Continuous attractor models have been widely used in modeling hippocampal neural circuits (see McNaughton et al, 2006 for a review), and researchers often assumed that there is a translation-invariance structure in these network models. The theta sweep state we presented in the current work is based on the property of the continuous attractor state. We do agree with the reviewer that the place cell circuit might not be a perfect continuous attractor network. For a simpler case where the connection weights are sampled from a Gaussian distribution around J_0, the theta sweep state still exhibit in the network (see Fig. S8 for an example). We also believe that the model can be extended to more complex cases where there exist over-representations of the “home” location and decision points in the real environment, i.e., the heterogeneity is not random, but has stronger connections near those locations, then the theta sweeps will be more biased to those location. However, if the heterogeneity breaks the continuous attractor state, the theta sweep state may not be presented in the network.

      1. Can the oscillating tracking behavior be observed in purely spiking models as opposed to rate models as considered in this work?

      Thank you for pointing this out. The short answer is yes. If the translation-invariance of the network connectivity pattern hold in the network, i.e., the spiking network is still a continuous attractor network (see the work from Tsodyks et al, 1996; and from Yu et al. "Spiking continuous attractor neural networks with spike frequency adaptation for anticipative tracking"), then the adaptation, which has the mathematical form of spike frequency adaptation (instead of firing rate adaptation), will still generate sweep state of the activity bump. We here chose the rate-based model because it is analytically tractable, which gives us a better understanding of the underlying dynamics. Many of the continuous attractor model related to spatial tuning cell populations are rate-based (see examples Zhang 1996; Burak & Fiete 2009). However, extending to spike-based model would be straightforward.

      1. Another important limitation is that the system needs to be tuned to exhibit oscillation within the theta range and that this tuning involves a priori variable parameters such as the external input strength. Is the oscillating-tracking behavior overtly sensitive to input strength variations?

      Thank you for pointing this out. In rodent studies, theta sequences are thought to result from the integration of both external inputs conveying sensory-motor information, and intrinsic network dynamics possibly related to memory processes (see Drieu and Zugaro 2019; Drieu at al, 2018). We clarified here that, in our modeling work, the generation of theta sweeps also depends on both the external input and the intrinsic dynamics (induced by the firing rate adaptation). Therefore, we don’t think the dependence of theta sweeps on the prior parameter – the external input strength – is a limitation here. We agreed with the reviewer that the system needs to be tuned to exhibit oscillation within the theta range. However, the parameter range of inducing oscillatory state is relatively large (see Fig. 2g in the main text). It will be interesting to investigate (and find experimental evidence) how the biological system adjusts the network configuration to implement the sweep state in network dynamics.

      1. The author mentioned that an external pacemaker can serve to drive oscillation within the desired theta band but there is no evidence presented supporting this.

      Thank you for pointing this out. We made this argument based on our initial simulation before but didn’t go into the details of that. We have deleted that argument in the discussion and rewrote that part. We will carry out more simulations in the future to verify if this is true. See our changes from line 418 to line 431:

      “... A representative model relying on neuronal recurrent interactions is the activation spreading model. This model produces phase precession via the propagation of neural activity along the movement direction, which relies on asymmetric synaptic connections. A later version of this model considers short-term synaptic plasticity (short-term depression) to implicitly implement asymmetric connections between place cells, and reproduces many other interesting phenomena, such as phase precession in different environments. Different from these two models, our model considers firing rate adaptation to implement symmetry breaking and hence generates activity propagation. To prevent the activity bump from spreading away, their model considers an external theta input to reset the bump location at the end of each theta cycle, whereas our model generates an internal oscillatory state, where the activity bump travels back due to the attraction of external location input once it spreads too far away. Moreover, theoretical analysis of our model reveals how the adaptation strength affect the direction of theta sweeps, as well as offers a more detailed understanding of theta cycling in complex environments...”

      1. A final and perhaps secondary limitation has to do with the choice of parameter, namely the time constant of neural firing which is chosen around 3ms. This seems rather short given that the fast time scale of rate models (excluding synaptic processes) is usually given by the membrane time constant, which is typically about 15ms. I suspect this latter point can easily be addressed.

      Thank you for pointing this out. The time constant we currently chose is relatively short as used in other studies. We conducted additional simulation by adjusting the time constant to 10ms, and the results reported in this paper remain consistent. Please refer to Fig S9 for the results obtained with a time constant of 10 ms.

      Reviewer #3:

      With a soft-spoken, matter-of-fact attitude and almost unwittingly, this brilliant study chisels away one of the pillars of hippocampal neuroscience: the special role(s) ascribed to theta oscillations. These oscillations are salient during specific behaviors in rodents but are often taken to be part of the intimate endowment of the hippocampus across all mammalian species, and to be a fundamental ingredient of its computations. The gradual anticipation or precession of the spikes of a cell as it traverses its place field, relative to the theta phase, is seen as enabling the prediction of the future - the short-term future position of the animal at least, possibly the future in a wider cognitive sense as well, in particular with humans. The present study shows that, under suitable conditions, place cell population activity "sweeps" to encode future positions, and sometimes past ones as well, even in the absence of theta, as a result of the interplay between firing rate adaptation and precise place coding in the afferent inputs, which tracks the real position of the animal. The core strength of the paper is the clarity afforded by the simple, elegant model. It allows the derivation (in a certain limit) of an analytical formula for the frequency of the sweeps, as a function of the various model parameters, such as the time constants for neuronal integration and for firing rate adaptation. The sweep frequency turns out to be inversely proportional to their geometric average. The authors note that, if theta oscillations are added to the model, they can entrain the sweeps, which thus may superficially appear to have been generated by the oscillations.

      1. The main weakness of the study is the other side of the simplicity coin. In its simple and neat formulation, the model envisages stereotyped single unit behavior regulated by a few parameters, like the two time constants above, or the "adaptation strength", the "width of the field" or the "input strength", which are all assumed to be constant across cells. In reality, not only assigning homogeneous values to those parameters seems implausible, but also describing e.g. adaptation with the simple equation included in the model may be an oversimplification. Therefore, it remains important to understand to what extent the mechanism envisaged in the model is robust to variability in the parameters or to eg less carefully tuned afferent inputs.

      Thank you for pointing out this important question. As the reviewer pointed out, there is an oversimplification in our model compared to the real hippocampal circuits (also see Q1 and Q3 from reviewer2). We also pointed out that in the main text line 504:

      “…Nevertheless, it is important to note that the CANN we adopt in the current study is an idealized model for the place cell population, where many biological details are missed. For instance, we have assumed that neuronal synaptic connections are translation-invariant in the space...”

      To investigate model robustness to parameter setting, we divided all the parameters into two groups. The first group of parameters determines the bump state, i.e., width of the field a, neuronal density ρ, global inhibition strength k, and connection strength J_0. The second group of parameters determines the bump sweep state (which based on the existence of the bump state), i.e., the input strength α and the adaptation strength m. For the first group of parameters, we refer the reviewer to the Method part: stability analysis of the bump state. This analysis tells us the condition when the continuous attractor state holds in the network (see Eq. 20, which guides us to perform parameter selection). For the second group of parameters, we refer the reviewer to Fig. 2g, which tells us when the bump sweep state occurs regarding to input strength and adaptation strength. When the input strength is small, the range of adaptation strength is also small (to get the bump sweep state). However, as the input strength increases, we can see from Fig. 2g that the range of adaptation strength (to get the bump sweep state) also linearly increases. Although there exists other two state in the network when the two parameters are set out of the colored area in Fig. 2g, the parameter range of getting sweep state is also large, especially when the input strength value is large, which is usually the case when the animal actively runs in the environment.

      To demonstrate how the variability affect the results, we added variability to the connection weights by sampling the connection weights from a Gaussian distribution around J_0 (this introduces heterogeneity in the connection structure). We found that the bump sweep state still holds in this condition (see Fig. S8 as well as Q1 from reviewer2). For the variability in other parameter values, the results will be similar. Although adding variability to these parameters will not bring us difficulty in numerical simulation, it will make the theoretical analysis much more difficult.

      1. The weak adaptation regime, when firing rate adaptation effectively moves the position encoded by population activity slightly ahead of the animal, is not novel - I discussed it, among others, in trying to understand the significance of the CA3-CA1 differentiation (2004). What is novel here, as far as I know, is the strong adaptation regime, when the adaptation strength m is at least larger than the ratio of time constants. Then population activity literally runs away, ahead of the animal, and oscillations set in, independent of any oscillatory inputs. Can this really occur in physiological conditions? A careful comparison with available experimental measures would greatly strengthen the significance of this study.

      Thank you for raising up this interesting question.

      Re: “…firing rate adaptation effectively moves the position encoded by population activity slightly ahead of the animal, is not novel…”, We added Treves, A (2004) as a citation when we introduce the firing rate adaptation in line 116

      To test if the case of “…the adaptation strength m is at least larger than the ratio of time constants…” could occur in physiological conditions, it requires a measure of the adaptation strength as well as the time constant of both neuron firing and adaptation effect. The most straightforward way would be in vivo patch clamp recording of hippocampal pyramidal neurons when the animal is navigating an environment. This will give us a direct measure of all these values. However, we don’t have these data to verify this hypothesis yet. Another possible way of measure these values is through a state-space model. Specifically, we can build a state space model (considering adaptation effect in spike release) by taking animal’s position as latent dynamics, and recorded spikes as observation, then infer the parameters such as adaptation strength and time constant in the slow dynamics. Previous work of state-space models (without firing rate adaptation) in analyzing theta sweeps and replay dynamics have been explored by Denovellis et al. (2021), as well as Krause and Drugowitsch (2022). We think it might be doable to infer the adaptation strength and adaptation time constant in a similar paradigm in future work. We thank the reviewer for pointing out that and hope our replies have clarified the concerns of the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Vision is a highly active process. Humans move their eyes 3-4 times per second to sample information with high visual acuity from our environment, and where eye movements are directed is critical to our understanding of active vision. Here, the authors propose that the cost of making a saccade contributes critically to saccade selection (i.e., whether and where to move the eyes). The authors build on their own recent work that the effort (as measured by pupil size) that comes with planning and generating an eye movement varies with saccade direction. To do this, the authors first measured pupil size for different saccade directions for each participant. They then correlated the variations in pupil size obtained in the mapping task with the saccade decision in a free-choice task. The authors observed a striking correlation: pupil size in the mapping task predicted the decision of where to move the eyes in the free choice task. In this study, the authors provide a number of additional insightful analyses (e.g., based on saccade curvature, and saccade latency) and experiments that further support their claim that the decision to move the eyes is influenced by the effort to move the eyes in a particular direction. One experiment showed that the same influence of assumed saccade costs on saccade selection is observed during visual search in natural scenes. Moreover, increasing the cognitive load by adding an auditory counting task reduced the number of saccades, and in particular reduced the costly saccades. In sum, these experiments form a nice package that convincingly establishes the association between pupil size and saccade selection.

      We thank the reviewer for highlighting the novelty and cogency of our findings.

      In my opinion, the causal structure underlying the observed results is not so clear. While the relationship between pupil size and saccade selection is compelling, it is not clear that saccade-related effort (i.e., the cost of a saccade) really drives saccade selection. Given the correlational nature of this relationship, there are other alternatives that could explain the finding. For example, saccade latency and the variance in landing positions also vary across saccade directions. This can be interpreted for instance that there are variations in oculomotor noise across saccade directions, and maybe the oculomotor system seeks to minimize that noise in a free-choice task. In fact, given such a correlational result, many other alternative mechanisms are possible. While I think the authors' approach of systematically exploring what we can learn about saccade selection using pupil size is interesting, it would be important to know what exactly pupil size can add that was not previously known by simply analyzing saccade latency. For example, saccade latency anisotropies across saccade directions are well known, and the authors also show here that saccade costs are related to saccade latency. An important question would be to compare how pupil size and saccade latency uniquely contribute to saccade selection. That is, the authors could apply the exact same logic to their analysis by first determining how saccade latencies (or variations in saccade landing positions; see Greenwood et al., 2017 PNAS) vary across saccade directions and how this saccade latency map explains saccade selection in subsequent tasks. Is it more advantageous to use one or the other saccade metric, and how well does a saccade latency map correlate with a pupil size map?

      We thank the reviewer for the detailed comment. 1) The reviewer first points out the correlational nature of many of our results. Thereafter, 2), the reviewer asks whether saccade latencies and landing precision also predict saccade selection, and could be these potential predictors be considered alternative explanations to the idea of effort driving saccade selection? Moreover, what can pupil size add to what can be learned from saccade latency?

      In brief, although we report a combination of correlational and causal findings, we do not know of a more parsimonious explanation for our findings than “effort drives saccade selection”. Moreover, we demonstrate that oculomotor noise cannot be construed as an alternative explanation for our findings.

      (1) Correlational nature of many findings.

      We acknowledge that many of our findings are predominantly correlational in nature. In our first tasks, we correlated pupil size during saccade planning to saccade preferences in a subsequent task. Although the link between across tasks was correlational, the observed relationship clearly followed our previously specified directed hypothesis. Moreover, experiments 1 and 2 of the visual search data replicated and extended this relationship. We also directly manipulated cognitive demand in the second visual search experiment. In line with the hypothesis that effort affects saccade selection, participants executed less saccades overall when performing a (primary) auditory dual task, and even cut the costly saccades most – which actually constitutes causal evidence for our hypothesis. A minimal oculomotor noise account would not directly predict a reduction in saccade rate under higher cognitive demand. To summarize, we have a combination of correlational and causal findings, although mediators cannot be ruled out fully for the latter. That said, we do not know of a more fitting and parsimonious explanation for our findings than effort predicting saccade selection (see following points for saccade latencies). We now address causality in the discussion for transparency and point more explicitly to the second visual search experiment for causal evidence.

      “We report a combination of correlational and causal findings. Despite the correlational nature of some of our results, they consistently support the hypothesis that saccade costs predicts saccade selection [which we predicted previously, 33]. Causal evidence was provided by the dual-task experiment as saccade frequencies - and especially costly saccades were reduced under additional cognitive demand. Only a cost account predicts 1) a link between pupil size and saccade preferences, 2) a cardinal saccade bias, 3) reduced saccade frequency under additional cognitive demand, and 4) disproportional cutting of especially those directions associated with more pupil dilation. Together, our findings converge upon the conclusion that effort drives saccade selection.”

      (2) Do anisotropies in saccade latencies constitute an alternative explanation?

      First of all, we would like to to first stress that differences in saccade latencies are indeed thought to reflect oculomotor effort (Shadmehr et al., 2019; TINS). For example, saccades with larger amplitudes and saccades where distractors need to be ignored are associated with longer latencies. Therefore, even if saccade latencies would predict saccade selection, this would not contrast the idea that effort drives saccade selection. Instead, this would provide convergent evidence for our main novel conclusion: effort drives saccade selection. There are several reasons why pupil size can be used as a more general marker of effort (see responses to R2), but ultimately, our conclusions do not hinge on the employed measure of effort per se. As stressed above in 1), we see no equally parsimonious explanation besides the cost account. Moreover, we predicted this relationship in our previous publication before running the currently reported experiments and analyses (Koevoet et al., 2023). That said, we are open to discuss further alternative options and would be looking forward to test these accounts in future work against each other – we are welcoming the reviewers’ (but also the reader’s) suggestions.

      We now discuss this in the manuscript as follows:

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost.

      Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      Second, we followed the reviewer’s recommendation in testing whether other oculomotor metrics would predict saccade selection. To this end, we conducted a linear regression across directions. We calculated pupil size, saccade latencies, landing precision and peak velocities maps from the saccade planning task. We then used AICbased backward model selection to determine the ‘best’ model model to determine which factor would predict saccade selection best. The best model included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences ~ pupil size + saccade latency + landing precision). Pupil size (b \=-42.853, t \= 4.791, p < .001) and saccade latency (b \=-.377, t \= 2.106, p \= .043; see Author response image 1) predicted saccade preferences significantly. In contrast, landing precision did not reach significance (b \= 23.631, t \= 1.675, p \= .104). This analysis shows that although saccade latency also predicts saccade preferences, pupil size remains a robust predictor of saccade selection. These findings demonstrate that minimizing oculomotor noise cannot fully explain the pattern of results.

      Author response image 1.

      The relationship between saccade latency (from the saccade planning task) and saccade preferences averaged across participants. Individual points reflect directions and shading represents bootstrapped 95% confidence intervals.

      We have added this argument into the manuscript, and discuss the analysis in the discussion. Details of the analysis have been added to the Supporting Information for transparency and further detail.

      “A control analysis ruled out that the correlation between pupil size and saccade preferences was driven by other oculomotor metrics such as saccade latency and landing precision (see Supporting Information).”

      “To ascertain whether pupil size or other oculomotor metrics predict saccade preferences, we conducted a multiple regression analysis. We calculated average pupil size, saccade latency, landing precision and peak velocity maps across all 36 directions. The model, determined using AIC-based backward selection, included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences  pupil size + saccade latency + landing precision). The analysis re- vealed that pupil size (β = -42.853, t = 4.791, p < .001) and saccade latency (β = -.377, t = 2.106, p = .043) predicted saccade preferences. Landing precision did not reach significance (β = 23.631, t = 1.675, p = .104). Together, this demonstrates that although other oculomotor metrics such as saccade latency contribute to saccade selection, pupil size remains a robust marker of saccade selection.”

      In addition to eye-movement-related anisotropies across the visual field, there are of course many studies reporting visual field anisotropies (see Himmelberg, Winawer & Carrasco, 2023, Trends in Neuroscience for a review). It would be interesting to understand how the authors think about visual field anisotropies in the context of their own study. Do they think that their results are (in)dependent on such visual field variations (see Greenwood et al., 2017, PNAS; Ohl, Kroell, & Rolfs, 2024, JEP:Gen for a similar discussion)?

      We agree that established visual field anisotropies are fascinating to be discussed in context of our own results. At the reviewer’s suggestion, we now expanded this discussion.

      The observed anisotropies in terms of saccade costs are likely related to established anisotropies in perception and early visual cortex. However, the exact way that these anisotropies may be linked remains elusive (i.e. what is cause, what is effect, are links causal?), and more research is necessary to understand how these are related.

      “The observed differences in saccade costs across directions could be linked to established anisotropies in perception [80–86], attention [87–92], saccade charac- teristics [87, 88, 92, 93], and (early) visual cortex [94–98] [also see 99]. For example, downward saccades are more costly than upward saccades, which mimics a similar asymmetry in early visual areas wherein the upper visual field is relatively under- represented [94–98]; similarly stronger presaccadic benefits are found for down- compared with upward saccades [87, 88]. Moreover, upward saccades are more pre- cise than downward saccades [93]. Future work should elucidate where saccade cost or the aforementioned anisotropies originate from and how they are related - something that pupil size alone cannot address.”

      We also added that the finding that more precise saccades are coupled with worse performance in a crowding task might be attributed to the increased effort associated with more precise saccades (Greenwood et al., 2017).

      “Adaptive resource allocation from, and to the oculomotor system parsimoniously explains a number of empirical observations. For example, higher cognitive demand is accompanied by smooth pursuits deviating more from to-be tracked targets [137], reduced (micro)saccade frequencies [Figure 4; 63, 64, 138, 139], and slower peak saccade velocities [140–142]. Relatedly, more precise saccades are accompanied with worse performance in a crowding task [93].”

      Finally, the authors conclude that their results "suggests that the eye-movement system and other cognitive operations consume similar resources that are flexibly allocated among each other as cognitive demand changes. The authors should speculate what these similar resources could mean? What are the specific operations of the auditory task that overlap in terms of resources with the eye movement system?

      We agree that the nature of joint resources is an interesting question. Our previous discussion was likely too simplistic here (see also responses to R3). We here specifically refer to the cognitive resources that one can flexibly distribute between tasks.

      Our data do not directly speak to the question of what the shared resources between the auditory and oculomotor tasks are. Nevertheless, both tasks charge working memory as saccade targets are mandatorily encoded into working memory prior to saccade onset (Van der Stigchel & Hollingworth, 2018), and the counting task clearly engages working memory. This may indicate some domain-generality between visual and auditory working memory during natural viewing (see Nozari & Martin, 2024 for a recent review), but this remains speculative. Another possibility is that not the working memory encoding associated with saccades per se, but that the execution of overt motor actions itself also requires cognitive processing as suggested by Beatty (1982): “the organization of an overt motor act places additional demands on informationprocessing resources that are reflected in the task-evoked pupillary response”.

      We have added upon this in more detail in the results and discussion sections.

      “Besides the costs of increased neural activity when exerting more effort, effort should be considered costly for a second reason: Cognitive resources are limited. Therefore, any unnecessary resource expenditure reduces cognitive and behavioral flexibility [22, 31, 36, 116]. As a result, the brain needs to distribute resources between cognitive operations and the oculomotor system. We found evidence for the idea that such resource distribution is adaptive to the general level of cognitive demand and available resources: Increasing cognitive demand through an additional pri- mary auditory dual task led to a lower saccade frequency, and especially costly sac- cades were cut. In this case, it is important to consider that the auditory task was the primary task, which should cause participants to distribute resources from the ocu- lomotor system to the counting task. In other situations, more resources could be distributed to the oculomotor system instead, for example to discover new sources of reward [22, 136]. Adaptive resource allocation from, and to the oculomotor system parsimoniously explains a number of empirical observations. For example, higher cognitive demand is accompanied by smooth pursuits deviating more from to-be tracked targets [137], reduced (micro)saccade frequencies [Figure 4; 63, 64, 138, 139], and slower peak saccade velocities [140–142]. Relatedly, more precise saccades are accompanied with worse performance in a crowding task [93]. Furthermore, it has been proposed that saccade costs are weighed against other cognitive operations such as using working memory [33, 143–146]. How would the resources between the oculomotor system and cognitive tasks (like the auditory counting task) be related? One possibility is that both consume from limited working memory resources [147, 148]. Saccades are thought to encode target objects in a mandatory fashion into (vi- sual) working memory [79], and the counting task requires participants to keep track of the auditory stream and maintain count of the instructed digit in working mem- ory. However, the exact nature of which resources overlap between tasks remain open for future investigation [also see 149]. Together, we propose that cognitive re- sources are flexibly (dis)allocated to and from the oculomotor system based on the current demands to establish an optimal balance between performance and cost minimization.”

      Reviewer #2 (Public Review):

      The authors attempt to establish presaccadic pupil size as an index of 'saccade effort' and propose this index as one new predictor of saccade target selection. They only partially achieved their aim: When choosing between two saccade directions, the less costly direction, according to preceding pupil size, is preferred. However, the claim that with increased cognitive demand participants would especially cut costly directions is not supported by the data. I would have expected to see a negative correlation between saccade effort and saccade direction 'change' under increased load. Yet participants mostly cut upwards saccades, but not other directions that, according to pupil size, are equally or even more costly (e.g. oblique saccades).

      Strengths:

      The paper is well-written, easy to understand, and nicely illustrated.

      The sample size seems appropriate, and the data were collected and analyzed using solid and validated methodology.

      Overall, I find the topic of investigating factors that drive saccade choices highly interesting and relevant.

      We thank the reviewer for pointing out the strengths of our paper.

      Weaknesses:

      The authors obtain pupil size and saccade preference measures in two separate tasks. Relating these two measures is problematic because the computations that underly saccade preparation differ. In Experiment 1, the saccade is cued centrally, and has to be delayed until a "go-signal" is presented; In Experiment 2, an immediate saccade is executed to an exogenously cued peripheral target. The 'costs' in Experiment 1 (computing the saccade target location from a central cue; withholding the saccade) do not relate to Experiment 2. It is unfortunate, that measuring presaccadic pupil size directly in the comparatively more 'natural' Experiment 2 (where saccades did not have to be artificially withheld) does not seem to be possible. This questions the practical application of pupil size as an index of saccade effort

      This is an important point raised by the reviewer and we agree that a discussion on these points improves the manuscript. We reply in two parts: 1) Although the underlying computations during saccade preparation might differ, and are therefore unlikely to be fully similar (we agree), we can still predict saccade selection between (Saccade planning to Saccade preference) and within tasks (Visual search). 2) Pupil size is a sluggish physiological signal, but this is outweighed by the advantages of using pupil size as a general marker of effort, also in the context of visual selection compared with saccade latencies.

      (1) Are delayed saccades (cost task) and the much faster saccades (preference task) linked?

      As the reviewer notes the underlying ‘type’ of oculomotor program may differ between voluntarily delayed-saccades and those in the saccade preference task. There are, however, also considerable overlaps between the oculomotor programs as the directions and amplitudes are identical. Moreover, the different types of saccades have considerable overlap in their underlying neural circuitry. Nevertheless, the underlying oculomotor programs likely still differ in some regard. Even despite these differences, we were able to measure differences across directions in both tasks, and costs and preferences were negatively and highly correlated between tasks. The finding itself therefore indicates that the costs of saccades measured during the saccade planning task generalize to those in the saccade preference task. Note also that we predicted this finding and idea already in a previous publication before starting the present study (Koevoet et al., 2023).

      We now address this interesting point in the discussion as follows:

      “We observed that aOordable saccades were preferred over costly ones. This is especially remarkable given that the delayed saccades in the planning task likely differ in their oculomotor program from the immediate saccades in the preference task in some regard.”

      (2) Is pupil size a sensible measure of saccade effort?

      As the reviewer points out, the pupillary signal is indeed relatively sluggish and therefore relatively slow and more artifical tasks are preferred to quantify saccade costs. This does not preclude pupil size from being applied in more natural settings, as we demonstrate in the search experiments – but a lot of care has to be taken to control for many possible confounding factors and many trials will be needed.

      That said, as saccade latencies may also capture differences in oculomotor effort (Shadmehr et al., 2019) they are a possible alternative option to assess effort in some oculomotor tasks (see below on why saccade latencies do not provide evidence for an alternative to effort driving saccade selection, but converging evidence). Whilst we do maintain that pupil size is an established and versatile physiological marker of effort, saccade latencies provide converging evidence for our conclusion that effort drives saccade selection.

      As for the saccade preference task, we are not able to analyze the data in a similar manner as in the visual search task for two reasons. First, the number of saccades is much lower than in the natural search experiments. Second, in the saccade preference task, there were always two possible saccade targets. Therefore, even if we were able to isolate an effort signal, this signal could index a multitude of factors such as deciding between two possible saccade targets. Even simple binary decisions go hand in hand with reliable pupil dilations as they require effort (e.g. de Gee et al., 2014).

      There are three major reasons why pupil size is a more versatile marker of saccade costs than saccade latencies (although as mentioned, latencies may constitute another valuable tool to study oculomotor effort). First, pupil size is able to quantify the cost of attentional shifts more generally, including covert attention as well as other effector systems such as head and hand movements. This circumvents the issue of different latencies of different effector systems and also allows to study attentional processes that are not associated with overt motor movements. Second, saccade latencies are difficult to interpret in natural viewing data, as fixation duration and saccade latencies are inherently confounded by one another. This makes it very difficult to separate oculomotor processes and the extraction of perceptual information from a fixated target. Thus, pupil size is a versatile marker of attentional costs in a variety of settings, and can measure costs that saccade latencies cannot (i.e. covert attention). Lastly, pupil size is highly established as a marker of effort which has been demonstrated across wide range of cognitive tasks and therefore not bound to eye movements alone (Bumke, 1911; Koevoet et al., 2024; Laeng et al., 2012; Loewenfeld, 1958; Mathôt, 2018; Robison & Unsworth, 2019; Sirois & Brisson, 2014; Strauch et al., 2022; van der Wel & van Steenbergen, 2018).

      We now discuss this as follows:

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost. Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      The authors claim that the observed direction-specific 'saccade costs' obtained in Experiment 1 "were not mediated by differences in saccade properties, such as duration, amplitude, peak velocity, and landing precision (Figure 1e,f)". Saccade latency, however, was not taken into account here but is discussed for Experiment 2.

      The final model that was used to test for the observed anisotropies in pupil size across directions indeed did not include saccade latencies as a predictor. However, we did consider saccade latencies as a potential predictor originally. As we performed AICbased backward model selection, however, this predictor was removed due to the marginal predictive contribution of saccade latency beyond other predictors explaining pupil size.

      For completeness, we here report the outcome of a linear mixed-effects that does include saccade latency as a predictor. Here, saccade latencies did not predict pupil size (b \= 1.859e-03, t \= .138, p \= .889). The asymmetry effects remained qualitatively unchanged: preparing oblique compared with cardinal saccades resulted in a larger pupil size (b \= 7.635, t \= 3.969, p < .001), and preparing downward compared with upward saccades also led to a larger pupil size (b \= 3.344, t \= 3.334, p \= .003).

      The apparent similarity of saccade latencies and pupil size, however, is striking. Previous work shows shorter latencies for cardinal than oblique saccades, and shorter latencies for horizontal and upward saccades than downward saccades - directly reflecting the pupil sizes obtained in Experiment 1 as well as in the authors' previous study (Koevoet et al., 2023, PsychScience).

      As the reviewer notes, there are substantial asymmetries across the visual field in saccade latencies. These assymetries in saccade latency could also predict saccade preferences. We will reply to this in three points: 1) even if saccade latency is a predictor of saccade preferences, this would not constitute as an alternative explanation to the conclusion of effort driving saccade selection, 2) saccade latencies show an up-down asymmetry but oblique-cardinal effects in latency may not be generalizable across saccade tasks, 3) pupil size remains a robust predictor of saccade preferences even when saccade latencies are considered as a predictor of saccade preferences.

      (1) We want to first stress that saccade latencies are thought to reflect oculomotor effort (Shadmehr et al., 2019). For example, saccades with larger amplitudes and saccades where distractors need to be ignored are associated with longer latencies. Therefore, even if saccade latencies predict saccade selection, this would not contrast the idea that effort drives saccade selection. Instead, this would provide convergent evidence for our main conclusion – effort predicting saccade selection (rather than pupil size predicting saccade selection per se).

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost. Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      (2) We first tested anisotropies in saccade latency in the saccade planning task (Wilkinson notation: latency ~ obliqueness + updownness + leftrightness + saccade duration + saccade amplitude + saccade velocity + landing error + (1+obliqueness + updownness|participant)). We found upward latencies to be shorter than downward saccade latencies (b \= -.535, t \= 3.421, p \= .003). In addition, oblique saccades showed shorter latencies than cardinal saccades (b \= -1.083, t \= 3.096, p \= .002) – the opposite of what previous work has demonstrated.

      We then also tested these latency anisotropies in another dataset wherein participants (n \= 20) saccaded toward a single peripheral target as fast as possible (Koevoet et al., submitted; same amplitude and eccentricity as in the present manuscript). There we did not find a difference in saccade latency between cardinal and oblique targets, but we did observe shorter latencies for up- compared with downward saccades. We are therefore not sure in which situations oblique saccades do, or do not differ from cardinal saccades in terms of latency, and even in which direction the effect occurs.

      In contrast, we have now demonstrated a larger pupil size prior to oblique compared with cardinal saccades in two experiments. This indicates that pupil size may be a more reliable and generalizable marker of saccade costs than saccade latency. However, this remains to be investigated further.

      (3) To gain further insights into which oculomotor metrics would predict saccade selection, we conducted a linear regression across directions. We created pupil size, saccade latencies, landing precision and peak velocities maps from the saccade planning task. We then used AIC-based model selection to determine the ‘best’ model to determine which factor would predict saccade selection best. The selected model included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences ~ pupil size + saccade latency + landing precision). Pupil size (b \=-42.853, t \= 4.791, p < .001) and saccade latency (b \=-.377, t \= 2.106, p \= .043) predicted saccade preferences significantly. In contrast, landing precision did not reach significance (b \= 23.631, t \= 1.675, p \= .104). This analysis shows that although saccade latency predicts saccade preferences, pupil size remains a robust predictor of saccade selection.

      “To ascertain whether pupil size or other oculomotor metrics predict saccade preferences, we conducted a multiple regression analysis. We calculated average pupil size, saccade latency, landing precision and peak velocity maps across all 36 directions. The model, determined using AIC-based backward selection, included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences  pupil size + saccade latency + landing precision). The analysis re- vealed that pupil size (β = -42.853, t = 4.791, p < .001) and saccade latency (β = -.377, t = 2.106, p = .043) predicted saccade preferences. Landing precision did not reach significance (β = 23.631, t = 1.675, p = .104). Together, this demonstrates that although other oculomotor metrics such as saccade latency contribute to saccade selection, pupil size remains a robust marker of saccade selection.”

      The authors state that "from a costs-perspective, it should be eOicient to not only adjust the number of saccades (non-specific), but also by cutting especially expensive directions the most (specific)". However, saccade targets should be selected based on the maximum expected information gain. If cognitive load increases (due to an additional task) an effective strategy seems to be to perform less - but still meaningful - saccades. How would it help natural orienting to selectively cut saccades in certain (effortful) directions? Choosing saccade targets based on comfort, over information gain, would result in overall more saccades to be made - which is non-optimal, also from a cost perspective.

      We thank the reviewer for this comment. Although we do not fully agree, the logic is quite close to our rationale and it is worth adding a point of discussion here. A vital part of the current interpretation is the instruction given to participants. In our second natural visual search task, participants were performing a dual task, where the auditory task was the primary task, whilst the search task was secondary. Therefore, participants are likely to adjust their resources to optimize performance on the primary task – at the expense of the secondary task. Therefore, less resources are made available and used to searching in the dual than in the single task, because these resources are needed for the auditory task. Cutting expensive directions does not help search in terms of search performance, but it does reduce the cost of search, so that more resources are available for the prioritized auditory task. Also note that the search task was rather difficult – participants did it, but it was tough (see the original description of the dataset for more details), which provides another reason to go full in on the auditory task at expense of the visual task. This, however, opens up a nice point of discussion: If one would emphasize the importance of search (maybe with punishment or reward), we would indeed expect participants to perform whichever eye movements are getting them to their goal fastest – thus reducing the relative influence of costs on saccade behavior. This remains to be tested however - we are working on this and are looking forward to discussing such findings in the future.

      Together, we propose that there is a trade-off between distributing resources either towards cognitive tasks or the oculomotor system (also see Ballard et al., 1995; Van der Stigchel, 2020). How these resources are distributed depends highly on the current task demands (also see Sahakian et al., 2023). This allows for adaptive behavior in a wide range of contexts.

      We now added these considerations to the manuscript as follows (also see our previous replies):

      “Do cognitive operations and eye movements consume from a similar pool of resources [44]? If so, increasing cognitive demand for non-oculomotor processes should result in decreasing available resources for the oculomotor system. In line with this idea, previous work indeed shows altered eye-movement behavior un- der effort as induced by dual tasks, for example by making less saccades under increased cognitive demand [62–64]. We therefore investigated whether less sac- cades were made as soon as participants had to count the occurrence of a specific digit in the auditory number stream in comparison to ignoring the stream (in Exp. 2; Figure 4a). Participants were instructed to prioritize the auditory digit-counting task over finding the visual search target. Therefore, resources should be shifted from the oculomotor system to the primary auditory counting task. The additional cognitive demand of the dual task indeed led to a decreased saccade frequency (t(24) = 7.224, p < .001, Cohen’s d = 1.445; Figure 4h).”

      I would have expected to see a negative correlation between saccade effort and saccade direction 'change' under increased load. Yet participants mostly cut upwards saccades, but not other directions that, according to pupil size, are equally or even more costly (e.g. oblique saccades).

      The reviewer’s point is taken from the initial comment, which we will address here. First, we’d like to point out that is it not established that saccade costs in different directions are always the same. Instead, it is possible that saccade costs could be different in natural viewing compared with our delayed-saccade task. Therefore, we used pupil size during natural viewing for the search experiments. Second, the reviewer correctly notes that oblique saccades are hardly cut when under additional cognitive demand. However, participants already hardly execute oblique saccades when not confronted with the additional auditory task (Figure 4b, d), making it difficult to reduce those further (i.e. floor effect). Participants chose to cut vertical saccades, possibly because these are more costly than horizontal saccades.

      We incorporated these point in our manuscript as follows:

      “To test this, we analyzed data from two existing datasets [63] wherein participants (total n = 41) searched for small targets (’Z’ or ’H’) in natural scenes (Figure 4a; [64]). Again, we tested whether pupil size prior to saccades negatively linked with saccade preferences across directions. Because saccade costs and preferences across directions could differ for different situations (i.e. natural viewing vs. saccade preference task), but should always be negatively linked, we established both cost and preferences independently in each dataset.”

      “We calculated a saccade-adjustment map (Figure 4g) by subtracting the saccade preference map in the single task (Figure 4f) from the dual task map (Fig- ure 4d). Participants seemingly cut vertical saccades in particular, and made more saccades to the top right direction. This pattern may have emerged as vertical saccades are more costly than horizontal saccades (also see Figure 1d). Oblique saccades may not have been cut because there were very little oblique saccades in the single condition to begin with (Figure 4d), making it difficult to observe a further reduction of such saccades under additional cognitive demand (i.e. a floor effect).”

      Overall, I am not sure what practical relevance the relation between pupil size (measured in a separate experiment) and saccade decisions has for eye movement research/vision science. Pupil size does not seem to be a straightforward measure of saccade effort. Saccade latency, instead, can be easily extracted in any eye movement experiment (no need to conduct a separate, delayed saccade task to measure pupil dilation), and seems to be an equally good index.

      There are two points here.

      (1) What is the practical relevance of a link between effort and saccade selection for eyemovement research and vision science?

      We see plenty – think of changing eye movement patterns under effort (be it smooth pursuits, saccade rates, distributions of gaze positions to images etc.) which have substantial implications for human factors research, but also neuropsychology. With a cost account, one may predict (rather than just observe) how eye movement changes as soon as resources are reduced/ non-visual demand increases. With a cost account, we can explain such effects (e.g. lower saccade rates under effort, cardinal bias, perhaps also central bias) parsimoniously that cannot be explained by what is so far referred to as the three core drivers of eye movement behavior (saliency, selection history, goals, e.g., Awh et al., 2012). Conversely, one must wonder why eye-movement research/vision science simply accepts/dismisses these phenomena as such, without seeking overarching explanations.

      (2) What is the usefulness of using pupil size to measure effort?

      We hope that our replies to the comments above illustrate why pupil size is a sensible, robust and versatile marker of attentional costs. We briefly summarize our most important points here.

      - Pupil size is an established measure of effort irrespective of context, as demonstrated by hundreds of original works (e.g. working memory load, multiple object tracking, individual differences in cognitive ability). This allows pupil size to be a versatile marker of the effort, and therefore costs, of non-saccadic attentional shifts such as covert attention or those realized by other effector systems (i.e. head or hand movements).

      - Our new analysis indicates that pupil size remains a strong and robust predictor of saccade preference, even when considering saccade latency.

      - Pupil size allows to study saccade costs in natural viewing. In contrast, saccade latencies are difficult to assess in natural viewing as fixation durations and saccade latencies are intrinsically linked and very difficult to disentangle.

      - Note however, that we think that it is interesting and useful so study effects of effort/cost on eye movement behavior. Whichever index is used to do so, we see plenty potential in this line of research, this paper is a starting point to do so.

      Reviewer #3 (Public Review):

      This manuscript extends previous research by this group by relating variation in pupil size to the endpoints of saccades produced by human participants under various conditions including trial-based choices between pairs of spots and search for small items in natural scenes. Based on the premise that pupil size is a reliable proxy of "effort", the authors conclude that less costly saccade targets are preferred. Finding that this preference was influenced by the performance of a non-visual, attentiondemanding task, the authors conclude that a common source of effort animates gaze behavior and other cognitive tasks.

      Strengths:

      Strengths of the manuscript include the novelty of the approach, the clarity of the findings, and the community interest in the problem.

      We thank the reviewer for pointing out the strengths of our paper.

      Weaknesses:

      Enthusiasm for this manuscript is reduced by the following weaknesses:

      (1) A relationship between pupil size and saccade production seems clear based on the authors' previous and current work. What is at issue is the interpretation. The authors test one, preferred hypothesis, and the narrative of the manuscript treats the hypothesis that pupil size is a proxy of effort as beyond dispute or question. The stated elements of their argument seem to go like this:

      PROPOSITION 1: Pupil size varies systematically across task conditions, being larger when tasks are more demanding.

      PROPOSITION 2: Pupil size is related to the locus coeruleus.

      PROPOSITION 3: The locus coeruleus NE system modulates neural activity and interactions.

      CONCLUSION: Therefore, pupil size indexes the resource demand or "effort" associated with task conditions.

      How the conclusion follows from the propositions is not self-evident. Proposition 3, in particular, fails to establish the link that is supposed to lead to the conclusion.

      We inadvertently laid out this rationale as described above, and we thank the reviewer for pointing out this initial suboptimal structure of argumentation. The notion that the link between pupil size and effort is established in the literature because of its neural underpinnings is inaccurate. Instead, the tight link between effort and pupil size is established based on covariations of pupil diameter and cognition across a wide variety of tasks and domains. In line with this, we now introduce this tight link predominantly based on the relationships between pupil size and cognition instead of focusing on putative neural correlates of this relationship.

      As reviewed previously (Beatty, 1982; Bumke, 1911; Kahneman, 1973; Kahneman & Beatty, 1966; Koevoet et al., 2024; Laeng et al., 2012; Mathôt, 2018; Sirois & Brisson, 2014; Strauch et al., 2022; van der Wel & van Steenbergen, 2018), any increase in effort is consistently associated with an increase in pupil size. For instance, the pupil dilates when increasing load in working memory or multiple object tracking tasks, and such pupillary effects robustly explain individual differences in cognitive ability and fluctuations in performance across trials (Alnæs et al., 2014; Koevoet et al., 2024; Robison & Brewer, 2020; Robison & Unsworth, 2019; Unsworth & Miller, 2021). This extends to the planning of movements as pupil dilations are observed prior to the execution of (eye) movements (Koevoet et al., 2023; Richer & Beatty, 1985). The link between pupil size and effort has thus been firmly established for a long time, irrespective of the neural correlates of these effort-linked pupil size changes.

      We again thank the reviewer for spotting this logical mistake, and now revised the paragraph where we introduce pupil size as an established marker of effort as follows:

      “We recently demonstrated that the effort of saccade planning can be measured with pupil size, which allows for a physiological quantification of saccade costs as long as low-level visual factors are controlled for [33]. Pupil size is an established marker of effort [36–44]. For instance, loading more in working memory or tracking more objects results in stronger pupil dilation [44–52]. Pupil size not only reflects cognitive (or mental) effort but also the effort of planning and executing movements [37, 53, 54]. We leveraged this to demonstrate that saccade costs can be captured with pupil size, and are higher for oblique compared with cardinal directions [33]. Here, we addressed whether saccade costs predict where to saccade.”

      We now mention the neural correlates of pupil size only in the discussion. Where we took care to also mention roles for other neurotransmitter systems:

      “Throughout this paper, we have used cost in the limited context of saccades.

      However, cost-based decision-making may be a more general property of the brain [31, 36, 114–116]. Every action, be it physical or cognitive, is associated with an in- trinsic cost, and pupil size is likely a general marker of this [44]. Note, however, that pupil dilation does not always reflect cost, as the pupil dilates in response to many sensory and cognitive factors which should be controlled for, or at least considered, when interpreting pupillometric data [e.g., see 39, 40, 42, 117]. Effort-linked pupil dilations are thought to be, at least in part, driven by activity in the brainstem locus coeruleus (LC) [40, 118–120] [but other neurotransmitters also affect pupil size, e.g. 121, 122]. Activity in LC with its widespread connections throughout the brain [120, 123–127] is considered to be crucial for the communication within and between neu- ral populations and modulates global neural gain [128–132]. Neural firing is costly [22, 133], and therefore LC activity and pupil size are (neuro)physiologically plausible markers of cost [40]. Tentative evidence even suggests that continued exertion of effort (accompanied by altered pupil dilation) is linked to the accumulation of glutamate in the lateral prefrontal cortex [134], which may be a metabolic marker of cost [also see 116, 134, 135]. “

      (2) The authors test one, preferred hypothesis and do not consider plausible alternatives. Is "cost" the only conceivable hypothesis? The hypothesis is framed in very narrow terms. For example, the cholinergic and dopamine systems that have been featured in other researchers' consideration of pupil size modulation are missing here. Thus, because the authors do not rule out plausible alternative hypotheses, the logical structure of this manuscript can be criticized as committing the fallacy of aOirming the consequent.

      As we have noted in the response to the reviewer’s first point, we did not motivate our use of pupil size as an index of effort clearly enough. For the current purpose, the neural correlates of pupil size are less relevant than the cognitive correlates (see previous point). We reiterate that the neuromodulatory underpinnings of the observed pupil size effects (which indeed possibly include effects of the cholinergic, dopaminergic and serotonergic systems), while interesting for the discussion on the neural origin of effects, are not crucial to our conclusion. We hope the new rationale (without focusing too much on the (irrelevant) exact neural underpinnings) convinces the reviewer and reader.

      Our changes to the manuscript are shown in our reply to the previous comment.

      The reviewer notes that other plausible alternative hypotheses could explain the currently reported results. However, we did not find a more parsimonuous explanation for our data than ‘Effort Drives Saccade Selection’. Effort explains why participants prefer saccading toward specific directions in (1) highly controlled and (2) more natural settings. Note that we also predicted this effect previously (Koevoet et al., 2023). Moreover, this account explains (3) why participants make less saccades under additional cognitive demand, and (4) why especially costly saccades are reduced under additional cognitive demand. We are very open to the reviewer presenting other possible interpretations of our data so these can be discussed to be put to test in future work.

      (3) The authors cite particular publications in support of the claim that saccade selection is influenced by an assessment of effort. Given the extensive work by others on this general topic, the skeptic could regard the theoretical perspective of this manuscript as too impoverished. Their work may be enhanced by consideration of other work on this general topic, e.g, (i) Shenhav A, Botvinick MM, Cohen JD. (2013) The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron. 2013 Jul 24;79(2):217-40. (ii) Müller T, Husain M, Apps MAJ. (2022) Preferences for seeking effort or reward information bias the willingness to work. Sci Rep. 2022 Nov 14;12(1):19486. (iii) Bustamante LA, Oshinowo T, Lee JR, Tong E, Burton AR, Shenhav A, Cohen JD, Daw ND. (2023) Effort Foraging Task reveals a positive correlation between individual differences in the cost of cognitive and physical effort in humans. Proc Natl Acad Sci U S A. 2023 Dec 12;120(50):e2221510120.

      We thank the reviewer for pointing us toward this literature. These papers are indeed relevant for our manuscript, and we have now incorporated them. Specifically, we now discuss how the costs of effort are weighed in relation to possible rewards during decision-making. We have also incorporated work that has investigated how the biomechanical costs of arm movements contribute to action selection.

      “Our findings are in line with established effort-based models that assume costs to be weighed against rewards during decision-making [102–107]. In such studies, reward and cognitive/physical effort are often parametrically manipulated to as- sess how much effort participants are willing to exert to acquire a given (monetary) reward [e.g. 108, 109]. Whereas this line of work manipulated the extrinsic costs and/or rewards of decision options (e.g. perceptual consequences of saccades [110, 111] or consequences associated with decision options), we here focus on the intrin- sic costs of the movement itself (in terms of cognitive and physical effort). Relatedly, the intrinsic costs of arm movements are also considered during decision-making: biomechanically aOordable movements are generally preferred over more costly ones [26–28]. We here extend these findings in two important ways. First, until now, the intrinsic costs of saccades and other movements have been inferred from gaze behavior itself or by using computational modelling [23, 25–28, 34, 35, 112]. In con- trast, we directly measured cost physiologically using pupil size. Secondly, we show that physiologically measured saccade costs predict where saccades are directed in a controlled binary preference task, and even during natural viewing. Our findings could unite state-of-the-art computational models [e.g. 23, 25, 34, 35, 113] with physiological data, to directly test the role of saccade costs and ultimately further our understanding of saccade selection.”

      (4) What is the source of cost in saccade production? What is the currency of that cost? The authors state (page 13), "... oblique saccades require more complex oculomotor programs than horizontal eye movements because more neuronal populations in the superior colliculus (SC) and frontal eye fields (FEF) [76-79], and more muscles are necessary to plan and execute the saccade [76, 80, 81]." This statement raises questions and concerns. First, the basis of the claim that more neurons in FEF and SC are needed for oblique versus cardinal saccades is not established in any of the publications cited. Second, the authors may be referring to the fact that oblique saccades require coordination between pontine and midbrain circuits. This must be clarified. Second, the cost is unlikely to originate in extraocular muscle fatigue because the muscle fibers are so different from skeletal muscles, being fundamentally less fatigable. Third, if net muscle contraction is the cost, then why are upward saccades, which require the eyelid, not more expensive than downward? Thus, just how some saccades are more effortful than others is not clear.

      Unfortunately, our current data do not allow for the specification of what the source is of differences in saccade production, nor what the currency is. We want to explicitly state that while pupil size is a sensitive measure of saccade costs, pupil size cannot directly inform what underlying mechanisms are causing differences in saccade costs across conditions (e.g. directions). Nevertheless, we do speculate about these issues because they are important to consider. We thank the reviewer for pointing out the shortcomings in our initial speculations.

      Broadly, we agree with the reviewer that a neural source of differences in costs between different types of saccades is more likely than a purely muscular account (also see Koevoet et al., 2023). Furthermore, we think that the observed differences in saccade costs for oblique vs. cardinal and up vs. down could be due to different underlying mechanisms. While we caution against overinterpreting single directions, tentative evidence for this may also be drawn by the different time course of effects for up/down versus cardinal/oblique, Figure 1c.

      Below we speculate about why some specific saccade directions may be more costly than others:

      Why would oblique saccades be more costly than cardinal saccades? We thank the reviewer for pointing out that oblique saccades additionally require coordination between pontine and midbrain circuits (Curthoys et al., 1984; King & Fuchs, 1979; Sparks, 2002). This point warrants more revised discussion compared to our initial version. We have incorporated this as follows:

      “The complexity of an oculomotor program is arguably shaped by its neural underpinnings. For example, oblique but not cardinal saccades require communication between pontine and midbrain circuits [73–75]. Such differences in neural complexity may underlie the additional costs of oblique compared with cardinal saccades. Besides saccade direction, other properties of the ensuing saccade such as its speed, distance, curvature, and accuracy may contribute to a saccade’s total cost [22, 33, 53, 76, 77] but this remains to be investigated directly.”

      Why would downward saccades be more costly than upward saccades? As the reviewer points out: from a net muscular contraction account of cost, one would expect the opposite pattern due to the movement of the eyelid. Instead, we speculate that our findings may be associated with the well-established anisotropy in early visual cortex along the vertical meridian. Specifically, the upper vertical meridian is represented at substantially less detail than the lower vertical meridian (Himmelberg et al., 2023; Silva et al., 2018). Prior to a saccade, attention is deployed towards the intended saccadic endpoint (Deubel & Schneider, 1996; Kowler et al., 1995). Attention tunes neurons to preferentially process the attended location over non-attended locations. Due to the fact that the lower visual field is represented at higher detail than the upper visual field, attention may tune neuronal responses differently when preparing up- compared with downward saccades (Hanning et al., 2024; Himmelberg et al., 2023). Thus, it may be more costly to prepare down- compared with upward saccades. This proposition, however, does not account for the lower costs associated horizontal compared with up- and downward saccades as the horizontal meridian is represented at a higher acuity than the vertical merdian. This makes it unlikely that this explains the pattern of results completely. Again, at this point we can only speculate why costs differ, yet we demonstrate that these differences in cost are decisive for oculomotor behavior. We now explicitly state the speculative nature of these ideas that would all need to be tested directly.

      We have updated our discussion of this issue as follows:

      “The observed differences in saccade costs across directions could be linked to established anisotropies in perception [80–86], attention [87–92], saccade charac- teristics [87, 88, 92, 93], and (early) visual cortex [94–98] [also see 99]. For example, downward saccades are more costly than upward saccades, which mimics a similar asymmetry in early visual areas wherein the upper visual field is relatively under- represented [94–98]; similarly stronger presaccadic benefits are found for down- compared with upward saccades [87, 88]. Moreover, upward saccades are more pre- cise than downward saccades [93]. Future work should elucidate where saccade cost or the aforementioned anisotropies originate from and how they are related - something that pupil size alone cannot address.”

      (5) The authors do not consider observations about variation in pupil size that seem to be incompatible with the preferred hypothesis. For example, at least two studies have described systematically larger pupil dilation associated with faster relative to accurate performance in manual and saccade tasks (e.g., Naber M, Murphy P. Pupillometric investigation into the speed-accuracy trade-off in a visuo-motor aiming task. Psychophysiology. 2020 Mar;57(3):e13499; Reppert TR, Heitz RP, Schall JD. Neural mechanisms for executive control of speed-accuracy trade-off. Cell Rep. 2023 Nov 28;42(11):113422). Is the fast relative to the accurate option necessarily more costly?

      We thank the reviewer for this interesting point that we will answer in two ways. First, we discuss the main point: the link between pupil size, effort, and cost. Second, we discuss the findings described specifically in these two papers and how we interpret these from a pupillometric account.

      First, one may generally ask whether 1) any effort results in pupil dilation, 2) whether any effort is costly, and 3) whether this means that pupil dilation always reflects effort and cost respectively. Indeed, it has been argued repeatedly, prominently, and independently (e.g., Bumke, 1911; Mathôt, 2018) that any change in effort (no matter the specific origin) is associated with an evoked pupil dilation. Effort, in turn, is consistently and widely experienced as aversive, both across tasks and cultures (David et al., 2024). Effort minimization may therefore be seen as an universal law of human cognition and behavior with effort as a to-be minimized cost (Shadmehr et al., 2019; Hull 1943, Tsai 1932). However, this does not imply that any pupil dilation necessarily reflects effort or that, as a consequence thereof, any pupil dilation is always signaling cost. For instance, the pupil dark response, the pupil far response and changes in baseline pupil size are not associated with effort. Baseline and task-evoked pupil dilation responses have to be interpreted differently (see below), moreover, the pupil also changes (and dilates) due to other factors (see Strauch et al., 2022; Mathôt, 2018, Bumke 1911, Loewenfeld, 1999 for reviews).

      Second, as for Naber & Murphy (2020) & Reppert at al. (2023) specifically: Both Reppert et al. (2023) and Naber & Murphy (2020) indeed demonstrate a larger baseline pupil size when participants made faster, less accurate responses. However, baseline pupil size is not an index of effort per-se, but task-evoked pupil dilation responses are (as studied in the present manuscript) (Strauch et al., 2022). For work on differences between baseline pupil diameter and task-evoked pupil responses, and their respective links with exploration and exploitation please see Jepma & Nieuwenhuis (2011). Indeed, the link between effort and larger pupil size holds for task evoked responses, but not baseline pupil size per se (also see Koevoet et al., 2023).

      Still, Naber (third author of the current paper) & Murphy (2020) also demonstrated larger task-evoked pupil dilation responses when participants were instructed to make faster, less accurate responses compared with making accurate and relatively slow responses. However, this difference in task-evoked response gains significance only after the onset of the movement itself, and peaks substantially later than response offset. Whilst pupil dilation may be sluggish, it isn’t extremely sluggish either. As feedback to the performance of the participant was displayed 1.25s after performing the movement and clicking (taking about 630ms), we deem it possible that this effect may in part result from appraising the feedback to the participant rather than the speed of the response itself (in fact, Naber and Murphy also discuss this option). In addition to not measuring saccades but mouse movements, it is therefore possible that the observed evoked pupil effects in Naber & Murphy (2020) are not purely linked to motor preparation and execution per se. Therefore, future work that aims to investigate the costs of movements should isolate the effects of feedback and other potential factors that may drive changes in pupil size. This will help clarify whether fast or more accurate movements could be linked to the underlying costs of the movements.

      Relatedly, we do not find evidence that pupil size during saccade planning predicts the onset latency of the ensuing saccade (please refer to our second response to Reviewer 2 for a detailed discussion).

      Together, we therefore do not see the results from Reppert et al. (2023) and Naber & Murphy (2020) to be at odds with our interpretation of evoked pupil size reflecting effort and cost in the context of planning saccades.

      We think that these are considerations important to the reader, which is why we now added them to the discussion as follows:

      “Throughout this paper, we have used cost in the limited context of saccades.

      However, cost-based decision-making may be a more general property of the brain [31, 36, 114–116]. Every action, be it physical or cognitive, is associated with an in- trinsic cost, and pupil size is likely a general marker of this [44]. Note, however, that pupil dilation does not always reflect cost, as the pupil dilates in response to many sensory and cognitive factors which should be controlled for, or at least considered, when interpreting pupillometric data [e.g., see 39, 40, 42, 117].”

      (6) The authors draw conclusions based on trends across participants, but they should be more transparent about variation that contradicts these trends. In Figures 3 and 4 we see many participants producing behavior unlike most others. Who are they? Why do they look so different? Is it just noise, or do different participants adopt different policies?

      We disagree with the transparency point of the reviewer. Note that we deviated from the norm here by being more transparent than common: we added individual data points and relationships rather than showing pooled effects across participants with error bars alone (see Figures 2c, 3b,c, 4c,e,f).

      Moreover, our effects are consistent and stable across participants and are highly significant. To illustrate, for the classification analysis based on cost (Figure 2E) 16/20 participants showed an effect. As for the natural viewing experiments (total > 250,000 fixations), we also find that a majority of participants show the observed effects: Experiment 1: 15/16 participants; Experiment 2: 16/25 participants; Experiment 2 – adjustment: 22/25 participants.

      We fully agree that it’s interesting to understand where interindividual variation may originate from. We currently have too little data to allow robust analyses across individuals and zooming in on individual differences in cost maps, preference maps, or potential personalized strategies of saccade selection. That said, future work could study this further. We would recommend to hereby reduce the number of directions to gain more pupil size data per direction and therefore cleaner signals that may be more informative on the individual level. With such stronger signals, studying (differences in) links on an individual level may be feasible and would be interesting to consider – and will be a future direction in our own work too. Nonetheless, we again stress that the reported effects are robust and consistent across participants, and that interindividual differences are therefore not extensive. Moreover, our results from four experiments consistently support our conclusion that effort drives saccade selection.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      - Based on the public review, I would recommend that the authors carefully review and correct the manuscript with regard to the causal conclusions. The study is largely correlational (i.e. the pupil was only observed, not manipulated) and therefore does not allow causal conclusions to be drawn about the relationship between pupil size and saccade selection. These causal conclusions become even more confusing when pupil size is equated with effort and saccade cost. As a consequence, an actual correlation between pupil size and saccade selection has led to the title that effort drives saccade selection. It would also be helpful for the reader to summarize in an additional section of the discussion what they consider to be a causal or correlational link based on their results.

      We agree with the reviewer, and we have indeed included more explicitly which findings are correlational and which causal in detail now. As outlined before we do not see a more parimanious explanation for our findings than our title, but we fully agree that the paper benefits from making the correlational/causal nature of evidence for this idea explicitly transparent.

      “We report a combination of correlational and causal findings. Despite the correlational nature of some of our results, they consistently support the hypothesis that saccade costs predicts saccade selection [which we predicted previously, 33]. Causal evidence was provided by the dual-task experiment as saccade frequencies - and especially costly saccades were reduced under additional cognitive demand. Only a cost account predicts 1) a link between pupil size and saccade preferences, 2) a cardinal saccade bias, 3) reduced saccade frequency under additional cognitive demand, and 4) disproportional cutting of especially those directions associated with more pupil dilation. Together, our findings converge upon the conclusion that effort drives saccade selection.”

      - Can the authors please elaborate in more detail on how they transformed the predictors of their linear mixed model for the visualization in Figure 1f? It is difficult to see how the coeOicients in the table and the figure match.

      We used the ‘effectsize’ package to provide effect sizes of for each predictor of the linear mixed-effects model (https://cran.r-project.org/web/packages/effectsize/index.html). We report absolute effect sizes to make it visually easier to compare different predictors. These details have now been included in the Methods section to be more transparent about how these effect sizes were computed.

      “Absolute effect sizes (i.e. r) and their corresponding 95% confidence intervals for the linear mixed-effects models were calculated using t and df values with the ’effectsize’ package (v0.8.8) in R.”

      - Could the authors please explain in more detail why they think that a trial-by-trial analysis in the free choice task adds something new to their conclusions? In fact, a trialby-trial analysis somehow suggests that the pupil size data would enter the analysis at a single trial level. If I understand correctly, the pupil size data come from their initial mapping task. So there is only one mean pupil size for a given participant and direction that goes into their analysis to predict free choice in a single trial. If this is the case, I don't see the point of doing this additional analysis given the results shown in Figure 2c.

      The reviewer understands correctly that pupil size data is taken from the initial mapping task. We then used these mean values to predict which saccade target would be selected on a trial-by-trial basis. While showing the same conceptual result as the correlation analysis, we opted to include this analysis to show the robustness of the results across individuals. Therefore we have chosen to keep the analysis in the manuscript but now write more clearly that this shows the same conceptual finding as the correlation analysis.

      “As another test of the robustness of the effect, we analyzed whether saccade costs predicted saccade selection on a trial-by-trial basis. To this end, we first determined the more aOordable option for each trial using the established saccade cost map (Figure 1d). We predicted that participants would select the more aOordable option. Complementing the above analyses, the more aOordable option was chosen above chance level across participants (M = 56.64%, 95%-CI = [52.75%-60.52%], one-sample t-test against 50%: t(19) = 3.26, p = .004, Cohen’s d = .729; Figure 2e). Together, these analyses established that saccade costs robustly predict saccade preferences.”

      Reviewer #2 (Recommendations For The Authors):

      The authors report that "Whenever the difference in pupil size between the two options was larger, saccades curved away more from the non-selected option (β = .004, SE = .001, t = 4.448, p < .001; Figure 3b), and their latencies slowed (β = .050, SE = .013, t = 4.323, p < .001; Figure 3c)". I suspect this effect might not be driven by the difference but by a correlation between pupil size and latency.

      The authors correlate differences in pupil size (Exp1) with saccade latencies (Exp2), I recommend correlating pupil size with the latency directly, in either task. This would show if it is actually the difference between choices or simply the pupil size of the respective individual option that is linked to latency/effort. Same for curvature.

      The reviewer raises a good point. Please see the previous analyses concerning the possible correlations between pupil size and saccade latency, and how they jointly predict saccade selection.

      Our data show that saccade curvature and latencies are linked with the difference in pupil size between the selected and non-selected options. Are these effects driven by a difference in pupil size or by the pupil size associated with the chosen option?

      To assess this, we conducted two linear mixed-effects models. We predicted saccade curvature and latency using pupil size (from the planning task) of the selected and nonselected options while controlling for the chosen direction (Wilkinson notation: saccade curvature/latency ~ selected pupil size + non-selected pupil size + obliqueness + vertical + horizontal + (1+ selected pupil size + non-selected pupil size|participant). We found that saccades curved away more from costlier the non-selected targets (β \=1.534, t \= 8.151, p < .001), and saccades curved away from the non-selected target less when the selected target was cheaper (β \=-2.571, t \= -6.602, p < .001). As the costs of the selected and non-selected show opposite effects on saccade curvature, this indicates that the difference between the two options drives oculomotor conflict.

      As for saccade latencies, we found saccade onsets to slow when the cost of the selected target was higher (b \= .068, t \= 2.844, p \= .004). In contrast, saccade latencies were not significantly affected by the cost of the non-selected target (β \= -.018, t \= 1.457, p \= .145), although numerically the effect was in the opposite direction. This shows that latencies were primarily driven by the cost of the selected target but a difference account cannot be fully ruled out.

      Together, these analyses demonstrate that the difference in costs between two alternatives reliably affects oculomotor conflict as indicated by the curvature analysis. However, saccade latencies are predominantly affected by the cost of the selected target – even when controlling for the obliqueness, updownness and leftrightness of the ensuing saccade. We have added these analyses here for completeness, but because the findings seem inconclusive for saccade latency we have chosen to not include these analyses in the current paper. We are open to including these analyses in the supplementary materials if the reviewer and/or editor would like us to, but have chosen not to do so due to conciseness and to keep the paper focused.

      I was wondering why the authors haven't analyzed the pupil size in Experiment 2. If the pupil size can be assessed during a free viewing task (Experiment 3), shouldn't it be possible to also evaluate it in the saccade choice task?

      We did not analyze the pupil size data from the saccade preference task for two reasons. First, the number of saccades is much lower than in the natural search experiments (~14.000 vs. ~250.000). Second, in the saccade preference task, there were always two possible saccade targets. Therefore, even if we were able to isolate an effort signal, this signal could index a multitude of factors such as deciding between two possible saccade targets (de Gee et al., 2014), and has the possibility of two oculomotor programs being realized instead of only a single one (Van der Stigchel, 2010).

      Discussion: "due to stronger presaccadic benefits for upward compared with downward saccades [93,94]". I think this should be the other way around.

      We thank the reviewer for pointing this out. We have corrected our mistake in the revised manuscript.

      Saccade latencies differ around the visual field; to account for that, results / pupil size should be (additionally) evaluated relative to saccade onset (rather than cue offset). It is interesting that latencies were not accounted for here (Exp1), since they are considered for Exp2 (where they correlate with a pupil size difference). I suspect that latencies not only correlate with the difference in pupil size, but directly with pupil size itself.

      We agree with the reviewer that locking the pupil size signal to saccade onset instead of cue offset may be informative. We included an analysis in the supporting information that investigates this (see Figure S1). The results of the analysis were conceptually identical.

      The reviewer writes that latencies were not accounted for in Experiment 1. Although saccade latency was not included in the final model reported in the paper, it was considered during AIC-based backward model selection. As saccade latency did not predict meaningful variance in pupil size, it was ultimately not included in the analysis as a predictor. For completeness, we here report the outcome of a linear mixed-effects that does include saccade latency as a predictor. Here, saccade latencies did not predict pupil size (β \= 1.859e-03, t \= .138, p \= .889). The assymetry effects remained qualitatively unchanged: preparing oblique compared with cardinal saccades resulted in a larger pupil size (β \= 7.635, t \= 3.969, p < .001), and preparing downward compared with upward saccades also led to a larger pupil size (β \= 3.344, t \= 3.334, p \= .003).

      In addition, we have included a new analysis in the supporting information that directly addresses this issue. We will reiterate the main results here:

      “To ascertain whether pupil size or other oculomotor metrics predict saccade preferences, we conducted a multiple regression analysis. We calculated average pupil size, saccade latency, landing precision and peak velocity maps across all 36 directions. The model, determined using AIC-based backward selection, included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences  pupil size + saccade latency + landing precision). The analysis re- vealed that pupil size (β = -42.853, t = 4.791, p < .001) and saccade latency (β = -.377, t = 2.106, p = .043) predicted saccade preferences. Landing precision did not reach significance (β = 23.631, t = 1.675, p = .104). Together, this demonstrates that although other oculomotor metrics such as saccade latency contribute to saccade selection, pupil size remains a robust marker of saccade selection.”

      We have also added this point in our discussion:

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost. Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      References

      Alnæs, D., Sneve, M. H., Espeseth, T., Endestad, T., van de Pavert, S. H. P., & Laeng, B. (2014). Pupil size signals mental eFort deployed during multiple object tracking and predicts brain activity in the dorsal attention network and the locus coeruleus. Journal of Vision, 14(4), 1. https://doi.org/10.1167/14.4.1

      Awh, E., Belopolsky, A. V., & Theeuwes, J. (2012). Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in Cognitive Sciences, 16(8), 437–443. https://doi.org/10.1016/j.tics.2012.06.010

      Ballard, D. H., Hayhoe, M. M., & Pelz, J. B. (1995). Memory Representations in Natural Tasks. Journal of Cognitive Neuroscience, 7(1), 66–80. https://doi.org/10.1162/jocn.1995.7.1.66

      Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychological Bulletin, 91(2), 276–292. https://doi.org/10.1037/0033-2909.91.2.276

      Bumke, O. (1911). Die Pupillenstörungen bei Geistes-und Nervenkrankheiten (2nd ed.). Fischer.

      Curthoys, I. S., Markham, C. H., & Furuya, N. (1984). Direct projection of pause neurons to nystagmusrelated excitatory burst neurons in the cat pontine reticular formation. Experimental Neurology, 83(2), 414–422. https://doi.org/10.1016/S0014-4886(84)90109-2

      David, L., Vassena, E., & Bijleveld, E. (2024). The unpleasantness of thinking: A meta-analytic review of the association between mental eFort and negative aFect. Psychological Bulletin, 150(9), 1070–1093. https://doi.org/10.1037/bul0000443

      de Gee, J. W., Knapen, T., & Donner, T. H. (2014). Decision-related pupil dilation reflects upcoming choice and individual bias. Proceedings of the National Academy of Sciences, 111(5), E618–E625. https://doi.org/10.1073/pnas.1317557111

      Deubel, H., & Schneider, W. X. (1996). Saccade target selection and object recognition: Evidence for a common attentional mechanism. Vision Research, 36(12), 1827–1837. https://doi.org/10.1016/0042-6989(95)00294-4

      Greenwood, J. A., Szinte, M., Sayim, B., & Cavanagh, P. (2017). Variations in crowding, saccadic precision, and spatial localization reveal the shared topology of spatial vision. Proceedings of the National Academy of Sciences, 114(17), E3573–E3582. https://doi.org/10.1073/pnas.1615504114

      Hanning, N. M., Himmelberg, M. M., & Carrasco, M. (2024). Presaccadic Attention Depends on Eye Movement Direction and Is Related to V1 Cortical Magnification. Journal of Neuroscience, 44(12). https://doi.org/10.1523/JNEUROSCI.1023-23.2023

      Himmelberg, M. M., Winawer, J., & Carrasco, M. (2023). Polar angle asymmetries in visual perception and neural architecture. Trends in Neurosciences, 46(6), 445–458. https://doi.org/10.1016/j.tins.2023.03.006

      Jepma, M., & Nieuwenhuis, S. (2011). Pupil Diameter Predicts Changes in the Exploration–Exploitation Trade-oF: Evidence for the Adaptive Gain Theory. Journal of Cognitive Neuroscience, 23(7), 1587– 1596. https://doi.org/10.1162/jocn.2010.21548

      Kahneman, D. (1973). Attention and Effort. Prentice-Hall.

      Kahneman, D., & Beatty, J. (1966). Pupil diameter and load on memory. Science (New York, N.Y.), 154(3756), 1583–1585. https://doi.org/10.1126/science.154.3756.1583

      King, W. M., & Fuchs, A. F. (1979). Reticular control of vertical saccadic eye movements by mesencephalic burst neurons. Journal of Neurophysiology, 42(3), 861–876. https://doi.org/10.1152/jn.1979.42.3.861

      Koevoet, D., Strauch, C., Naber, M., & Van der Stigchel, S. (2023). The Costs of Paying Overt and Covert Attention Assessed With Pupillometry. Psychological Science, 34(8), 887–898. https://doi.org/10.1177/09567976231179378

      Koevoet, D., Strauch, C., Van der Stigchel, S., Mathôt, S., & Naber, M. (2024). Revealing visual working memory operations with pupillometry: Encoding, maintenance, and prioritization. WIREs Cognitive Science, e1668. https://doi.org/10.1002/wcs.1668

      Kowler, E., Anderson, E., Dosher, B., & Blaser, E. (1995). The role of attention in the programming of saccades. Vision Research, 35(13), 1897–1916. https://doi.org/10.1016/0042-6989(94)00279-U

      Laeng, B., Sirois, S., & Gredebäck, G. (2012). Pupillometry: A Window to the Preconscious? Perspectives on Psychological Science, 7(1), 18–27. https://doi.org/10.1177/1745691611427305

      Loewenfeld, I. E. (1958). Mechanisms of reflex dilatation of the pupil. Documenta Ophthalmologica, 12(1), 185–448. https://doi.org/10.1007/BF00913471

      Mathôt, S. (2018). Pupillometry: Psychology, Physiology, and Function. Journal of Cognition, 1(1), 16. https://doi.org/10.5334/joc.18

      Naber, M., & Murphy, P. (2020). Pupillometric investigation into the speed-accuracy trade-oF in a visuomotor aiming task. Psychophysiology, 57(3), e13499. https://doi.org/10.1111/psyp.13499

      Nozari, N., & Martin, R. C. (2024). Is working memory domain-general or domain-specific? Trends in Cognitive Sciences, 0(0). https://doi.org/10.1016/j.tics.2024.06.006

      Reppert, T. R., Heitz, R. P., & Schall, J. D. (2023). Neural mechanisms for executive control of speedaccuracy trade-oF. Cell Reports, 42(11). https://doi.org/10.1016/j.celrep.2023.113422

      Richer, F., & Beatty, J. (1985). Pupillary Dilations in Movement Preparation and Execution. Psychophysiology, 22(2), 204–207. https://doi.org/10.1111/j.1469-8986.1985.tb01587.x

      Robison, M. K., & Brewer, G. A. (2020). Individual diFerences in working memory capacity and the regulation of arousal. Attention, Perception, & Psychophysics, 82(7), 3273–3290. https://doi.org/10.3758/s13414-020-02077-0

      Robison, M. K., & Unsworth, N. (2019). Pupillometry tracks fluctuations in working memory performance. Attention, Perception, & Psychophysics, 81(2), 407–419. https://doi.org/10.3758/s13414-0181618-4

      Sahakian, A., Gayet, S., PaFen, C. L. E., & Van der Stigchel, S. (2023). Mountains of memory in a sea of uncertainty: Sampling the external world despite useful information in visual working memory. Cognition, 234, 105381. https://doi.org/10.1016/j.cognition.2023.105381

      Shadmehr, R., Reppert, T. R., Summerside, E. M., Yoon, T., & Ahmed, A. A. (2019). Movement Vigor as a Reflection of Subjective Economic Utility. Trends in Neurosciences, 42(5), 323–336. https://doi.org/10.1016/j.tins.2019.02.003

      Silva, M. F., Brascamp, J. W., Ferreira, S., Castelo-Branco, M., Dumoulin, S. O., & Harvey, B. M. (2018). Radial asymmetries in population receptive field size and cortical magnification factor in early visual cortex. NeuroImage, 167, 41–52. https://doi.org/10.1016/j.neuroimage.2017.11.021

      Sirois, S., & Brisson, J. (2014). Pupillometry. WIREs Cognitive Science, 5(6), 679–692. https://doi.org/10.1002/wcs.1323

      Sparks, D. L. (2002). The brainstem control of saccadic eye movements. Nature Reviews Neuroscience, 3(12), Article 12. https://doi.org/10.1038/nrn986

      Strauch, C., Wang, C.-A., Einhäuser, W., Van der Stigchel, S., & Naber, M. (2022). Pupillometry as an integrated readout of distinct attentional networks. Trends in Neurosciences, 45(8), 635–647. https://doi.org/10.1016/j.tins.2022.05.003

      Unsworth, N., & Miller, A. L. (2021). Individual DiFerences in the Intensity and Consistency of Attention. Current Directions in Psychological Science, 30(5), 391–400. https://doi.org/10.1177/09637214211030266

      Van der Stigchel, S. (2010). Recent advances in the study of saccade trajectory deviations. Vision Research, 50(17), 1619–1627. https://doi.org/10.1016/j.visres.2010.05.028

      Van der Stigchel, S. (2020). An embodied account of visual working memory. Visual Cognition, 28(5–8), 414–419. https://doi.org/10.1080/13506285.2020.1742827

      Van der Stigchel, S., & Hollingworth, A. (2018). Visuospatial Working Memory as a Fundamental Component of the Eye Movement System. Current Directions in Psychological Science, 27(2), 136–143. https://doi.org/10.1177/0963721417741710

      van der Wel, P., & van Steenbergen, H. (2018). Pupil dilation as an index of eFort in cognitive control tasks: A review. Psychonomic Bulletin & Review, 25(6), 2005–2015. https://doi.org/10.3758/s13423-018-1432-y

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Nitric oxide (NO) has been implicated as a neuromodulator in the retina. Specific types of amacrine cells (ACs) produce and release NO in a light-dependent manner. NO diffuses freely through the retina and can modulate intracellular levels of cGMP, or directly modify and modulate proteins via S-nitrosylation, leading to changes in gap-junction coupling, synaptic gain, and adaptation. Although these system-wide effects have been documented, it is not well understood how the physiological function of specific neuronal types is affected by NO. This study aims to address this gap in our knowledge. 

      There are two major findings. 1) About a third of the retinal ganglion cells display cell-type specific adaptation to prolonged stimulus protocols. 2) Application of NO specifically affected Off-suppressed ganglion cells designated as G32 cells. The G32 cluster likely contains 3 ganglion cell types that are differentially affected. 

      This is the first comprehensive analysis of the functional effects of NO on ganglion cells in the retina. The cell-type specificity of the effects is surprising and provides the field with valuable new information. 

      Strengths: 

      NO was expected to produce small effects, and considerable effort was expended in validating the system to ensure that changes in the state of the preparation would not confound any effects of NO. The authors used a sequential stimulus protocol to control for changes in the sensitivity of the retina during the extended recording periods. The approach potentially increases the sensitivity of the measurements and allows more subtle effects to be observed. 

      Neural activity was measured by Ca-imaging. Responsive ganglion cells were grouped into 32 types using a clustering analysis. Initial control experiments demonstrated that the celltypes revealed by the analysis largely recapitulate those from their earlier landmark study using a similar approach. 

      Application of NO to the retina modulated responses of a single cluster of cells, labeled G32, while having little effect on the remaining 31 clusters. In separate experiments, ganglion cell spiking activity was recorded on a multi-electrode array (MEA). Together the Ca-imaging and MEA recordings provide complementary approaches and demonstrate that NO modulates the temporal but not spatial properties of affected cell-types.

      Weaknesses: 

      The concentration of NO used in these experiments was ~0.25µM, which is 5- to 10-fold lower than the endogenous concentration previously measured in rodent retina. It is perhaps surprising that this relatively low NO concentration produced significant effects. However, the endogenous measurements were done in an eye-cup preparation, while the current experiments were performed in a bare (no choroid) preparation. Perhaps the resting NO level is lower in this preparation. It is also possible that the low concentration of NO promoted more selective effects.

      Reviewer #2 (Public review): 

      Neuromodulators are important for circuit function, but their roles in the retinal circuitry are poorly understood. This study by Gonschorek and colleagues aims to determine the modulatory effect of nitric oxide on the response properties of retinal ganglion cells. The authors used two photon calcium imaging and multi-electrode arrays to classify and compare cell responses before and after applying a NO donor DETA-NO. The authors found that DETA-NO selectively increases activity in a subset of contrast-suppressed RGC types. In addition, the authors found cell-type specific changes in light response in the absence of pharmacological manipulation in their calcium imaging paradigm. This study focuses on an important question and the results are interesting. The limitations of the method and data interpretation are adequately discussed in the revised manuscript. 

      The authors have addressed my previous comments, included additional discussions on the limitations of the method, and provided a more careful interpretation of their data. 

      Recommendations for the authors: 

      Please correct the citation that reviewer #1 mentioned. In addition, a little more discussion of the NO concentration issue would be helpful. The low NO concentration is not a weakness in the data; it simply raises questions regarding the interpretation.

      Thank you for these recommendations.

      Regarding the citation error, we are not sure if Reviewer #1 refers to a citation   formatting error or incorrect placement. In any case, we modified the text: We  specified the extracted information regarding the NO concentrations and put the  applied concentration into that context (Lines 621-635). In addition, we made clear  that the citation of Guthrie (2014) refers to the dissertation, which can be easily  retrieved via Google Scholar. We also cited the mentioned ARVO abstract by   Guthrie and Mieler (2014). 

      We hope that these modifications solve the above-mentioned issues. 


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):  

      Summary: 

      Nitric oxide (NO) has been implicated as a neuromodulator in the retina. Specific types of amacrine cells (ACs) produce and release NO in a light-dependent manner. NO diffuses freely through the retina and can modulate intracellular levels of cGMP, or directly modify and modulate proteins via S-nitrosylation, leading to changes in gap-junction coupling, synaptic gain, and adaptation. Although these system-wide effects have been documented, it is not well understood how the physiological function of specific neuronal types is affected by NO. This study aims to address this gap in our knowledge. 

      Strengths: 

      NO was expected to produce small effects, and considerable effort was expended in validating the system to ensure that any effects of NO would not be confounded by changes in the state of the preparation. The authors used a paired stimulus protocol to control for changes in the sensitivity of the retina during the extended recording periods. The approach potentially increases the sensitivity of the measurements and allows more subtle effects to be observed. 

      Neural activity was initially measured by Ca-imaging. Responsive ganglion cells were grouped into 32 types using a clustering analysis. Initial control experiments demonstrated that the cell-types revealed here largely recapitulate those from their earlier landmark study using the same approach (Fig. 2). 

      Application of NO to the retina strongly modulated responses of a single cluster of cells, labeled G32, while having little effect on the remaining 31 clusters. This result is evident in Fig. 3e. 

      Separate experiments measured ganglion cell spiking activity on a multi-electrode array (MEA). Clustering analysis of the peri-stimulus spike-time histograms (PSTHs) obtained from the MEA data also revealed 32 clusters. The PSTHs for each cluster were aligned to the Ca-imaging data using a convolution approach. The higher temporal resolution of the MEA recordings indicated that NO increased the speed of sub-cluster 2 responses but had no effect on receptive field size. The physiological significance of the small change in kinetics remains unclear. 

      We thank the reviewer for their detailed and constructive comments.

      Weaknesses: 

      The G32 cluster was further divided into three sub-types using Bayesian Information Criterion (BIC) based on the temporal properties of the Ca-responses. This sub-clustering result seems questionable due to the small difference in the BIC parameter between 2 and 3 clusters. Three sub-clusters of the G32 cluster were also revealed for the PSTH data, however, the BIC analysis was not applied to further validate this result. 

      (1.1) We agree with the reviewer that this is an important point to be clarified. To this end, we repeated the analysis with n=2 clusters (see Author response image 1 below). In brief, we found that the overall interpretation did not change: Both clusters in the Ctrl1-dataset showed barely any type-specific adaptational effects, whereas under NO application, temporal contrast responses decreased (see Author response image 1 below). If requested, we would be happy to add this image to the supplementary material. 

      Author response image 1.

      In an additional analysis, we evaluated if n=2 or n=3 was the “better” choice for the number of clusters. In the new Supplementary Fig. S4, we compared the clusters with n=2 (top) and n=3 (bottom). For n=2, the two clusters are relatively strongly correlated for both visual stimuli, whereas for n=3, the clusters become more distinct, especially with respect to differences in the correlations for the two stimuli (Fig. S4b). For n=2, the low intra-cluster correlation (ICC) strongly suggests that cluster 2 contains multiple response types (ICC(C2) = 0.5 ± 0.48, mean ± s.d.; Fig. S4c). For n=3, the mean ICC values are high for all three clusters (ICC(C1) = 0.81 ± 0.16; ICC(C2) = 0.86 ± 0.07; ICC(C3) = 0.83 ± 0.1; mean ± s.d.). Together, this suggests that n=3 clusters captures the response diversity in G32 better than n=2 clusters. 

      Finally, we performed a BIC analysis for the MEA dataset and found the optimal number of clusters to be also n=3 (see new Suppl. Fig. S5).

      The alignment of sub-clusters 1, 2, and 3 identified in the Ca-imaging and the MEA recordings seemed questionable, because the temporal properties of clusters did not align well, nor did the effects of NO. 

      (1.2) To address this important point, we analyzed the correlations between the control responses of the three clusters from the Ca<sup>2+</sup>-dataset with the ones from the MEA-dataset (see new Suppl. Fig. S7). To avoid confusion, we named the clusters in the MEA-dataset i,ii,iii (see Fig. 8). We found two of the three clusters to be highly correlated (Ca<sup>2+</sup> clusters 2,3 and MEA clusters iii, ii), whereas one cluster was much less so (cluster 1 vs. cluster i), likely due to differences in response kinetics. In clusters i and ii NO application led to a release of suppression for temporal contrasts – similar to what we observed in the Ca<sup>2+</sup> data (see also our new analysis of the MEA data in Suppl. Fig. S6, as discussed further below).

      We agree that the cell types underlying the Ca<sup>2+</sup> and MEA G32 clusters may not be the same – aligning functional types between those two methods is challenging due to several factors, mainly because while Ca<sup>2+</sup> is a proxy for spiking activity, other Ca<sup>2+</sup> sources as well as sub-threshold membrane potential changes affect the intracellular Ca<sup>2+</sup>, potentially in a cell type-specific way. We explain this now better in the text.

      In any case, our main point was not to unambiguously align the cell types but to show that in both datasets, we find three subclusters of G<sub>32</sub>, which are affected by NO in a differential manner, particularly their suppression to temporal contrasts.

      The title of the paper indicates that nitric oxide modulates contrast suppression in a subset of mouse retinal ganglion cells, however, this result appears to be inferred from previous results showing that G32 is identified as a "suppressed-by-contrast" cell. The present study does not explicitly evaluate the amount of contrast-suppression in G32 cells. 

      (1.3) The reviewer is correct in that we did not quantify contrast-suppression in G<sub>32</sub> in detail but focused on the responses to temporal contrast (chirp and moving bar) and its modulation by NO (Fig. 5). In this context, please note that G<sub>32</sub>’s responses to the moving bar stimulus suggests that the cells are also suppressed by spatial contrast (i.e., an edge appearing in their RF). The functional RGC type G<sub>32</sub> (“Off suppressed 2”) was defined in an earlier study (Baden et al. 2016); it was assigned to the “Suppressed-by-Contrast” (SbC) category mainly because temporal contrast suppresses its responses. Already then, coverage analysis indicated that G<sub>32</sub> may indeed contain several RGC types – in line with our clustering analysis. It is still unclear if G<sub>32</sub> contains one (or more) of the SbC cells described by Jacoby & Schwartz (2018); in their recent study, Wienbar and Schwarz (2022) introduced the novel bursty-SbC RGC, which Goetz et al. (2022) speculated to potentially align with G<sub>32</sub>.<br /> We now discuss the relationship between G<sub>32</sub> and the SbC RGCs defined in other studies in the revised manuscript.

      In its current form, the work is likely to have limited impact, since the morphological and functional properties of the affected sub-cluster remain unknown. The finding that there can be cell-specific adaptation effects during experiments on in vitro retina is important new information for the field.

      (1.4) Again, we thank the reviewer for the detailed and helpful feedback. We hope that the reviewer finds our revised manuscript improved.

      Reviewer #1 (Recommendations For The Authors):  

      Most of the calcium activity traces (dF/F) throughout the paper have neither vertical nor horizontal calibration bars. Presumably, most values are positive, but this is unclear as a zero level is not indicated anywhere. Without knowing where zero dF/F is, it is not possible to determine whether the NO increased the Ca-signal or blocked a decrease in the Ca-signal. 

      Both ∆F/F and z-scoring, as we used here, are ways to normalize Ca<sup>2+</sup> traces. We decided against using ∆F/F<sub>0</sub> because this typically assumes that F represents the cell’s Ca<sup>2+</sup> resting level (F<sub>0</sub>; without activity). However, in our measurements, the “resting” Ca<sup>2+</sup> levels (i.e. before presenting a stimulus) may indeed reflect no spiking activity (e.g., in an ON RGC) but may also reflect baseline spiking activity (e.g., in an G<sub>32</sub>, which has a baseline firing rate of ~10 Hz; see Fig. S6). Hence, we used z-scoring, which carries no assumption of resting Ca<sup>2+</sup> level equal to no activity. In practice, we normalized all traces to the Ca<sup>2+</sup> level prior to the light stimulus and defined this as zero (as described in the Methods).

      We considered the reviewer’s suggestion of adding zero lines to every trace but felt that this would hamper the overall readability of the figures.

      Regarding calibration bars: We made sure that horizontal bars (indicating time) are present in all figures. We decided to leave out vertical bars in Ca<sup>2+</sup> responses, because as explained above, the traces are normalized (and unit-free), and within a figure all traces are scaled the same.

      Points of clarification for the Methods: 

      (1) The stimulus field was 800 x 600 µm. Presumably, both scan fields were contained within this region when scanning either Field 1 or Field 2 so that the adaptation level of the preparation at both locations was maintained? 

      Yes, the stimulation field is always kept centered on the respective recording (scan) field and the adaptation level for each recording field was maintained.

      (2) There appeared to be an indeterminate amount of time between the initial 10-minute adaptation period and Ctrl1, whereas there were no such gaps between subsequent scans. Is this likely to produce differences in adaptation state and thus represent a systematic error? 

      At this time point, recording (scan) fields were selected to make sure that the cells in the field were uniformly labelled with the Ca<sup>2+</sup> indicator and responsive to light stimuli. This typically happened already at the end of the light adaptation phase and/or right after. When selecting the fields, light stimuli were presented (to test responsiveness) and thereby the adaptation level was maintained independent of the duration of this procedure, minimizing systematic errors.

      (3) Was the dense white noise stimulus applied during the wash-in period to maintain the adaptation state of the preparation prior to the subsequent scan? 

      The dense noise was not applied throughout the wash-in period but at least 5-10min before the field was recorded with a drug (e.g., NO). 

      Fig. 1d illustrates very nicely how the stimuli align with the responses. It would have been helpful to have this format continue throughout the paper but unfortunately, the vertical lines are dropped in Fig. 2a and then the stimulus waveform is omitted in Fig. 2e onwards. 

      Thanks, good idea. We added the vertical lines and the stimulus waveform to the figures where they were missing to improve the readability. 

      What was the rationale for selecting the concentration of the NO donor used? Is it likely to mimic natural levels? 

      A DETA/NO concentration of 100 µM is commonly used in studies investigating NOinduced effects. DETA/NO has a half-life time (t<sub>0.5</sub>) of 20 hours, which makes it more suitable for application in tissues (like our whole-mount preparation), because the donor can penetrate into the issue before releasing NO. In turn, this long t0.5 means that only a fraction of the bound NO is released per time unit.

      Based on t<sub>0.5</sub> for DETA/NO and NO, one can roughly estimate the NO range as follows: t<sub>0.5</sub> of NO strongly depends on the tissue and is estimated in the second to minute range (Beckman & Koppenol, 1996). Assuming a t<sub>0.5</sub> for NO of 2 minutes, a freshly prepared 100 µM DETA/NO solution is expected to result within the first hour a NO concentration of approx. 0.25 µM (taking into account that 1 mole of DETA/NO releases 1.5 moles of NO molecules; see Ramamurthi & Lewis 1997).

      In general, it is difficult to determine the physiological concentration of NO in the retina. Different measurements point at peaks of a few 100 nM (e.g., frog retina, ganglion cells: 0.25 µM, Kalamkarov et al. 2016; rodent inner retina, 0.1 to 0.4 µM, Micah et al. 2014). Hence, the NO concentrations we apply should be within the measured physiological range.

      Fig. 3e: what are the diamond symbols? If these are the individual cells, it might be better to plot them on top of the box plots so all are visible. 

      Indeed, the diamond symbols represent individual cells, yet outliers only. We decided not to plot all cells as a dot plot on top of the box plots since the readability will suffer as there are too many individual dots to show, e.g., n=251 for G<sub>32</sub> Ctrl and n=135 for G<sub>32</sub> DETA/NO.

      Fig. 3: please explain more clearly the x-axis units in a-d and the y-axis units in e. 

      To estimate potential response differences between the first and the second scan (i.e. either Ctrl 2 or NO), the traces were subtracted cell-pairwise (∆ Ctrl: Ctrl 2 – Ctrl 1; ∆ DETA/NO: NO – Ctrl 1). As all Ca<sup>2+</sup> traces were normalized, they are unit-free. Therefore, the x-axes in Fig. 3a-d represent the mean differences of each cell per cell type, e.g., a value of zero would mean that the traces of Ctrl 1 and Ctrl 2 for a cell are identical. The y-axis in Fig. 3e is also unit-free, because technically, it is the same measure as Fig. 3a-d. But since it summarizes the control- and NO-data, we refer to this as “delta mean trace.” We tried to make this clearer in the revised manuscript and a detailed description can be found in the Methods.

      Fig. 3: "...a substantial number of RGC types (34%) changed their responses to chirp and/or moving bar stimuli in the absence of any pharmacological perturbation in a highly reproducible manner...". How many of the cell types showed a significant difference? Two cell-types with p<0.001are highlighted with 3 asterisks. It would be helpful to indicate on this plot which of the other cells showed significant differences. 

      Yes, this is a good idea. Thank you. We tried to add this information to the figure, but it became rather crowded. Therefore, we added a new Suppl. Fig. S3 (same style as Fig. 3) where we exclusively summarized the control-dataset. 

      Fig. 7: To illustrate the transform from PSTH to Ca-imaging, why not use G32 data as an example?

      Fair point. We modified the figure and added G<sub>32</sub> as an example.

      It would be clearer if the cells were labeled consistently throughout the paper using their Baden cluster numbers rather than switching to the older nomenclature (JAM-B, local edge, alpha, etc), e.g. Fig. 7a,b. 

      In the revised manuscript, we now changed the nomenclature to the Ca2+ Baden et al. (2016) terminology. We used the alternative cell type names here because where Fig. 7a is discussed in the manuscript, the cell type matching did not happen yet. But we agree that a consistent nomenclature is helpful.

      The evidence supporting the sub-clustering of the G32 cells for the two recording methods could have been stronger. In Fig. 5, the BIC difference between 2 and 3 clusters is rather small. Is this result robust enough to justify 3 rather than 2 clusters? The BIC analysis should also be performed on the PSTH data-set to support the notion that the MEA G32 cluster also contains 3 rather than 2 sub-clusters. 

      Regarding the sub-clustering of G<sub>32</sub> into n=2 or n=3 clusters for both datasets, please see our detailed reply #1.1 in our response to the public comments above.

      The alignment of the three sub-clusters across the Ca-imaging and MEA data looked questionable. For example, the cluster 2 and cluster 3 traces in Fig. 5e,f look similar, with cluster 1 being more different. In Fig. 8c on the other hand, cluster 1 and 3 look similar with cluster 2 being more different. The pharmacological results also did not align well. For the Ca-imaging, NO appeared to have a large effect on cluster 1, a more modest effect on cluster 2 and less effect on cluster 3 (Fig. 5f). In comparison, the MEA results diverged, with NO producing the largest effect on cluster 2 and very modest if any effects on clusters 1 and 3 (Fig. 8c). Moreover, the temporal properties of cluster 1 and cluster 3 look very different between the Ca-imaging and MEA data. Without further comment, these differences raise concerns about the reliability of the clustering and the validity of comparisons made across the two sets of experiments. 

      We agree that this is a critical point. Please see our reply #1.2 in our response to the public comments above.

      Fig. 8: Transforming the PSTHs into Ca-traces is important to align the MEA recordings with the Ca-imaging data. It would also be very informative to see a more detailed overall presentation of the PSTH data since it provides a much higher temporal resolution of the responses. For example, illustrating the average PSTHs for the G32 cells under all the experimental conditions could be quite illuminating. 

      To address this point, we added a new Supplementary Fig. S6, which shows the pseudo-Ca<sup>2+</sup> traces for each cluster and condition next to the PSTHs. In addition, we quantified the cumulative firing rate for response features (time windows) where temporal suppression was observed in the Ca<sup>2+</sup> data. This new analysis shows that during NO-application, we can see an increase in firing rate in all clusters. Nevertheless, the effect of NO on the PSTHs is admittedly small and it is better visible in the pseudo-Ca<sup>2+</sup> transformed traces. One possible explanation for this difference may be that the overall firing rates are quite dynamic in G<sub>32</sub> such that a significant increase in “suppression” phases relative to the peak firing appears small.

      Reviewer #2 (Public Review):  

      Neuromodulators are important for circuit function, but their roles in the retinal circuitry are poorly understood. This study by Gonschorek and colleagues aims to determine the modulatory effect of nitric oxide on the response properties of retinal ganglion cells. The authors used two photon calcium imaging and multi-electrode arrays to classify and compare cell responses before and after applying a NO donor DETA-NO. The authors found that DETA-NO selectively increases activity in a subset of contrast-suppressed RGC types.

      In addition, the authors found cell-type specific changes in light response in the absence of pharmacological manipulation in their calcium imaging paradigm. While this study focuses on an important question and the results are interesting, the following issues need further clarification for better interpretation of the data. 

      We thank the reviewer for her/his detailed and constructive comments.

      (1) Design of the calcium imaging experiments: the control-control pair has a different time course from the control-drug pair (Fig 1e). First, the control-control pair has a 10 minute interval while the control-drug pair has a 25 minute interval. Second, Control 1 Field 2 was imaged 10 min later than Control 1 Field 1 since the start of the calcium imaging paradigm. 

      Given that the control dataset is used to control for time-dependent adaptational changes throughout the experiment, I wonder why the authors did not use the same absolute starting time of imaging and the same interval between the first and second round of imaging for both the control-control and the control-drug pairs. This can be readily done in one of the two ways: 1. In a set of experiment, add DETA/NO between "Control 1 Field 1 and "Control 2 Field 1" in Fig. 1e as the drug group; or 2. Omit DETA/NO in the Fig. 1e protocol as the control group to monitor the time course of adaptational changes. 

      Thank you for raising this point. We hope that in the following we can clarify the reasoning behind our protocol and the analysis approach.

      (2.1) Initially, we performed these experiments in different ways (also in the sequence suggested by the reviewer), before homing in on the paradigm illustrated in Fig. 1. We chose this paradigm for two reasons: First, we wanted to have for each retina both Ctrl1/Ctrl2 and Ctr1/NO data sets, to be sure that the time-dependent (adaptational) effects were not related to the general condition of an individual retina preparation. Second, we did not see obvious differences in time-dependent or NO-induced effects between paradigms. Therefore, while we cannot exclude that the absolute time between recordings can affect the observed changes, we do not think that such effects are substantial enough to change our conclusions.

      In the revised manuscript, we now explicitly point at the different intervals. 

      Related to the concern above, to determine NO-specific effect, the authors used the criterion that "the response changes observed for control (ΔR(Ctrl2−Ctrl1)) and NO (ΔR(NO−Ctrl1)) were significantly different". This criterion assumes that without DETA-NO, imaging data obtained at the time points of "Control 1 Field 2" and "DETA/NO Field 2" would give the same value of ΔR as ΔR(Ctrl2−Ctrl1) for all RGC types. It is not obvious to me why this should be the case, because of the unknown time-dependent trajectory of the adaptational change for each RGC type. For example, a RGC type could show stable response in the first 30 min and then change significantly in the following 30 min. DETA/NO may counteract this adaptational change, leading to the same ΔR as the control condition (false negative). Alternatively, DETA/NO may have no effect, but the nonlinear timedependent response drift can give false positive results. 

      (2.2) Initially, we assumed that after adapting the retina to a certain light level, RGCs exhibit stable responses over time, such that when adding a pharmacological agent, we can identify drug-induced response changes (e.g., by calculating the response difference). To our surprise, we found that for some RGC types the responses changed between the first and the second recording (referred to as cell type-specific adaptational effects), which is why we devised the Ctrl1/Ctrl2 vs. Ctr2/NO analysis. 

      The reviewer is correct in that we assume in our analysis that the adaptational- and NO-induced effects are independent and sum linearly. Further, we agree with the reviewer that there may be other possibilities, two of which are highlighted by the reviewer:

      (a) Interaction: for instance, if NO compensates for the adaptational effect, we would not be able to measure this; or, if this compensation was partial, underestimate both effects. 

      (b) More complex time-dependency: for example, if an RGC shows a pronounced adaptational effect with a longer delay (i.e. only after the second scan), or that a very transient NO effect has already disappeared when we perform the second scan. On the one hand, as we only can take snapshots of the RGC responses, we cannot exclude these possibilities. On the other hand, both effects (adaptational- and NO-dependent) were type-specific and reproducible between experiments (also with varying timing, see reply #2.1), which makes complex time dependencies less likely.

      The revised manuscript now reflects these limitations of our recording paradigm and points out which effects can be detected, and which likely not.

      I also wonder why washing-out, a standard protocol for pharmacological experiments, was not done for the calcium protocol since it was done in the MEA experiments. A reversible effect by washing in and out DETA/NO in the calcium protocol would provide a much stronger support that the observed NO modulation is due to NO and not to other adaptive changes. 

      (2.3) We agree that a clear wash-out would strengthen our findings. Indeed, in the beginning of our experiments, we tried to wash-out the agent in the Ca<sup>2+</sup> recordings, as we did in the MEA recordings. We soon stopped doing this in the Ca<sup>2+</sup> experiments, because response quality decreased for the third scan of the same field, likely due to bleaching of fluorescent indicator and photopigment. This is why we typically restrict the total recording time of the same field of RGCs to about 30 min (~ two scans with all light stimuli). Moreover, our MEA data showed that DETA/NO can largely be washed-out, which supports that we observed NO-specific effects. Therefore, we decided against further attempts to establish the wash-out also in the Ca<sup>2+</sup> experiments (e.g., shortening the recording time by presenting fewer light stimuli).

      (2) Effects of Strychnine: In lines 215-219, " In the light-adapted retina, On-cone BCs boost light-Off responses in Off-cone BCs through cross-over inhibition (83, 84) and hence, strychnine affects Off-response components in RGCs - in line with our observations (Fig. S2)" However, Fig. S2 doesn't seem to show a difference in the Off-response components. Rather, the On response is enhanced with strychnine. In addition, suppressed-by-contrast cells are known to receive glycinergic inhibition from VGluT3 amacrine cells (Tien et al., 2016). However, the G32 cluster in Fig. S2 doesn't seem to show a change with strychnine. More explanation on these discrepancies will be helpful.

      (2.4) We thank the reviewer for this comment. Regarding the first part, we agree that the figure does not support differences in the Off-response components. We therefore rephrased the corresponding text accordingly. Additionally, we now show all RGC types with n>3 cells per recording condition in the revised Suppl. Fig. S2 and added statistics.

      Regarding the second part, there are several possible explanations for these discrepancies:

      (a) The SbC (transient Off SbC) studied in Tien et al. (2016) likely corresponds to the RGC type G<sub>28</sub> (see Höfling et al. 2024). As mentioned above (see reply #1.2), it is unclear if G<sub>32</sub> corresponds to a previously described SbC, and if so, to which. Goetz et al. (2022) proposed that G<sub>32</sub> may align with the bursty-SbC (bSbC) type (their Supplemental Table 3), as described also by Wienbar and Schwartz (2022). An important feature of the bSbC type is that its contrast response function is mainly driven by intrinsic properties rather than synaptic input. If G<sub>32</sub> indeed included the bSbC, this may explain why strychnine does not interfere with the suppression of temporal contrast.

      (b) In Tien et al. (2016), the authors genetically removed the VG3-ACs (see their Fig. 3) and show that this ablation reduces the inhibition of tSbC cells in a stimulus size-dependent manner. Specifically, larger light stimuli (600 µm) only show marginal effects on the IPSCs and inhibitory synaptic conductance (see their Figs. 3c,d and 3e,f, respectively). In our study, the full-field chirp had a size of 800 x 600 µm. Therefore – and assuming that G<sub>32</sub> indeed included tSbCs – our observation that strychnine did not affect temporal suppression in the full-field chirp responses would be in line with Tien et al. (2016).   

      (3) This study uses DETA-NO as an NO donor for enhancing NO release. However, a previous study by Thompson et al., Br J Pharmacol. 2009 reported that DETA-NO can rapidly and reversible induce a cation current independent of NO release at the 100 uM used in the current study, which could potentially cause the observed effect in G32 cluster such as reduced contrast suppression and increased activity. This potential caveat should at least be discussed, and ideally excluded by showing the absence of DETA-NO effects in nNOS knockout mice, and/or by using another pharmacological reagent such as the NO donor SNAP or the nNOS inhibitor l-NAME. 

      Thank you for pointing out this potential caveat. We certainly cannot exclude such side effects. However, we think that this explanation of our observations is unlikely, because Thompson et al. barely see effects at 100 µM DETA/NO; in fact, their data suggests that clear NO-independent effects on the cation-selective channel occur at much higher DETA/NO concentrations, such as 3 mM. 

      In any case, in the revised manuscript, we refer to this paper in the Discussion

      (4) Clarification of methods: In the Methods, lines 1119-1127, the authors describe the detrending, baseline subtraction, and averaging. Then, line 1129, " the mean activity r(t) was computed and then traces were normalized such that: max t(|r(t)|) = 1. How is the normalization done? Is it over the entire recording (control and wash in) for each ROI? Or is it normalized based on the mean trace under each imaging session (i.e. twice for each imaging field)? 

      The normalization (z-scoring) was done for each ROI individually per stimulus and condition (Ctrl 1, Ctrl 2, DETA/NO). We normalized the traces, because the absolute Ca<sup>2+</sup> signal depends on factors, such as “resting” state of the cell (e.g., silent vs. baseline spiking activity in the absence of a light stimulus) and its fluorescent dye concentration. This also means that absolute response amplitudes are difficult to interpret. Hence, we focused on analyzing relative changes per ROI and condition, which still allowed us to investigate adaptational and drug-induced effects. In the revised manuscript, we changed the corresponding paragraph for clarification.

      As for the clustering of RGC types, I assume that each ROI's cluster identity remains unchanged through the comparison. If so, it may be helpful to emphasize this in the text.

      Yes, this is correct. We identified G<sub>32</sub> RGCs based on their Ctrl1 responses and then compares these responses with those for Ctrl2 or NO. We now clarified this in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):  

      The manuscript would benefit from a discussion of how the findings in this study relate to known mechanisms of NO modulation and previously reported effects of NO manipulations on RGC activity. 

      Thank you for the recommendation. We already refer to known mechanisms of NO within the retina in the Introduction. In the revised manuscript, we now added information to the Discussion.

      In the abstract, "a paired-recording paradigm" could be misleading because paired recording generally refers to the simultaneous recording of two neurons. However, the paradigm in this study is essentially imaging experiments done at two time points. 

      We agree with the reviewer. To avoid any confusion with paired electrophysiological recordings, we changed the term “paired-recording paradigm” to “sequential recording paradigm” and replaced the term “pair-/ed” with “sequentially recorded”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript investigates the role of membrane contact sites (MCSs) and sphingolipid metabolism in regulating vacuolar morphology in the yeast Saccharomyces cerevisiae. The authors show that tricalbin (1-3) deletion leads to vacuolar fragmentation and the accumulation of the sphingolipid phytosphingosine (PHS). They propose that PHS triggers vacuole division through MCSs and the nuclear-vacuolar junction (NVJ). The study presents some solid data and proposes potential mechanisms underlying vacuolar fragmentation driven by this pathway. However, there are some concerns regarding the strength and interpretation of their lipid data, and the robustness of some conclusions. The manuscript would benefit from addressing these concerns and providing more conclusive evidence to support the proposed conclusions. Overall, the study provides valuable insights into the connection between MCSs, lipid metabolism, and vacuole dynamics, but further clarification will be highly valuable to strengthen the conclusions.

      We thank the thoughtful and positive feedback from Reviewer #1. Nevertheless, there are concerns raised regarding the strength and interpretation of the lipid data, as well as the robustness of specific conclusions. We acknowledge the importance of addressing the raised concerns and provide more conclusive evidence to support our proposed conclusions. We have responded in the "Recommendations to Authors" section and hope that our research has been further strengthened.

      Reviewer #2 (Public Review):

      This manuscript investigates the mechanism behind the accumulation of phytosphingosine (PHS) and its role in triggering vacuole fission. The study proposes that membrane contact sites (MCSs) are involved in two steps of this process. First, tricalbin-tethered MCSs between the endoplasmic reticulum (ER) and the plasma membrane (PM) or Golgi modulate the intracellular amount of PHS. Second, the accumulated PHS induces vacuole fission, most likely via the nuclear-vacuolar junction (NVJ). The authors suggest that MCSs regulate vacuole morphology through sphingolipid metabolism.

      While some of the results in the manuscript are interesting the overall logic is hard to follow. In my assessment of the manuscript, my primary concern lies in its broad conclusions which, in my opinion, exceed the available data and raise doubts. Here are some instances where this comes into play for this manuscript:

      We greatly appreciate the careful insights into our research from Reviewer #2. We have sincerely addressed the points one by one in the following.

      Major points for revision

      1) The rationale to start investigating a vacuolar fission phenotype in the beginning is very weak. It is basically based on a negative genetic interaction with NVJ1. Based on this vacuolar fragmentation is quantified. The binning for the quantifications is already problematic as, in my experience, WT cells often harbor one to three vacuoles. How are quantifications looking when 1-3 vacuoles are counted as "normal" and more than 3 vacuoles as "fragmented"? The observed changes seem to be relatively small and the various combinations of TCB mutants do not yield a clear picture.

      The number of vacuoles at a steady state could be influenced by various environmental factors, including the composition of the medium (manufacturer supplying the reagent and local water hardness) and the background of the strain. Possibly due to those causes, our observations differ from the experience of Reviewer #2. Indeed, we observed that WT cells always have one vacuole in YPD medium. Whereas in SD medium (Fig S3B only), WT cells have mainly one or two vacuoles per cell. In both cases, we observed that some of the mutants showed a different phenotype from the WT and that those differences are supported by student’s t-test and two-way ANOVA analysis.

      2) The analysis of the structural requirements of the Tcb3 protein is interesting but does not seem to add any additional value to this study. While it was used to quantify the mild vacuolar fragmentation phenotype it does not reoccur in any following analysis. Is the tcb3Δ sufficient to yield the lipid phenotype that is later proposed to cause the vacuolar fragmentation phenotype?

      We do not know whether tcb3Δ alone is sufficient to increase PHS as we have not examined it. Nevertheless, as another approach, we analyzed the difference in IPC level between tcb1Δ2Δ3Δ triple deletion and tcb3Δsingle deletion in a sec18 mutant background and showed that the reduction of IPC synthesis is similar between tcb1Δ2Δ3Δand tcb3Δ alone (unpublished). This result suggests that out of all tricalbins (Tcb1, Tcb2 and Tcb3), Tcb3 plays a central role. In addition, the IPC synthesis reduction phenotype was small in tcb1Δ alone and tcb2Δ alone, but a strong phenotype appeared in the tcb1Δtcb2Δ combined deletion (as strong as in tcb3Δ alone). The relationship between Tcb1 Tcb2 and Tcb3 indicated by these results is also consistent with the results of the structural analysis in this study. We have shown that Tcb3 physically interacts with Tcb1 and Tcb2 by immunoprecipitation analysis (unpublished). In the future, we plan to investigate the relationship between Tcb proteins in more detail, along with the details of the interactions between Tcb1, Tcb2, and Tcb3.

      3) The quantified lipid data also has several problems. i) The quantified effects are very small. The relative change in lipid levels does not allow any conclusion regarding the phenotypes. What is the change in absolute PHS in the cell. This would be important to know for judging the proposed effects. ii) It seems as if the lipid data is contradictory to the previous study from the lab regarding the role of tricalbins in ceramide transfer. Previously it was shown that ceramides remain unchanged and IPC levels were reduced. This was the rationale for proposing the tricalbins as ceramide transfer proteins between the ER and the mid-Golgi. What could be an explanation for this discrepancy? Does the measurement of PHS after labelling the cells with DHS just reflect differences in the activity of the Sur2 hydroxylase or does it reflect different steady state levels.

      i) As Reviewer #2 pointed out, it is a slight change, but we cannot say that it is not sufficient. We have shown that PHS increases in the range of 10~30% depending on the concentration of NaCl that induces vacuole division (This result is related to the answers to the following questions by Reviewer #3 and to the additional data in the new version). This observation supports the possibility that a small increase in PHS levels may have an effect on vacuole fragmentation. We did not analyze total PHS level by using methods such as liquid chromatography-mass spectrometry or ninhydrin staining of TLC-separated total lipids. The reason for this is that radiolabeling of sphingolipids using the precursor [3H]DHS provides higher sensitivity and makes it easier to detect differences. Moreover, using [3H]DHS labeling, we only measure PHS that is synthesized in the ER and that doesn’t originate from degradation of complex sphingolipids or dephosphorylation of PHS-1P in other organelles.

      ii) In our previous study (Ikeda et al. iScience. 2020), we separated the lipid labeled with [3H]DHS into ceramides and acylceramides. There was no significant change in ceramide levels, but acylceramides increased in tcb1Δ2Δ3Δ. Since we did not separate these lipids in the present study, the data shows the total amount of both ceramide and acylceramide. We apologize that the term in Figure 3A was wrong. We have corrected it. Also, we have used [3H]DHS to detect IPC levels, which differs from the previous analysis used [3H]inositol. This means the lipid amounts detected are completely different. Since the amount of inositol incorporated into cells varies from cell to cell, the amount loaded on the TLC plate is adjusted so that the total amount (signal intensity) of radioactively labeled lipids is almost the same. In contrast, for DHS labeling, the amount of DHS attached to the cell membrane is almost the same between cells, so we load the total amount onto the TLC plate without adjustment. In addition, the reduction in IPC levels due to Tcb depletion that we previously reported was seen only in sec12 or sec18 mutation backgrounds, and no reduction in IPC levels was observed in the tcb1Δ2Δ3Δ by [3H]inositol labeling (Ikeda et al. iScience. 2020). Therefore, we cannot simply compare the current results with the previous report due to the difference in experimental methods.

      The labeling time for [3H]DHS is 3 hours, and we are not measuring steady-state amounts, but rather analyzing metabolic reactions. Since [3H]DHS is converted to PHS by Sur2 hydroxylase in the cell, the possibility that differences in PHS amounts reflect differences in Sur2 hydroxylase activity cannot be ruled out. However, this possibility is highly unlikely since we have previously observed that the distribution of ceramide subclasses is hardly affected by tcb1Δtcb2Δtcb3Δ (Ikeda et al. iScience 2020). We have added to the discussion that the possibility of differences in Sur2 hydroxylase activity cannot be excluded.

      4) Determining the vacuole fragmentation phenotype of a lag1Δlac1Δ double mutant does not allow the conclusion that elevated PHS levels are responsible for the observed phenotype. This just shows that lag1Δlac1Δ cells have fragmented vacuoles. Can the observed phenotype be rescued by treating the cells with myriocin? What is the growth rate of a LAG1 LAC1 double deletion as this strain has been previously reported to be very sick. Similarly, what is the growth phenotype of the various LCB3 LCB4 and LCB5 deletions and its combinations.

      As Reviewer #2 pointed out, the vacuolar fragmentation in lag1Δlac1Δ itself does not attribute to the conclusion that increased PHS levels are the cause. Since this mutant strain has decreased level of ceramide and its subsequent product IPC/MIPC in addition to the increased level of the ceramide precursors LCB or LCB-1P, we have changed the manuscript as follows. As noted in the following comment by reviewer #2, myriocin treatment has been reported to induce vacuolar fragmentation, so we do not believe that experiments on recovery by myriocin treatment will lead to the expected results.

      ・ Previous Version: We first tested whether increased levels of PHS cause vacuolar fragmentation. Loss of ceramide synthases could cause an increase in PHS levels. Our analysis showed that vacuoles are fragmented in lag1Δlac1Δ cells, which lack both enzymes for LCBs (DHS and PHS) conversion into ceramides (Fig 3B). This suggests that ceramide precursors, LCBs or LCB-1P, can induce vacuolar fragmentation.

      ・Current Version: We first evaluated whether the increases in certain lipids are the cause of vacuolar fragmentation in tcb1Δ2Δ3Δ. Our analysis showed that vacuoles are fragmented in lag1Δlac1Δ cells, which lack both enzymes for LCBs (DHS and PHS) conversion into ceramides (Fig 3B). This suggests that the increases in ceramide and subsequent products IPC/MIPC are not the cause of vacuolar fragmentation, but rather its precursors LCBs or LCB-1P.

      As reviewer #2 pointed out, the lag1Δlac1Δ double mutant is very slow growing as shown below (Author response image 1). We also examined the growth phenotype of LCB3, LCB4, and LCB5 deletion strains, and found that the growth of these strains was the same as the wild strains, with no significant differences in growth (Author response image 1).

      Author response image 1.

      Cells (FKY5687, FKY5688, FKY36, FKY37, FKY33, FKY38) were adjusted to OD 600 = 1.0 and fivefold serial dilutions were then spotted on YPD plates, then incubated at 25℃ for 3 days.

      5) The model in Figure 3 E proposes that treatment with PHS accumulates PHS in the endoplasmic reticulum. How do the authors know where exogenously added PHS ends up in the cell? It would also be important to determine the steady state levels of sphingolipids after treatment with PHS. Or in other words, how much PHS is taken up by the cells when 40 µM PHS is added?

      It has been found that the addition of PHS well suppresses the Gas1 trafficking (Gaigg et al. J Biol Chem. 2006) and endocytosis phenotypes in lcb-100 mutants (Zanolari et al. EMBO J. 2000). Their suppression depends on Lcb3 localized to the ER. Thus, we know that PHS added from outside the cell reaches the ER and is functional.

      We also agree that it is important to measure the amount of PHS taken up into the cells. However, this is extremely difficult to do for the following reasons. The majority of PHS added to the medium remains attached to the surface layer of the cells. If we measure the lipids in the cells by MS, we would detect both lipids present on the outside and inside of the plasma membrane. This means we need to separate the outside from the inside of the cell's membrane to determine the exact amount of LCB that has taken up by the cells. Regretfully, this separation is currently technically difficult.

      6) Previous studies have observed that myriocin treatment itself results in vacuolar fragmentation (e.g. Hepowit et al. biorXivs 2022, Fröhlich et al. eLife 2015). Why does both, depletion and accumulation of PHS lead to vacuolar fragmentation?

      It’s exactly as Reviewer #2 said. Consistent with previous results with myriocin treatment, we also observed vacuolar fragmentation in the lcb1-100 mutant strain. Then we have added these papers to the references for further discussion. Our discussion is as follows.

      "Previous studies have observed that myriocin treatment results in vacuolar fragmentation (Hepowit et al. bioRxiv 2022; Now published in J Cell Sci. 2023, Fröhlich et al. eLife 2015). Myriocin treatment itself causes not only the depletion of PHS but also of complex sphingolipids such as IPC. This suggests that normal sphingolipid metabolism is important for vacuolar morphology. The reason for this is unclear, but perhaps there is some mechanism by which sphingolipid depletion affects, for example, the recruitment of proteins required for vacuolar membrane fusion. In contrast, our new findings show that both PHS increase and depletion cause vacuole fragmentation. Taken together, there may be multiple mechanisms controlling vacuole morphology and lipid homeostasis by responding to both increasing and decreasing level of PHS."

      7) The experiments regarding the NVJ genes are not conclusive. While the authors mention that a NVJ1/2/3 MDM1 mutant was shown to result in a complete loss of the NVJ the observed effects cannot be simply correlated. It is also not clear why PHS would be transported towards the vacuole. In the cited study (Girik et al.) the authors show PHS transport from the vacuole towards the ER. Here the authors claim that PHS is transported via the NVJ towards the vacuole. Also, the origin of the rationale of this study is the negative genetic interaction of tcb1/2/3Δ with nvj1Δ. This interaction appears to result in a strong growth defect according to the Developmental Cell paper. What are the phenotypes of the mutants used here? Does the additional deletion of NVJ genes or MDM1 results in stronger growth phenotypes?

      We seriously appreciate the concerns in our research. As reviewer #2 pointed out, we have not shown evidence in this study to support that PHS is transported directly from the ER to the vacuole, so it is unclear whether PHS is transported to the vacuole and its physiological relevance. Girik et al. showed that the NVJ resident protein Mdm1 is important for PHS transport between vacuole and ER. Given the applied experimental method that tracks PHS released in the vacuole, indeed only transport of PHS from the vacuole to the ER was verified. However, assuming that Mdm1 transports PHS along its concentration gradient we consider that under normal conditions, PHS is transported from the ER (as the organelle of PHS synthesis) to the vacuole. We clarified this interpretation by adding the following sentences to the manuscript at line 313:

      “The study applied an experimental method that tracks LCBs released in the vacuole and showed that Mdm1p is necessary for LCBs leakage into the ER. However, assuming that Mdm1p transports LCBs along its concentration gradient we consider that under normal conditions, LCBs is transported from the ER (as the organelle of PHS synthesis) to the vacuole.”

      The negative genetic interaction between tcb1/2/3Δ and nvj1Δ is consistent with this model, but under our culture conditions we did not observe a negative interaction between the genes encoding the TCB3 and NVJ junction proteins (Author response image 2). We do not know if this is due to strain background, culture conditions, or whether the deletions of TCB1 and TCB2 are also required for the negative interaction. We would like to analyze details in the future.

      Author response image 2.

      Cells (FKY 3868, FKY5560, FKY6187, FKY6189, FKY6190, FKY6188, FKY6409) were adjusted to OD 600 = 1.0 and fivefold serial dilutions were then spotted on YPD plates, then incubated at 25℃ for 3 days.

      Our results in this study show that deletion of the NVJ component gene partially suppresses vacuolar fission upon the addition of PHS. To clarify these facts, we have changed the sentences in Results and Discussion of our manuscript as follows. We hope that this change will avoid over-interpretation.

      ・ Previous: To test the role of NVJ-mediated “transport” for PHS-induced vacuolar fragmentation,

      ・Current: To test the role of NVJ-mediated “membrane contact” for PHS-induced vacuolar fragmentation,

      ・Previous: Taken together, we conclude from these findings that accumulated PHS in tricalbin deleted cells triggers vacuole fission via “non-vesicular transport of PHS” at the NVJ.

      ・Current: Taken together, we conclude from these findings that accumulated PHS in tricalbin deleted cells triggers vacuole fission via “contact between ER and vacuole” at the NVJ.

      ・Previous: Because both PHS- and tricalbin deletion-induced vacuolar fragmentations were partially suppressed by the lack of NVJ (Fig 4B, 4C), it is suggested that transport of PHS into vacuoles via the NVJ is involved in triggering vacuolar fragmentation.

      ・Current: Based on the fact that both PHS- and tricalbin deletion-induced vacuolar fragmentations were partially suppressed by the lack of NVJ (Fig 4B, 4C), it is possible that the trigger for vacuolar fragmentation is NVJ-mediated transport of PHS into the vacuole.

      8) As a consequence of the above points, several results are over-interpreted in the discussion. Most important, it is not clear that indeed the accumulation of PHS causes the observed phenotypes.

      We thank the suggestion by Reviewer #2. In particular, the concern that PHS accumulation really causes vacuolar fragmentation could only be verified by an in vitro assay system. This is an important issue to be resolved in the future.

      Reviewer #3 (Public Review):

      In this manuscript, the authors investigated the effects of deletion of the ER-plasma membrane/Golgi tethering proteins tricalbins (Tcb1-3) on vacuolar morphology to demonstrate the role of membrane contact sites (MCSs) in regulating vacuolar morphology in Saccharomyces cerevisiae. Their data show that tricalbin deletion causes vacuolar fragmentation possibly in parallel with TORC1 pathway. In addition, their data reveal that levels of various lipids including ceramides, long-chain base (LCB)-1P and phytosphingosine (PHS) are increased in tricalbin-deleted cells. The authors find that exogenously added PHS can induce vacuole fragmentation and by performing analyses of genes involved in sphingolipid metabolism, they conclude that vacuolar fragmentation in tricalbin-deleted cells is due to the accumulated PHS in these cells. Importantly, exogenous PHS- or tricalbin deletion-induced vacuole fragmentation was suppressed by loss of the nucleus vacuole junction (NVJ), suggesting the possibility that PHS transported from the ER to vacuoles via the NVJ triggers vacuole fission.

      This work provides valuable insights into the relationship between MCS-mediated sphingolipid metabolism and vacuole morphology. The conclusions of this paper are mostly supported by their results, but there is concern about physiological roles of tricalbins and PHS in regulating vacuole morphology under known vacuole fission-inducing conditions. That is, in this paper it is not addressed whether the functions of tricalbins and PHS levels are controlled in response to osmotic shock, nutrient status, or ER stress.

      We appreciate the comment, and we consider it an important point. To answer this, we have performed additional experiments. Please refer to the following section, "Recommendations For The Authors" for more details. These results and discussions also have been added to the revised Manuscript. We believe this upgrade makes our findings more comprehensive.

      There is another weakness in their claim that the transmembrane domain of Tcb3 contributes to the formation of the tricalbin complex which is sufficient for tethering ER to the plasma membrane and the Golgi complex. Their claim is based only on the structural simulation, but not on biochemical experiments such as co-immunoprecipitation and pull-down.

      We appreciate your valuable suggestion and would like to attempt to improve upon it in the future.

      Author response to Recommendations:

      The following is the authors' response to the Recommendations For The Authors. We have now incorporated the changes recommended by Reviewers to improve the interpretations and clarity of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      I would recommend the authors provide additional experimental data to fully support their claims or revise the writing of their manuscript to be more precise in their conclusions. In particular, I have suggestions/questions:

      Fig. 1A: display the results as in 1B (that is, different colors for different number of vacuoles, and the x axes showing the different conditions, in this case WT vs tcb1∆2∆3∆.

      In response to the suggestion of Reviewer #1, we have changed the display of results.

      Fig. S1B: the FM4-64 pattern looks different in the KO strain as compared to those shown in Fig. 1A. Is there a reason for that? Also, no positive control of cps1p not in the vacuole lumen is shown.

      Our apologies, this was probably due to the poor resolution of the images. We have made other observations and changed the Figure along with the positive control.

      Line 172: the last condition in Fig. 2B (vi), should be compared to the tcb1∆tcb2∆ condition (shown in fig 1).

      In response to the suggestion of Reviewer #1, we have changed the manuscript as follows: We found that cells expressing Tcb3(TM)-GBP and lacking Tcb1p and Tcb2p (Fig 2B (vi)) are even more fragmented than tcb1Δ2Δ in Fig 1B and are fragmented to a similar degree as tcb3Δ (Fig 1B and Fig 2B (ii)).

      Fig 2E: the model shown here can be tested, is there binding (similar to kin recognition mechanism of some Golgi proteins) between the different Tcb TMDs?

      As Reviewer #1 mentioned, we have confirmed by co-immunoprecipitation that Tcb3 binds to both Tcb1 and Tcb2 (unpublished). Furthermore, we will test if the binding can be observed with TMD alone in the future.

      Fig 3A: you measured an increase in PHS that is metabolized from DHS (which is what you label). Are there other routes to produce PHS independently of DHS? I mean, how is the increase reporting on the total levels of this lipid?

      PHS synthesized by Sur2 is converted to PHS-1P and phytoceramide. Conversely, PHS is reproduced by degradation of PHS1-P via Lcb3, Ysr3, and by degradation of phytoceramides via Ypc1 (Vilaça, Rita et al. Biochim Biophys Acta Mol Basis Dis. 2017. Fig1). Our analysis shows that these degradation substrates are not decreasing but rather accumulating in tcb1Δ2Δ3Δ strain, suggesting that the degradation system is not promoting PHS level. Therefore, the increase in detected PHS is most likely due to congestion/jams in metabolic processes downstream of PHS. Possible causes of the lipid metabolism disruption in Tcbdeletion cells have been discussed in the Discussion. To put it simply, (1) The reduced activity of a PtdIns4P phosphatase Sac1, due to MCS deficiency between ER and PM. (2) The impaired ceramide nonvesicular transport from the ER to the Golgi. (3) The low efficiency of PHS export by Rsb1, due to insufficient PHS diffusion between the ER and the PM.

      Line 248: did the authors test if the NVJ MCS is unperturbed in the triple Tcb KO?

      This is an exciting question. We are very interested in considering whether Tcb deficiency affects NVJ formation in terms of lipid transport. We would like to conduct further analysis in this regard in our future studies.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest carefully evaluating the findings in this manuscript. Right now the connection between elevated PHS levels and vacuolar fragmentation are not really supported by the data. One of the major issues in the field of yeast sphingolipid biology is that quantification of the lipid levels is difficult and labor- and cost-intensive. But I think that it is very important to directly connect phenotypes with the lipid levels.

      Minor points:

      • In figure 1 c and d WT controls of the different treatments are lacking.

      As reviewer #2 had pointed out, we have added data for the WT controls.

      • The tcb1Δmutant appears to be sensitive in pH 5.0 media while the triple tricalbins mutant grows fine. Is that a known phenotype?

      We have performed this assay on SD plates. Then, to check whether this phenotype of tcb1Δ was specific or general, we re-analyzed the same strain in YPD medium. In YPD medium, tcb1Δ strain grew normally, while the control, vma3Δ, was still pH sensitive. Therefore, the growth of this tcb1Δ strain is dependent on the nutrient conditions of the medium but does not appear to be pH sensitive. This new data was inserted as part of Supplementary Figure 1.

      • Line 305. The is an "of" in the sentence that needs to be deleted.

      As pointed out by Reviewer #2, we have corrected the sentence.

      Reviewer #3 (Recommendations For The Authors):

      In supplementary Fig 2, the authors show the involvement of the NVJ in hyperosmotic shockinduced vacuole fission, but the involvement of tricalbins and PHS in this process is not tested. Does osmotic shock affect the level or distribution of tricalbins and PHS? They will be able to test whether overexpression of tricalbins inhibits hyperosmotic shock-induced vacuole fission or not. Also, they will be able to perform the similar experiments upon ER stressinduced vacuole fission.

      We appreciate Reviewer#3 for suggesting that it is important to test the involvement of PHS in hyperosmotic shock- or ER stress-induced vacuole fission. We have shown in a previous report that treatment with tunicamycin, which is ER stress inducer, increased the PHS level by about 20% (Yabuki et al. Genetics. 2019. Fig4). In addition, we tested the effect of hyperosmolarity on PHS levels for this time. Analysis of PHS under hyperosmotic shock conditions (0.2 M NaCl), in which vacuolar fragments were observed, showed an increase in PHS of about 10%. Furthermore, when the NaCl concentration was increased to 0.8 M, PHS levels increased up to 30%. In other words, we have shown that PHS increases in the range of tens of percent depending on the concentration of NaCl that induces vacuole division. This observation supports the possibility that a small increase in PHS levels may have an effect on vacuole fragmentation. Moreover, NaCl-induced vacuolar fragmentation, like that caused by PHS treatment, was also suppressed by PHS export from the cell by Rsb1 overexpression.

      These new data are now inserted, commented and discussed in the manuscript as Figure 5. We hope that these results will provide further insight into the more general aspects of PHS involvement in the vacuole fission process.

      Minor points:

      1) It is unclear for me whether endogenous Tcb3 is deleted in cells expressing Tcb3-GBP (FKY3903-3905 and FKY4754). They should clearly mention that these cells do not express endogenous Tcb3 in the manuscript.

      We apologize that our description was not clear. In this strain, endogenous TCB3 gene is tagged with GBP and the original Tcb3 has been replaced by the tagged version. We have changed the description in our manuscript.

      2) The strength of the effect of PHS on vacuole morphology looks different in respective WT cells in Fig 3C, 4B, and S2B. Is this due to the different yeast strains they used?

      Yes, we used BY4742 background for the strain in Figure 3C, SEY6210 background in Figure 4B, and HR background in Figure S2B. As a matter of fact, we observed that the strength of the PHS effect varies depending on their background. Strain numbers are now given in the legend so that the cells used for each data can be referenced in the strain list.

      3) p.3, line 44: the "SNARE" complex (instead of "protease")?

      We thank for the remarks on the incorrect wording. We have corrected this sentence.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This article identifies ADGR3 as a candidate GPCR for mediating beige fat development. The authors use human expression data from the Human protein atlas and Gtex databases and combine this with experiments performed in mice and a murine cell line. They refer to a GPCR bioactivity screening tool PRESTO-Salsa, with which it was found that Hesperetin activates ADGR3. From their experiments, authors conclude that Hesperetin activates ADGR3, inducing a Gs-PKA-CREB axis resulting in adipose thermogenesis.

      Strengths:

      The authors analyze human data from public databases and perform functional studies in mouse models. They identify a new GPCR with a role in the thermogenic activation of adipocytes.

      Weaknesses:

      (1) Selection of ADGRA3 as a candidate GPCR relevant for mediating beiging in humans:

      The authors identify genes upregulated in iBAT compared to iWAT in response to cold, and among these differentially expressed genes, they identify highly expressed GPCRs in human white adipocytes (visceral or subcutaneous). Finally, among these genes, they select a GPCR not previously studied in the literature.

      If the authors are interested in beiging, why do they not focus on genes upregulated in iWAT (the depot where beiging is described to occur in mice), comparing thermoneutral to cold-induced genes? I would expect that genes induced in iWAT in response to cold would be extremely relevant targets for beiging. With their strategy, the authors exclude receptors that are induced in the tissue where beiging is actually described to occur.

      Furthermore, the authors are comparing genes upregulated in cold in BAT (but not WAT) to highly expressed genes in human white adipocytes during thermoneutrality. Overall, the authors fail to discuss the logic behind their strategy and the obvious limitations of it.

      Thanks for your valuable advice. In this study, we focus on genes that exhibited higher expression in BAT compared to iWAT under cold stimulation conditions, as these genes might play a role in adipose thermogenesis. Regarding the genes you mentioned that iWAT upregulates following cold stimulation, we did identify other intriguing targets in these genes in another ongoing study, albeit not encompassed within the scope of this study. Moreover, instead of making a comparison, we intersected 27 GPCR coding genes that were highly expressed in BAT compared to iWAT with genes that were highly expressed in human adipocytes (Figure 1C).

      With your suggestions, we realized that the description of the screening strategy in the manuscript was not clear enough, so we made the following supplement:

      “…dataset obtained from the Gene Expression Omnibus (GEO) database. Additionally, we utilized the human subcutaneous adipocytes dataset (Figure 1C, red) and human visceral adipocytes dataset (Figure 1C, purple) from the human protein atlas database to obtain genes that are highly expressed in human white adipocytes. The GSE118849 dataset comprises samples of brown adipose tissue (BAT) and inguinal white adipose tissue (iWAT) obtained from mice subjected to a 72-hour cold exposure at a temperature of 4℃.

      A total of 1134 differentially expressed genes (DEGs) that exhibited up-regulation in BAT compared to iWAT under cold stimulation were identified in the analysis, which might play a role in adipose thermogenesis. These DEGs were further screened to identify highly…”

      (2) Relevance of ADGRA3 and comparison to established literature:

      There has been a lot of literature and discussion about which receptor should be targeted in humans to recruit thermogenic fat. The current article unfortunately does not discuss this literature nor explain how it relates to their findings. For example, O'Mara et al (PMID: 31961826) demonstrated that chronic stimulation with the B3 adrenergic agonist, Mirabegron, resulted in the recruitment of thermogenic fat and improvement in insulin sensitivity and cholesterol. Later, Blondin et al (PMID: 32755608), highlighted the B2 adrenergic receptor as the main activation path of thermogenic fat in humans. There is also a recent report on an agonist activating B2 and B3 simultaneously (PMID: 38796310). Thus, to bring the literature forward, it would be beneficial if the current manuscript compared their identified activation path with the activation of these already established receptors and discussed their findings in relation to previous studies.

      Thanks to your suggestion. We have included a supplementary discussion on the relevant human adipose thermogenic receptors in the discussion section, as presented below:

      “The induction of beige fat has been investigated as a potentially effective therapeutic approach in combating obesity [23]. A clinical trial revealed that treatment with the chronic β3-AR agonist mirabegron leads to an increase in human brown fat, HDL cholesterol, and insulin sensitivity [24]. Subsequently, Blondin et al discovered that oral administration of mirabegron only elicits an increase in BAT thermogenesis when administered at the maximal allowable dose, indicating that human brown adipocyte thermogenesis is primarily driven by β2-adrenoceptor (β2-AR) stimulation [11]. Consistent with this finding, we found much higher levels of ADRB2 expression in human white adipose tissue than ADRB3 (Figure S1E). Furthermore, a recent study has demonstrated that simultaneous activation of β2-AR and β3-AR enhances whole-body metabolism through beneficial effects on skeletal muscle and BAT [25].”

      In Figures 1d and e, the authors show the expression of ADGRA3 in comparison to the expression of ADRB3. In human brown adipocytes, ADRB2 has been shown to be the main receptor through which adrenergic activation occurs (PMID: 32755608), thus authors should show the relative expression of this gene as well.

      We wholeheartedly endorse the proposal to augment the ADRB2 expression data in Figures 1D and E. However, it is regrettable to note that the pertinent databases (PRJNA66167 and PRJEB4337) are deficient in ADRB2 expression information. Fortunately, the GTEx database houses the ADRB2 expression data. Consequently, we have integrated these crucial data into Figure S1E.

      (3) Strategy to investigate the role of ADGRA3 in WAT beiging:

      Having identified ADGRA3 as their candidate receptor, the authors proceed with investigations of this receptor in mouse models and the murine inguinal adipocyte cell line 3T3.

      First of all, in Figure 1D, the authors show a substantially lower expression of ADGRA3 compared to ADRB3. It could thus be argued that a mouse would not be the best model system for studying this receptor. It would be interesting to see data from experiments in human adipocytes.

      Thanks for your helpful advice. We induced human adipose-derived mesenchymal stem cells (hADSCs) into adipocytes to evaluate the effect of ADGRA3 on human adipocytes (Figure 8).

      Moreover, if the authors are interested in inducing beiging, why do they show expression in iBAT and not iWAT?

      Maybe the description of this article wasn't clear enough, but we did show the expression and effects of ADGRA3 in iWAT and BAT (Author response image 1, Figure 3F-J and Figure 4F-J).

      Author response image 1.

      The authors perform in vivo experiments using intraperitoneal injections of shRNA or overexpression CMV-driven vectors and report effects on body temperature and glucose metabolism. It is here important to note that ADGRA3 is not uniquely expressed in adipocytes. A major advantage of databases like the Human Protein Atlas and Gtex, is that they give an overview of the gene expression across tissues and cell types. When looking up ADGRA3 in these databases, it is expressed in subcutaneous and visceral adipocytes. However, other cell types and tissues demonstrate an even higher expression. In the Human protein atlas, the enhanced cell types are astrocytes and hepatocytes. In the Gtex database tissues with the highest expression are Brain, Liver, and Thyroid.

      With this information in mind, IP injections for modification of ADGRA3 receptor expression could be expected to affect any of these tissues and cells.

      The manuscript report changes body temperature. However, temperature is regulated by the brain and also affected by thyroid activity. Did the authors measure the levels of circulating thyroid hormones? Gene expression changes in the brain? The authors report that Adgra3 overexpression decreased the TG level in serum and liver. The liver could be the primary targeted organ here, and the adipose effects might be secondary. The data would be easier to interpret if authors reported the effects on the liver, thyroid, and brain, and the gene expression across tissues should be discussed in the article.

      Thank you for your valuable advice. We supplemented the results of the effect of local BAT injection of Adgra3 OE on thermogenic genes (Figures S5G-H), the levels of circulating thyroid hormones (Figures S2H, S4F and S5B) and the effects of Adgra3 overexpression/knockdown on Adgra3 expression levels (Figures S2A-B and S4B-C) in multiple tissues as well as discussed in the article, as follows:

      “Given the consideration that the non-targeted nanoparticle approach utilized in this study for modulating Adgra3 expression levels in vivo alter Adgra3 expression in tissues beyond adipose tissue (Figures S2A-B and S4B-C), notably the liver and skeletal muscle, the construction of Adgra3 adipose tissue-specific knockout/overexpression mouse models is imperative for a more nuanced understanding of the precise mechanisms underlying the influence of on adipose thermogenesis. We will employ more sophisticated models in subsequent studies to further elucidate the effects of ADGRA3 on adipose thermogenesis and metabolic homeostasis. Nevertheless, our findings underlie a potential therapeutic feature of…”

      Finally, the identification of Hesperetin using the PRESTO-Salsa tool, and how specific the effect of Hesperetin is on ADGRA3, is currently unclear. This should be better discussed, and authors should consider measuring the established effects of Hesperetin in their model systems, including apoptosis.

      Thanks for your suggestion. We have further discussed the relevant content and added it in the discussion section as follows:

      “Previously, the influence of hesperetin on ADGRA3 has remained unreported. In this study, we screened hesperetin as a potential agonist for ADGRA3 by using the PRESTO-Salsa tool as well as discovered that hesperetin has an agonist effect on ADGRA3 through a series of experiments. This study focuses on the regulatory effect of hesperetin on adipose thermogenesis and explores whether this effect is dependent upon ADGRA3. As such, we refrained from conducting further investigations into other potential effects of hesperidin, including its potential role in antioxidant and in apoptosis.”

      Reviewer #2 (Public Review):

      Based on bioinformatics and expression analysis using mouse and human samples, the authors claim that the adhesion G-protein coupled receptor ADGRA3 may be a valuable target for increasing thermogenic activity and metabolic health. Genetic approaches to deplete ADGRA3 expression in vitro resulted in reduced expression of thermogenic genes including Ucp1, reduced basal respiration, and metabolic activity as reflected by reduced glucose uptake and triglyceride accumulation. In line, nanoparticle delivery of shAdgra3 constructs is associated with increased body weight, reduced thermogenic gene expression in white and brown adipose tissue (WAT, BAT), and impaired glucose and insulin tolerance. On the other hand, ADGRA3 overexpression is associated with an improved metabolic profile in vitro and in vivo, which can be explained by increasing the activity of the well-established Gs-PKA-CREB axis. Notably, a computational screen suggested that ADGRA3 is activated by hesperetin. This metabolite is a derivative of the major citrus flavonoid hesperidin and has been described to promote metabolic health. Using appropriate in vitro and in vivo studies, the authors show that hesperetin supplementation is associated with increased thermogenesis, UCP1 levels in WAT and BAT, and improved glucose tolerance, an effect that was attenuated in the absence of ADGRA3 expression.

      Overall, the data suggest that ADGRA3 is a constitutively active Gs-coupled receptor that improves metabolism by activating adaptive thermogenesis in WAT and BAT. The conclusions of the paper are partly supported by the data, but some experimental approaches need further clarification.

      (1) The in vivo approaches to modulate Adgra3 expression in mice are carried out using non-targeted nanoparticle-based approaches. The authors do not provide details of the composition of the nanomaterials, but it is highly likely that other metabolically active organs such as the liver are targeted. This is critical because Adgre3 is expressed in many organs, including the liver, adrenal glands, and gastrointestinal system. Therefore, many of the observed metabolic effects could be indirect, for example by modulating bile acids or corticosterone levels. Consistent with this, after digestion in the gastrointestinal tract, hesperetin is rapidly metabolized in intestinal and liver cells. Thus, hesperetin levels in the systemic circulation are likely to be insufficient to activate Adgra3 in thermogenic adipocytes/precursors. Overall, the authors need to repeat the key metabolic experiments in adipose-specific Adgra3 knockout/overexpression models to validate the reliability of the in vivo results. In addition, to validate the relevance of hesperetin supplementation for adaptive thermogenesis in BAT and WAT vivo, the levels of hesperetin present in the systemic circulation should be quantified.

      Thank you for your valuable advice. Unfortunately, we could not perform quantitative determination of hesperetin concentration in the systemic circulation because we had used the serum of hesperetin-treated mice for the quantitative determination of serum insulin, fT4 and TG. According to your other suggestions, we supplemented the results of the effect of local BAT injection of Adgra3 OE on thermogenic genes (Figures S5G-H), the levels of circulating thyroid hormones (Figures S2H, S4F and S5B) and the effects of Adgra3 overexpression/knockdown on Adgra3 expression levels (Figures S2A-B and S4B-C) in multiple tissues as well as discussed in the article, as follows:

      “Given the consideration that the non-targeted nanoparticle approach utilized in this study for modulating Adgra3 expression levels in vivo alter Adgra3 expression in tissues beyond adipose tissue (Figures S2A-B and S4B-C), notably the liver and skeletal muscle, the construction of Adgra3 adipose tissue-specific knockout/overexpression mouse models is imperative for a more nuanced understanding of the precise mechanisms underlying the influence of on adipose thermogenesis. We will employ more sophisticated models in subsequent studies to further elucidate the effects of ADGRA3 on adipose thermogenesis and metabolic homeostasis. Nevertheless, our findings underlie a potential therapeutic feature of…”

      (2) Standard measurements for energy balance are not presented. Quantitative data on energy expenditure, e.g. by indirect calorimetry, and food intake are missing and need to be included to validate the authors' claims.

      We are in full agreement with your proposal. Regrettably, owing to the constraints of experimental facilities, we are presently unable to access quantitative data pertaining to the energy expenditure of animals. However, we believe that the present results can also partially support the idea that ADGRA3 promotes energy metabolism and the results of the effect of ADGRA3 on food intake were shown in Figure S2C and Figure S5A respectively.

      (3) The thermographic images used to determine the BAT temperature are not very convincing. The distance and angle between the thermal camera and the BAT have a significant effect on the determination of the temperature, which is not taken into account, at least in the images presented.

      Thank you very much for pointing out the lack of our method description. According to the methods of literatures (Xia, Bo et al. PLoS biology. 2020. doi:10.1371/journal.pbio.3000688) and (Warner, Amy et al. PNAS. 2013. doi:10.1073/pnas.1310300110), the same batch of representative infrared images of mice were all captured using a thermal imaging camera (FLIR ONE PRO), measured at the same distance perpendicular to the plane on which the mice were located. We have supplemented this description in the Materials and Methods section, as shown below:

      “2.20. Infrared Thermography.

      BAT temperature was measured at room temperature by infrared thermography according to previous publications [22, 23]. The same batch of representative infrared images of mice were all captured using a thermal imaging camera (FLIR ONE PRO), measured at the same distance perpendicular to the plane on which the mice were located. To quantify interscapular region temperature, the average surface temperature from a region of the interscapular BAT was taken with FLIR Tools software.”

      (4) The 3T3-L1 cell line is not an adequate cell culture model to study thermogenic adipocyte differentiation. To validate their results, the key experiments showing that ADGRA3 expression modulates thermogenic marker expression in a hesperetin-dependent manner need to be performed in a reliable model, e.g. primary murine adipocytes.

      Induction of 3T3L1 cell line into white adipocytes is indeed not suitable for studying thermogenic adipocyte differentiation. However, with reference to previous studies (Wei, Gang et al. Cell metabolism. 2021. doi: 10.1016/j.cmet.2021.08.012 ) and (Bae IS, Kim SH. Int J Mol Sci. 2019. doi: 10.3390/ijms20246128), 3T3-L1 cell line was used to differentiate into beige-like adipocytes in this study, and many studies believe that this method is suitable for studying the thermogenic effect of adipocytes in vitro. Meanwhile, we provided a more detailed description of the induction of beige-like adipocytes by 3T3-L1 in the Materials and Methods section and induced human adipose-derived stem cells (hADSC) into adipocytes to evaluate the effect of ADGRA3 on human adipocytes (Figure 8).

      “…supplemented with 10% FBS. Confluent 3T3-L1 pre-adipocytes were induced into mature beige-like adipocytes with 0.5 mM isobutyl methylxanthine (IBMX), 1 μM dexamethasone, 5 μg/ml insulin, 1 nM 3, 3', 5-Triiodo-L-thyronine (T3), 125 μM indomethacin and 1 μM rosiglitazone in high-glucose DMEM containing 10% FBS for 2 days, then treated with high-glucose DMEM containing 5 μg/ml insulin, 1 nM T3, 1 μM rosiglitazone and 10% FBS for 6 days and cultured with high-glucose DMEM containing 10% FBS for 2 days. hADSCs were seeded on plates coated with 0.1% gelatin and culture and grown to confluence in human mesenchymal stem cells (hMSCs) specialized culture medium (ZQ-1320). Confluent hADSCs were induced into mature human adipocytes with adipogenic induction medium (PCM-I-004) according to the manufacturer’s instructions.”

      (5) The experimental setup only allows the measurement of basal cellular respiration. More advanced approaches are needed to define the contribution of ADGRA3 versus classical adrenergic receptors to UCP1-dependent thermogenesis.

      Thanks for your suggestion. The maximum oxygen consumption rate of the cells was also measured (Figures 2G and 2N) by adding FCCP, an uncoupler of oxidative phosphorylation (OXPHOS) in mitochondria.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Zhao et al. explored the function of adhesion G protein-coupled receptor A3 (ADGRA3) in thermogenic fat biology.

      Strengths:

      Through both in vivo and in vitro studies, the authors found that the gain function of ADGRA3 leads to browning of white fat and ameliorates insulin resistance.

      Weaknesses:

      There are several lines of weak methodologies such as using 3T3-L1 adipocytes and intraperitoneal(i.p.) injection of virus. Moreover, as the authors stated that ADGRA3 is constitutively active, how could the authors then identify a chemical ligand?

      (1) Primary cultured cells should be used to perform gain and loss function analysis of ADGRA3, instead of using 3T3-L1. It is impossible to detect Ucp1 expression in 3T3-L1 cells.

      Induction of 3T3L1 cell line into white adipocytes is indeed difficult for detecting UCP1 expression. However, with reference to previous studies (Wei, Gang et al. Cell metabolism. 2021. doi:10.1016/j.cmet.2021.08.012) and (Bae IS, Kim SH. Int J Mol Sci. 2019. doi:10.3390/ijms20246128), 3T3-L1 cell line was used to differentiate into beige-like adipocytes in this study, and many studies believe that this method is suitable for studying the thermogenic effect of adipocytes in vitro. Meanwhile, we provided a more detailed description of the induction of beige-like adipocytes by 3T3-L1 in the Materials and Methods section and induced human adipose-derived stem cells (hADSC) into adipocytes to evaluate the effect of ADGRA3 on human adipocytes (Figure 8).

      “…supplemented with 10% FBS. Confluent 3T3-L1 pre-adipocytes were induced into mature beige-like adipocytes with 0.5 mM isobutyl methylxanthine (IBMX), 1 μM dexamethasone, 5 μg/ml insulin, 1 nM 3, 3', 5-Triiodo-L-thyronine (T3), 125 μM indomethacin and 1 μM rosiglitazone in high-glucose DMEM containing 10% FBS for 2 days, then treated with high-glucose DMEM containing 5 μg/ml insulin, 1 nM T3, 1 μM rosiglitazone and 10% FBS for 6 days and cultured with high-glucose DMEM containing 10% FBS for 2 days. hADSCs were seeded on plates coated with 0.1% gelatin and culture and grown to confluence in human mesenchymal stem cells (hMSCs) specialized culture medium (ZQ-1320). Confluent hADSCs were induced into mature human adipocytes with adipogenic induction medium (PCM-I-004) according to the manufacturer’s instructions.”

      (2) For virus treatment, the authors should consider performing local tissue injection, rather than IP injection. If it is IP injection, have the authors checked other tissues to validate whether the phenotype is fat-specific?

      Thank you for your valuable advice. We supplemented the results of the effect of local BAT injection of Adgra3 OE on thermogenic genes (Figures S5G-H) and the effects of Adgra3 overexpression/knockdown on Adgra3 expression levels (Figures S2A-B and S4B-C) in other tissues.

      (3) The authors should clarify how constitutively active GPCR needs further ligands.

      Thank you for your suggestion. In fact, we only identified hesperetin as a potential agonist of ADGRA3 rather than a ligand. The results also indicate that overexpression of ADGRA3 without additional hesperetin is sufficient to activate downstream PKA signaling pathways through constitutive activity (Figure 5). Recently, Chen et al identified oleic ethanolamine (OEA) as a potential endogenous agonist of GPR3, which is also a constitutively active GPCR. Overall, the high constitutive activity of constitutively active GPCRs arises from the combined effects of stimulation by endogenous agonists and their basal coupling with Gs.

      As for why we screened and identified potential agonists of ADGRA3, we hope to find more convenient pathways for its clinical application than gene overexpression, as described in the article:      

      “Considering the difficulty of overexpressing ADGRA3 in clinical application, hesperetin was screened as a potential agonist of ADGRA3 by PRESTO-Salsa database (Figure 6A). The…”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      The title appears to be overstated as no clinical trials were performed and experiments were not even performed in human brown adipocytes.

      Thank you for your critical suggestion, therefore we have added the experimental results of human adipocytes (Figure 8) and revised the title to “Constitutively active receptor ADGRA3 signaling induces adipose thermogenesis”.

      Please specify n-number and what are replicates or independent experiments. Please also state if any outliers were excluded and why.

      Thanks for your valuable suggestion. We have added a description of the n-number in the Figure legends section, number of independent experiments and exclusion criteria for outliers in the Materials and Methods section, as follows:

      “…of tissue samples. Cohorts of ≥4 mice per genotype or treatment were assembled for all in vivo studies. All in vivo studies were repeated 2-3 independent times. All procedures related to…”

      “…μM H-89) was added to 3T3-L1 mature beige-like adipocytes for 48 hours. All in vitro studies were repeated 2-3 independent times.”

      “All data are presented as mean ± SEM. In this study, outliers that met the three-sigma rule were excluded from analysis, with the exception of those presented in Figure S1E. Given the possibility that the outliers in Figure S1E represent extreme expressions of the inherent variability within the population sample, we have chosen to retain these specific outliers for further analysis. Student’s t-test was used to compare two groups. One-way analysis of…”

      Authors use Infrared Thermography to measure body temperature. Depending on the distance between the mouse and the camera, the mouse needs to be at the same spot.

      Thank you very much for pointing out the lack of our method description. According to the methods of literatures (Xia, Bo et al. PLoS biology. 2020. doi:10.1371/journal.pbio.3000688) and (Warner, Amy et al. PNAS. 2013. doi:10.1073/pnas.1310300110), the same batch of representative infrared images of mice were all captured using a thermal imaging camera (FLIR ONE PRO), measured at the same distance perpendicular to the plane on which the mice were located. We have supplemented this description in the Materials and Methods section, as shown below:

      “2.20. Infrared Thermography.

      BAT temperature was measured at room temperature by infrared thermography according to previous publications [22, 23]. The same batch of representative infrared images of mice were all captured using a thermal imaging camera (FLIR ONE PRO), measured at the same distance perpendicular to the plane on which the mice were located. To quantify interscapular region temperature, the average surface temperature from a region of the interscapular BAT was taken with FLIR Tools software.”

      Please discuss the limitations of the experiments and discuss the relevant literature.

      Thanks for your recommendations. We discussed the limitations of the experiments and the relevant literature in the discussion section, as follows:

      “The induction of beige fat has been investigated as a potentially effective therapeutic approach in combating obesity [23]. A clinical trial revealed that treatment with the chronic β3-AR agonist mirabegron leads to an increase in human brown fat, HDL cholesterol, and insulin sensitivity [24]. Subsequently, Blondin et al discovered that oral administration of mirabegron only elicits an increase in BAT thermogenesis when administered at the maximal allowable dose, indicating that human brown adipocyte thermogenesis is primarily driven by β2-adrenoceptor (β2-AR) stimulation [11]. Consistent with this finding, we found much higher levels of ADRB2 expression in human white adipose tissue than ADRB3 (Figure S1E). Furthermore, a recent study has demonstrated that simultaneous activation of β2-AR and β3-AR enhances whole-body metabolism through beneficial effects on skeletal muscle and BAT [25].”

      “Given the consideration that the non-targeted nanoparticle approach utilized in this study for modulating Adgra3 expression levels in vivo alter Adgra3 expression in tissues beyond adipose tissue (Figures S2A-B and S4B-C), notably the liver and skeletal muscle, the construction of Adgra3 adipose tissue-specific knockout/overexpression mouse models is imperative for a more nuanced understanding of the precise mechanisms underlying the influence of on adipose thermogenesis. We will employ more sophisticated models in subsequent studies to further elucidate the effects of ADGRA3 on adipose thermogenesis and metabolic homeostasis. Nevertheless, our findings underlie a potential therapeutic feature of…”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      Casas-Tinto et al. present convincing data that injury of the adult Drosophila CNS triggers transdifferentiation of glial cells and even the generation of neurons from glial cells. This observation opens up the possibility of getting a handle on the molecular basis of neuronal and glial generation in the vertebrate CNS after traumatic injury caused by Stroke or Crush injury. The authors use an array of sophisticated tools to follow the development of glial cells at the injury site in very young and mature adults. The results in mature adults revealing a remarkable plasticity in the fly CNS and dispels the notion that repair after injury may be only possible in nerve cords which are still developing. The observation of so-called VC cells which do not express the glial marker repo could point to the generation of neurons by former glial cells.

      Conclusion:

      The authors present an interesting story that is technically sound and could form the basis for an in-depth analysis of the molecular mechanism driving repair after brain injury in Drosophila and vertebrates.

      Strengths:

      The evidence for transdifferentiation of glial cells is convincing. In addition, the injury to the adult CNS shows an inherent plasticity of the mature ventral nerve cord which is unexpected.

      Weaknesses:

      Traumatic brain injury in Drosophila has been previously reported to trigger mitosis of glial cells and generation of neural stem cells in the larval CNS and the adult brain hemispheres. Therefore this report adds to but does not significantly change our current understanding. The origin and identity of VC cells is unclear.

      The Reviewer correctly points out that it has been reported that traumatic brain injury trigger generation of neural stem cells. However, according to previous reports, those cells where quiescent Dpn+ neuroblast. We now report that already differentiated adult neuropil glia transdifferentiate into neurons. Which is a new mechanism not previously reported. 

      We agree with the reviewer regarding the identity of VC neurons although according to the results of G-TRACE experiments the origin is clear, they originate from neuropil glia (i.e. Astrocyte-like glia and ensheathing glia). We have used a battery of antibodies previously reported to identify specific subtypes of neurons to identify these newly generated neurons (Figure S1). We did not find any other neuronal marker rather than Elav that co-localize with VC cells

      Reviewer #2:

      Summary:

      Casas-Tinto et al., provide new insight into glial plasticity using a crush injury paradigm in the ventral nerve cord (VNC) of adult Drosophila. The authors find that both astrocyte-like glia (ALG) and ensheating glia (EG) divide under homeostatic conditions in the adult VNC and identify ALG as the glial population that specifically ramps up proliferation in response to injury, whereas the number of EGs decreases following the insult. Using lineagetracing tools, the authors interestingly observe the interconversion of glial subtypes, especially of EGs into ALGs, which occurs independent of injury and is dependent on the availability of the transcription factor Prospero in EGs, adding to the plasticity observed in the system. Finally, when tracing the progeny of differentiated glia, Casas-Tinto and colleagues detect cells of neuronal identity and provide evidence that such glia-derived neurogenesis is specifically favored following ventral nerve cord injury, which puts forward a remarkable way in which glia can respond to neuronal damage.

      Numerous experiments have been carried out in 7-day-old flies, showing that the observed plasticity is not due to residual developmental remodeling or a still immature VNC.

      By elegantly combining different genetic tools, the authors show glial divisions with mitotic-dependent tracing and find that the number of generated glia is refined by apoptosis later on.

      The work identifies Prospero in glia as an important coordinator of glial cell fate, from development to the adult context, which draws further attention to the upstream regulatory mechanisms.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Weaknesses:

      Although the authors do use a variety of methods to show glial proliferation, the EdU data (Figure 1B) could be more informative (Figure 1B) by displaying images of non-injured animals and providing quantifications or the mention of these numbers based on results previously acquired in the system.

      We appreciate the Reviewer’s comment. We believed that adding images of non-injured animals did not add new information as we already quantified the increase of glial proliferation upon injury in Losada-Perez let al. 2021. Besides, the purpose of this experiment was to figure out if dividing cells where Astrocyte-like glia rather than the number of dividing cells. Comparing independent experiments could be tricky but if we compare the quantifications of G2-M glia (repo>fly-Fucci) done in Losada-Perez et al 2021 (fig 1C) with the quantifications of G2-M neuropil glia done in this work (fig 1C) we can see that the numbers are comparable.

      The experiments relying on the FUCCI cell cycle reporter suggested considerable baseline proliferation for EGs and ALGs, but when using an independent method (Twin Spot MARCM), mitotic marking was only detected for ALGs. This discrepancy could be addressed by assessing the co-localization of the different glia subsets using the identified driver lines with mitotic markers such as PH3.

      In our understanding this discrepancy could be explained by the magnitude of proliferation. The lower proliferation rate of EG (as indicate the fly-fucci experiments) combining with the incomplete efficiency of MARCM clones induction reduces considerably the chances of finding EG MARCM clones. PH3 is a mitotic marker but it is also found in apoptotic cells (Kim and Park 2012. DOI: 10.1371/journal.pone.0044307) however, we stained injured VNCs with anti-Ph3 and found ALG cells positive for PH3 (Author response image 1).

      Author response image 1.

       

      The data in Figure 1C would be more convincing in combination with images of the FUCCI Reporter as it can provide further information on the location and proportion of glia that enter the cell cycle versus the fraction that remains quiescent.

      We added a Figure 1 V2 (version 2) with the suggested images (1-C’).

      The analyses of inter-glia conversion in Figure 3 are complicated by the fact that Prospero RNAi is both used to suppress EG - to ALG conversion and as a marker to establish ALG nature. Clarifications if the GFP+ cells still expressed Pros or were classified as NP-like GFP cells are required here.

      As described in the text, Pros is a marker for ALG and the results suggest that Prospero expression is required for the EG to ALG transition. We clarified these concepts in the text accordingly. In figure 3 we showed images of NP-like cells originated from EG that are prospero+, and therefore supporting the transdifferentiation from EG to ALG.  

      The conclusion that ALG and EG glial cells can give rise to cells of neuronal lineage is based on glial lineage information (GFP+ cells from glial G-trace) and staining for the neuronal marker Elav. The use of other neuronal markers apart from Elav or morphological features would provide a more compelling case that GFP+ cells are mature neurons.

      We completely agree with the reviewer's observation regarding the identity of VC neurons. We have used a battery of antibodies previously reported to identify specific subtypes of neurons to identify these newly generated neurons (Figure S1). We did not find any other neuronal marker rather than Elav that colocalize with VC cells

      Although the text discusses in which contexts, glial plasticity is observed or increased upon injury, the figures are less clear regarding this aspect. A more systematic comparison of injured VNCs versus homeostatic conditions, combined with clear labelling of the injury area would facilitate the understanding of the panels.

      We appreciate the Reviewer’s observation. We have carefully checked all figures and labelled then as “Injured” or “Not Injured”. We added a Figure 2-V2 and a figure 4-V2.

      Context/Discussion

      The study finds that glia in the ventral cord of flies have latent neurogenic potential. Such observations have not been made regarding glia in the fly brain, where injury is reported to drive glial divisions or the proliferation of undifferentiated progenitor cells with neurogenic potential.

      Discussing this different strategy for cell replacement adopted by glia in the VNC and pointing out differences to other modes seems fascinating. Highlighting differences in the reactiveness of glia in the VNC compared to the brain also seems highly relevant as they may point to different properties to repair damage.

      Based on the assays employed, the study points to a significant amount of

      glial "identity" changes or interconversions, which is surprising under homeostatic conditions. The significance of this "baseline" plasticity remains undiscussed, although glia unarguably show extensive adaptations during nervous system development.

      It would be interesting to know if the "interconversion" of glia is determined by the needs in the tissue or would shift in the context of selective ablation/suppression of a glial type.

      We deeply appreciate the Reviewer’s enthusiasm on this subject, it is indeed fascinating. We made a reduced discussion in order to fit in the eLife Short report requirements but the specific condition that trigger glial interconversion are of great interest for us. To compromise EG or ALG viability and evaluate the behaviour of glial cells is of great interest for developmental biology and regeneration, but the precise scenario to develop these experiments is not well defined. In this report, we aim to reproduce an injury in Drosophila brain and this model should serve to analyze cellular behaviours. The scenario where we deplete on specific subpopulation of glial cells is conceptually attractive, but far away from the scope of this report.

      Reviewer #3:

      In this manuscript, Casas-Tintó et al. explore the role of glial cells in the response to a neurodegenerative injury in the adult brain. They used Drosophila melanogaster as a model organism and found that glial cells are able to generate new neurons through the mechanism of transdifferentiation in response to injury.

      This paper provides a new mechanism in regeneration and gives an understanding of the role of glial cells in the process.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this elegant and thorough study, Sánchez-León et al. investigate the effects of tDCS on the firing of single cerebellar neurons in awake and anesthetized mice. They find heterogeneous responses depending on the orientation of the recorded Purkinje cell.

      Strengths:

      The paper is important in that it may well explain part of the controversial and ambiguous outcomes of various clinical trials. It is a well-written paper on a deeply analyzed dataset.

      We sincerely thank Reviewer #1 for their positive feedback and insightful comments. We are pleased to know that you found our study elegant and thorough, and we appreciate your recognition of its potential to clarify the controversial and ambiguous outcomes seen in various clinical trials. Your acknowledgment of the depth of our analysis and the clarity of the writing is highly encouraging, and we are grateful for your thoughtful evaluation of our work.

      Weaknesses:

      The sample size could be increased for some of the experiments.

      We sincerely thank the reviewer for their thoughtful suggestion to increase the sample size. While we understand the importance of this consideration, we believe it is not feasible at this stage due to several factors. First, the complexity of our experiments, which include single-neuron recordings in awake animals during electric field application, juxtacellular neurobiotin injections post-tDCS (with a low success rate), and high-density recordings from Purkinje cells across different layers in awake animals, significantly limits the throughput of data collection. Second, the statistical outcomes obtained from our analyses, which combine multiple techniques, are robust and provide a strong basis for our conclusions. Third, the current study already involves a substantial number of animals (74 mice), which aligns with ethical considerations for minimizing animal use while ensuring robust results.

      We believe that the current sample size is sufficient to support the findings presented in the manuscript. Expanding the sample size further would require considerable additional resources and time, without a clear indication that it would fundamentally alter the conclusions of the study. We are grateful for the reviewer’s understanding of these limitations and their acknowledgment of the value of the current dataset.

      Reviewer #2 (Public review):

      Summary:

      In this study by Sánchez-León and colleagues, the authors attempted to determine the influence of neuronal orientation on the efficacy of cerebellar tDCS in modulating neural activity. To do this, the authors made recordings from Purkinje cells, the primary output neurons of the cerebellar cortex, and determined the inter-dependency between the orientation of these cells and the changes in their firing rate during cerebellar tDCS application.

      Strengths:

      (1) A major strength is the in vivo nature of this study. Being able to simultaneously record neural activity and apply exogenous electrical current to the brain during both an anesthetized state and during wakefulness in these animals provides important insight into the physiological underpinnings of tDCS.

      (2) The authors provide evidence that tDCS can modulate neural activity in multiple cell types.

      For example, there is a similar pattern of modulation in Purkinje cells and non-Purkinje cells (excitatory and inhibitory interneurons). Together, these data provide wholistic insight into how tDCS can affect activity across different populations of cells, which has important implications for basic neuroscience, but also clinical populations where there may be non-uniform or staged effects of neurological disease on these various cell types.

      (3) There is a systematic investigation into the effects of tDCS on neural activity across multiple regions of the cerebellum. The authors demonstrate that the pattern of modulation is dependent on the target region. These findings have important implications for determining the expected neuromodulatory effects of tDCS when applying this technique over different target regions noninvasively in animals and humans.

      We sincerely thank Reviewer #2 for their detailed and thoughtful comments on our study. We are pleased that you recognized the importance of our in vivo approach, allowing for simultaneous neural recordings and tDCS application in both anesthetized and awake states. Your acknowledgment of our findings regarding the modulation of neural activity across different cell types, including Purkinje and non-Purkinje cells, is greatly appreciated. We also value your recognition of the implications of our work for understanding how tDCS can affect diverse neuronal populations, particularly in the context of clinical applications. Additionally, your positive feedback on our systematic investigation across multiple cerebellar regions highlights the relevance of our work for determining the region-specific effects of tDCS. Thank you for your encouraging and insightful evaluation.

      Weaknesses:

      (1) In the introduction, there is a lack of context regarding why neuronal orientation might be a critical factor influencing the responsiveness to tDCS. The authors allude to in vitro studies that have shown neuronal orientation to be relevant for the effects of tDCS on neural activity but do not expand on why this might be the case. These points could be better understood by informing the reader about the uniformity/non-uniformity of the induced electric field by tDCS. In addition, there is a lack of an a priori hypothesis. For example, would the authors have expected that neuronal orientation parallel or perpendicular to the electrical field to be related to the effects of tDCS on neural activity?

      We thank the Reviewer #2 for this insightful comment. In response, we have expanded the introduction to provide a clearer context regarding the influence of neuronal orientation on the effects of tDCS. Therefore, we have added two new paragraphs in the Introduction to address these points.

      “For neurons whose somatodendritic axis is aligned with the electric field, the field induces a pronounced somatic polarization. In the case of anodal stimulation, where the positive electrode is positioned near the dendrites and the soma is oriented away, positively charged ions accumulate near the soma, leading to depolarization and increased excitability, thus facilitating action potential generation. Conversely, neurons whose orientation opposes the field, such as when the soma is closer to the positive electrode and the dendrites face away, experience hyperpolarization, reducing excitability. Lastly, neurons oriented perpendicular to the electric field would exhibit minimal somatic polarization, as the field does not induce significant redistribution of charges along the somatodendritic axis.”

      Additionally, we have now clarified our a priori hypothesis regarding neuronal orientation and its expected influence on tDCS efficacy.

      “We hypothesized that the orientation of PCs relative to the electric field would influence the effects of tDCS on neural activity. In the Vermis, PCs oriented parallel to the field are expected to exhibit stronger effects due to greater somatic polarization, leading to depolarization or hyperpolarization depending on the orientation of the somatodendritic axis. Conversely, PCs in Crus I/II, which are oriented obliquely to the field, are expected to exhibit intermediate effects, as the oblique alignment reduces the strength of polarization compared to parallel alignment.”

      (2) It is unclear how specific stimulation parameters were determined. First, how were the tDCS intensities used in the present experiments determined/selected, and how does the relative strength of this induced electric field equate to the intensities used non-invasively during tDCS experiments in humans? Second, there is also a fundamental difference in the pattern of application used here (e.g., 15 s pulses separated by 10 s of no stimulation) compared to human studies (e.g., 10-20 min of constant stimulation).

      We thank Reviewer #2 for their observations. We proceed to address their concerns and included the following text in the main manuscript, Discussion section: 

      “We used higher values than those applied in human experiments to achieve more reliable results. As seen in Supplementary Fig. 3, neurons are modulated in a similar way for 100, 200 or 300 µA but higher intensities elicited significant changes in a greater proportion of these neurons. In addition, a previous study from our lab23 using the same methodology, showed that 100, 200 and 300 µA (eliciting from 5.9 to 125.7 V/m in the current study) were ideal to obtain reliable and robust results in neuronal modulation, while keeping animal awareness of the stimulation at a minimum level. Besides, Asan et al. has recently shown that using epidural stimulation in anesthetized rats under an electric field closer to human studies (1.5–7.5 V/m) was also able to modulate the activity of cerebellar neurons.”

      In addition, we add the following text to the Results section under ‘tDCS modulates Purkinje cell activity in awake mice in a heterogeneous manner’ section:

      “This protocol allows us to avoid the development of plasticity effects, which are known to require at least several minutes of tDCS administration, and to test the direct electrical modulation exerted by the externally applied currents.”

      (3) In their first experiment, the authors measure the electric field strength at increasing depths during increasing stimulation intensities. However, it appears that an alternating current rather than a direct current, which is usually employed in tDCS protocols, was used. There is a lack of rationale regarding why the alternating current was used for this component. Typically, this technique is more commonly used for entraining/boosting neural oscillations compared to studies using tDCS which aim to increase or decrease neural activity in general.

      We appreciate Reviewer #2’s assessment of the differences between tDCS and tACS. We will clarify this distinction. We chose tACS for measuring electric field strength for two main reasons:

      • Amplifier Limitations: The amplifiers commonly used in electrophysiology are designed to filter out low-frequency components, including direct current (DC) signals, using a highpass filter. This is due to the fact that the neuronal signals of interest, such as action potentials, typically occur at higher frequencies (several Hz to kHz). Consequently, any DC signal applied is filtered out from the recordings, preventing us from measuring changes in voltage effectively.

      • Impedance Changes: DC stimulation can alter the impedance of electrodes and surrounding tissue over time. To mitigate this effect and maintain stable recordings, it is advantageous to frequently alternate the polarity and intensity of the stimulation.

      This next text has been included in the 'Transcranial Electrical Stimulation' section of the 'Materials and Methods' part of the manuscript:

      “We selected tACS to measure electric field strength due to two main reasons: (1) amplifiers used in electrophysiology filter out low-frequency signals like DC, making voltage changes from tDCS undetectable, and (2) DC stimulation can alter electrode and tissue impedance over time, whereas alternating the polarity in tACS helps maintain stable recordings.”

      It is important to note that our aim with tACS is to provide an approximation of current propagation through the tissue, rather than to exactly replicate the baseline conditions encountered during continuous tDCS stimulation.

      Reviewer #3 (Public review):

      Summary:

      In this study, Sanchez-Leon et al. combined extracellular recordings of Purkinje cell activity in awake and anesthetized mice with juxtacellular recordings and Purkinje cell staining to link Purkinje cell orientation to their stimulation response. The authors find a relationship between neuron orientation and firing rate, dependent on stimulation type (anodal/cathodal). They also show the effects of stimulation intensity and rebound effects.

      Strengths:

      Overall, the work is methodologically sound and the manuscript is well written. The authors have taken great care to explain their rationale and methodological choices.

      We sincerely thank Reviewer #3 for their positive feedback and constructive comments regarding our study. We are pleased that you found our work methodologically sound and well written. Your acknowledgment of our efforts to explain our rationale and methodological choices is greatly appreciated. We believe that the insights gained from linking Purkinje cell orientation to their stimulation response will contribute significantly to our understanding of cerebellar function and tDCS effects. Thank you for your thoughtful evaluation of our manuscript.

      Weaknesses:

      My only reservation is the lack of reporting of the precise test statistics, p-values, and multiple comparison corrections. The work would benefit from adding this and other information.

      We sincerely thank Reviewer #3 for their valuable feedback and for highlighting an important aspect of our analysis. We agree that the inclusion of precise test statistics, p-values, and details on multiple comparison corrections would strengthen the robustness of our findings. In response to your suggestion, we have now added this information to the Results section, ensuring that all statistical tests, exact p-values, and corrections for multiple comparisons are clearly reported. We believe these additions provide greater transparency and rigor to our analysis, and we appreciate your thoughtful recommendation.

      Major Comments:

      (1) The authors should report the exact test statistics. These are missing for all comparisons and hinder the reader from understanding what exactly was tested for each of the experiments. For example, having the exact test statistics would help better understand the non-significant differences in Figure 1h where there is at least a numeric difference in CS firing rate during tDCS.

      As mentioned before, we have now included the precise test statistics for all statistical comparisons throughout the manuscript. Specifically, in the case of Supplementary Figure 1h, we have added the exact values for the comparisons of CS firing rates during tDCS, even for nonsignificant differences, to ensure transparency and to clarify the observed numerical differences. We believe these additions will help readers better interpret the data and understand the statistical underpinnings of our findings. 

      However, given the large amount of data analyzed, particularly related to individual neuronal activity, it is not feasible to present all of the data for each individual neuron. We have aimed to provide a comprehensive statistical summary without overwhelming the reader with an excessive amount of detailed data.

      (2) Did the authors apply any corrections for multiple comparisons? Generally, it would be helpful if they could clarify the statistical analysis (which values were subjected to the tests, how many tests were performed for each question, etc.).

      We appreciate the reviewer’s comment regarding the need for clarification on the statistical analysis and the application of multiple comparison corrections. In response, we have updated the main text to include all the requested information. Specifically, we have added the appropriate multiple comparison tests (Tukey's or Nemenyi) where applicable to each analysis. These corrections have been applied to ensure that the results are robust and account for the number of comparisons made. We have also clarified the specific tests used for each analysis, the values subjected to these tests, and the number of comparisons performed for each question. This information is now detailed in the Methods section under 'Statistical Analysis' for transparency and to aid in the interpretation of the results.

      (3) The relationship shown in Figure 2g seems to be influenced by the two outliers. Have the authors confirmed the results using a robust linear regression method?

      We agree with the reviewer that the two neurons in Figure 2g could appear as outliers. To address this, we applied the ROUT method with a stringent Q = 1% to detect potential outliers, and none were found. In addition, we have confirmed the robustness of our results by performing a complementary analysis using robust linear regression methods (e.g., M-estimators), which showed consistent findings with our original analysis. For this purpose, we used the 'Huber' loss function, which combines least squares with robustness against outliers. The regression line obtained with this method (y = -0.5650x + 157.4556) differs minimally from the originally presented value, with the p-value of the slope and the intercept being p = 1.4846x10<sup>-4</sup> (t<sub>(22)</sub> = -4.5740) and p = 1.1382x10<sup>-11</sup> (t<sub>(22)</sub> \= 12.8010), respectively. Author response image 1 shows both regression fits to facilitate their comparison. These additional steps ensure the reliability of the relationship observed in the figure, even when accounting for the potential influence of the two data points.

      Author response image 1.

      (4) The authors conclude that tDCS modulates vermal PCs more than Crus I/II PCs - but they don't seem to test this statistically. It would be helpful to submit the firing rate change values to an actual statistical test to conclude this directly from the data.

      We agree that it would be appropriate to apply a statistical test to determine whether there is similarity in the level of modulation. To this end, we have normalized the modulation so that all data are positive. For example, a neuron that increases or decreases its activity by 50% relative to the baseline period will be considered as having a modulation of 50% in both cases. This yields a mean modulation of 9.42% for neurons recorded in Crus I/II and 62.35% for those in the Vermis. Since the two distributions do not meet the normality assumption (Shapiro-Wilk test), we used a Mann-Whitney test, which resulted in a p-value < 0.0001, thus demonstrating a significant difference in modulation between the two cerebellar regions analyzed. We added this information to the main text. Additionally, we included a new panel in Supplementary Figure 3 (Supplementary Figure 3i) to visually represent these data.

      Reviewer #1 (Recommendations for the authors):

      I have several suggestions to further improve the paper:

      (1) It remains unclear how many tDCS trials were done during each single-cell recording. What were the inclusion criteria? Were tens of trials done per cell or was a cell already included if the recording was stable during a few trials? Please clarify.

      For every single-cell recording, the maximum number of trials allowed by the recording stability were applied. A neuron was included in the analysis if the recording was stable for at least 2 trials at a given intensity and polarity, and up to a maximum of 1 hour recording. We introduced a paragraph in the methods section explaining this.

      (2) Along the same line, could the authors show cell responses to individual consecutive trials? Do the responses change over time? For example, does a cell increase the firing rate more during early trials compared to late trials? Please clarify.

      We appreciate the reviewer’s suggestion to investigate whether cell responses change over consecutive trials. In our data, when tDCS effects were observed, the changes in firing rate were evident from the very first trials in some neurons. To illustrate this, we have included Author response image 2, which shows examples of individual neuron responses (2 non-PC on the left and 2 PC on the right) across consecutive trials. Red and blue histogram bars indicate anodal and cathodal tDCS periods, respectively.

      Author response image 2.

      However, a rigorous analysis of the stimulation effect over time across trials was not feasible due to the considerable variability in the number of trials applied to different recorded neurons. This variability arose from differences in the duration for which stable recordings could be maintained.

      Despite this limitation, the early responses to tDCS provide valuable insights into the immediate effects of stimulation on neuronal activity.

      (3) Neurons are recorded very superficially, just below a 2 mm wide craniotomy. The temperature of the brain is likely lower than a normal physiological temperature. Did the authors consider the potential effects of temperature? Please address.

      We acknowledge the reviewer's concern regarding the potential effects of temperature on the recorded neurons. While it is challenging to precisely control the temperature of the tissue in the recording area, it is important to note that the temperature conditions were consistent across both the control and stimulation phases of the experiment. This consistency ensures that any potential effects of temperature are evenly distributed across conditions, thereby minimizing its impact on the observed changes in neuronal activity. Furthermore, although the recordings are conducted 2 mm below the craniotomy, this region is continuously bathed in saline, with an additional 3 mm of fluid maintained at physiological temperature, effectively preventing dehydration and cooling of the surface tissue. 

      (4) More general, but along the same line, is there any effect of the depth of the recorded cells on its response to stimulations for any of the data collected in this study? Figure 1 nicely shows that there is a significant electric field at depths up to 4 mm, but do more superficial cells have stronger/weaker responses to cathodal/anodal stimulation, as the electric field there is much stronger?

      We were also expecting to see some correlation between depth and degree of modulation, however, a linear regression analysis showed very low R<sup>2</sup> values (see Author response images 3-6), suggesting a negligible correlation between depth of recording and neuronal activity modulation. We did this analysis for Purkinje and non-Purkinje cells separately, as well as for recordings in CrusI-II or Vermis, showing similar negative results in all cases.

      Author response image 3.

      Author response table 1.

      Author response image 4.

      Author response table 2.

      Author response image 5.

      Author response table 3.

      Author response image 6.

      Author response table 4.

      (5) The authors are recording the movements of the mouse on a treadmill. Was there any correlation between tDCS and behavior? And between behavior and firing patterns? Please address.

      We appreciate the reviewer’s question regarding the potential correlation between tDCS and behavior, as well as between behavior and firing patterns. In our experimental setup, the movement of the mouse typically introduces electrical artifacts in the recordings, particularly during running on the treadmill. To ensure the accuracy of our data, trials that coincided with running or other significant movements were excluded from the analysis. This is explained in the Methods section of the main text under 'Data analysis' within the description of how single-cell activity was processed. On the other hand, conscious of the modulatory effects that animal movement or specific behaviors may have on neuronal firing rates, we thought that trials involving movement should be eliminated to avoid any potential confounding with the effects of current application. 

      (6) The strength of the electrical field seems highly variable. Do the authors have an explanation for this? Please address.

      We appreciate the reviewer’s observation regarding the variability in the strength of the electric field. This variability is indeed expected, given the inherent inter-individual differences in skull thickness across animals (which, as discussed in the main manuscript, attenuates around 20% of the current), as well as slight variations in the precise placement of the tES active electrode during surgery. These factors can lead to fluctuations in the electric field, although they remain within the same order of magnitude.

      (7) As the authors stated, even for cells recorded at a depth of over 2 mm, the electric fields are still much higher than the fields generated in human studies. Why were there no comparable strengths used? Please address.

      We thank the reviewer for raising this important point. Previous studies from our lab (SánchezLeón et al. 2021) demonstrated minimal modulation in neuronal activity (LFP) when using tDCS intensities below 200 µA in awake animals. To achieve stronger and more consistent effects, we selected an intensity of 200 µA for our experiments. It is well-established that small animals, such as mice, require higher electric field strengths than humans to induce observable effects (Ozen et al., 2010; Vöröslakos et al., 2018; Asan et al., 2020). This discrepancy may be attributed to several factors, including differences in neuronal density within the stimulated networks (Herculano-Houzel et al., 2009), as well as variations in axonal length and diameter (Chakraborty et al., 2018). However, as we stated in the Discussion, we also found modulated neurons for electric fields close to those in humans:

      “Importantly, we observe clear firing rate modulation of PCs and non-PCs at depths of 2.3 mm and tDCS intensity of 100 μA, where the measured electric field is as low as 5.9 V/m.”

      Despite these limitations, animal models remain invaluable for obtaining high-resolution invasive data that cannot be collected in human studies. Such experiments are crucial for understanding the basic mechanisms underlying non-invasive brain stimulation, validating computational models, and exploring the therapeutic potential of these techniques for various neurological conditions.

      References:

      Asan, A. S., Lang, E. J., & Sahin, M. (2020). Entrainment of cerebellar purkinje cells with directional AC electric fields in anesthetized rats. Brain stimulation, 13(6), 1548–1558. https://doi.org/10.1016/j.brs.2020.08.017 

      Chakraborty, D., Truong, D. Q., Bikson, M., & Kaphzan, H. (2018). Neuromodulation of Axon Terminals. Cerebral cortex (New York, N.Y. : 1991), 28(8), 2786–2794. https://doi.org/10.1093/cercor/bhx158

      Herculano-Houzel S. (2009). The human brain in numbers: a linearly scaled-up primate brain. Frontiers in human neuroscience, 3, 31. https://doi.org/10.3389/neuro.09.031.2009

      Ozen, S., Sirota, A., Belluscio, M. A., Anastassiou, C. A., Stark, E., Koch, C., & Buzsáki, G. (2010). Transcranial electric stimulation entrains cortical neuronal populations in rats. The Journal of neuroscience : the official journal of the Society for Neuroscience, 30(34), 11476–11485. https://doi.org/10.1523/JNEUROSCI.5252-09.2010

      Vöröslakos, M., Takeuchi, Y., Brinyiczki, K., Zombori, T., Oliva, A., Fernández-Ruiz, A., Kozák, G., Kincses, Z. T., Iványi, B., Buzsáki, G., & Berényi, A. (2018). Direct effects of transcranial electric stimulation on brain circuits in rats and humans. Nature communications, 9(1), 483. https://doi.org/10.1038/s41467-018-02928-3

      (8) It seems that there is a very high number of mice used for a relatively small number of cellular recordings. Can the authors explain this?

      We appreciate the reviewer’s observation regarding the number of mice used relative to the number of recorded neurons. There are several factors contributing to this:

      (1)  In vivo juxtacellular labeling is a complex, multi-step process where each step must be executed precisely to successfully label a neuron. During blind recordings, it is impossible to ensure with 100% certainty that the neuron targeted for juxtacellular labeling will later be recoverable with sufficient staining (Pinault, 1996). To maintain confidence in the correspondence between the recorded and labeled neuron, we typically limit our attempts to label one neuron per mouse, or at most, two neurons located far apart from each other.

      (2)  Recording duration limitations: The probability of maintaining a well-isolated, stable neuronal recording decreases significantly as the recording time increases. To obtain sufficient data with multiple tDCS trials, it is necessary to conduct numerous independent recordings. Additionally, each time the recording pipette penetrates the recording site, there is a minor but cumulative impact on the dura mater and neural tissue, leading to tissue degradation in subsequent recordings.

      (3)  Diverse experimental conditions: This study explores several conditions, including recordings in anesthetized and awake mice, targeting different cerebellar regions (Crus I/II and vermis), and utilizing a range of techniques (single-unit extracellular recordings using glass pipettes, juxtacellular recording and labeling, and high-density recordings using the Neuropixels system). These distinct approaches required the establishment of independent experimental animal groups, which contributed to the higher number of subjects used in the study.

      Although we were often able to record several neurons per mouse, the final number of neurons that met all criteria for analysis was reduced due to these limitations.

      References:

      Pinault D. (1996). A novel single-cell staining procedure performed in vivo under electrophysiological control: morpho-functional features of juxtacellularly labeled thalamic cells and other central neurons with biocytin or Neurobiotin. Journal of neuroscience methods, 65(2), 113–136. https://doi.org/10.1016/0165-0270(95)00144-1

      (9) The N for both the neurobiotin-stained neurons and the Neuropixels recordings was relatively low. If possible, it would be nice to see a few more cells.

      We sincerely thank the reviewer for their thoughtful suggestion to increase the sample size. While we understand the importance of this consideration, we believe it is not feasible at this stage due to several factors. First, the complexity of our experiments, which include single-neuron recordings in awake animals during electric field application, juxtacellular neurobiotin injections post-tDCS (with a low success rate), and high-density recordings from Purkinje cells across different layers in awake animals, significantly limits the throughput of data collection. Second, the statistical outcomes obtained from our analyses, which combine multiple techniques, are robust and provide a strong basis for our conclusions. Third, the current study already involves a substantial number of animals (74 mice), which aligns with ethical considerations for minimizing animal use while ensuring robust results.

      We believe that the current sample size is sufficient to support the findings presented in the manuscript. Expanding the sample size further would require considerable additional resources and time, without a clear indication that it would fundamentally alter the conclusions of the study. We are grateful for the reviewer’s understanding of these limitations and their acknowledgment of the value of the current dataset.

      (10) tDCS and tES seem to be used interchangeably; please make it consistent.

      We agree that this could cause confusion. To address this, we have added a clarification at the first mention of tES in the manuscript, indicating that tES (transcranial Electrical Stimulation) is an umbrella term that encompasses both tDCS (transcranial Direct Current Stimulation) and tACS (transcranial Alternating Current Stimulation). We have ensured consistent use of the appropriate term throughout the rest of the text.

      (11) Did the authors apply saline or agar to the craniotomy while recording? Or was the dura dried out? Can the authors clarify this, and relate the answer to a potential interaction of either the medium or dryness of the dura with the tDCS?

      We appreciate the reviewer’s inquiry. To prevent the dura from drying out during our recordings, we applied saline to the cranial window throughout the experiment. Additionally, in our setup, the tDCS ring-shaped electrode was placed over the skull and sealed with dental cement to prevent any leakage of currents into the craniotomy, which was positioned at the center of the preparation. This precaution also helped minimize electrical noise reaching the recording electrode. In instances where the seal was not perfectly executed, the electrical noise from tDCS leaked into the saline solution, causing amplifier saturation and rendering neuronal activity recordings impossible.

      (12) There are several mistakes in spelling and grammar throughout the document; please check carefully.

      We appreciate the reviewer’s attention to detail regarding spelling and grammar. We have carefully reviewed the manuscript and corrected all identified errors to ensure clarity and proper language use throughout the document.

      (13) Can the authors briefly explain why tACS (and not tDCS) is used to measure the effectiveness of the stimulation at the different depths as shown in Figure 1? As the rest of the paper focuses entirely on tDCS, it is important to understand why tACS is used in Figure 1.

      We will clarify this distinction. We chose tACS for measuring electric field strength for two main reasons:

      • Amplifier Limitations: The amplifiers commonly used in electrophysiology are designed to filter out low-frequency components, including direct current (DC) signals, using a highpass filter. This is due to the fact that the neuronal signals of interest, such as action potentials, typically occur at higher frequencies (several Hz to kHz). Consequently, any DC signal applied is filtered out from the recordings, preventing us from measuring changes in voltage effectively.

      • Impedance Changes: DC stimulation can alter the impedance of electrodes and surrounding tissue over time. To mitigate this effect and maintain stable recordings, it is advantageous to frequently alternate the polarity and intensity of the stimulation.

      This next text has been included in the 'Transcranial Electrical Stimulation' section of the 'Materials and Methods' part of the manuscript:

      “We selected tACS to measure electric field strength due to two main reasons: (1) amplifiers used in electrophysiology filter out low-frequency signals like DC, making voltage changes from tDCS undetectable, and (2) DC stimulation can alter electrode and tissue impedance over time, whereas alternating the polarity in tACS helps maintain stable recordings.”

      It is important to note that our aim with tACS is to provide an approximation of current propagation through the tissue, rather than to exactly replicate the baseline conditions encountered during continuous tDCS stimulation.

      (14) How do Figures 2e and f relate to each other? Figure 2e has 6 red lines, but 6f has 8 red explicitly states that 8 cells were recorded.

      We appreciate the Reviewer for highlighting this discrepancy. You are correct that in Figure 5e, the lines are too densely packed to easily distinguish all of them. Additionally, the activity of two neurons under anodal tDCS was greatly suppressed, which caused their corresponding arrowheads to be close to the origin of the arrows, making them less visible. To clarify, while Figure 5f shows all 8 cells recorded, the compression of the data in Figure 5e makes it challenging to distinguish all individual responses visually. We have added a clarifying note to the figure legend to explaining that “densely packed lines and suppressed activity of two neurons under anodal tDCS reduce the visibility of their responses”.

      (15) Figure 2g contains two outliers that seem critical to the correlation, this is noticeable as nearly all other cells seem to modulate much more modestly. Maybe add a few more cells to convince everyone?

      We agree with the reviewer that the two neurons in Figure 2g could appear as outliers. To address this, we applied the ROUT method with a stringent Q = 1% to detect potential outliers, and none were found. In addition, we have confirmed the robustness of our results by performing a complementary analysis using robust linear regression methods (e.g., M-estimators), which showed consistent findings with our original analysis. For this purpose, we used the 'Huber' loss function, which combines least squares with robustness against outliers. The regression line obtained with this method (y = -0.5650x + 157.4556) differs minimally from the originally presented value, with the p-value of the slope and the intercept being p = 1.4846x10<sup>-4</sup> (t<sub>(22)</sub> = -4.5740) and p = 1.1382x10<sup>-11</sup> (t<sub>(22)</sub> \= 12.8010), respectively. Author response image 1 both regression fits to facilitate their comparison. These additional steps ensure the reliability of the relationship observed in the figure, even when accounting for the potential influence of the two data points.

      (16) 'From these experiments we can conclude that 1) tDCS in vermis of anesthetized mice modulates PCs and non-PCs in a heterogeneous way'. Figure 4d shows no correlation between cathodal versus anodal stimulation for non-PCs, so how does the data suggest heterogeneous modulation of non-PCs? Is it simply heterogeneous because the data is very scattered?

      Thank you for your observation. By 'heterogeneous modulation,' we indeed refer to the scattered nature of the responses in non-PCs. Although Figure 4d shows a wide spread of data points and the linear regression is not statistically significant, a general trend can still be observed, where 11 out of 15 non-PCs show modulation in opposite directions with anodal and cathodal tDCS. However, this trend is not consistent across all neurons, hence our description of this modulation as heterogeneous. Importantly, this contrasts with the response observed in Purkinje cells (PCs), where a more consistent modulation pattern is evident, and the p-value for the linear regression is significant. Therefore, we conclude that while PCs show a clearer, more predictable modulation, the scattered data in non-PCs supports a more heterogeneous response.

      (17) The authors state that it is not possible to discriminate the non-PCs, even though some published papers suggest this is quite possible (see e.g., work by Simpson and Ruigrok; please discuss). For sure, the authors of the current manuscript should be able to discriminate the interneurons in the molecular layer from those in the granular layer (if it were only by identifying the polarity of the complex spikes). The authors may want to consider redoing the analyses of the non-PCs, and at least present and compare the outcomes of these two main subgroups of non-PCs.

      The authors are indeed familiar with the work of Simpson, Ruigrok, and others in linking electrophysiological recordings with neuronal class identity. Prior to proceeding with juxtacellular labeling, we conducted preliminary attempts to categorize non-PC neurons based on firing characteristics. However, we ultimately chose not to include neuronal sorting for non-PCs in this study for two main reasons. 

      First, the baseline recording period without tDCS was very short (10 seconds), and once tDCS was applied, the firing rate, coefficient of variation, and interspike intervals (ISI) of neurons were already altered. This made it difficult to reliably classify neurons based on their spontaneous activity, which is critical for precise sorting.

      Second, unlike PCs—where the presence of complex spikes and the resulting inhibition provide a clear ground truth—there is no analogous, unequivocal marker for non-PCs. Even following the reviewer's suggestion, while it might be possible in the molecular layer to identify a neuron as a molecular layer interneuron (MLI), this approach does not allow for a rigorous distinction between basket cells and stellate cells. These two cell types, despite their distinct morphologies—which could significantly affect their responses to tDCS—cannot be reliably differentiated without a true ground truth. Therefore, in the absence of such definitive markers, we believe that further subclassification of non-PCs based solely on electrophysiological properties would not be sufficiently rigorous for the purposes of our study.

      (18) Can the authors briefly discuss possible reasons why non-PCs in Crus1/2 do show heterogeneous responses similar to that of PCs, whereas the non-PCs in the vermis do not?

      We appreciate the reviewer’s insightful question regarding the different modulation patterns observed in non-PCs between Crus I/II and the vermis. Several potential factors could contribute to these differences, including variations in local cerebellar circuit connectivity between the two regions, differences in the cellular diversity of non-PCs due to the lack of a "ground truth" for their classification, or disparities in somatodendritic orientation and cell distribution. In the vermis, PCs are organized into different layers with opposing orientations (as shown in Figure 6), which could result in a more stable, polarity-dependent modulation, making their response more distinct from that of non-PCs. In contrast, in Crus I/II, the orientation of PCs is more heterogeneous and less aligned with the electric field, potentially leading to a more variable modulation pattern in both PCs and non-PCs. 

      However, it is important to note that we did not aim to juxtacellularly label non-PCs in this study, so we cannot offer a definitive answer regarding their precise orientation or identity. Additionally, the observed differences could be partially attributed to statistical power: we recorded 50 nonPCs in Crus I/II compared to only 25 in the vermis. Out of the 15 neurons in the vermis that showed statistically significant modulation, 11 displayed polarity-dependent modulation in opposite directions, but the smaller sample size might have limited our ability to detect the full range of possible effects. Furthermore, recordings in Crus I/II were conducted in awake animals, whereas the neurons recorded in Figure 4 in the vermis were obtained from anesthetized animals. This difference in physiological state could also be related to the observed changes.

      (19) 'The importance of PC axodendritic orientation in determining the effect of tDCS on firing rate modulation is further highlighted by our observation that pre-synaptic non-PC neurons providing inputs to PCs modulate their activity in a very heterogeneous way.' This is based on the finding that non-PCs modulate heterogeneously, but that is not what is shown for the vermis. Please address.

      Thank you for pointing this out. By 'heterogeneous modulation,' we are referring to the observation that non-Purkinje cells (non-PCs) respond in various ways under tDCS. Specifically, some nonPCs increase their activity under anodal stimulation and decrease it under cathodal stimulation (and vice versa), while others exhibit more complex patterns, such as increasing their activity under both anodal and cathodal stimulation or decreasing for both polarities. Additionally, some non-PCs only respond to one polarity, and others show no response at all.

      Our reasoning is that if the presynaptic non-PCs providing inputs to Purkinje cells (PCs) were the primary drivers of PC modulation, we would expect them to behave in a manner opposite to how PCs are modulated. For instance, if most non-PCs increased their activity under anodal stimulation while PCs decreased theirs, this could suggest that tDCS modulates non-PCs to fire more, imposing greater inhibition on PCs since many non-PCs are inhibitory. However, what we observe is a highly heterogeneous response from non-PCs, with no clear pattern that would consistently explain the modulation of PCs through presynaptic inputs alone. While non-PCs must certainly exert some influence on PC activity, their variable responses suggest that the modulation of PCs may also be driven by direct effects of tDCS on the PCs themselves, in addition to any indirect presynaptic influence.

      (20) To help in reinforcing the hypothesis that stimulation response depends on dendritic orientation, the authors could show, with the existing data, how PCs in different layers of the vermis respond to cathodal or anodal stimulations. The data shown in Figure 4a-c already has a large number of PCs recorded in different layers of the vermis. As shown in Figure 4b, PCs in specific layers of the vermis have specific dendritic orientations. Can the authors show that PCs recorded for Figure 4, in the different layers (implying similar dendritic orientation) have similar (or different) stimulation responses? This would greatly improve their argument for the importance of dendritic orientation for tDCS responses.

      We appreciate the reviewer’s suggestion and the valuable insight it provides. In fact, this was one of the main motivations for performing the experiments shown in Figure 6, where we conducted simultaneous recordings of different Purkinje cells (PCs) in distinct layers. This allowed us to directly compare responses in neurons with different somatodendritic orientations. Unfortunately, the data presented in Figure 4 were obtained using glass micropipettes for juxtacellular labeling— a method that permits recording from only one neuron at a time—thus precluding a robust analysis of the correlation between dendritic orientation and tDCS responses. Furthermore, it should be noted that Figure 4a represents an idealized approximation; since these recordings were performed in different animals with variations along the anteroposterior axis, precise dendritic orientation cannot be reliably attributed to each cell (except for those that were juxtacellularly labeled).

      Additionally, unlike recordings with Neuropixels, where we have numerous contacts positioned at known distances from each other, enabling us to precisely locate cells within the cerebellar layers, the localization of neurons recorded with glass pipettes is less accurate. This is due to factors such as tissue displacement during insertion and animal movements, which further complicates the precise determination of neuronal layer placement during the stimulation protocol.

      While the data in Figure 4 do not allow us to definitively test our hypothesis, the results shown in Figure 6 provide a more direct comparison of the responses of PCs across different layers to tDCS, thereby reinforcing the hypothesis that dendritic orientation is a key factor in modulating neuronal activity.

      (21) The data shown in Figure 5e-f feels underpowered, although the statistical correlation between dendritic orientation and response is strong. For example, currently, the authors show that at an angle of ~0 degrees, two cells increase their firing to anodal stimulation, and 1 cell at 180 ~degrees decreases its firing. Again, the manuscript would be much improved if the authors could increase the sample sizes for these experiments.

      We appreciate the reviewer’s concern regarding the sample size in Figure 5e-f. While the statistical correlation between dendritic orientation and response to tDCS is strong, we understand that the data may feel underpowered, particularly given the limited number of cells observed at specific angles such as ~0 degrees and ~180 degrees.

      It’s important to note that although visually it may appear there is only one neuron at 180 degrees during anodal stimulation, there are actually three neurons at this orientation. This is more clearly visible in the same figure during cathodal stimulation. However, the firing rate of these neurons during anodal stimulation is so low that the arrow representing their response appears very small, making it difficult to distinguish. (We have added a clarifying note to the figure legend to explaining that “densely packed lines and suppressed activity of two neurons under anodal tDCS reduce the visibility of their responses”).

      Unfortunately, increasing the sample size for these specific experiments is not feasible within the current study due to the technical complexity and time-consuming nature of the recordings, especially when incorporating juxtacellular labeling or high-density electrode arrays. Despite these challenges, we believe the current sample provides valuable insights into the relationship between dendritic orientation and firing rate modulation under tDCS. The significant statistical correlation suggests that the observed trend is robust, even with the existing sample size. Additionally, the different experimental approaches used in this study—single-unit extracellular recordings in different regions of the cerebellum in both awake and anesthetized animals, juxtacellular recordings and labeling, and high-density multi-unit recordings—provide a robust and comprehensive view of the results. Each technique offers complementary insights, strengthening our conclusions and ensuring that the observed patterns are not the result of one specific method or condition. Future studies could aim to expand on these findings, but we are confident that the results presented here contribute meaningfully to our understanding of how dendritic orientation influences neuronal responses to tDCS.

      (22) The authors, rightly so, address the potential impact of plasticity in the discussion. Here, the authors may want to cite other studies that have directly addressed this question: E.g., Das et al., 2017 (Frontiers Neuroscience, 11:444; doi: 10.3389/fnins.2017.00444) and van der Vliet et al., 2018 (Brain Stimul, 11(4):759-771; doi: 10.1016/j.brs.2018.04.009).

      We appreciate the reviewer’s suggestion to include additional studies addressing the impact of plasticity on the effects of cerebellar tDCS. In response, we have added a new sentence in the discussion section that cites both Das et al. (2017) and van der Vliet et al. (2018), highlighting the importance of synaptic plasticity in the effects of tDCS. 

      “These findings are consistent with previous work suggesting that synaptic plasticity is crucial for the effects of tDCS, as demonstrated by the importance of PC plasticity in behavioral outcomes(51) and the role of BDNF-mediated plasticity in motor learning(52).”

      Reviewer #2 (Recommendations for the authors):

      In the introduction, it would be beneficial to provide additional context regarding the influence of neuronal orientation on modulation shown from in-vitro studies. In addition, some explanation of the uniformity/non-uniformity of the electrical field would help. From here, the authors should provide their specific hypotheses for these experiments.

      We thank the Reviewer #2 for this insightful comment. In response, we have expanded the introduction to provide a clearer context regarding the influence of neuronal orientation on the effects of tDCS. Therefore, we have added two new paragraphs in the Introduction to address these points.

      “For neurons whose somatodendritic axis is aligned with the electric field, the field induces a pronounced somatic polarization. In the case of anodal stimulation, where the positive electrode is positioned near the dendrites and the soma is oriented away, positively charged ions accumulate near the soma, leading to depolarization and increased excitability, thus facilitating action potential generation. Conversely, neurons whose orientation opposes the field, such as when the soma is closer to the positive electrode and the dendrites face away, experience hyperpolarization, reducing excitability. Lastly, neurons oriented perpendicular to the electric field would exhibit minimal somatic polarization, as the field does not induce significant redistribution of charges along the somatodendritic axis.”

      Additionally, we have now clarified our a priori hypothesis regarding neuronal orientation and its expected influence on tDCS efficacy.

      “We hypothesized that the orientation of PCs relative to the electric field would influence the effects of tDCS on neural activity. In the Vermis, PCs oriented parallel to the field are expected to exhibit stronger effects due to greater somatic polarization, leading to depolarization or hyperpolarization depending on the orientation of the somatodendritic axis. Conversely, PCs in Crus I/II, which are oriented obliquely to the field, are expected to exhibit intermediate effects, as the oblique alignment reduces the strength of polarization compared to parallel alignment.”

      Justification of the stimulation parameters used (i.e., intensity and pattern) should be included in the Methods.

      The time of stimulation was chosen of only a few seconds to avoid confounding effects of plasticity, which is known to require several minutes of tDCS administration. Regarding the intensities, we refer to previous studies from our lab, using the exact same methodology, where we find that 100, 200 and 300 µA were ideal to obtain reliable and robust results in neuronal modulation, while keeping animal awareness of the stimulation at a minimum level. We also added the clarification to the main text.

      Please also justify the use of tACS rather than tDCS in the first experiment.

      We appreciate Reviewer #2’s assessment of the differences between tDCS and tACS. We will clarify this distinction. We chose tACS for measuring electric field strength for two main reasons:

      • Amplifier Limitations: The amplifiers commonly used in electrophysiology are designed to filter out low-frequency components, including direct current (DC) signals, using a highpass filter. This is due to the fact that the neuronal signals of interest, such as action potentials, typically occur at higher frequencies (several Hz to kHz). Consequently, any DC signal applied is filtered out from the recordings, preventing us from measuring changes in voltage effectively.

      • Impedance Changes: DC stimulation can alter the impedance of electrodes and surrounding tissue over time. To mitigate this effect and maintain stable recordings, it is advantageous to frequently alternate the polarity and intensity of the stimulation.

      This next text has been included in the 'Transcranial Electrical Stimulation' section of the 'Materials and Methods' part of the manuscript:

      “We selected tACS to measure electric field strength due to two main reasons: (1) amplifiers used in electrophysiology filter out low-frequency signals like DC, making voltage changes from tDCS undetectable, and (2) DC stimulation can alter electrode and tissue impedance over time, whereas alternating the polarity in tACS helps maintain stable recordings.”

      It is important to note that our aim with tACS is to provide an approximation of current propagation through the tissue, rather than to exactly replicate the baseline conditions encountered during continuous tDCS stimulation.

      Reviewer #3 (Recommendations for the authors):

      (1) A suggestion would be to highlight which of the data points in Figure 2g are the neurons they show as representative in Figure 2e-f. This would give the reader insights into how a standard neuron would behave/how representative these neurons are.

      We appreciate the reviewer’s comment and, in response, we have highlighted the two exemplary neurons from Figures 2e-f in Figure 2g to provide better insight into how these representative neurons behave in the context of the overall data. This will help the reader understand how typical these neurons are in relation to the broader dataset. Additionally, we have applied the same approach to Figure 3, highlighting the representative neurons for further clarity.

      (2) It would also be interesting to add figures to the supplementary materials that show the waveforms of non-PC neurons during anodal and cathodal tDCS, as done for PC neurons in the supplementary materials (as stated at the bottom of page 14, the authors chose to mention but not show these).

      We understand the reviewer’s interest in visualizing the waveforms of non-Purkinje neurons during anodal and cathodal tDCS. To address this, we have carefully examined the waveforms of both non-Purkinje neurons under these conditions. However, given the absence of notable changes in their waveforms, we believe that this data does not have sufficient standalone significance to justify the inclusion of a new figure. We are, of course, happy to provide this data upon request or to incorporate it into the supplementary materials if deemed necessary.

      Author response image 7.

      Superimposed averaged SS waveforms under control (black), anodal (red) and cathodal (blue) tDCS from the example neurons shown in panels A and B in Fig. 3.

      (3) In Figure 5d, there is a significant aftereffect of the stimulation on the Purkinje cell firing rate - do the authors have an idea why this occurred?

      We appreciate the reviewer’s observation, as it highlights an interesting phenomenon that we have not been able to fully explain. We observed this aftereffect in many of the recorded neurons, and intriguingly, it often occurred in the opposite direction to the modulation seen during tDCS. We addressed a potential explanation for this in the discussion section:

      ‘Nonetheless, we cannot rule out the possibility of indirect synaptic effects. Indeed, the electric field gradient imposed by tDCS could indirectly modulate a specific neuron firing rate by increasing (or decreasing) its pre-synaptic activity, i.e. by modulating the firing rate of other neurons that synapse onto it. Indeed, these synaptic changes could explain the rebound effect observed after tDCS termination. The synapses involved in the modulation of firing rate may undergo a short-term plasticity process(47–50), which can continue to affect the firing rate even after the external currents have been turned off and no polarization is exerted on the neuron. These findings are consistent with previous work suggesting that synaptic plasticity is crucial for the effects of tDCS, as demonstrated by the importance of PC plasticity in behavioral outcomes(51) and the role of BDNF-mediated plasticity in motor learning(52).’

      This explanation highlights the potential role of synaptic plasticity and the indirect modulation of neuronal networks, but further investigation would be required to fully understand the mechanisms underlying this aftereffect.

      (4) I'm having trouble understanding the reference electrode positioning from schematics 1a & 1b: The text and 1a suggest that the reference electrode was positioned on the back of the mouse, outside of the brain. But Figure 1b looks as if the reference electrode was on the mouse cerebral cortex. Could the authors adapt schematic 1b to clarify the reference location or add this information to the legend?

      We agree that the figure showing two different reference electrodes was confusing, and we have now modified it to better clarify the distinction between the recording reference electrode and the stimulation reference electrode. Additionally, we have specified in Figures 1A and 1B whether the reference pertains to the transcranial alternating stimulation or to the electrophysiological recording.

      (9) In the discussion, (page 22) the authors highlight the importance of axodendritic orientation, but they analyze only somatodendritic orientation. Are the two so similar that they can be used synonymously? This would be good to clarify.

      We appreciate the reviewer’s clarification and fully agree. While Purkinje cells (PCs) do indeed have a highly polarized morphology, with the axon generally oriented in the opposite direction to the main dendrites, this is not always the case, especially for other types of neurons. Therefore, our results strictly refer to the somatodendritic axis, as this is the one we can most clearly observe through our juxtacellular labeling. In response, we have changed all instances where the term 'axodendritic' appeared in the text to 'somatodendritic' for accuracy.

      (10) It would be helpful to clarify that Supplementary Figure 3b and 3e are the same as Figures 4 c and 4d, respectively. This was confusing to me.

      We appreciate the reviewer’s feedback and have now modified the caption of Supplementary Figure 3 to indicate that Supplementary Figures 3b and 3e correspond to Figures 4c and 4d, respectively. This should help clarify any confusion.

      (11) Typo: 'consisting in' ◊ consisting of

      We thank the reviewer for their clarification. The typo has been corrected to 'consisting of'.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The concept that trained immunity, as defined, can be beneficial to subsequent immune challenges is important in the broad context of health and disease. The significance of this manuscript is the finding that trained immunity is actually a two-edged sword, herein, detrimental in the context of LPS-induced Acute Lung Injury that is mediated by AMs.

      Strengths:

      Several lines of evidence in different mouse models support this conclusion. The postulation that differences in immune responses in individuals are linked to differences in the mycobiome and consequent B-glucan makeup is provocative.

      Weaknesses:

      The findings that the authors state are relevant to sepsis, are actually confined to a specific lung injury model and not classically-defined sepsis. In addition, the ontogeny of the reprogrammed AMs is uncertain. Links in the proposed signaling pathways need to be strengthened.

      Reviewer #2 (Public review):

      Summary:

      Prével et al. present an in vivo study in which they reveal an interesting aspect of β-glucan, a known inducer of enhanced immune responses termed trained immunity in sterile inflammation. The authors can show, that β-glucan's can reprogram alveolar macrophages (AMs) in the lungs through neutrophils and IFNγ signaling and independent of Dectin1. This reprogramming occurs at both transcriptional and metabolic levels. After β-glucan training, LPS-induced sterile inflammation exacerbated acute lung injury via enhanced immunopathology. These findings highlight a new aspect of β-glucan's role in trained immunity and its potential detrimental effects when enhanced pathogen clearance is not required.

      Strengths:

      (1) This manuscript is well-written and effectively conveys its message.

      (2) The authors provide important evidence that β-glucan training is not solely beneficial, but depending on the context can also enhance immunopathology. This will be important to the field for two reasons. It shows again, that trained immunity can also be harmful. Jentho et al. 2021 have already provided further evidence for this aspect. And it highlights anew that LPS application is an insufficient infection model.

      Weaknesses:

      (1) Only a little physiological data is provided by the in vivo models.

      (2) The effects in histology appear to be rather weak.

      Reviewer #1 (Recommendations for the authors):

      The opening paragraph in the introduction focuses on sepsis. This is misleading since this manuscript does not address sepsis but rather intranasal-administered LPS-induced acute lung injury.

      We are in total agreement with the reviewer and have modified the introduction to focus on acute lung injury with clinical relevance more associated to TLR4-mediated acute lung injury and lung inflammation.

      The authors make definitive statements that AMs originate from fetal liver monocytes. However, it is well known that the ontogeny of AMs is complex and AMs can be populated, in part, from peripheral monocytes. The ontogeny of reprogrammed AMs was not addressed in this study but they may come from monocyte-derived AMs following B-glucan training (transfer of AMs into Csf2rb KO mice does not prove the contrary). In this regard, do, for example, the percentages of CD11b+ AMs change? More phenotyping of the control and reprogrammed AMs would enhance the interpretation of the findings.

      The reviewer is correct that the ontogeny of AMs can be heterogenous, especially following a pulmonary challenge. In β-glucan-treated mice, Figure 1I shows no changes in frequency or number of AMs in the BAL. As the reviewer suggested, we repeated this experiment and incorporate more markers for AMs. New Supplementary Figure 1C shows the expression of CD11b on AMs (CD11c<sup>+</sup>SiglecF<sup>+</sup>) from control and β-glucan-treated mice. While the frequency increases with LPS administration, we show no difference between control and β-glucan groups suggesting β-glucan does not induce the expansion of monocyte-derived AMs. Additionally, in New Supplementary Figure 1D, we show the expression of AM-associated markers in order to better delineate their phenotype. We observed no differences in MHCII, CD169, CD64 and F4/80 in β-glucan-treated mice, but an increase in CD80<SUP>+</SUP> AMs following βglucan suggesting enhanced activation corroborating their proinflammatory phenotype. Collectively, these data indicate that while the frequency and number of either yolk-sac or BMderived AMs are unchanged in the β-glucan treated mice, the activation of AMs is enhanced after the systemic treatment with β-glucan.

      The abstract seems to overpromise a bit. First, it mentions trained immunity and HSCs, but they don't seem to formally address either in the context of this model (there is reprogramming as assessed by transcriptome and metabolic analyses which is suggestive as stated by the authors, but do the changes overlap significantly with classically trained immunity?), and second, it links phenotypes together in a pathway(s) that they haven't actually interrogated - although they look at transcripts and do a seahorse assay they don't actually confirm that any of those findings are related to the increased response to LPS in vivo. The long discussion with all the caveats highlights these limitations, all relegated to future studies.

      We thank the reviewer for this comment. In response, we have revised the abstract to more accurately highlight the key findings of this study. Specifically, we introduced the concept of central trained immunity to describe the phenomena commonly observed with β-glucan treatment, contrasting it with the peripheral trained immunity detailed in the manuscript.

      The use of Csf2rb-/- mice to complement the clodronate approach is interesting (this approach has been used in the past with influenza virus). In addition to lacking AMs, these mice develop pulmonary alveolar proteinosis. Do the authors have histopathology from these mice in the current model? They mention PAP in the discussion.

      Pulmonary alveolar proteinosis (PAP) typically develops in Csf2b-/- mice from 12 weeks of age onwards (Stanley et al., Proc Natl Acad Sci USA, 1994). However, in our model, mice were euthanized at 6 weeks, ensuring that pulmonary function and structure remained intact. A hallmark of PAP is the accumulation of protein, primarily surfactant, in BAL. To investigate this, we measured BAL protein concentration and observed no differences at baseline (Figure 2F). These findings were further supported by the absence of differences in BAL proinflammatory cytokine concentrations (Figure 2H).

      A question about their BAL technique? In the control mice without glucan/LPS stimulation, only 40% of BAL cells are AMs [and the total number of AMs (range of <103 to 2-3 x 104) is at least 5-fold lower than typically seen in BALs from healthy mice (105), and there didn't seem to be many PMNs either. Are 60% of the BAL cells lymphocytes/ RBCs? Is it possible that overall AM numbers are changing, but CD11c/SiglecF-positive cell numbers stay the same (only assessed 2 markers)? More phenotyping would help.

      We appreciate the reviewer’s comment and would like to clarify that alveolar macrophages (AMs) are presented in the manuscript as a frequency of viable cells rather than as a frequency of CD45<SUP>+</SUP> cells, to ensure consistency throughout the study. The remaining cells in the samples are likely epithelial cells and lymphocytes, as red blood cells are lysed during sample processing. For additional context, we now provide data showing AMs as a percentage of CD45<SUP>+</SUP> cells, which account for 80–90% of leukocytes. Furthermore, in New Supplementary Figure 1D, we highlight the expression of AM-associated markers to better define their phenotype. We observed no differences in MHCII, CD169, CD64, or F4/80 expression in βglucan-treated mice. However, there was an increase in CD80<SUP>+</SUP> AMs, indicating enhanced activation and corroborating their proinflammatory phenotype.

      Author response image 1.

      AMs as percentage of CD45<SUP>+</SUP> cells. Mice were treated with β-glucan for seven days. We show CD11c<sup>+</sup>SiglecF<sup>+</sup> cells in the bronchoalveolar lavage (BAL) as a percentage of CD45<SUP>+</SUP> cells (n=5).

      Line 130-131. TNF is decreased and not pointed out.

      In the poly(I:C) model, the difference in the BAL TNF concentration is not statistically different between naïve and trained mice due to high variability of data. The reviewer is correct that TNFα does not appear to reflect Poly(I:C)-mediated ALI. We have included this point in the revised manuscript (Line 146-148).

      Reviewer #2 (Recommendations for the authors):

      Suggestions:

      (1) The authors provide evidence for enhanced ALI via different techniques, e.g. histology, vascular leakage, immune cell composition in BAL etc. It would be interesting to see whether there were any changes in the disease severity of ALI. If possible the authors could provide data for survival, temperature, weight, and/or glucose in the different groups.

      Mice are extremely resistant to the pulmonary LPS model. We have previously assessed lethality of our LPS model, and all mice survive even with an increased intranasal dose of LPS 200μg (Pernet et al, Nature, 2023). To address the reviewer concerns, we next assessed the morbidity by monitoring weight loss following LPS challenge and showed β-glucan-treated mice exhibit a delayed recovery time after 4 days LPS treatment (New Supplementary Figure 1B).

      (2) The authors show that ß-glucan mediated training enhances ALI. Conversely, the opposite, decreased immunopathology should be observed in case an LPS tolerance model would be used. I am wondering whether this has already been performed, given that the (LPS/immune)tolerance field is already older than the training field. If not, I suggest incorporating this feature in their discussion.

      Thank you for this insightful comment. While LPS has long been recognized to induce tolerance, studies have also shown that intranasal exposure to ambient levels of LPS can induce alveolar macrophage (AM) training via type I interferon signaling (Zahalka et al., Mucosal Immunol, 2022). In contrast, Mason et al. demonstrated that systemic LPS stimulation induces tolerance through TNF-α signaling, resulting in diminished AM phagocytosis and superoxide production. This leads to reduced neutrophil recruitment and impaired bacterial clearance in a Pseudomonas aeruginosa pneumonia model (J Infect Dis, 1997). Furthermore, we recently reported that systemic administration of β-glucan induces central trained immunity, generating a distinct subset of regulatory neutrophils that promote disease tolerance against influenza viral infection (Khan et al., Nat Immunol, 2025). These findings highlight the complex and context-dependent interplay between training and tolerance. We have expanded on this point in the discussion section of the revised manuscript (Lines 289-297).

      (3) The finding that trained immunity can exert not only beneficial effects but also enhance immunopathology is interesting and should be further explored. Already Jentho et al. (PNAS 2021) have shown that upon sterile inflammation as imposed by LPS, (heme) training can lead to enhanced mortality. This might be a relevant trade-off in trained immunity since no beneficial resistance effect by pathogen killing can be obtained. It would be interesting to see, in their model, whether heme would also enhance ALI after intranasal LPS application. Or at least, can the authors discuss this finding more, also in relation to the already published evidence?

      Thank you for raising this interesting point, which is indeed relevant to our study. Jentho et al. demonstrated that training by heme can be beneficial in combating infectious challenges but can have deleterious effects in the context of sterile inflammation. The concept of endogenous training agents like heme, with their diverse effects on immune cells, aligns well with our βglucan model, particularly given the high prevalence of fungal agents in the microbiome.

      While investigating the effects of heme on alveolar macrophages would certainly be intriguing, Jentho and colleagues have already reported the maladaptive effects of heme, such as tissue damage, during sterile LPS-induced inflammation. As such, these findings might be redundant in the context of our model. However, we have drawn a relevant parallel and expanded on this discussion in the revised manuscript (Lines 382-385).

      (4) It is not clear how the histologies were evaluated. This is a field of great subjectivity. The authors should describe it in more detail. The best option would have been a blinded observer. Was this done?

      Histology samples were evaluated according to ATS 2011 guidelines regarding “Features and measurements of experimental acute lung injury in animals” by a blinded pathologist. We have specified this in the methods of the revised manuscript.

      Minor:

      (1) Line 108 and ff. Please change TNF, not TNFa

      Since we used an ELISA specific for TNF-α rather than general TNF, it is more accurate to refer to it as TNF-α.

      (2) Line 513 and ff. Please use Greek letters when appropriate, e.g. IFN-γ not IFNg.

      Thank you for pointing out these mistakes, we rectified these in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary: 

      The authors compared four types of hiPSCs and four types of hESCs at the proteome level to elucidate the differences between hiPSCs and hESCs. Semi-quantitative calculations of protein copy numbers revealed increased protein content in iPSCs. Particularly in iPSCs, proteins related to mitochondrial and cytoplasmic were suggested to reflect the state of the original differentiated cells to some extent. However, the most important result of this study is the calculation of the protein copy numbers per cell, and the validity of this result is problematic. In addition, several experiments need to be improved, such as using cells of different genders (iPSC: female, ESC: male) in mitochondrial metabolism experiments.

      Strengths: 

      The focus on the number of copies of proteins is exciting and appreciated if the estimated calculation result is correct and biologically reproducible. 

      Weaknesses: 

      The proteome results in this study were likely obtained by simply looking at differences between clones, and the proteome data need to be validated. First, there were only a few clones for comparison, and the gender and number of cells did not match between ESCs and iPSCs. Second, no data show the accuracy of the protein copy number per cell obtained by the proteome data. 

      We agree with the reviewer that it would be useful to have data from more independent stem cell clones and ideally an equal gender balance of the donors would be preferable. As usual, practical cost-benefit, and time available affect the scope of work that can be performed. We note that the impact of biological donor sex on proteome expression in iPSC lines has already been addressed in previous studies13. We will however revise the manuscript to include specific mention of these limitations and propose a larger-scale follow-up when resources are available.

      Regarding the estimation of protein copy numbers in our study, we would like to highlight that the proteome ruler approach we have used has been employed extensively in the field previously, with direct validation of differences in copy numbers provided using orthogonal methods to MS, e.g., FACS2-4,7,10. Furthermore, the original manuscript14 directly compared the copy numbers estimated using the “proteomic ruler” to spike-in protein epitope signature tags and found remarkable concordance. This original study was performed with an older generation mass spectrometer and reduced peptide coverage, compared with the instrumentation used in our present study. Further, we noted that these authors predicted that higher peptide coverage, such as we report in our study, would further increase quantitative performance.

      Reviewer #2 (Public Review):

      Summary: 

      Pluripotent stem cells are powerful tools for understanding development, differentiation, and disease modeling. The capacity of stem cells to differentiate into various cell types holds great promise for therapeutic applications. However, ethical concerns restrict the use of human embryonic stem cells (hESCs). Consequently, induced human pluripotent stem cells (ihPSCs) offer an attractive alternative for modeling rare diseases, drug screening, and regenerative medicine. A comprehensive understanding of ihPSCs is crucial to establish their similarities and differences compared to hESCs. This work demonstrates systematic differences in the reprogramming of nuclear and non-nuclear proteomes in ihPSCs. 

      We thank the reviewer for the positive assessment.

      Strengths: 

      The authors employed quantitative mass spectrometry to compare protein expression differences between independently derived ihPSC and hESC cell lines. Qualitatively, protein expression profiles in ihPSC and hESC were found to be very similar. However, when comparing protein concentration at a cellular level, it became evident that ihPSCs express higher levels of proteins in the cytoplasm, mitochondria, and plasma membrane, while the expression of nuclear proteins is similar between ihPSCs and hESCs. A higher expression of proteins in ihPSCs was verified by an independent approach, and flow cytometry confirmed that ihPSCs had larger cell sizes than hESCs. The differences in protein expression were reflected in functional distinctions. For instance, the higher expression of mitochondrial metabolic enzymes, glutamine transporters, and lipid biosynthesis enzymes in ihPSCs was associated with enhanced mitochondrial potential, increased ability to uptake glutamine, and increased ability to form lipid droplets. 

      Weaknesses: 

      While this finding is intriguing and interesting, the study falls short of explaining the mechanistic reasons for the observed quantitative proteome differences. It remains unclear whether the increased expression of proteins in ihPSCs is due to enhanced transcription of the genes encoding this group of proteins or due to other reasons, for example, differences in mRNA translation efficiency. Another unresolved question pertains to how the cell type origin influences ihPSC proteomes. For instance, whether ihPSCs derived from fibroblasts, lymphocytes, and other cell types all exhibit differences in their cell size and increased expression of cytoplasmic and mitochondrial proteins. Analyzing ihPSCs derived from different cell types and by different investigators would be necessary to address these questions. 

      We agree with the Reviewer that our study does not extend to also providing a detailed mechanistic explanation for the quantitative differences observed between the two stem cell types and did not claim to have done so. We have now included an expanded section in the discussion where we discuss potential causes. However, in our view fully understanding the reasons for this difference is likely to involve extensive future in-depth analysis in additional studies and is not something that can be determined just by one or two additional supplemental experiments.

      We also agree studying hiPSCs reprogrammed from different cell types, such as blood lymphocytes, would be of great interest. Again, while we agree it is a useful way forward, in practice this will require a very substantial additional commitment of time and resources. We have now included a section discussing this opportunity within the discussion to encourage further research into the area.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) aizi1 and ueah1 clones, which were analyzed in Figure 1A, were excluded from the proteome analysis. In particular, the GAPDH expression level of the aizi1 clone is similar to that of ESCs and different from other iPSC clones. An explanation of how the clones were selected for proteome analysis is needed. Previously, the comparative analysis of iPSCs and ESCs reported in many studies from 2009-2017 (Ref#1-7) has already shown that the number of clones used in the comparative analysis is small, claiming differences (Ref#1-3) and that the differences become indistinguishable when the number of clones is increased (Ref#4-7). Certainly, few studies have been done at the proteome level, so it is important to examine what differences exist in the proteome. Also, it is interesting to focus on the amount of protein per cell. However, if the authors want to describe biological differences, it would be better to get the proteome data in biological duplicate and state the reason for selecting the clones used.

      (1) M. Chin, Cell Stem Cell, 2009, PMID: 19570518

      (2) K. Kim, Nat Biotechnol., 2011, PMID: 22119740

      (3) R. Lister, Nature, 2011, PMID: 21289626

      (4) A.M. Newman, Cell Stem Cell, 2010, PMID: 20682451

      (5) M.G. Guenther, Cell Stem Cell, 2010, PMID: 20682450

      (6) C. Bock, Cell, 2010, PMID: 21295703

      (7) S. Yamanaka, Cell Stem Cell, PMID: 22704507

      We agree with the reviewer that analysing more clones would be beneficial. We have included a section of this topic in the discussion. In our study, we only had access to the 4 hESC lines included, therefore in the original proteomic study we also analysed 4 hiPSC lines, which were routinely grown within our stem cell facility. While as the study progressed the stem cell facility expanded the culture of additional hiPSC lines, unfortunately we couldn’t also access additional hESC lines.

      We agree that ideally combining each biological replicate with additional technical replicates would provide extra robustness. As usual, cost and practical considerations at the time the experiments were performed affected the experimental design chosen. For the experimental design, each experiment was contained within 1 batch to avoid the strong batch effects present in TMT (Brenes et al 2019).

      (2) iPSC samples used in the proteome analysis are two types of female and two types of male, while ESC samples are three types of female and one type of female. The number of sexes of the cells in the comparative analysis should be matched because sex differences may bias the results.

      While we agree with the reviewer in principle, we have previously performed detailed comparisons of proteome expression in many independent iPSC lines from both biological male and female donors (see Brenes et al., Cell Reports 2021) and it seems unlikely that biological sex differences alone could account for the proteome differences between iPS and ESC lines uncovered in this study . However, as this is a relevant point, we have revised the manuscript to explicitly mention this caveat within the discussion section.

      (3) In Figure 1h, I suspect that the variation of PCA plots is very similar between ESCs and iPSCs. In particular, the authors wrote "copy numbers for all 8 replicates" in the legend, but if Figure 1b was done 8 times, there should be 8 types of cells x 8 measurements = 64 points. Even if iPSCs and ESCs are grouped together, there should be 8 points for each cell type. Is it possible that there is only one TMT measurement for this analysis? If so, at least technical duplicates or biological duplicates would be necessary. I also think each cell should be plotted in the PCA analysis instead of combining the four types of ESCs and iPSCs into one.

      We thank the reviewer for bringing this error to our attention. The legend has been corrected to state, “for all 8 stem cell lines”. Each dot represents the proteome of each of the 4 hESCs and 4 hiPSCs that were analysed using proteomics.

      (4) It is necessary to show what functions are enriched in the 4408 proteins whose protein copies per cell were increased in the iPSCs obtained in Figure 2B.

      The enrichment analysis requested has been performed and is now included as a new supplemental figure 2. We find it very interesting that despite the large number of proteins involved here (4,408), the enrichment analysis still shows clear enrichment for specific cellular processes. The summary plot using affinity propagation within webgestalt is included here:

      Author response image 1.

      (5) The Proteomic Ruler method used in this study is a semi-quantitative method to calculate protein copy numbers and is a concentration estimation method. Therefore, if the authors want to have a biological discussion based on the results, they need to show that the estimated concentrations are correct. For example, there are Western Blotting (WB) results for genes with no change in protein levels in hESC and hiPSC in Fig. 6ij, but the WB results for the group of genes that are claimed to have changed are not shown throughout the paper. Also, there is no difference in the total protein level between iPSCs and ESCs from the ponceau staining in Fig.6ij. WB results for at least a few genes are needed to show whether the concentration estimates obtained from the proteome analysis are plausible. If the protein per cell is increased in these iPSC clones, performing WB analysis using an equal number of cells would be better.

      Regarding the ‘proteome ruler’ approach we would like to highlight that this method has previously been used extensively in the field, with detailed validation, as already explained above. It is also not ‘semi-quantitative’ and can estimate absolute abundance, as well as concentrations. Our work does not use their concentration formulas, but the estimation of protein copy numbers, which was shown to closely match the observed copy numbers as determined when spike-ins are used14.

      In providing here additional validation using Western Blotting (WB), we prioritised for analysis also by WB the proteins related to pluripotency markers, which are vital to determine the pluripotency state of the hESCs and hiPSCs, as well as histone markers. We have included a section in the discussion concerning additional validation data and agree in general that further validation is always useful.

      (6) Regarding the experiment shown in Figure 4l, the gender of iPSC used (wibj2) is female and WA01 (H1; WA01) is male. Certainly, there is a difference in the P/E control ratio, but isn't this just a gender difference? The sexes of the cells need to be matched.

      We accept that ideally the sexes of donors should ideally have been matched and have mentioned this within the discussion. Nonetheless, as previously mentioned, our previous detailed proteomic analyses of multiple hiPSC lines13 derived from both biological male and female donors provide relevant evidence that the results shown in this study are not simply a reflection of the sex of the donors for the respective iPSC and ESC lines. When comparing eroded and non-eroded female hiPSCs to male hiPSCs we found no significant differences in any electron transport chain proteins, not TCA proteins between males and females.

      Minor comments:

      (1) Method: Information on the hiPSCs and hESCs used in this study should be described. In particular, the type of differentiated cells, gender, and protocols that were used in the reprogramming are needed.

      We agree with the reviewer on this. The hiPSC lines were generated by the HipSci consortium, as described in the flagship HipSci paper15. We cite the flagship paper, which specifies in great detail the reprogramming protocols and quality control measures, including analysis of copy number variations15. However, we agree that this information may not be easily accessible for readers. We agree it is relevant to explicitly include this information in our present manuscript, instead of expecting readers to look at the flagship paper. These details have therefore been added to the revised version.

      (2) Method: In Figure1a, Figure 6i, j, the antibody information of Nanog, Oct4, Sox2, and Gapdh is not written in the method and needs to be shown.

      The data relating to these has now been included within the methods section.

      (3) Method: In Figure 1b and other figures, the authors should indicate which iPSC corresponds to which TMT label; the data in the Supplemental Table also needs to indicate which data is which clone.

      We have now added this to the methods section.

      (4) Method: The method of the FACS experiment used in Figure 2 should be described.

      The methods related to the FACS analysis have now been included within the manuscript.

      (5) Method: The cell name used in the mitochondria experiment shown in Figure 4 is listed as WA01, which is thought to be H1. Variations in notation should be corrected.

      This has now been corrected.

      (6) Method: The name of the cell clone shown in Figure 3l,m should be mentioned.

      We have now added these details on the corresponding figure and legend.

      Reviewer #2 (Recommendations For The Authors):

      This study utilized quantitative mass spectrometry to compare protein expression in independently derived 4 ihPSC and 4 hESC cell lines. The investigation quantified approximately 7,900 proteins, and employing the "Proteome ruler" approach, estimated protein copy numbers per cell. Principal component analyses, based on protein copy number per cell, clearly separated hiPSC and hESC, while different hiPSCs and hESCs grouped together. The study revealed a global increase in the expression of cytoplasmic, mitochondrial, membrane transporters, and secreted proteins in hiPSCs compared to hESCs. Interestingly, standard median-based normalization approaches failed to capture these differences, and the disparities became apparent only when protein copy numbers were adjusted for cell numbers. Increased protein abundance in hiPSC was associated with augmented ribosome biogenesis. Total protein content was >50% higher in hiPSCs compared to hESCs, a observation independently verified by total protein content measurement via the EZQ assay and further supported by the larger cell size of hiPSCs in flow cytometry. However, the cell cycle distribution of hiPSC and hESC was similar, indicating that the difference in protein content was not due to variations in the cell cycle. At the phenotypic level, differences in protein expression also correlated with increased glutamine uptake, enhanced mitochondrial potential, and lipid droplet formation in hiPSCs. ihPSCs also expressed higher levels of extracellular matrix components and growth factors.

      Overall, the presented conclusions are adequately supported by the data. Although the mechanistic basis of proteome differences in ihPSC and hESC is not investigated, the work presents interesting findings that are worthy of publication. Below, I have listed my specific questions and comments for the authors.

      (1) Figure 1a displays immunoblots from 6 iPSC and 4 ESC cell lines, with 8 cell lines (4 hESC, 4 hiPSC) utilized in proteomic analyses (Fig. 1b). The figure legend should specify the 8 cell lines included in the proteomic analyses. The manuscript text describing these results should explicitly mention the number and names of cell lines used in these assays.

      We agree with the reviewer and have now marked in figure 1 all the lines that were used for proteomics and have added a section in the methods specifying which cell lines were analysed in each TMT channel.

      (2) In most figures, the quantitative differences in protein expression between hiPSC and hESC are evident, and protein expression is highly consistent among different hiPSCs and hESCs. However, the glutamine uptake capacity of different hiPSC cell lines, and to some extent hESC cell lines, appears highly variable (Figure 3e). While proteome changes were measured in 4 hiPSCs and 4 hESCs, the glutamine uptake assays were performed on a larger number of cell lines. The authors should clarify the number of cell lines used in the glutamine uptake assay, clearly indicating the cell lines used in the proteome measurements. Given the large variation in glutamine uptake among different cell lines, it would be useful to plot the correlation between the expression of glutamine transporters and glutamine uptake in individual cell lines. This may help understand whether differences in glutamine uptake are related to variations in the expression of glutamine transporters.

      The “proteomic ruler” has the capacity to estimate the protein copy numbers per cell, as such changes in the absolute number of cells that were analysed do not cause major complications in quantification. Furthermore, TMT-based proteomics is the most precise proteomics methods available, where the same peptides are detected in all samples across the same data points and peaks, as long as the analysis is done within a single batch, as is the case here.

      The glutamine uptake assay is much more sensitive to the variation in the number of cells. The number of cells were estimated by plating the cells with approximately 5e4 cells two days before the assay, which creates variability. Furthermore, hESCs and hiPSCs are more adhesive than the cells used in the original protocol, hence the quench data was noisier for these lines, making the data from the assay more variable.

      (3) In Figure 4j, it would be helpful to indicate whether the observed differences in the respiration parameters are statistically significant.

      We have now modified the plot to show which proteins were significantly different.

      (4) The iPSCs used here are generated from human primary skin fibroblasts. Different cells vary in size; for instance, fibroblast cells are generally larger than blood lymphocytes. This raises the question of whether the parent cell origin impacts differences in hiPSCs and hESC proteomes. For example, do the authors anticipate that hiPSCs derived from small somatic cells would also display higher expression of cytoplasmic, mitochondrial, and membrane transporters compared to ESC? The authors may consider discussing this point.

      This is a very interesting point. We have now added an extension to the discussion focussed on this subject.

      (5) One wonders if the "Proteome ruler" approach could be applied retrospectively to previously published ihPSC and hESC proteome data, confirming higher expression of cytoplasmic and mitochondrial proteins in ihPSCs, which may have been masked in previous analyses due to median-based normalization.

      We agree with the reviewer and think this is a very good suggestion. Unfortunately, in the main proteomic papers comparing hESC and hiPSCs16,17  the authors did not upload their raw files to a public repository (as it was not mandatory at that period in time), and they also used the International Protein Index (IPI), which is a discontinued database. So the raw files can’t be reprocessed and the database doesn’t match the modern SwissProt entries. Therefore, reprocessing the previous data was impractical.

      (6) The work raises a fundamental question: what is the mechanistic basis for the higher expression of cytoplasmic and mitochondrial proteins in ihPSCs? Conceivably, this could be due to two reasons: (a) Genes encoding cytoplasmic and mitochondrial proteins are expressed at a higher level in ihPSCs compared to hESC. (b) mRNAs encoding cytoplasmic and mitochondrial proteins are translated at a higher level in ihPSCs compared to hESC. The authors may check published transcriptome data from the same cell lines to shed light on this point.

      This is a very interesting point. We believe that the reprogrammed cells contained mature mitochondria, which are not fully regressed upon reprogramming and that this can establish a growth advantage in the normoxic environments in which the cells are grown. Unfortunately, the available transcriptomic data lacked spike-ins, and thus only enables comparison of concentration, not of copy numbers13. Therefore, we could not determine with the available data if there was an increase in the copies of specific mRNAs. However, with a future study where there was a transcriptomic dataset with spike-ins included, this would be very interesting to analyse.

      Reviewer #3 (Recommendations For The Authors):

      It is unclear whether changes in protein levels relate to any phenotypic features of cell lines used. For example, the authors highlight that increased protein expression in hiPSC lines is consistent with the requirement to sustain high growth rates, but there is no data to demonstrate whether hiPSC lines used indeed have higher growth rates.

      We respectfully disagree with the reviewer on this point. Our data show that hESCs and hiPSCs show significant differences in protein mass and cell size, with the MS data validated by the EZQ assay and FACS, while having no significant differences in their cell cycle profiles. Thus, increased size and protein content would require higher growth rates to sustain the increased mass, which is what we observe.

      The authors claim that the cell cycle of the lines is unchanged. However, no details of the method for assessing the cell cycle were included so it is difficult to appreciate if this assessment was appropriately carried out and controlled for.

      We apologise for this omission; the details have been included in the revised version of the manuscript.

      Details and characterisation of iPSC and ESC lines used in this study are overall lacking. The lines used are merely listed in methods, but no references are included for published lines, how lines were obtained, what passage they were used at, their karyotype status etc. For details of basic characterisation, the authors should refer to the ISSC Standards for the use of human stem cells in research. In particular, the authors should consider whether any of the changes they see may be attributed to copy number variants in different lines.

      We agree with the reviewer on this and refer to the reply above concerning this issue.

      The expression data for markers of undifferentiated state in Figure 1a would ideally be shown by immunocytochemistry or flow cytometry as it is impossible to tell whether cultures are heterogeneous for marker expression.

      We agree with the reviewer on this. FACS is indeed much more quantitative and a better method to study heterogeneity. However, we did not have protocols to study these markers using FACS.

      TEM analysis should ideally be quantified.

      We agree with the reviewer that it would be nice to have a quantitative measure.

      All figure legends should explicitly state what graphs are representing (e.g. average/mean; how many replicates (biological or technical), which lines)? Some data is included in Methods (e.g. glutamine uptake), but not for all of the data (e.g. TEM).

      We agree with the reviewer. These has been corrected in the revised version of the manuscript, with additional details included.

      Validation experiments were performed typically on one or two cell lines, but the lines used were not consistent (e.g. wibj_2 versus H1 for respirometry and wibj_2, oaqd_3 versus SA121 and SA181 for glutamine uptake). Can the authors explain how the lines were chosen?

      The validation experiments were performed at different time points, and the selection of lines reflected the availability of hiPSC and hESC lines within our stem cell facility at a given point in time.

      We chose to use a range of different lines for comparison, rather than always comparing only one set of lines, to try to avoid a possible bias in our conclusions and thus to make the results more general.

      The authors should acknowledge the need for further functional validation of the results related to immunosuppressive proteins.

      We agree with the reviewer and have added a sentence in the discussion making this point explicitly.

      Differences in H1 histones abundance were highlighted. Can the authors speculate as to the meaning of these differences?

      Regarding H1 histones, our study of the literature, as well as discussions with with chromatin and histone experts, both within our institute and externally, have not shed light into what the differences could imply, based upon previous literature. We think therefore that this is a striking and interesting result that merits further study, but we have not yet been able to formulate a clear hypothesis on the consequences.

      (1) Howden, A. J. M. et al. Quantitative analysis of T cell proteomes and environmental sensors during T cell differentiation. Nat Immunol, doi:10.1038/s41590-019-0495-x (2019).

      (2) Marchingo, J. M., Sinclair, L. V., Howden, A. J. & Cantrell, D. A. Quantitative analysis of how Myc controls T cell proteomes and metabolic pathways during T cell activation. Elife 9, doi:10.7554/eLife.53725 (2020).

      (3) Damasio, M. P. et al. Extracellular signal-regulated kinase (ERK) pathway control of CD8+ T cell differentiation. Biochem J 478, 79-98, doi:10.1042/BCJ20200661 (2021).

      (4) Salerno, F. et al. An integrated proteome and transcriptome of B cell maturation defines poised activation states of transitional and mature B cells. Nat Commun 14, 5116, doi:10.1038/s41467-023-40621-2 (2023).

      (5) Antico, O., Nirujogi, R. S. & Muqit, M. M. K. Whole proteome copy number dataset in primary mouse cortical neurons. Data Brief 49, 109336, doi:10.1016/j.dib.2023.109336 (2023).

      (6) Edwards, W. et al. Quantitative proteomic profiling identifies global protein network dynamics in murine embryonic heart development. Dev Cell 58, 1087-1105 e1084, doi:10.1016/j.devcel.2023.04.011 (2023).

      (7) Barton, P. R. et al. Super-killer CTLs are generated by single gene deletion of Bach2. Eur J Immunol 52, 1776-1788, doi:10.1002/eji.202249797 (2022).

      (8) Phair, I. R., Sumoreeah, M. C., Scott, N., Spinelli, L. & Arthur, J. S. C. IL-33 induces granzyme C expression in murine mast cells via an MSK1/2-CREB-dependent pathway. Biosci Rep 42, doi:10.1042/BSR20221165 (2022).

      (9) Niu, L. et al. Dynamic human liver proteome atlas reveals functional insights into disease pathways. Mol Syst Biol 18, e10947, doi:10.15252/msb.202210947 (2022).

      (10) Murugesan, G., Davidson, L., Jannetti, L., Crocker, P. R. & Weigle, B. Quantitative Proteomics of Polarised Macrophages Derived from Induced Pluripotent Stem Cells. Biomedicines 10, doi:10.3390/biomedicines10020239 (2022).

      (11) Ryan, D. G. et al. Nrf2 activation reprograms macrophage intermediary metabolism and suppresses the type I interferon response. iScience 25, 103827, doi:10.1016/j.isci.2022.103827 (2022).

      (12) Nicolas, P. et al. Systems-level conservation of the proximal TCR signaling network of mice and humans. J Exp Med 219, doi:10.1084/jem.20211295 (2022).

      (13) Brenes, A. J. et al. Erosion of human X chromosome inactivation causes major remodeling of the iPSC proteome. Cell Rep 35, 109032, doi:10.1016/j.celrep.2021.109032 (2021).

      (14) Wisniewski, J. R., Hein, M. Y., Cox, J. & Mann, M. A "proteomic ruler" for protein copy number and concentration estimation without spike-in standards. Mol Cell Proteomics 13, 3497-3506, doi:10.1074/mcp.M113.037309 (2014).

      (15) Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370-375, doi:10.1038/nature22403 (2017).

      (16) Phanstiel, D. H. et al. Proteomic and phosphoproteomic comparison of human ES and iPS cells. Nat Methods 8, 821-827, doi:10.1038/nmeth.1699 (2011).

      (17) Munoz, J. et al. The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells. Mol Syst Biol 7, 550, doi:10.1038/msb.2011.84 (2011).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      Summary

      The mammalian Shieldin complex consisting of REV7 (aka MAD2L2, MAD2B) and SHLD1-3 affects pathway usage in DSB repair favoring non-homologous end-joining (NHEJ) at the expense of homologous recombination (HR) by blocking resection and/or priming fill-in DNA synthesis to maintain or generate near blunt ends suitable for NHEJ. While the budding yeast Saccharomyces cerevisiae does not have homologs to SHLD1-3, it does have Rev7, which was identified to function in conjunction with Rev3 in the translesion DNA polymerase zeta. Testing the hypothesis that Rev7 also affects DSB resection in budding yeast, the work identified a direct interaction between Rev7 and the Rad50-Mre11-Xrs2 complex by two-hybrid and direct protein interaction experiments. Deletion analysis identified that the 42 amino acid C-terminal region was necessary and sufficient for the 2-hybrid interaction. Direct biochemical analysis of the 42 aa peptide was not possible. Rev7 deficient cells were found to be sensitive to HU only in synergy with G2 tetraplex forming DNA. Importantly, the 42 aa peptide alone suppressed this phenotype. Biochemical analysis with full-length Rev7 and a C-terminal truncation lacking the 42 aa region shows G4-specific DNA binding that is abolished in the C-terminal truncation and with a substrate containing mutations to prevent G4 formation. Rev7 lacks nuclease activity but inhibits the dsDNA exonuclease activity of Mre11. The C-terminal truncation protein lacking the 42 aa region also showed some inhibition suggesting the involvement of additional binding sites besides the 42 aa region. Also, the Mre11 ssDNA endonuclease activity is inhibited by Rev7 but not the degradation of linear ssDNA. Rev7 does not affect ATP binding by Rad50 but inhibits in a concentration-dependent manner the Rad50 ATPase activity. The C-terminal truncation protein lacking the 42 aa region also showed some inhibition but significantly less than the full-length protein.

      Using an established plasmid-based NHEJ assay, the authors provide strong evidence that Rev7 affects NEHJ, showing a four-fold reduction in this assay. The mutations in the other Pol zeta subunits, Rev3 and Rev1, show a significantly smaller effect (~25% reduction). A strain expressing only the Rev7 C-terminal 42 aa peptide showed no NHEJ defect, while the truncation protein lacking this region exhibited a smaller defect than the deletion of REV7. The conclusion that Rev7 supports NHEJ mainly through the 42 aa region was validated using a chromosomal NHEJ assay. The effect on HR was assessed using a plasmid:chromosome system containing G4 forming DNA. The rev7 deletion strain showed an increase in HR in this system in the presence and absence of HU. Cells expressing the 42 aa peptide were indistinguishable from the wild type as were cells expressing the Rev7 truncation lacking the 42 aa region. The authors conclude that Rev7 suppresses HR, but the context appears to be system-specific and the conclusion that Rev7 abolished HR repair of DSBs is unwarranted and overly broad.

      Strength

      This is a well-written manuscript with many well-executed experiments that suggest that Rev7 inhibits MRX-mediated resection to favor NEHJ during DSB repair. This finding is novel and provides insight into the potential mechanism of how the human Shieldin complex might antagonize resection.

      We thank Reviewer 1 for their comprehensive summary of our work. The Reviewers' recognition that our manuscript is “well-written” with “many well-executed experiments” and our findings are “novel” is greatly appreciated.

      Weaknesses

      The nuclease experiments were conducted using manganese as a divalent cation, and it is unclear whether there is an effect with the more physiological magnesium cation. Additional controls for the ATPase and nuclease experiments to eliminate non-specific effects would be helpful. Evidence for an effect on resection in cells is lacking. The major conclusion about the role of Rev7 in regulating the choice between HR and NHEJ is not justified, as only a highly specialized assay is used that does not warrant the broad conclusion drawn. Specifically, the results that the Rev7 C terminal truncation lacking the 42 aa region still suppresses HR is unexpected and unexplained. The effect of Rev7 on G4 metabolism is underdeveloped and distracts from the main results that Rev7 modulated MRX activity. The authors should consider removing this part and develop a more complete story on this later.

      We have addressed each point identified as “Weaknesses” by the reviewer, as described below:

      The nuclease experiments were conducted using manganese as a divalent cation, and it is unclear whether there is an effect with the more physiological magnesium cation.

      We acknowledge the Reviewer’s concern and apologize for not having been clear in our first submission.  However, several studies have demonstrated that Mre11 exhibits all three DNase activities, namely single-stranded endonuclease, double-stranded exonuclease and DNA hairpin opening only in the presence of Mn²⁺ but not with other divalent cations, such as magnesium or calcium (Paull and Gellert, Mol. Cell 1998; 2000; Usui et al., Cell 1998; Ghosal and Muniyappa, JMB, 2007; Arora et al., Mol Cell Biol. 2017). For this reason, Mn²⁺ was used as a cofactor for the Mre11 nuclease assays. We have clarified this in the revised manuscript. As a side note, Mg2+ serves as a cofactor for Rad50’s ATPase activity.  

      Additional controls for the ATPase and nuclease experiments to eliminate non-specific effects would be helpful.

      We thank the Reviewer for raising this important point, as it led us to evaluate and confirm the specificity of Rev7 and exclude its potential non-specific effects. To this end, we have performed additional experiments, which showed that (a) the S. cerevisiae Dmc1 ATPase activity was not affected by Rev7, contrary to its inhibitory effect on Rad50 and (b) Rev7 had no discernible impact on the endonucleolytic activity of S. cerevisiae Sae2, whereas it inhibits DNase activities of Mre11. Thus, the lack of inhibitory effects on the ATPase activity of Dmc1 and nuclease activity of Sae2 confirm the specificity of Rev7 for Mre11 and Rad50 subunits. We have included this new data in Figure 6H and 6J and in Figure 5 –figure supplement 1, respectively, in the revised manuscript.

      Evidence for an effect on resection in cells is lacking. The major conclusion about the role of Rev7 in regulating the choice between HR and NHEJ is not justified, as only a highly specialized assay is used that does not warrant the broad conclusion drawn.

      We agree with the Reviewer that in vivo evidence demonstrating the inhibitory effect of REV7 on DNA end resection was lacking in the first submission. Reviewer 2 and 3 have also raised point. We now measured the rate of DNA end resection using a qPCR-based assay (Mimitou and Symington, EMBO J. 2010; Gnugge et al., Mol. Cell 2023). The results revealed that deletion of REV7 led to an enhancement in the rate of DNA end resection at a DSB site inflicted by HO endonuclease (Figure 9—figure supplement 3), providing direct evidence that loss of REV7 contributes to increase in DNA end resection at the DSBs.

      Specifically, the results that the Rev7 C-terminal truncation lacking the 42 aa region still suppresses HR is unexpected and unexplained.

      This is a fair point, and we thank the reviewer for raising it. Although the interaction of Rev7-C1 in the yeast two-hybrid assays was not apparent, surprisingly, it partially suppressed HR (Figure 9). In line with this, biochemical assays showed that it exerts partial inhibitory effect on the Mre11 nuclease (Figure 5) and Rad50 ATPase (Figure 6) activities compared with the full-length Rev7. Consistent with vitro data, the AF2 models revealed that, in addition to the C-terminal 42-aa region, residues in the N-terminal region of Rev7 also interact with the Mre11 and Rad50 subunits (Figure 2—figure supplement 2).

      The effect of Rev7 on G4 metabolism is underdeveloped and distracts from the main results that Rev7 modulated MRX activity. The authors should consider removing this part and develop a more complete story on this later.

      We agree with the reviewer’s comment “that the effect of Rev7 on G4 DNA metabolism is underdeveloped and distracts” from the central theme of the present paper, and suggested that we develop this part as a complete story later. This point has also been raised by Reviewer 2 and 3 and, therefore, Figures and associated text were removed in the revised version of the manuscript.

      Reviewer 2 (Public Review):

      In this study, Badugu et al investigate the Rev7 roles in regulating the Mre11-Rad50-Xrs2 complex and in the metabolism of G4 structures. The authors also try to make a conclusion that REV7 can regulate the DSB repair choice between homologous recombination and non-homologous end joining.

      The major observations of this study are:

      (1) Rev7 interacts with the individual components of the MRX complex in a two-hybrid assay and in a protein-protein interaction assay (microscale thermophoresisi) in vitro.

      (2) Modeling using AlphaFold-Multimier also indicated that Rev7 can interact with Mre11 and Rad50.

      (3) Using a two-hybrid assay, a 42 C terminal domain in Rev7 responsible for the interaction with MRX was identified.

      (4) Rev7 inhibits Mre11 nuclease and Rad50 ATPase activities in vitro.

      (5) Rev 7 promotes NHEJ in plasmid cutting/relegation assay.

      (6) Rev7 inhibits recombination between chromosomal ura3-1 allele and plasmid ura3 allele containing G4 structure.

      (7) Using an assay developed in V. Zakian's lab, it was found that rev7 mutants grow poorly when both G4 is present in the genome and yeast are treated with HU.

      (8) In vitro, purified Rev7 binds to G4-containing substrates.

      In general, a lot of experiments have been conducted, but the major conclusion about the role of Rev7 in regulating the choice between HR and NHEJ is not justified.

      We appreciate Reviewer 2 for comprehensive assessment of our manuscript and their insightful comments. However, we believe that the data (Figure 7-9) in our manuscript, together with new data (Figure 9- figure supplement 2 and 3) in the revised manuscript, clearly demonstrate that Rev7 regulates the choice between HR and NHEJ.

      (1) Two stories that do not overlap (regulation of MRX by Rev7 and Rev7's role in G4 metabolism) are brought under one umbrella in this work. There is no connection unless the authors demonstrate that Rev7 inhibits the cleavage of G4 structures by the MRX complex.

      We agree with the reviewer’s point that the themes associated with the regulation of the functions of MRX subunits by Rev7 and its role G4 DNA metabolism do not overlap. This concern has also been expressed by Reviewer 1 and 3. According to their suggestion, we have deleted all figures and text describing the role of Rev7 in G4 DNA metabolism from the revised manuscript.

      (2) The authors cannot conclude based on the recombination assay between G4-containing 2-micron plasmid and chromosomal ura3-1 that Rev7 "completely abolishes DSB-induced HR". First of all, there is no evidence that DSBs are formed at G4. Why is there no induction of recombination when cells are treated with HU? Second, as the authors showed, Rev7 binds to G4, therefore it is not clear if the observed effects are the result of Rev7 interaction with G4 or its impact on HR. The established HO-based assays where the speed of resection can be monitored (e.g., Mimitou and Symington, 2010) have to be used to justify the conclusion that Rev7 inhibits MRX nuclease activity in vivo.

      We thank the Reviewer for the insightful comments and drawing our attention to the inference "completely abolishes DSB-induced HR". We have we have rephrased the conclusion, and replaced it with “REV7 gene product plays an anti-recombinogenic role during HR”. Then, the reviewer refers to lack of “evidence that DSBs are formed at G4”. At this point, unfortunately, our attempts to identify DSB at the G4 DNA site in the 2-micron plasmid did not provide a clear answer to this question. This might be related to the existence of myriad DNases in the cell and technical issues associated with the isolation of low-abundant, linearized 2-micron plasmid molecules. Because of these reasons, we cannot provide any data on DSB at the G4 site in the 2-micron plasmid.

      The reviewer then correctly points out “Why is there no induction of recombination when cells are treated with HU?” These findings are consistent with previous studies which have shown that Mre11-deficient cells are sensitivity to HU, resulting in cell death (Tittel-Elmer et al., EMBO J. 28, 1142-1156, 2009; Hamilton and Maizels, PLoS One, 5, e15387, 2010). However, a novel finding of our study is that ura3-1 rev7D cells and ura3-1 cells expressing Rev7-42 amino acid peptide (to limited extent) produce Ura3+ papillae. We have included this information in the Results section and adjusted the text to make this point clear to the reader.

      In the same paragraph, the Reviewer expresses a concern about the interaction of Rev7 with G4 DNA substrates and its impact on HR. As discussed above, in response to your comment (1) and a similar comment of Reviewer 1 and 3, we have deleted all figures and text describing the role of Rev7 in G4 DNA metabolism in the revised manuscript. The reviewer specifically refers to a study by Mimitou and Symington, 2010 in which the speed DNA end resection at the HO endonuclease-inflicted DSB was quantified. We have carried out the suggested experiment and the results are presented in Figure 9─figure supplement 3.

      Reviewer 3 (Public Review):

      Summary:

      REV7 facilitates the recruitment of Shieldin complex and thereby inhibits end resection and controls DSB repair choice in metazoan cells. Puzzlingly, Shieldin is absent in many organisms and it is unknown if and how Rev7 regulates DSB repair in these cells. The authors surmised that yeast Rev7 physically interacts with Mre11/Rad50/Xrs2 (MRX), the short-range resection nuclease complex, and tested this premise using yeast two-hybrid (Y2H) and microscale thermophoresis (MST). The results convincingly showed that the individual subunits of MRX interact robustly with Rev7. AlphaFold Multimer modelling followed by Y2H confirmed that the carboxy-terminal 42 amino acid is essential for interaction with MR and G4 DNA binding by REV7. The mutant rev7 lacking the binding interface (Rev7-C1) to MR shows moderate inhibition to the nuclease and the ATPase activity of Mre11/Rad50 in biochemical assays. Deletion of REV7 also causes a mild reduction in NHEJ using both plasmid and chromosome-based assays and increases mitotic recombination between chromosomal ura3-01 and the plasmid ura3 allele interrupted by G4. The authors concluded that Rev7 facilitates NHEJ and antagonizes HR even in budding yeast, but it achieves this by blocking Mre11 nuclease and Rad50 ATPase.

      Weaknesses

      There are many strengths to the studies and the broad types of well-established assays were used to deduce the conclusion. Nevertheless, I have several concerns about the validity of experimental settings due to the lack of several key controls essential to interpret the experimental results. The manuscript also needs a few additional functional assays to reach the accurate conclusions as proposed.

      We are happy that the Reviewer has found “many strengths” in our manuscript and further noted that “results convincingly showed that the individual subunits of MRX interact robustly with Rev7”. We greatly appreciate the Reviewer for these encouraging words, and for specific suggestions that helped us to improve the manuscript. As suggested, we have performed additional experiments including key controls and the data is presented in the revised manuscript.

      (1) AlphaFold model predicts that Mre11-Rev7 and Rad50-Rev7 binding interfaces overlap and Rev7 might bind only to Mre11 or Rad50 at a time. Interestingly, however, Rev7 appears dimerized (Figure 1). Since the MR complex also forms with 2M and 2R in the complex, it should still be possible if REV7 can interact with both M and R in the MR complex. The author should perform MST using MR complex instead of individual MR components. The authors should also analyze if Rev7-C1 is indeed deficient in interaction with MR individually and with complex using MST assay.

      Thank you for the valuable suggestion. As requested, MST titration experiments have been performed to examine the affinity of purified GFP-tagged Rev7-C1 for the Mre11, Rad50 and MR complex. The results revealed that Rev7-C1 binds to the Mre11 and Rad50 subunits with about 3- and 8.8-fold reduced affinity, respectively; whereas it binds to the MR complex with ~5.6-fold reduced affinity compared with full-length Rev7. The data is shown in Figure 1─figure supplement 4A-C.

      (2) The nuclease and the ATPase assays require additional controls. Does Rev7 inhibit the other nuclease or ATPase non-specifically? Are these outcomes due to the non-specific or promiscuous activity of Rev7? In Figure 6, the effect of REV7 on the ATP binding of Rad50 could be hard to assess because the maximum Rad50 level (1 mM) was used in the experiments. The author should use the suboptimal level of Rad50 to check if REV7 still does not influence ATP binding by Rad50.

      We thank the Reviewer for these valuable comments (Reviewer 1 has raised similar issues). Thus, we performed additional control experiments and the results indicate that (a) the ATPase activity of S. cerevisiae Dmc1 was not affected by Rev7 and (b) Rev7 does not inhibit the endonucleolytic activity of S. cerevisiae Sae2. The results are depicted in Figure 6H and 6J and Figure 5 –figure supplement 1A-D, respectively.

      As suggested by the Reviewer, using suboptimal levels of Rad50 (0.2 mM), we carried out experiments to test the effect of varying concentrations of Rev7 on the ability of Rad50 to bind ATP and catalyse its hydrolysis. The results showed that Rev7 had no discernible effect on its ability to bind ATP, even at concentrations 30 times higher than the concentration of Rad50 (Figure 6B and 6D). However, Rev7 suppresses the ATPase activity of Rad50, but not that of Dmc1, in a concentration-dependent manner (Figure G, 6J).  

      (3) The moderate deficiency in NHEJ using plasmid-based assay in REV7 deleted cells can be attributed to aberrant cell cycle or mating type in rev7 deleted cells. The authors should demonstrate that rev7 deleted cells retain largely normal cell cycle patterns and the mating type phenotypes. The author should also analyze the breakpoints in plasmid-based NHEJ assays in all mutants, especially from rev7 and rev7-C1 cells.

      We appreciate the Reviewer's critical and insightful comment. We monitored cell-cycle progression of both wild-type and rev7D cells over time using FACS. The results revealed that the cell cycle profiles and mating type phenotypes rev7D cells were similar to the wild type cells. The data is presented in Figure 7-figure supplement 1. This indicates that rev7D cells do not possess aberrant cell cycle or mating type defects as compared with the wild-type cells.

      We find the second point raised by the Reviewer although is intriguing, its relevance to the current study is unclear. In our view, identification of breakpoints using plasmid-based NHEJ assays in all the mutants will require a significant amount of time, and the insight that we may gain is unlikely to add to the central theme of this paper.  Moreover, we know for sure that Rev7 has no DNA cleavage/nicking activity.

      (4) It is puzzling why the authors did not analyze end resection defects in rev7 deleted cells after a DSB. The author should employ the widely used resection assay after a HO break in rev3, rev7, and mre11 rev7 cells as described previously.

      Thank you for the suggestion. Reviewer 1 also has raised this point. As suggested, we have analysed end resection in the rev7D cells at a HO inflicted DSB site using a qPCR assay (Mimitou and Symington, EMBO J. 2010; Gnugge et al., Mol. Cell 2023). The results revealed that deletion of REV7 led to an enhancement in the rate of DNA end resection at a DSB inflicted by HO endonuclease (Figure 9—figure supplement 3),

      (5) Is it possible that Rev7 also contributes to NHEJ as the part of TLS polymerase complex? Although NHEJ largely depends on Pol4, the authors should not rule out that the observed NHEJ defect in rev7 cells is due at least partially to its TLS defect. In fact, both rev3 or rev1 cells are partially defective in NHEJ (Figure 7). Rev7-C1 is less deficient in NHEJ than REV7 deletion. These results predict that rev7-C1, rev3 should be as defective as the rev7 deletion. Additionally, the authors should examine if Rev7-C1 might be deficient in TLS. In this regard, does rev7-C1 reduce TLS and TLS-dependent mutagenesis? Is it dominant? The authors should also check if Rev3 or Rev1 are stable in Rev7 deleted or rev7-C1 cells by immunoblot assays.

      We agree with the possibility that Rev7 may play a role in translesion DNA synthesis and TLS-dependent mutagenesis. Accordingly, Rev7-C1 might be deficient in TLS. While we do not rule out such scenarios, we respectfully suggest that this is outside the scope of the current manuscript. This manuscript focuses on the role of Rev7 in NHEJ and HR pathways, not on translesion DNA synthesis. Nevertheless, we recognise the importance of this line of investigation, and we will certainly consider this suggestion in our future work. Thank you.

      (6) Due to the G4 DNA and G4 binding activity of REV7, it is not clear which class of events the authors are measuring in plasmid-chromosome recombination assay in Figure 9. Do they measure G4 instability or the integrity of recombination or both in rev7 deleted cells? Instead, the effect of rev7 deletion or rev7-C1 on recombination should be measured directly by more standard mitotic recombination assays like mating type switch or his3 repeat recombination.

      We appreciate the Reviewer for highlighting this important point and would like to take the opportunity to clarify the rationale behind plasmid-chromosome recombination assay, as previously described (Paeschke et al., Cell 145, 678, 2011). In this assay, we are measuring the rate of Ura+ papillae formation arising from integration of the targeting plasmid into the genome at the ura3-1 locus of wild-type and rev7D cells. Analysis of PCR-generated DNA fragments indicate that pFAT10-G4 plasmid integrates at the ura3-1 genomic locus of rev7D cells, but not in the wild-type cells (Figure 9-figure supplement 2). Further, we also measured the stability of G4 DNA and the results indicate that it is stable in rev7D cells.

      Recommendations for the authors:

      Reviewer 1 (Recommendations for the authors):

      (1) Title: The word 'choice' implies a regulator. Is that the model here? Alternatively, is it pathway properties that define the preference of usage?

      This is an excellent suggestion. In the revised submission, we rephrased the title “Saccharomyces cerevisiae Rev7 promotes non-homologous end-joining by inhibiting Mre11 nuclease and Rad50 ATPase activities and Homologous recombination.”

      (2) Line 83, Introduction: Titia De Lange proposed an alternative/complementary model for Shieldin and REV7 to support fill-in by DNA polymerases including Pol alpha. This should be discussed.

      We thank the reviewer for pointing out that we have not discussed the work from Titia De Lange’s research group. We have now added new sentences to the Introduction to describe the alternative model involving Polα-primase fill-in synthesis (p3.2.7).

      (3) Line 131: The paragraph title needs to change. 2-hybrid assays cannot establish direct interaction especially when analyzing yeast proteins by yeast 2-hybrid. I agree that direct interaction is established by other means later.

      Per the Reviewer’s suggestion, we have deleted the word “directly” from the title of the paragraph.

      (4) Figure 1 D-F: The purity of the Rev7-GFP fusion is shown in Figure S1, and the purity of the Rad50, Mre11, and Xrs2 subunits as assessed by PAGE should be shown as well.

      Following this suggestion, we have included images of Coomassie blue-stained SDS-polyacrylamide gels (Figure 1-figure supplement 1), which show the purity and size of GFP tagged Rev7, Rad50, Mre11, Xrs2, Rev1, Sae2 and Dmc1 proteins.

      (5) Please check the Kd values. In the graph in D, the differences between Rad50, Mre11, and Xrs2 look much larger than the values in F suggest.

      This is a fair point and we appreciate the reviewer for highlighting. The differences between the binding profiles of the Rad50, Mre11, and Xrs2 with Rev7 as shown in the previous version of the manuscript were not obvious because of cluttering of binding curves. Therefore, the binding profiles of interacting pair of proteins were plotted separately to highlight the differences (Figure 1—figure supplement 3). Further, we rigorously analysed the dataset to ascertain the binding affinities and found that the Kd values obtained were in good agreement with the values shown in Figure 1D.

      (6) Figure 1S3: Please label the bands.

      In the revised manuscript, the protein bands in Figure1-figure (previously Figure 1S3) are identified with their names.

      (7) Line 195: Change Figure 1 to Figure 1S4.

      We have introduced the correction in the revised manuscript.

      (8) Line 202: The minimal interaction domain of 42 aa is only described in the next paragraph. The description anticipates a result about the 42 aa fragment that has not been shown to this point. Please reorder results or descriptions to make this coherent.

      We have implemented the change, as per the Reviewer’s suggestion.

      (9) Figure 2: The two-hybrid analysis in Figures 1 and 2 also identifies Rev7 self-interaction, which is not discussed. This serves as another control against the artifact of the truncation proteins and should be discussed.

      We have now discussed the significance of Rev7 self-interaction in the Y2H experiments wherever relevant in the text.

      (10) Is the 42 aa fragment sufficient to elicit a two-hybrid signal?

      We thank the reviewer for this insightful comment. To test this premise, we expressed the terminal 42 amino acid sequence of Rev7 using bait pGBKT7 vector. The results revealed that the 42 residue fragment of ScRev7 alone is sufficient for a two-hybrid interaction with the MRX subunits (Figure 2-figure supplement 1).

      (11) Line 289: Why are the EMSA conditions described as physiological? As per Material and Methods, the reaction mixtures contain 20 mM Tris-HCl (pH 7.5), 0.1 mM DTT, 0.2 mg/ml BSA, and 5% glycerol, which is far from physiological.

      As suggested by all three reviewers, the data showing the interaction of Rev7 and its truncation derivative Rev7-C1 with G4 DNA has been deleted in the revised version of the manuscript.

      (12) Figure 4C: The figure needs to increase in size. The plotting symbols are not all visible, and it is undefined what the black squares represent.

      Following the reviewer's suggestion, Figure 4C has been omitted in the revised version of the manuscript.

      (13) Figure 5: The MRX nuclease assays were conducted in the presence of Manganese. Has the more physiological divalent cation magnesium been tested?

      This has been addressed in response to the query of Reviewer 1 (Public Review). As noted above, Mre11 exhibits DNase activities only in the presence of Mn²⁺.

      (14) In Figure 5D, lane 2: What is the concentration of Rev7?

      We appreciate the reviewer for catching this. The concentration of ScRev7 used for the reaction shown in Figure 5D, lane 2 was 2 μM, as specified in the Figure legend.

      (15) Figure 6 legend: Lane 1620 "same as in lane "Is there a "1" missing?

      We thank the reviewer for pointing out the typographical error, which has been corrected in the revised manuscript.

      (16) Figure 9: Rev7-C1 lacks the 42 a peptide that is postulated to mediate anti-resection but shows normal HR here. This seems unexpected based on the premise that the 42 aa fragment supports end-joining. Rev7 seems to suppress HR independent of the function of the 42 aa peptide.

      This has been addressed in response to the query posed by Reviewer 1 in the Public Review. We do see that the Rev7-C1 lacking the 42 aa peptide suppresses HR, but the suppression was only partial as compared with the wild type. This is consistent with biochemical assays suggesting that Rev7-C1 exerts partial inhibition on the Mre11 nuclease (Figure 5) and Rad50 ATPase (Figure 6) activities. Further, the AF2 models indicate that, in addition to the C-terminal 42-aa region, other regions of Rev7 also interact with the Mre11 and Rad50 subunits (Figure 2—figure supplement 2), consistent with biochemical and genetic data.

      (17) Line 478: The conclusion that "these findings are consistent with the idea that REV7 completely abolishes DSB-induced HR in S. cerevisiae." is overly broad as the assay

      We agree with the reviewer's assessment. Accordingly, we have rephrased the sentence to soften the claim.

      Line 483ff: Based on the comments on Figure 9, the introductory sentences of the discussion do not seem to be supported by the data, as Rev7 appears to regulate HR independent of the 42 aa peptide.

      Please refer to the response of comment #16 above

      (18) Line 536: Similarly to above 17, the conclusion about the effect of the 42 aa peptide on HR appears unwarranted.

      We have revised the statement to moderate the previously exaggerated claims.

      (19) In all figures, please list in the legend, which exact strains have been used referring to Table S5.

      We have now included mentions of the strains in the figure legend wherever applicable.

      (20) Line 351: linear.

      It is corrected in the revised manuscript.

      Reviewer 2 (Recommendations For The Authors):

      (1) It is very strange and unusual that Rev7 independently binds to all three subunits of the MRX complex, raising a question of how specific these interactions are. At least, it should be a negative control in their YH2 assay and protein-protein interaction assay in vitro that Rev7 does not bind to some other proteins. For example, Sae2 and Rev7 interactions can be tested.

      The reviewer is right that it is important to validate the specificity of Y2H interactions as well as in vitro enzyme assays. These findings are shown in Figure 6 and Figure 5-figure supplement 1.  As suggested by the Reviewer, we included SAE2 in Y2H and MST assays, and Dmc1 and Sae2 in vitro enzyme assays. Our results clearly showed that Sae2 neither interacts with MRX subunits in Y2H assays (Figure 1A-C) nor inhibits the Sae2’s nuclease and Dmc1’s ATPase activities in vitro (Figure 6 and Figure 5-figure supplement 1)

      (2) It is surprising that in the Discussion the authors speculate that Rev7 might recruit Mus81 nuclease for cleavage, completely ignoring their own publication on the cleavage of G4 by MRX.

      We agree with the reviewer, and we have added discussion about MRX (mentioned above by the reviewer) in revised version.

      (3) How does the AlphaFold-Multimer modeling predict the interaction between Rev7 and MRX as a complex? Are the same regions of MRX accessible for the interaction with Rev7 in this case? Similarly, how are the activities of the MRX complex and phosphorylated Sae2 (see P. Cejka's work) affected by Rev7?

      Thank you for pointing this out. In this study, we investigated the interaction between Rev7 and Mre11, and between Rev7 and Rad50 subunits using AF2 algorithm. However, the three-dimensional structure of S. cerevisae MRX-Rev7 complex could not be constructed due to the size limits imposed by AF2 algorithm. Therefore, we are unable to comment on whether the same regions of MRX subunits in the complex are accessible for the interaction with Rev7. That said, AF2 algorithm has recently been used for structural modelling of S. cerevisiae Mre11 (1–533)-Rad50 (1–260 + 1,057–1,312) complex (Nicolas et al., Mol. Cell 84, 2223, 2024). As such, there are no AF2 structural models that cover the whole length of Mre11-Rad50 proteins.

      Regarding the second point raised by the Reviewer, our results suggest that Rev7 does interact with Sae2 in Y2H assays. However, whether phosphorylated Sae2 could potentially affect the interaction between MRX subunits and Rev7 warrants further studies.

      Minor points:

      (1) Figure 1. The labeling of the strains in A and B is genes and in C is proteins.

      The reviewer is correct. We have now corrected the error in the Figure 1 and 2.

      (2) Abstract. Carefully check English grammar.

      We thank the Reviewer for spotting this, which has been corrected in the revised manuscript.

      (3) Line 322 "Further, it has been demonstrated that Mre11 cleaves non-B DNA structures such as DNA hairpins, cruciforms and intra- and inter-molecular G-quadruplex structures)." It has not been shown that Mre11 cuts cruciform structures.

      We thank the referee for spotting this error. Mre11 does not cleave cruciform DNA structures. This error is corrected in the revised manuscript.

      (4) Page 14. Lines 452-455. What does "selective and non-selective media" mean? Is it without and with HU treatment?

      Thanks very much for the comment. In our manuscript, selective medium is composed of SC/-Leu with HU and non-selective medium is without HU. We have clarified this point in the revised version.

      (5) Page 15. Lane 472 "To assess whether increased frequency of HR is due to the instability of G-quadruplex DNA in rev7Δ cells, we examined the length of G4 DNA inserts in the plasmids carrying sequences during HR assay". It is not clear what does mean" during HR assay"? Did you examine the presence of G4 in Ura+ recombinants? If not, this analysis is meaningful.

      The reviewer is correct. We measured the presence of G4 DNA insert in Ura+ recombinants. The text has been appropriately edited to reflect these necessary modifications.

      (6) What is the nature of the ura3-1 allele? Can it revert to URA3 in rev7 mutants?

      The ura3-1 allele (glycine-to-glutamate substitution) reverts to Ura3+ at a low rate of ~2.5 × 10−9 in both orientations (Johnson et al., Mol. Cell 59, 163, 2015)

      (7) From the way that the recombination process is depicted it seems that the authors believe that plasmid should integrate into the chromosome. In reality, in most cases it should be a gene conversion where the G4 sequence (if it indeed induces DSBs) should be replaced by the wild-type segment form ura3-1, integration is not required since it is 2-micron plasmid.

      We apologize for not having made this clearer. The recombination assay with targeting plasmids containing G4 DNA forming sequences was performed as previously described (Paeschke et al., Cell 145, 678, 2011). In this assay, the appearance of Ura+ recombinants arise from the integration of the targeting plasmid bearing ura3G4 allele (with a G4 DNA forming insert) integrates into the genome at the ura3-1 locus. As shown in Author response image 1B, this is confirmed by PCR amplification of the insert in the genomic DNA of wild type and rev7D cells.

      Reviewer 3 (Recommendations For The Authors):

      (1) All Y2H experiments were performed with REV7 fusion to pGBKT7 and MRX to pGADT7. It will be helpful to test if pGAD-Rev7 also interacts with pGBK-Mre11 or Rad50 by Y2H.

      Following the reviewers' suggestions, we performed Y2H experiments in wild-type PJ69-4a cells co-transformed with the pGBKT7 vector expressing MRX subunits and the pGADT7 vector expressing Rev7. The results indicated that Rev7 interacts with Mre11, Rad50 or Xrs2 subunits, indicating that interactions are vector-independent.

      Author response image 1.

      Yeast two hybrid analysis suggest interaction between Rev7 and MRX subunits. PJ69-4A cells were co-transformed with bait vector expressing Rev7 or the Mre11, Rad50 or Xrs2 subunits and prey vector expressing Rev7 protein. Equal number of cells were spotted onto –Trp – Leu and –Trp – Leu –His dropout plates containing 3-AT and images were obtained following 48 h of incubation at 30°C. The data is representative of three independent experiments.

      (2) G4 studies are under-developed and do not add much or even negatively to the manuscript. The author might consider revising the manuscript to improve their integration with better rationales or logic. Alternatively, the authors should consider removing the G4 part for another paper.

      This concern was also raised by Reviewer 1 and 2. Following the suggestions of all reviewers, figures and text related G4 DNA studies have been deleted in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      The conclusions of this paper are mostly well supported by data, but some aspects need to be corrected.

      1) Line 99. The title is not suitable for summarizing this part of the results. In this paragraph, the results mainly describe SRSF1 expression pattern and binding of spermatogonia-associated gene's transcripts in testes. There is no functional assay to conclude SRSF1 has an essential role in mouse testes. The data only indicate that SRSF1 may have a vital role in posttranscriptional regulation in the testes.

      Thank you for the professional suggestions. Following this advice, we have corrected the text in this revised version (Page 4, Line 98 and 112).

      2) Line 141. In the mating scheme, Vasa-Cre Srsf1Fl/del mice should be obtained instead of Vasa-Cre Srsf1Fl/Fl mice.

      Thank you for the professional suggestions. Following this advice, we have corrected the text in this revised version (Page 4, Line 118).

      3) Fig 2 C, "PZLF" should be corrected to "PLZF".

      Thank you very much for the helpful comments. We have corrected this in Figure 2C.

      4) Fig 5 B, "VASA" and "Merge" should be interchanged.

      Thank you very much for the helpful comments. We have interchanged "VASA" and "Merge" in Figure 5B.

      5) Fig 5 D, "Ctrl" should be added in the up panel.

      Thank you very much for the helpful suggestions. We have added "Ctrl" in Figure 5C.

      6) The legend for Figure 6 D should be revised.

      Thank you very much for the helpful suggestions. We have revised the legend for Figure 7D

      7) The legend for Figure 7 G should be revised.

      Thank you very much for the helpful suggestions. We have revised the legend for Figure 8D

      8) Immunoprecipitation mass spectrometry (IP-MS) data showed that t SRSF1 interacts with other RNA splicing-related proteins (e.g., SRSF10, SART1, RBM15, SRRM2, SF3B6, and SF3A2). The authors should verify the interactions in testis or cells.

      We thank the reviewer for the professional comments and suggestions. Following this advice, we performed co-transfection and co-IP to verify the protein-protein interactions in 293T cells, the results showed that the RRM1 domain of SRSF1 interacted with SART1, RBM15 and SRSF10 in 293T cells. In addition, the fluorescence results showed complete co-localization of mCherry-SRSF1 with eGFP-SART1, eGFP-RBM15 and eGFP-SRSF10 in 293T cells. Therefore, we have incorporated the data into the Figure 9G-J. Meanwhile, these have been incorporated into the text, given descriptions, and highlighted (Page 17, Lines 338-347).

      9) To avoid overstatement, the authors should pay attention to the use of adjectives and adverbs in the article, especially when drawing conclusions about the role of Tail1.

      We thank the reviewer for the professional comments and suggestions. To avoid overstatement, we have revised the entire text (Page 4, Lines 98, and 112; Page 16, Lines 308; Page 17, Lines 346-347; Page 20, Lines 413-414; Page 21, Lines 432-433).

      Reviewer #2 (Recommendations For The Authors):

      Major

      1) I find the use of "SSC homing" misleading/confusing because this "homing" or relocation of postnatal gonocytes/nascent spermatogonia to the basement membrane precedes the maturation of the nascent spermatogonia into SSCs. In addition, "SSC homing" is commonly used in the SSC transplantation field to describe a transplanted SSC's ability to find and colonize its niche within the seminiferous tubules. I appreciate that "postnatal gonocytes/nascent spermatogonia homing" is not easily grasped by a broader audience. Perhaps "homing of precursor SSCs" is more appropriate.

      Thank you very much for the helpful comments and suggestions. Following this advice, we have corrected the text in this revised version (Line 1-2, 39, 44, 49, 54-55, 68, 70, 72-73, 77, 84, 93-95, 191, 201, 240, 384-387, 397, 417-422, and 433)

      2) If I am misunderstanding the description of the Srsf1 cKO phenotype, and the authors truly believe SSCs have formed in the Srsf1 cKO testis, I strongly recommend immunostaining to show that the cKO germ cells robustly express SSC markers, not just markers of undifferentiated spermatogonia.

      We thank the reviewer for the professional suggestions. We fully agree with the reviewer. Immunohistochemical staining for FOXO1 and statistical results indicated a reduced number of prospermatogonia (Figure 6C-E). So, we have corrected the text in this revised version (Line 1-2, 39, 44, 49, 54-55, 68, 70, 72-73, 77, 84, 93-95, 191, 201, 240, 384-387, 397, 417-422, and 433).

      3) If the authors have the available resources, the significance of this report would be enhanced by additional characterization of the cKO phenotype at the transition from gonocyte to nascent spermatogonia. Do any cKO germ cells exhibit defects in maturing from gonocytes to nascent spermatogonia at the molecular level? I.e., by P5-7, do all cKO germ cells express PLZF and localize FOXO1 to cytoplasm, as expected of nascent spermatogonia? If the cKO germ cells are actually a heterogenous population of gonocytes and nascent spermatogonia, what is the distribution of each subpopulation in the lumen vs basement membrane?

      Thank you for the professional suggestions. Following this advice, immunohistochemical staining for FOXO1 was performed on 5 dpp mouse testis sections (Figure 6C). Further, germ cell statistics of FOXO1 expression in the nucleus showed a reduced number of prospermatogonia in cKO mice (Figure 6D). And germ cells in which FOXO1 is expressed in the nucleus similarly undergo abnormal homing (Figure 6E). Thus, all the above data indicated that SRSF1 has an essential role in the homing of precursor SSCs. we have incorporated the data into the Figure 6C-E. Meanwhile, these have been incorporated into the text, given descriptions, and highlighted (Page 9, Lines 191-201; Page 20, Lines 389-391).

      Minor

      1) Could the authors clarify why Tial1 exon exclusion in the cKO results in reduced protein expression? Is it creating a transcript isoform that undergoes nonsense-mediated decay?

      Thank you for the professional suggestions. Following this advice, we analyzed Tial1 transcripts again, and we found that Tial1 exon exclusion resulted in reduced expression of protein isoform X2 (Figure 8J). Since this region is not in the CDS, no clear evidence of nonsense-mediated decay was found in the analysis.

      2) Could the authors confirm that the TIAL1 antibody is not detecting the portion of the protein encoded by the alternatively spliced exon?

      Thank you for the helpful comments. The TIAL1 monoclonal antibody is produced by Proteintech Group under the product number 66907-1-Ig. Immunogen is TIAL1 fusion protein Ag11981. The sequence is as follows. MDARVVKDMATGKSKGYGFVSFYNKLDAENAIVHMGGQWLGGRQIRTNWATRKPPAPKSTQENNTKQLRFEDVVNQSSPKNCTVYCGGIASGLTDQLMRQTFSPFGQIMEIRVFPEKGYSFVRFSTHESAAHAIVSVNGTTIEGHVVKCYWGKESPDMTKNFQQVDYSQWGQWSQVYGNPQQYGQYMANGWQVPPYGVYGQPWNQQGFGVDQSPSAAWMGGFGAQPPQGQAPPPVIPPPNQAGYGMASYQTQ The homology was 99% in mice and all TIAL1 isoforms were detected. So, TIAL1 antibody is detecting the portion of the protein encoded by the alternatively spliced exon.

      3) Lines 143: should "cKO" actually be "control"?

      Thank you for the helpful suggestions. There is a real problem in the text description. we have corrected the text in this revised version (Page 6, Line 138-139).

      4) Lines 272-3 "visual analysis using IGV showed the peak of Tial1/Tiar was stabilized in 5 dpp cKO mouse testes (Figure 7H)": "peak stabilization" is not evident to me from the figure nor do I see Tial1 listed as differentially expressed in the supplemental. I would refrain from using IGV visualization as the basis for the differential abundance of a transcript.

      Thank you very much for the helpful comments and suggestions. Tial1/Tiar is one of 39 stabilizing genes that are bound by SRSF1 and undergo abnormal AS. Following this advice, we have substituted Tial1/Tiar's FPKM for his peaks (Figure 8H). Meanwhile, we have corrected the text in this revised version (Page 15, Line 296-300; Page 16, Line 303-304).

      5) Lines 468-473: please clarify the background list used for GO enrichment analyses. By default, the genes expressed in the testis are enriched for spermatogenesis-related genes. To control for this and test whether a gene list is enriched for spermatogenesis-related genes beyond what is already seen in the testis, I recommend using a list of all expressed genes (for example, defined by TPM>=1) as the background list.

      We thank the reviewer for the professional comments and suggestions. Following this advice, all expressed genes (TPM sum of all samples >=1) are listed background for GO enrichment analyses. The results of GO enrichment analysis of the AS gene turned out to be the same. The results of GO enrichment analysis of the SRSF1 peak-containing genes, differential genes, and IP proteins-associated genes have corrected in the figure (Figure 2A, 7E, and 9E)

      6) Figure 2B: Could the authors mark where the statistically significant peaks appear on the tracks? There are many small peaks and it's unclear if they are significant or not.

      Thank you for the helpful suggestions. Following this advice, we have marked the areas of higher peaks in the figure (Figure 2B). We generally believe that any region above the peaks of IgG is likely to be a binding region, and of course, the higher the peak value, the more pre-mRNA is bound by SRSF1 in that region.

      7) Figure 7A: I assume the SRSF1 CLIP-seq genes are all the genes from the adult testis experiments. I would suggest limiting the CLIP-seq gene set to only those expressed in the P5 RNA-seq data, as if the target is not expressed at P5, there's no way it will be differentially expressed or differentially spliced in at P5.

      Thank you very much for the helpful comments and suggestions. Following this advice, we found that 3543 of the 4824 genes bound by SRSF1 were expressed in testes at 5 dpp. we have corrected in the figure (Figure 8A). these have been incorporated into the text, given descriptions, and highlighted (Page 14, Lines 274-277).

      8) Figure 7F: Could the authors clarify where the alternatively spliced exon is relative to the total transcript, shown in 7H?

      Thank you for the helpful suggestions. Following this advice, we have labeled the number of exons where variable splicing occurs. (Figure 8F).

      9) Please include where the sequencing and mass spec data will be publicly available.

      Thank you very much for the helpful comments and suggestions. Following this advice, these have been incorporated into the text, given descriptions, and highlighted (Page 25, Lines 560-565).

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improving the data and analysis

      1) The claim that TIAL1 mediates SRSF1 effects is not well supported; this claim should be adjusted or additional supporting data should be provided. To support a claim that alternative splicing of Tial1 mediates the effects of SRSF1, at least two additional pieces of data are needed: first, a demonstration that the two alternative protein isoforms have different molecular functions, either in vitro or in vivo; and second, a better quantitation of the levels and ratios of expression of the two different isoforms in vivo.

      Thank you for the helpful comments and suggestions. Following this advice, we quantified the expression levels and ratios of two different isoforms in vivo, and we found that Tial1 exon exclusion resulted in reduced expression of protein isoform X2 (Figure 8J). However, it is not possible to prove that the two alternative protein isoforms have different molecular functions. So, this claim has been adjusted in the text. these have been incorporated into the text, given descriptions, and highlighted (Lines 1-2, 43-45, 95, 306, 323-325, 408, 413-414).

      2) Likewise, the claim that "SRSF1 is required for "homing and self-renewal" of SSCs should be adjusted or better supported. As of now, the data supports a claim that SRSF1 is required for the establishment of the SSC population in the testis after birth. This could be due to defects in homing, self-renewal, or survival. To support claims about homing and self-renewal, these phenotypes should be tested more directly, for example by quantitating numbers of spermatogonia at the basal membrane in juvenile testes (homing) and expression of SSC markers in addition to the pan-germ cell marker VASA across early postnatal time points.

      Thank you very much for the helpful comments and suggestions. Immunohistochemical staining for FOXO1 was performed on 5 dpp mouse testis sections (Figure 6C). Further, germ cell statistics of FOXO1 expression in the nucleus showed a reduced number of prospermatogonia in cKO mice (Figure 6D). And germ cells in which FOXO1 is expressed in the nucleus similarly undergo abnormal homing (Figure 6E). Thus, all the above data indicated that SRSF1 has an essential role in the homing of precursor SSCs. we have incorporated the data into the Figure 6C-E. These have been incorporated into the text, given descriptions, and highlighted (Page 9, Lines 191-201; Page 20, Lines 387-389). Meanwhile, "homing and self-renewal" of SSCs have corrected the text in this revised version (Line 1-2, 39, 44, 49, 54-55, 68, 70, 72-73, 77, 84, 93-95, 191, 201, 240, 384-387, 397, 417-422, and 433).

      3) Additional, more detailed analyses of CLIP-seq and RNA-seq data at least showing that the libraries are of good quality should be provided.

      Thank you very much for suggestions. Following this advice, detailed analyses of RNA-seq data have been incorporated the data into the figures (Figure S2). But detailed analyses of CLIP-seq have already been used in another paper (Sun et al., 2023), and we have not provided it in order to avoid multiple uses of one figure. Meanwhile, we made a citation in the article (Page 4, Lines 105; Page 25, Lines 564-565).

      4) Gene Ontology analyses should be redone with a more appropriate background gene set.

      Thank you for the helpful suggestions. All expressed genes (TPM sum of all samples >=1) are listed background for GO enrichment analyses. The results of GO enrichment analysis of the AS gene turned out to be the same. The results of GO enrichment analysis of the SRSF1 peak-containing genes, differential genes, and IP proteins-associated genes have been corrected in the figure (Figure 2A, 7E, and 9E)

      Minor points about the text and figures

      5) The species (mouse) should be stated earlier in the Introduction.

      Thank you for the professional suggestions. Following this advice, the mouse has been stated earlier in the Introduction (Page 3, Line 65).

      6) In Fig. 1C (Western blot), the results would be more convincing if quantitation of band intensities normalized to the loading control was added.

      Thank you very much for comments and suggestions. Following this advice, ACTB served as a loading control. The value in 16.5 dpc testes were set as 1.0, and the relative values of testes in other developmental periods are indicated. Therefore, we have incorporated the data into the figures (Figure 1C).

      7) In Fig 5D, TUNEL signal in the single-channel image is difficult to see; please adjust the contrast.

      Thank you for the professional suggestions. Following this advice, the images of the channels have been replaced by enlarged images for better visibility (Figure 5C).

      Major comments

      1) In Fig 1D, it appears that SRSF1 is expressed most strongly in spermatogonia by immunofluorescence, but this is inconsistent with the sharp rise in expression detected by RT-qPCR at 20 days post partum (dpp) (Fig. 1B), which is when round spermatids are first added; this discrepancy should be explained or addressed.

      We appreciate the important comments from the reviewer. In another of our studies, we showed that SRSF1 expression is higher in pachytene spermatocytes and round spermatids (Sun et al., 2023). So, it is normal for the sharp rise in expression detected by RT-qPCR at 20 days post partum (dpp).

      Author response image 1.

      Dynamic localization of SRSF1 in male mouse germ cells. (Sun et al., 2023)

      2) It is important to provide a more comprehensive basic description of the CLIP-seq datasets beyond what is shown in the tracks shown in Fig. 2B. This would allow a better assessment of the data quality and would also provide information about the transcriptome-wide patterns of SRSF1 binding. No information or quality metrics are provided about the libraries, and it is not stated how replicates are handled to maximize the robustness of the analysis. The distribution of peaks across exons, introns, and other genomic elements should also be shown.

      Thank you very much for the helpful comments and suggestions. In fact, detailed analyses of CLIP-seq have already been presented in another paper (Sun et al., 2023), and we have not provided it in order to avoid multiple uses of one figure. Meanwhile, we made a citation in the article (Page 4, Lines 105; Page 25, Lines 564-565). In addition, the distribution of peaks in exons, introns, and other genomic elements is shown in Figure 2B.

      3) The claim that SRSF1 is required for "homing and self-renewal" of SSCs is made in multiple places in the manuscript. However, neither homing nor self-renewal is ever directly tested. A single image is shown in Fig. 5E of a spermatogonium at 5dpp that does not appropriately sit on the basal membrane, potentially indicating a homing defect, but this is not quantified or followed up. There is good evidence for depletion of spermatogonia starting at 7 dpp, but no further explanation of how homing and/or self-renewal fit into the phenotype.

      Thank you very much for the helpful comments and suggestions. Following this advice, immunohistochemical staining for FOXO1 was performed on 5 dpp mouse testis sections (Figure 6C). Further, germ cell statistics of FOXO1 expression in the nucleus showed a reduced number of prospermatogonia in cKO mice (Figure 6D). And germ cells in which FOXO1 is expressed in the nucleus similarly undergo abnormal homing (Figure 6E). Thus, all the above data indicated that SRSF1 has an essential role in the homing of precursor SSCs. we have incorporated the data into the Figure 6C-E. These have been incorporated into the text, given descriptions, and highlighted (Page 9, Lines 191-201; Page 20, Lines 387-389). Meanwhile, "homing and self-renewal" of SSCs have corrected the text in this revised version (Line 1-2, 39, 44, 49, 54-55, 68, 70, 72-73, 77, 84, 93-95, 191, 201, 240, 384-387, 397, 417-422, and 433).

      4) In Fig. 6A (lines 258-260) very few genes downregulated in the cKO are bound by SRSF1 and undergo abnormal splicing. The small handful that falls into this overlap could simply be noise. A much larger fraction of differentially spliced genes are CLIP-seq targets (~33%), which is potentially interesting, but this set of genes is not explored.

      Thank you for the helpful comments. Following this advice, this was specifically indicated by the fact that 39 stabilizing genes were bound by SRSF1 and underwent abnormal AS. In our study, Tial1/Tiar is one of 39 stabilizing genes that are bound by SRSF1 and undergo abnormal AS. Therefore, we fully agree with the reviewers' comments. These have been added in this revised version (Page 14, Lines 279-280; Page 15, Lines 296-300).

      5) The background gene set for Gene Ontology analyses is not specified. If these were done with the whole transcriptome as background, one would expect enrichment of spermatogenesis genes simply because they are expressed in testes. The more appropriate set of genes to use as background in these analyses is the total set of genes that are expressed in testis.

      We thank the reviewer for the professional comments and suggestions. All expressed genes (TPM sum of all samples >=1) are listed background for GO enrichment analyses. The results of GO enrichment analysis of the AS gene turned out to be the same. The results of GO enrichment analysis of the SRSF1 peak-containing genes, differential genes, and IP proteins-associated genes have been corrected in the figure (Figure 2A, 7E, and 9E)

      6) In general, the model is over-claimed: aside from interactions by IP-MS, little is demonstrated in this study about how SRSF1 affects alternative splicing in spermatogenesis, or how alternative splicing of TIAL1 specifically would result in the phenotype shown. It is not clear why Tial1/Tiar is selected as a candidate mediator of SRSF1 function from among the nine genes that are downregulated in the cKO, are bound by SRSF1, and undergo abnormal splicing. Although TIAL1 levels are reduced in cKO testes by Western blot (Fig. 7J), this could be due just be due to a depletion of germ cells from whole testis. The reported splicing difference for Tial1 seems very subtle and the ratio of isoforms does not look different in the Western blot image.

      Thank you very much for the helpful comments and suggestions. In our study, Tial1/Tiar is one of 39 stabilizing genes that are bound by SRSF1 and undergo abnormal AS. However, Western blotting showed that expression levels of TIAL1/TIAR isoform X2 were significantly suppressed (Figure 8J). So, the data indicate that SRSF1 is required for TIAL1/TIAR expression and splicing.

      Sun, L., Chen, J., Ye, R., Lv, Z., Chen, X., Xie, X., Li, Y., Wang, C., Lv, P., Yan, L., et al. (2023). SRSF1 is crucial for male meiosis through alternative splicing during homologous pairing and synapsis in mice. Sci Bull 68, 1100-1104. 10.1016/j.scib.2023.04.030.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This paper presents a computational model of the evolution of two different kinds of helping ("work," presumably denoting provisioning, and defense tasks) in a model inspired by cooperatively breeding vertebrates. The helpers in this model are a mix of previous offspring of the breeder and floaters that might have joined the group, and can either transition between the tasks as they age or not. The two types of help have differential costs: "work" reduces "dominance value," (DV), a measure of competitiveness for breeding spots, which otherwise goes up linearly with age, but defense reduces survival probability. Both eventually might preclude the helper from becoming a breeder and reproducing. How much the helpers help, and which tasks (and whether they transition or not), as well as their propensity to disperse, are all evolving quantities. The authors consider three main scenarios: one where relatedness emerges from the model, but there is no benefit to living in groups, one where there is no relatedness, but living in larger groups gives a survival benefit (group augmentation, GA), and one where both effects operate. The main claim is that evolving defensive help or division of labor requires the group augmentation; it doesn't evolve through kin selection alone in the authors' simulations.

      This is an interesting model, and there is much to like about the complexity that is built in. Individual-based simulations like this can be a valuable tool to explore the complex interaction of life history and social traits. Yet, models like this also have to take care of both being very clear on their construction and exploring how some of the ancillary but potentially consequential assumptions affect the results, including robust exploration of the parameter space. I think the current manuscript falls short in these areas, and therefore, I am not yet convinced of the results. Much of this is a matter of clearer and more complete writing: the Materials and Methods section in particular is incomplete or vague in some important junctions. However, there are also some issues with the assumptions that are described clearly.

      Below, I describe my main issues, mostly having to do with model features that are unclear, poorly motivated (as they stand), or potentially unrealistic or underexplored.

      We would like to thank the reviewer for the thoughtful comments that helped us to greatly improve the clarity of our paper.  

      One of the main issues I have is that there is almost no information on what happens to dispersers in the model. Line 369-67 states dispersers might join another group or remain as floaters, but gives no further information on how this is determined. Poring through the notation table also comes up empty as there is no apparent parameter affecting this consequential life history event. At some point, I convinced myself that dispersers remain floaters until they die or become breeders, but several points in the text contradict this directly (e.g., l 107). Clearly this is a hugely important model feature since it determines fitness cost and benefits of dispersal and group size (which also affects relatedness and/or fitness depending on the model). There just isn't enough information to understand this crucial component of the model, and without it, it is hard to make sense of the model output.

      We use the same dispersal gene β to represent the likelihood an individual will either leave or join a group, thereby quantifying both dispersal and immigration using the same parameter. Specifically, individuals with higher β are more likely to remain as floaters (i.e., disperse from their natal group to become a breeder elsewhere), whereas those with lower β are either more likely to remain in their natal group as subordinates (i.e., queue in a group for the breeding position) or join another group if they dispersed.  

      We added in the text “Dispersers may migrate to another group to become subordinates or remain as floaters waiting for breeding opportunities, which is also controlled by the same genetic dispersal propensity as subordinates” to clarify this issue. We also added in Table 1 that β is the “genetic predisposition to disperse versus remain in a group”, and to Figure 1 that “subordinates in the group (natal and immigrants) […]” after we already clarified that “Dispersers/floaters may join a random group to become subordinates.”

      Related to that, it seems to be implied (but never stated explicitly) that floaters do not work, and therefore their DV increases linearly with age (H_work in eq.2 is zero). That means any floaters that manage to stick around long enough would have higher success in competition for breeding spots relative to existing group members. How realistic is this? I think this might be driving the kin selection-only results that defense doesn't evolve without group augmentation (one of the two main ways). Any subordinates (which are mainly zero in the no GA, according to the SI tables; this assumes N=breeder+subordinates, but this isn't explicit anywhere) would be outcompeted by floaters after a short time (since they evolve high H and floaters don't), which in turn increases the benefit of dispersal, explaining why it is so high. Is this parameter regime reasonable? My understanding is that floaters often aren't usually high resource holding potential individuals (either b/c high RHP ones would get selected out of the floater population by establishing territories or b/c floating isn't typically a thriving strategy, given that many resources are tied to territories). In this case, the assumption seems to bias things towards the floaters and against subordinates to inherit territories. This should be explored either with a higher mortality rate for floaters and/or a lower DV increase, or both.

      When it comes to floaters replacing dead breeders, the authors say a bit more, but again, the actual equation for the scramble competition (which only appears as "scramble context" in the notation table) is not given. Is it simply proportional to R_i/\sum_j R_j ? Or is there some other function used? What are the actual numbers of floaters per breeding territory that emerge under different parameter values? These are all very important quantities that have to be described clearly.

      Although it is true that dispersers do not work when they are floaters, they may later help if they immigrate into a group as a subordinate. Consequently, immigrant subordinates have no inherent competitive advantage over natal subordinates (as step 2.2. “Join a group” is followed by step 3. “Help”, which occurs before step 5. “Become a breeder”). Nevertheless, floaters can potentially outcompete subordinates of the same age if they attempt to breed without first queuing as a subordinate (step 5) when subordinates are engaged in work tasks. We believe that this assumption is realistic and constitutes part of the costs associated with work tasks. However, floaters are at a disadvantage for becoming a breeder because: (1) floaters incur higher mortality than individuals within groups (Eq. 3); and (2) floaters may only attempt to become breeders in some breeding cycles (versus subordinate groups members, who are automatically candidates for an open breeding position in the group in each cycle). Therefore, due to their higher mortality, floaters are rarely older than individuals within groups, which heavily influences their dominance value and competitiveness. Additionally, any competitive advantage that floaters might have over other subordinate group members is unlikely to drive the kin selection-only results because subordinates would preferably choose defense tasks instead of work tasks so as not to be at a competitive disadvantage compared to floaters.  

      Regarding whether floaters aren't usually high resource holding potential (RHP) individuals and, therefore, our assumptions might be unrealistic; empirical work in a number of species has shown that dispersers are not necessarily those of lower RHP or of lower quality. In fact, according to the ecological constraints hypothesis, one might predict that high quality individuals are the ones that disperse because only individuals in good condition (e.g., larger body size, better energy reserves) can afford the costs associated with dispersal (Cote et al., 2022). To allow differences in dispersal propensity depending on RHP, we extended our model in the Supplemental Materials by incorporating a reaction norm of dispersal based on their rank (D = 1 / (1 + exp (β<sub>R</sub> * Rβ<sub>0</sub>)) under the section “Dominance-dependent dispersal propensities” and now referenced in L195. This approach allows individuals to adjust their dispersal strategy to their competitiveness and to avoid kin competition by remaining as a subordinate in another group. Results show that the addition of the reaction norm of dispersal to rank did not qualitatively influence the results described in the main text.  

      We also added “number of floaters” present in the whole population to the summary tables as requested.  

      As a side note, the “scramble context” we mention was an additional implementation in which we made rank independent of age. However, since the main conclusions remained unchanged, we decided to remove it for simplicity from the final manuscript, but we forgot to remove it from Table 1 before submission.  

      I also think the asexual reproduction with small mutations assumption is a fairly strong one that also seems to bias the model outcomes in a particular way. I appreciate that the authors actually measured relatedness within groups (though if most groups under KS have no subordinates, that relatedness becomes a bit moot), and also eliminated it with their ingenious swapping-out-subordinates procedure. The fact remains that unless they eliminate relatedness completely, average relatedness, by design, will be very high. (Again, this is also affected by how the fate of the dispersers is determined, but clearly there isn't a lot of joining happening, just judging from mean group sizes under KS only.) This is, of course, why there is so much helping evolving (even if it's not defensive) unless they completely cut out relatedness.

      As we showed in the Supplementary Tables and the section on relatedness in the SI (“Kin selection and the evolution of division of labor"), high relatedness does not appear to explain our results. In evolutionary biology generally and in game theory specifically (with the exception of models on sexual selection or sex-specific traits), asexual reproduction is often modelled because it reduces unnecessary complexity. To further study the effect of relatedness on kin structures more closely resembling those of vertebrates, however, we created an additional “relatedness structure level”, where we shuffled half of the philopatric offspring using the same method used to remove relatedness completely, effectively reducing withingroup relatedness structure by half. As shown in the new Figure S3, the conclusions of the model remain unchanged.  

      Finally, the "need for division of labor" section is also unclear, and its construction also would seem to bias things against division of labor evolving. For starters, I don't understand the rationale for the convoluted way the authors create an incentive for division of labor. Why not implement something much simpler, like a law of minimum (i.e., the total effect of helping is whatever the help amount for the lowest value task is) or more intuitively: the fecundity is simply a function of "work" help (draw Poisson number of offspring) and survival of offspring (draw binomial from the fecundity) is a function of the "defense" help. As it is, even though the authors say they require division of labor, in fact, they only make a single type of help marginally less beneficial (basically by half) if it is done more than the other. That's a fairly weak selection for division of labor, and to me it seems hard to justify. I suspect either of the alternative assumptions above would actually impose enough selection to make division of labor evolve even without group augmentation.

      In nature, multiple tasks are often necessary to successfully rear offspring. We simplify this principle in the model by maximizing reproductive output when both tasks are carried out to a similar extent, allowing for some flexibility from the mean. We added to the manuscript “For example, in many cooperatively breeding birds, the primary reasons that individuals fail to produce offspring are (1) starvation, which is mitigated by the feeding of offspring, and (2) nest depredation, which is countered by defensive behavior. Consequently, both types of tasks are necessary to successfully produce offspring, and focusing solely on one while neglecting the other is likely to result in lower reproductive success than if both tasks are performed by individuals within the group.”

      Regarding making fecundity a function of work tasks and offspring survival as a function of defensive tasks, these are actually equivalent in model terms, as it’s the same whether breeders produce three offspring and two die, or if they only produce one. This represents, of course, an oversimplification of the natural context, where breeding unsuccessfully is more costly (in terms of time and energy investment) than not breeding at all.

      Overall, this is an interesting model, but the simulation is not adequately described or explored to have confidence in the main conclusions yet. Better exposition and more exploration of alternative assumptions and parameter space are needed.

      We hope that our clarifications and extension of the model satisfy your concerns.  

      Reviewer #2 (Public review):

      Summary:

      This paper formulates an individual-based model to understand the evolution of division of labor in vertebrates. A main conclusion of the paper is that direct fitness benefits are the primary factor causing the evolution of vertebrate division of labor, rather than indirect fitness benefits.

      Strengths:

      The paper formulates an individual-based model that is inspired by vertebrate life history. The model incorporates numerous biologically realistic details, including the possibility to evolve age polytheism where individuals switch from work to defence tasks as they age or vice versa, as well as the possibility of comparing the action of group augmentation alone with that of kin selection alone.

      Weaknesses:

      The model makes assumptions that restrict the possibility that kin selection leads to the evolution of helping. In particular, the model assumes that in the absence of group augmentation, subordinates can only help breeders but cannot help non-breeders or increase the survival of breeders, whereas with group augmentation, subordinates can help both breeders and non-breeders and increase the survival of breeders. This is unrealistic as subordinates in real organisms can help other subordinates and increase the survival of non-breeders, even in the absence of group augmentation, for instance, with targeted helping to dominants or allies. This restriction artificially limits the ability of kin selection alone to lead to the evolution of helping, and potentially to division of labor. Hence, the conclusion that group augmentation is the primary driving factor driving vertebrate division of labor appears forced by the imposed restrictions on kin selection. The model used is also quite particular, and so the claimed generality across vertebrates is not warranted.

      We would like to thank the reviewer for the in-depth review. We respond to these and other comments below.  

      I describe some suggestions for improving the paper below, more or less in the paper's order.

      First, the introduction goes to great lengths trying to convince the reader that this model is the first in this or another way, particularly in being only for vertebrates, as illustrated in the abstract where it is stated that "we lack a theoretical framework to explore the conditions under which division of labor is likely to evolve" (line 13). However, this is a risky and unnecessary motivation. There are many models of division of labor and some of them are likely to be abstract enough to apply to vertebrates even if they are not tailored to vertebrates, so the claims for being first are not only likely to be wrong but will put many readers in an antagonistic position right from the start, which will make it harder to communicate the results. Instead of claiming to be the first or that there is a lack of theoretical frameworks for vertebrate division of labor, I think it is enough and sufficiently interesting to say that the paper formulates an individual-based model motivated by the life history of vertebrates to understand the evolution of vertebrate division of labor. You could then describe the life history properties that the model incorporates (subordinates can become reproductive, low relatedness, age polyethism, etc.) without saying this has never been done or that it is exclusive to vertebrates; indeed, the paper states that these features do not occur in eusocial insects, which is surprising as some "primitively" eusocial insects show them. So, in short, I think the introduction should be extensively revised to avoid claims of being the first and to make it focused on the question being addressed and how it is addressed. I think this could be done in 2-3 paragraphs without the rather extensive review of the literature in the current introduction.

      We have revised the novelty statements in the Introduction by more clearly emphasizing how our model addresses gaps in the existing literature. More details are provided in the comments below.

      Second, the description of the model and results should be clarified substantially. I will give specific suggestions later, but for now, I will just say that it is unclear what the figures show. First, it is unclear what the axes in Figure 2 show, particularly for the vertical one. According to the text in the figure axis, it presumably refers to T, but T is a function of age t, so it is unclear what is being plotted. The legend explaining the triangle and circle symbols is unintelligible (lines 227-230), so again it is unclear what is being plotted; part of the reason for this unintelligibility is that the procedure that presumably underlies it (section starting on line 493) is poorly explained and not understandable (I detail why below). Second, the axes in Figure 3 are similarly unclear. The text in the vertical axis in panel A suggests this is T, however, T is a function of t and gamma_t, so something else must be being done to plot this. Similarly, in panel B, the horizontal axis is presumably R, but R is a function of t and of the helping genotype, so again some explanation is lacking. In all figures, the symbol of what is being plotted should be included.

      We added the symbols of the variables to the Figure axes to increase clarity. In Figure 3A, we corrected the subindex t in the x-axis; it should be subindex R (reaction norm to dominance rank instead of age). As described in Table 1, all values of T, H and R are phenotypically expressed values. For instance, T values are the phenotypically expressed values from the individuals in the population according to their genetic gamma values and their current dominance rank at a given time point.  

      Third, the conclusions sound stronger than the results are. A main conclusion of the paper is that "kin selection alone is unlikely to select for the evolution of defensive tasks and division of labor in vertebrates" (lines 194-195). This conclusion is drawn from the left column in Figure 2, where only kin selection is at play, and the helping that evolves only involves work rather than defense tasks. This conclusion follows because the model assumes that without group augmentation (i.e., xn=0, the kin selection scenario), subordinates can only help breeders to reproduce but cannot help breeders or other subordinates to survive, so the only form of help that evolves is the least costly, not the most beneficial as there is no difference in the benefits given among forms of helping. This assumption is unrealistic, particularly for vertebrates where subordinates can help other group members survive even in the absence of group augmentation (e.g., with targeted help to certain group members, because of dominance hierarchies where the helping would go to the breeder, or because of alliances where the helping would go to other subordinates). I go into further details below, but in short, the model forces a narrow scope for the kin selection scenario, and then the paper concludes that kin selection alone is unlikely to be of relevance for the evolution of vertebrate division of labor. This conclusion is particular to the model used, and it is misleading to suggest that this is a general feature of such a particular model.

      The scope of this paper was to study division of labor in cooperatively breeding species with fertile workers (i.e., primarily vertebrates), in which help is exclusively directed towards breeders to enhance offspring production (i.e., alloparental care). Our focus is in line with previous work in most other social animals, including eusocial insects and humans, which emphasizes how division of labor maximizes group productivity. Other forms of “general” help are not considered in the paper, and such forms of help are rarely considered in cooperatively breeding vertebrates or in the division of labor literature, as they do not result in task partitioning to enhance productivity.

      Overall, I think the paper should be revised extensively to clarify its aims, model, results, and scope of its conclusions.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):

      I reserved this section for more minor comments, relating to clarity and a general admonition to give us more detail and exploration of some basic population genetic quantities.

      Another minor point, although depending on whether I assume right or wrong, it could be major: I am not entirely sure that dispersers help in the groups they join as helpers, because of line 399, which states specifically that individuals who do remain in natal territories do. But I assume dispersers help (elsewhere, the authors state helping is not conditional on relatedness to the breeder). Otherwise, this model becomes even weirder for me. Either way, please clarify.

      Apologies if this was not clear. Immigrants that join a group (so dispersers from another group) as a subordinate help and queue for a breeding position, as does any natal subordinate born into the group. We rephased the sentence to “Subordinate group members, either natal or immigrants to the group, […]”  

      More generally, in simulation studies like this, there can be interactions between the strength of selection (which affects overall genetic variation maintained in the population), population size, and mutation rate/size, which can affect, for example, relatedness values. None of these quantities is explored here (and their interactions are not quantified), so it is not possible to evaluate the robustness of any of these results.

      Thank you for your comments about the parameter landscape. It is important to point out that variations in the mutation rate do not qualitatively affect our results, as this is something we explored in previous versions of the model (not shown). Briefly, we find that variations in the mutation rates only alter the time required to reach equilibrium. Increasing the step size of mutation diminishes the strength of selection by adding stochasticity and reducing the genetic correlation between offspring and their parents. Population size could, in theory, affect our results, as small populations are more prone to extinction. Since this was not something we planned to explore in the paper directly, we specifically chose a large population size, or better said, a large number of territories (i.e. 5000) that can potentially host a large population.  

      The authors also never say how it is actually determined. There is the evolved helping variable, and there is also the evolved reaction norm. I assume that the actual amount of help of each type is given by the product of T (equation 1) and H (for defense) and (1-T) and H (for work), but this should be stated explicitly.  

      Help provided is an interaction between H (total effort) and T (proportion of total effort invested in each type of task). To clarify the distinction between these two processes, we have now added “Hence, the gene α regulates the amount of help expressed, while the genes γ determine which specific helping tasks are performed at different time points in the breeding cycle”.  

      It is also weird that after introducing the T variable as a function of age, Figure 3 actually depicts it as a function of dominance value.

      Thank you for pointing out an error in Eq. 1. This inequality was indeed written incorrectly in the paper (but is correct in the model code); it is dominance rank instead of age (see code in Individual.cpp lines 99-119). We corrected this mistake throughout the manuscript.

      What is "scramble context"?

      “Scramble context” was an additional implementation that we decided to remove from the final manuscript, but we forgot to remove from Table 1 before submission. We have now removed it from the table.

      Reviewer #2 (Recommendations for the authors):

      Some specific comments:

      (1) L 31: "All theoretical..." These absolute statements are risky and unnecessary.

      Rephrased to “To date, most theoretical and empirical work…”

      (2) L 46: I believe Tom Wenseleers has published on the evolution of division of labor with reproductive workers and high within-colony conflict.

      Tom Wenseleers has indeed produced some models on the evolution of cooperation in social insects where some workers may reproduce. However, these models focus on the relevance of relatedness and policing selecting for a reduction in within-group conflict and the evolution of reproductive division of labor. Our model focuses instead on division of labor among workers (helpers). We have rephased this section to “task specialization is linked to sterility and where conflict of interest is generally low” to account for species of social insect in which variation in relatedness between group members and higher levels of reproductive conflict may arise. We also cited one of his papers.  

      (3) L 57: Again, unnecessary categorical statements.

      Rephrased to “Although a great deal of recent empirical work highlights the importance of direct benefits in the evolution of cooperative breeding behavior in vertebrates [21–24], we lack understanding on the joint influence of direct and indirect fitness benefits in the evolution of division of labor.”

      (4) L 67: This is said to be a key distinction, but in the paper, such a key role is not clearly shown. This and other tangential points are unnecessary to keep the introduction to the point.

      The different fitness costs of different tasks is the basis of our model on division of labor. Therefore, this is a key distinction and basis from which to describe different tasks in the model. We have left this sentence unchanged.

      (5) L 61-73: "In vertebrates, however, helpers may obtain fitness benefits directly via reproduction..." Some social insects may do so as well. It seems unnecessary and incorrect to say that vertebrate sociality is fundamentally different from invertebrate one. I think it is sufficiently interesting to say this work aims to understand vertebrate division of labor, by explicitly modeling aspects of its life history, without saying this can't happen in invertebrates or that no other model has ever done anything like it.

      Our point is not that, in some social insects, workers cannot obtain direct fitness benefits, but that previous models where the focus is on the colony reproductive outcome are only a good approximation to eusocial insect with sterile workers. However, to make this clearer we have added “In vertebrates and social insect with fertile workers, however, helpers may obtain fitness benefits directly via […]”.  

      (6) L 74-86: By this point, the introduction reads like a series of disconnected comments without a clear point.

      In L60 we added: “Understanding how direct and indirect benefits interact is particularly important in systems where individuals may differentially bear the fitness costs of cooperation”. By adding this sentence, we emphasize our focus on the largely unexplored direct fitness benefits and costs, as well as their interaction with indirect fitness. We then proceed to explain why it is crucial to consider that tasks have varying direct fitness costs and how the fitness benefits derived from cooperation change with age and resource-holding potential. These elements are essential for studying the division of labour in species with totipotent workers.

      (7) L 87: This sentence gives a clear aim. It would be clearer if the introduction focused on this aim.

      With the new sentence added in L60 (see previous comment), we bring the focus to the main question that we are trying to address in this paper earlier in the Introduction.  

      (8) L 88: "stochastic model" should be changed to "individual-based model".

      Done.

      (9) L 104: "limited number" is unclear. Say a fixed finite number, or something specific.

      Done.

      (10) L 105: "unspecified number" is unclear. Say the number of subordinates emerges from the population dynamics.

      Changed to “variable number of subordinate helpers, the number of which is shaped by population dynamics, with all group members capable of reproducing during their lifetime”.

      (11) L 112: "Dispersers" is used, but in the previous lines 107-109, the three categories introduced used different terms. Those three terms introduced should be used consistently throughout the paper, without using two or more terms for one thing.

      We use the term “disperser” to describe individuals that disperse from their natal group.

      Dispersers can assume one of three roles: (1) they can join another group as "subordinates"; (2) they can join another group as "breeders" if they successfully outcompete others; or (3) they can remain as "floaters" if they fail to join a group. "Floaters" are individuals who persist in a transient state without access to a breeding territory, waiting for opportunities to join a group in an established territory. We rephased the sentence to “Dispersers cannot reproduce without acquiring a territory (denoted here as floaters)”. This was also clarified in other instances where the term “dispersers” was used (e.g. L407). Other instances where this might not have been so clear, we replace “dispersers” with “floaters”.  

      (12) L 112: "(floaters)" Unclear parenthesis.

      See previous comment.  

      (13) L 115: There should be a reference to Methods around here.

      Added a reference to Figure 1.

      (14) L 117: To be clearer, say instead that dominance value is a linearly increasing function of age as a proxy of RHP and a linearly decreasing function of help provided due to the costs of working tasks. And refer to equation 2.

      Rephrased to “We use the term dominance value to designate the competitiveness of an individual compared to other candidates in becoming a breeder, regardless of group membership, that increases as a function of age, serving as a proxy for resource holding potential (RHP), and decreases as a function of help provided, reflecting costs to body condition from performing working tasks (Eq. 2).” We did not include “linearly” to keep it simpler, since it is clear from Eq. 2, which is now referenced here.  

      (15) L 119: "Subordinate helpers". As all subordinates are helpers, the helper qualifier is confusing.

      Subordinates are not necessarily helpers, as they can evolve help values of 0, hence, why we make it explicit here.

      (16) L 119: "choose". This terminology may be misleading. The way things are implemented in the model is that individuals are assigned a task depending on their genetic traits gamma. Perhaps it would be better to use a less intentional term, like perform one of two tasks.

      We changed “choose between two” to “engage in one of two”, which has less connotations of intentionality.

      (17) L 124: "Subordinates can [...] exhibit task specialization that [...] varies with their dominance value". It should be that it varies with age.

      Apologies. The equation was wrong; it does vary with dominance value. We corrected it accordingly.

      (18) L 133: "maximised" This is apparently important for the modelling procedure, but it is completely unclear what it means. Equation 4 comes out of nowhere, and it is said that such an equation is the maximum amount of help that can affect fecundity. Why? What does this mean? If there is something that is maximised, this should be proven. This value is then used for something (line 507), but it is unclear why or what it is used for (it says "we use the value of Hmax instead" without saying what for, no justification for the listed inequalities are given, and the claimed maximisation of an unspecified variable at those H values is not proven). Moreover, the notation in this section is also unclear: what are the sums over? Also, Hdefence and Hwork should vary over the index that is summed over, but the notation suggests that those quantities don't vary.

      We changed “maximized” to “greatest”, and we added a clarification to the rationality behind the maximization of the impact of help in the breeder’s productivity: “For example, in many cooperatively breeding birds, the primary reasons that breeders fail to produce offspring are (1) starvation, which is mitigated by the feeding of offspring, here considered as a work task, and (2) nest depredation, which is countered by defensive behavior. Consequently, both types of tasks are often necessary for successful reproduction, and focusing solely on one while neglecting the other is likely to result in lower reproductive success than if both tasks are performed by helpers within the group.”

      We now also clarify that the sums are for help given within a group (L 507), and added indexes to the equations.

      (19) L 152: "habitat saturation" How is this implemented? How is density dependence implemented? Or can the population size keep increasing indefinitely? It would be good to plot the population size over time, the group size over time, and the variance in group size over time. This could substantiate later statements about enhancing group productivity and could all be shown in the SI.

      Habitat saturation emerges from population dynamics due to the limited availability of territories and the fluctuating number of individuals, leading highly productive environments to experience habitat saturation. Although the number of group members is not restricted in our model, the population could theoretically increase indefinitely. However, this is not observed in the results presented here, as we selected parameter landscapes that stabilize population numbers. We confined our parameters to those where the population neither increased indefinitely (nor collapsed), as we did not incorporate density-dependent mortality traits for simplification. Consequently, the group size in the SI, where the standard deviation is already included, closely represents group size at any other given time during equilibrium.

      L 336: we changed “environments with habitat saturation” to “environments that lead to habitat saturation”, to increase clarity.

      (20) L 152: "lifecycle". Rather than the lifecycle, the figure describes the cycle of events in a single time step. The lifecycle (birth to death) goes over multiple time steps (as individuals live over multiple steps). So this figure shouldn't be called a life cycle.

      We changed “lifecycle” to “breeding cycle”.

      (21) L 156: "generation". This is not a generation but a time step.

      We changed “generation” to “breeding cycle”.

      (22) L 157: "previous life cycle" would mean that the productivity of a breeder depends on the number of helpers that its parents had, which is not what is meant.

      We changed “lifecycle” to “breeding cycle”.

      (23) L 158: "Maximum productivity is achieved when different helping tasks are performed to a similar extent." Again, unclear why that is the case.

      We added a clarification on this, see response to comment 18.  

      (24) L 160: "Dispersers/floaters". Use just one term for a single thing.

      See response to comment 11.   

      (25) L 162: "dispersal costs". I don't recall these being described in Methods.

      Individuals that disperse do not enjoy the protection of living in a territory and within a group of other individuals, so they have a higher mortality risk, described in Eq. 3.3. (negative values in the exponential part of the equation increase survival). The cost of dispersal is the same as individuals that remain as floaters at a given time step.

      (26) L 164: "generation" -> time step.

      We changed this to “breeding cycle”.  

      (27) L 170: "Our results show that division of labor initially emerges because of direct fitness benefits..." This is a general statement, but the results are only particular to the model. So this statement and others in the manuscript should be particular to the model. Also, Figure 2 doesn't say anything about what evolves "initially" as it only plots evolutionary equilibria.

      We rephrased this statement to “Our results suggest that voluntary division of labor involving tasks with different fitness costs is more likely to emerge initially because of direct fitness benefits”, to more accurately represent the conditions under which we modeled the division of labor.  

      Our reference to “initially” is regarding group formation (family groups versus aggregations of unrelated individuals or a mix). This is shown in the comparison between the different graphs at equilibrium. The initial state of the simulation is that all individuals disperse and do not cooperate.  

      (28) L 171: "but a combination of direct and indirect fitness benefits leads to higher rates and more stable forms of division of labor". What do you mean by "higher rates and more stable forms of division of labor"? Say how division of labor is shown in the figure (with intermediate T?).

      Yes, intermediate values of T show division of labor if γR ≠ 0. This is described under the section “The role of dominance in task specialization”. We added “with intermediate values suggesting a division of labor” to the Figure 2 legend.  

      (29) L173-175: "as depicted in Figure 2, intermediate values of task specialization indicate in all cases age/dominance-mediated task specialization (γt ≠ 0; Table 1) and never a lack of specialization (γt = 0; Table 1)". This sentence is unclear and imprecise. Does this sentence want to say that in Figure 2, all plots with intermediate values of T involve gamma t different from zero? If so, just say that.

      Rephrased to: “In Figure 2, all plots depicting intermediate values of T exhibit non-zero γR values and, hence, division of labor”.

      (30) L179-180: "forms of help that impact survival never evolve under any environmental condition when only kin selection occurs". This is misleading because under the KS scenario, help cannot positively impact survival in this model, so they never evolve.

      Help cannot affect survival but could potentially affect group persistence. If helpers increase breeder productivity and offspring remain philopatric and queue for the breeding position, then they will receive help from related individuals.   

      (31) L 210: "initially". What do you mean by that?

      Help only evolves in our model in family groups, which may then open the door for the evolution of help in mixed-kin groups. Therefore, we use “initially” to refer to the ancestral group structure that likely led to cooperation under benign environmental conditions. We rephased this section to “in more benign (and often highly productive) environments that lead to habitat saturation, help likely evolved initially in family groups, and defensive tasks are favored because competition for the breeding position is lower under kin selection.”

      (32) L 212: "kin selection is achieved". What does that mean?

      Rephased to “kin selection acts not only by selecting subordinates in their natal group to increase the productivity of a related breeder […]”

      (33) L 216: "division of labor seems to be more likely to evolve in increasingly harsh environments". Say in parentheses where this is shown.

      Added.  

      (34) L 218: "help evolves in benign environments". I don't see where this is shown. Figure 2 doesn't show that H is higher with lower m (e.g., in KS+GA column).

      Help does not evolve in benign environments under only direct fitness benefits derived from group augmentation (shown in Figure 2).  

      (35) L 225: "y-axis" should be "vertical axis", as y has another meaning in the model.

      Done.

      (36) L 226: "likelihood". Here and throughout, "likelihood" should be changed to probability. Likelihood means something else.

      Thank you for the advice, we have corrected this through the manuscript.  

      (37) L 236: "the slope of the reaction norm for the dominance value in task specialization".

      Unclear. Clearer to say: the rate at which individuals to shift from defense to work as they age.

      The important part is not so much the rate but the direction, that is, from work task to defense (or vice versa) as their rank increases. Changed to “the direction and rate of change in task specialization with dominance”.

      (38) L 257: "(task = 0; cost to dominance value)," This seems out of place.

      This aims to clarify that work tasks have a cost to dominance, while defense tasks have a cost to survival. This is particularly relevant in this model since different helping tasks are defined by their fitness costs.

      (39) L 258: "increase"-> "increase with age".

      Added “with dominance”.

      (40) L 262: "division of labor equilibria" What is that?

      Changed to “at equilibrium when division of labor evolves”

      (41) L 268: "Our findings suggest that direct benefits of group living play a driving role in the evolution of division of labor via task specialization in species with totipotent workers". This is a very general statement, but the results are much more circumscribed. First, the model is quite specific by assuming that, in the absence of group augmentation (xn=0), indirect fitness benefits can only be given to breeders (Equation 5) but not to other subordinates (Equations 2, 3.1). This is unrealistic, particularly for vertebrates, and reduces the possibility that indirect fitness benefits play a role.  

      As previously discussed, the scope of this paper was to study division of labor in cooperatively breeding species with fertile workers in which help is exclusively directed towards breeders to enhance offspring production through alloparental care. Other forms of “general” help do not result in task partitioning to enhance productivity.

      Second, the difference in costs of work and defense are what drive the evolution of "division of labor" (understood as intermediate T in case this is what the authors mean) in the KS scenario, but the functional forms of those two costs are quite specific and not of the same form, so these functions may bias the results found. Specifically, R is an unbounded linear function of work and the effect of this function becomes weaker as the individual ages due to the weakening force of selection with age (Equation 2) whereas Sh is a particular bounded nonlinear function of defense (Equation 3.1). These differences may tend to make the effect of Sh stronger due to the particular functions chosen.  

      The difference in costs is inherent to the nature of the different tasks (work versus defense): while survival is naturally bounded, with death as the lower bound, dominance costs are potentially unbounded, as they are influenced by dynamic social contexts and potential competitors. Therefore, we believe that the model’s cost structure is not too different from that in nature.  

      Third, no parameter sweep is given to see to what extent these results hold across the many parameters involved. So, in summary, the discussion should at least reflect that the results are of a restricted nature rather than giving the impression that they are of the suggested level of generality.

      During the exploratory phase of the model development, various parameters and values were assessed. However, the manuscript only details the ranges of values and parameters where changes in the behaviors of interest were observed, enhancing clarity and conciseness. For instance, variation in yh (the cost of help on dominance when performing “work tasks”) led to behavioral changes similar to those caused by changes in xh (the cost of help in survival when performing “defensive tasks”), as both are proportional to each other. Specifically, since an increase in defense costs raises the proportion of work relative to defense tasks, while an increase in the costs of work task has the opposite effect, only results for the variation of xh were included in the manuscript to avoid redundancy. Added to Table 1: “To maintain conciseness, further exploration of the parameter landscape was not included in the manuscript”.

      (42) L 270: "in eusocial insects often characterized by high relatedness and reproductive inhibition, sterile workers acquire fitness benefits only indirectly". This is misleading. Sterile workers of any taxa, be it insects or vertebrates, can only acquire fitness benefits indirectly as they are sterile, but eusocial insects involve not only sterile workers.

      Rephased to “In contrast, in eusocial species characterized by high relatedness and permanent worker sterility, such as most eusocial insects, workers acquire fitness benefits only indirectly”. In any case, permanent sterility only occurs in eusocial invertebrates; in vertebrates with reproductive inhibition sterility is only temporal and context dependent. Therefore, in vertebrates, sterile workers may potentially obtain direct fitness benefits if the social context changes, as is the case in naked mole-rats.  

      (43) L 273: "Group members in eusocial species are therefore predicted to maximize colony fitness due to the associated lower within-group conflict". Again, this is incorrect. Primitively eusocial insects have high conflict.

      We added “Group members in such eusocial species” to clarify that we are not referring here to primitively eusocial species but those with permanent sterile workers.  

      (44) L 277: "when the benefits of cooperation are evenly distributed among group members". In this model, the benefits of cooperation are not evenly distributed among group members: breeders reproduce, but subordinates don't.

      Subordinates may reproduce if they become breeders later in life. However, subordinates also benefit from cooperation as subordinates directly (greater survival in larger groups), and indirectly if they are related to the breeder. Here we refer to the first one, and we expand on that in the following sentence.  

      (45) L 280: "survival fitness benefits derived from living in larger groups seem to be key for the evolution of cooperative behavior in vertebrates [22, 63], and may also translate into low within-group conflict. This suggests that selection for division of labor in vertebrates is stronger in smaller groups". I don't see how the previous sentence suggests this. The paper does not present results to support this statement (i.e., no selection gradients in smaller vs larger groups are shown).

      The benefits of living in a larger group entail diminishing returns, so those living in smaller groups benefit greater by an increase in productivity and group size than those in a larger group.  

      (46) L 284: "Our model demonstrates that vertebrates evolve a more stable division of labor". Where is that shown? How is "more stable" measured?

      Rephrased to “vertebrates are more likely to evolve division of labor”. This is shown in Figure 2, that exemplifies that division of labor evolves in a wider range of environmental condition and to a higher degree (intermediate values of T).  

      (47) L 287: "direct fitness benefits in the form of group augmentation select more strongly for defensive tasks". Where is that shown? Establishing this would entail comparing selection gradients with direct fitness benefits of group augmentation and without them.

      In Figure 2, when we compare the GA column to KS+GA column, we see that at equilibrium, more helpers choose defense tasks, specially when they are free to choose their preferred task (circles).  

      (48) L 288: "kin selection alone seems to select only for work tasks." Again, this may be an artifact of the model assuming that helpers cannot increase non-breeders' fitness components except via group augmentation, and that defense tasks are inherently more costly than work tasks.

      As stated previously, we are studying task specialization in cooperative breeders where help is in the form of alloparental care (from allofeeding and egg care to defense from predators). We also assume that the costs are different, but whether one or the other is more costly depends on the relative context (e.g., a task can be more costly if it affects competitiveness in a very competitive environment). It is important to note that we name these tasks “work” and “defense” for practical reasons, but the focus of the paper is on tasks with different fitness costs that for their characteristics may not fit so well in under this terminology. While we acknowledge that most tasks have both kinds of fitness costs to a degree, here we focus on the main fitness costs of each kind of task (L430-436).  

      (49) L 290: "are comparatively large". This sounds as if the tasks are large, which is presumably not what is meant.

      Rephrased to “costs to dominance value and to the probability of attaining a breeding position are comparatively larger than survival costs.”

      (50) L 298: "helpers are predicted to increase defensive tasks with age or rank, whereas in harsh environments, work tasks are predicted to increase with age or rank." Add parentheses referring to where this is shown.

      This is shown in Figure 3, but since this is described in the discussion, we did not add a reference to the figure. If the editor would like us to refer to figures here, we can (see also comments below relating to the same issue).

      (51) L 308: "the role of age and environmental harshness on the evolution of division of labor". What is the prediction? Simply, the role of age is an assumption, not a prediction.

      Rephrased to “the role of environmental harshness on the evolution of division of labor via age-dependent task specialization”.

      (52) L 315: "individuals shifting from work tasks such as foraging for food, digging, and maintaining the burrow system, to defensive tasks such as guarding and patrolling as individuals grow older and larger". Say in parentheses where this is predicted.

      This prediction comes from Figure 3, we do not reference it here since we are in the Discussion section.  

      (53) L 320: "Under these conditions, our model predicts the highest levels of task partitioning and division of labor." Where is this predicted? Add parentheses referring to where this is shown. As it is, it is not possible to check the validity of the statement.

      This prediction comes from Figure 2 column KS+GA, we do not reference it here since we are in the Discussion section. The results with references to the figures are found under the Results section. In the discussion, we reiterate the results already described and add some examples from real data that seem to confirm our predictions.  

      (54) L 322: "In line with our model predictions, larger and older helpers of this species invest relatively more in territory maintenance, whereas younger/smaller helpers defend the breeding shelter of the dominant pair to a greater extent against experimentally exposed egg predators". These predictions are neat, but are now very difficult to understand from the figures. Maybe at the bottom of 3A, you could add a diagram work->defense for negative gamma_t and defense>work for positive gamma_t (or whatever order it is).

      Done.

      (55) L 325: "Territory maintenance has been shown to greatly affect routine metabolic rates and, hence, growth rates [80], which directly translates into a decrease in the likelihood of becoming dominant and attaining breeding status, as predicted by our model." This seems to be an assumption, not a prediction.

      That is true. We removed: “as predicted by our model”.  

      (56) L 352: "controlled". This means something else.

      Changed to “addressed”.

      (57) L 356: "summary, our study represents the first theoretical model aimed at elucidating the potential mechanisms underlying division of labor between temporal non-reproductives via task specialization in taxa beyond eusocial organisms". Again, claiming to be the first is risky and unnecessary.

      Rephrased to “our study helps to elucidate”.

      (58) L 358: "Harsh environments, where individuals can obtain direct fitness benefits from group living, favor division of labor, thereby enhancing group productivity and, consequently, group size." I'm not sure about this conclusion as harsh environments (large m in Figure 2) also involve the evolution of no division of labor (from the triangles and circles that are zero in the right bottom panel) and perhaps more so than with less harsh environments (intermediate m). Incidentally, in the bottom right panel of Figure 2, do the two separate clusters of triangles and circles mean that there is some sort of evolutionary branching?

      Yes, there are two different equilibria for the same set of conditions. Although it is true that for m=0.3 less division of labor evolves when kin selection and group augmentation act together, it is not the case when only group augmentation takes place. In addition, we qualify m=0.2 as harsh as opposed to benign in which we observe the rise of habitat saturation (m=0.1). m=0.3 is then an extreme harsh environment, in which in several instances different parameter landscape causes population collapse (see figures in the Supplemental Material).  

      (59) L 360: "Variation in the relative fitness costs of different helping tasks with age favors temporal polyethism". I don't see that this has been shown. Temporal polyethism evolves here whenever gamma_t evolves non-zero values. Figure 3A shows that non-zero gamma_t evolves with harsher environments, but I don't see what the "variation in relative fitness costs of different helping tasks" refers to.

      The evolved reaction norms of the model are towards different fitness costs depending on the task performed, since this is how we define the different types of tasks in the model.  

      (60) L 382: "undefined". Say variable. Undefined is something else.

      Undefined is more accurate, since we did not define how many subordinates there were per group, while “variable” could have been defined within a range, which was not the case in this model.  

      (61) L 390: "each genetic locus". Say earlier that each genetic trait is controlled by a single locus.

      Added.  

      (62) L 395: "complete" and "consistent" -> "certain".

      We changed one to “certain” and another to “absolute” to avoid using the same adjective twice in a sentence.  

      (63) L 396: What determines whether dispersers become subordinates or floaters? A trait? Or a fixed probability?

      We added “which is also controlled by the same genetic dispersal predisposition as for subordinates”.

      (64) L 412-413: "cycle". This should be a breeding step.

      Changed to “season” instead.

      (65) L 418: Say negatively impacts (it could also be positively impacts, which I guess is not what you mean).

      Done.

      (66) L 425: "a sample of floaters". Chosen how?

      Added “randomly drawn”.

      (67) L 426-428. But the equation in Table 1 indicates that all floaters compete for breeding spots, not a sample of floaters. This is not clear.

      The number of floaters sampled to try to breed at a given group is N<sub>f,b</sub> = 𝑓∗𝑁<sub>𝑓</sub>/𝑁<sub>𝑏</sub> (Table 1).

      Therefore, N<sub>f,b</sub> is the sample size of floaters for a given open breeding position, and f is how many groups on average a floater attempts to access in each time step.  

      (68) L 432. In the figure, the breeding cycle is called a step, but here it is called a cycle. There should be a single term used throughout. Breeding is not really a cycle here (it doesn't involve multiple steps that are repeated cyclically), so it seems more appropriate to call this breeding steps or breeding seasons.

      Taken into account previous comments, we changed the terms “generation” and “life cycle” to “breeding cycle”. We added “or seasons”.  

      (69) L 439: "generations". What are generations here, as generations are overlapping? You probably mean time steps or something else.

      Changed to “breeding cycles”.

      (70) L 439: "equilibrium was reached". Presumably, equilibrium is reached only asymptotically, so some cutoff is implemented in practice. So maybe say explicitly what cutoff was implemented.

      As mentioned, we run the model for 200’000 time steps, and if equilibrium was not reached for the phenotypic values, then we run the model for longer, with 400’000 time steps being the maximum at which all simulation reached equilibrium. In some cases, genetic values did not reach equilibrium at ranges at which there was no impact on phenotypic values, so these were disregarded to assess whether equilibrium was reached.  

      (71) L 452: "Even though individuals are likely to change the total amount of help given throughout their lives". Do you mean in real organisms or in the model? Say which. If it is in the model, it is not clear how.

      We added “in nature” to clarify that this was not the case in the model.  

      (72) L 455: "For more details on how individuals may adapt their level of help with age and social and environmental conditions, see [63]." Do you mean real individuals or in the model? Again, if it is in the model, it is unclear how this is possible and should be explained in this paper at least briefly rather than citing another one.

      We rephrased it to “How individuals in the model may adapt their level of help with age and social and environmental conditions has been described elsewhere.” We do not go into detail here because it is not within the scope of the paper, and those results have been described elsewhere.  

      (73) L 475: "helpers". Make terminology consistent throughout.

      All helpers are subordinates, but not all subordinates are helpers, as they may evolve no help. Since here we are describing those subordinates that do help, we use that terminology. We added “subordinate helpers” to clarify this further.  

      (74) L 476: "proportional". The dependence in Equation 1 is not "proportional to". Say something like "a survival probability (not rate) that decreases with the amount of help provided".

      Done.

      (75) L 482: "environmental"-> baseline, as defined first.

      Done.

      (76) L 486: "benefits". Can you briefly say in parentheses what those benefits are in real organisms? As in line 475, where you reminded the reader of survival costs due to predator defense.

      Added “such as those offered by safety in numbers or increased resource defense potential”.

      (77) L 494. "we first outline a basic model in which individuals". It is not clear what this sentence says, and the remainder of this section does not clarify it.

      We made two models for comparison, one where individuals can choose freely which task they prefer to perform, and another in which there is an increase in productivity when both kinds of tasks are performed to a similar extent at group level. In the latter model, individuals may choose an unpreferred task at certain times during their lived to increase the effect of the help provided in the breeder’s (and group’s) productivity.  

      We rephrased this section to “we first outline a basic model where individuals evolve their preferred helping task. Then we compare this to another model in which the breeder’s reproductive outcome is maximized when the group’s helping effort in each kind of tasks is performed to a roughly equal degree.”

      (78) L 496: "by performing both tasks". Sounds as if the breeder performs both tasks, not helpers.

      We changed to “when the group’s helping effort in each kind of tasks”.

      (79) L 497: "the maximum amount of cumulative help of each type (sigma Hmax) that can affect fecundity is given by Eq. 4:" This statement is imprecise. Presumably, what is meant is that this level of help maximises breeder productivity, as stated earlier in the paper. However, there is no proof that this level of help maximises breeder productivity, so this expression seems unjustified and it is unclear how it is used.

      This is a description of the model set up. As described later in the same section, the cumulative help of each time that will influence the breeder’s fecundity if maximum Hmax. Therefore, it does represent the maximum amount of cumulative help of each type that can affect the breeder’s fecundity.

      (80) L 500: "reproduced" -> "reproduce".

      Done.  

      (81) L 503. Say here what K is so that the reader knows what equation 5 is showing.

      Added “K” to the “The quantity of offspring produced (K)”.

      (82) L 503: "diminishing returns" -> "diminishing returns as help increases".

      Done.  

      (83) L 507: Why these inequalities?

      These inequalities explain the use of Hmax (response to comment 79). We rephased it to “the cumulative defense effort is larger than or the cumulative work effort is larger than ”.  

      (84) L 526: "removing the influence of relatedness from the model". It would be helpful to plot relatedness in this and the other scenario to check that it is indeed low here and high in the other.

      The actual values of relatedness are provided in the Supplemental Material Table S1. We added this reference to Figure 2.  

      (85) L 528: "It is possible that direct and indirect fitness benefits could have an additive effect on the evolution of alloparental care". This is technically incorrect. It is also unclear what the point of this sentence is.

      We have removed this sentence.  

      (86) Table 1: Say what are the allowed values for these genotypic traits (can they take negative values, be greater than one, are they continuous or discrete?): e.g., alpha \in [0,1] or alpha \in (-infinity, infinity). For phenotypic traits, it would be helpful if the third column lists the equation where the trait is defined. As the variables in the first column are scalars, they should not be bold face. Survival "rate" should be survival "probability" throughout.

      All genetic traits can take any real number (-infinity, infinity), but the phenotypic values are either constrained by the equation like for logistic formulas, or manually constrained like for dispersal propensity or help (only positive numbers allowed). We added “Each genetic trait is controlled by a single locus, and may take any real number” (L403), and added the boundaries for help and dominance value in Table 1. We decided against including the equations in the table due to space constraints. We removed the bold face as suggested. We changed all instances of “survival rate” to “survival probability”.

      (87) Figures S1, S2: I don't recall seeing references to these figures in the main text, but there should be, as well as for Tables S1-S3.

      Table S1 is now referenced in Figure 2. The other figures are now referenced in the main text when we reference the different sections in the Supplemental Materials (L190 and L198). Other Tables are referenced in their respective Figures in the SI.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank all reviewers for their thorough and thoughtful comments. We have carefully addressed each point raised, conducting new experiments and analyses to strengthen the manuscript. Below is a summary:

      · Synchronous ensembles in new experiments: New experiments demonstrated synchronous ensembles during immobility in a novel environment (Figure 3-figure supplement 2) and revealed a significant reduction in such synchrony following familiarization training (Figure 4D).

      · Ripple-associated activity: We detected a much larger number of ripple events to confirm (a) the suppression of CA1PC spiking during ripples (Figure 4Ai) and (b) that synchronous ensembles mostly occur outside ripples (Figure 3-figure supplement 3). Additionally, spiking suppression was accompanied by decreased subthreshold membrane potentials (Figure 4Bi, Ci). Ripple-associated spiking and membrane potential dynamics shifted toward higher firing rates and more depolarization after familiarization training (Figure 4).

      · Public data analysis: Analysis of publicly available data identified thetaassociated synchronous ensembles, demonstrating the generalizability of our findings across different experimental conditions (Supplementary Figure 5).

      · Neuron morphology and algorithm validation: Images of recorded neurons after experiments confirmed their intact morphology. We also provided details on validating spike detection algorithms (Methods and Supplementary Figure 1).

      · Cell soma locations: New data and analyses illustrate the distribution of cells labeled at different embryonic days along the radial axis of the pyramidal layer (Supplementary Figure 1).

      · Analyses testing the robustness of synchronous ensembles: Additional analyses examined the impact of complex bursts and thetaphase locking, confirming the robustness of synchronous ensembles detection (Supplementary Figures 3 and 4).

      · Additional analyses and figures: We conducted further analyses and created new figures to address all remaining concerns (Response to Reviewer Figures 1-6).

      We believe these revisions have significantly enhanced the paper, and we sincerely thank all reviewers for their invaluable feedback.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-ofthe-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings. Moreover, it enables the visualization of actual cell locations, allowing for the examination of spatial properties (e.g., Figure 4G).

      We thank the reviewer for recognizing the strength of our study.

      Weaknesses:

      There is a notable deviation from several observations obtained through conventional electrophysiological recordings. Particularly, as mentioned below in detail, the considerable differences in baseline firing rates and no observations of ripple-triggered firing patterns raise some concerns about potential artifacts from imaging and analsyis, such as cell toxicity, abnormal excitability, and false detection of spikes. While these findings are intriguing if the validity of these methods is properly proven, accepting the current results as new insights is challenging.

      We appreciate the reviewer’s insightful comments regarding the apparent deviation of our observation from conventional understanding, which we address in the following sections.

      Reviewer #1 (Recommendations For The Authors):

      (1) I am not particularly inclined to strongly adhere to conventional insights, but the findings obtained through this imaging method seem significantly different from those known from conventional electrophysiological recordings. For instance, there are noticeable differences in several basic firing characteristics. First, the average firing rates of 2.3-4.3 Hz (Line 97) appear higher than the distribution of firing frequencies reported in many electrophysiological recordings of pyramidal cells (e.g., Mizuseki et al., Cell Rep, 2013).

      We understand that some of our findings differ from conventional insights. However, it is important to emphasize that many of our observations align closely with prior electrophysiological recordings. For instance, individual neurons in our study exhibit expected modulation by locomotion, spatial locations, novelty, and theta oscillations, all of which are hallmarks of normal hippocampal physiology.

      Regarding the firing rates, it is important to highlight the heterogeneity of the firing rates, which range from 0.01 to 10 Hz, with a skewed distribution toward lower frequencies(1). While our values (2.3-4.3Hz) are higher than those reported by Mizuseki et al. (2013)(1) in rats, our recordings were obtained from mice and aligned with studies using mice, including firing rates of 2.1 Hz reported by McHugh et al. (1996) and 2.4-2.6 Hz by Buzsaki et al. (2003)(2,3).

      In addition, our recordings were performed in a novel environment, which is known to enhance the firing rates of the hippocampal neurons(4). Consistent with this, our new recordings in a familiar environment demonstrate significantly lower firing rates (see below).

      Results (line 279)

      “Mean firing rates were significantly reduced in the familiar group compared to the novel group (Familiar group: 1.1 to 5.2 Hz (25<sup>th</sup>-75<sup>th</sup> percentiles), median=2.3 Hz, n\=66 cells, 6 sessions, 4 mice; Novel group: 1.7 to 6.0 Hz (25<sup>th</sup>-75<sup>th</sup> percentiles), median=4.2 Hz, n\=111 cells, 6 sessions, 6 mice, p\=0.0083, Wilcoxon signed-rank test).”

      Second, while this finding suggests that spike synchrony is entirely unrelated to ripple-triggered events, it is indeed difficult to comprehend (researchers who have analyzed electrophysiological data, at the very least, should have experienced some degree of correlation between ripples and spikes).

      We thank the reviewer for raising this important point. We, too, found it surprising that population synchrony appears largely unrelated to ripples. To ensure the robustness of this observation, we conducted new experiments under conditions optimized for ripple detection to (a) confirm that the lack of positive correlation is also observed under conditions where we can detect more ripples and (b) demonstrate that our imaging methods can detect a higher correlation between ripples and spikes in a familiar environment (see details below).

      Results (line 251)

      “It was puzzling that these CA1PCs exhibited robust spiking activities outside of ripples yet generated few spikes during ripples. To further investigate neuronal activities during ripples, we established a recording condition that allowed us to capture more ripple episodes. Specifically, we immobilized mice in a tube to promote behaviors favoring ripple generation. The mice were habituated to head fixation in a tube in a room distinct from the one where imaging experiments were conducted. On the imaging day, the mice were introduced to the recording room and head-fixed under the microscope for the first time.

      CA1PCs were labeled in utero on embryonic day (E) 14.5 (n\=56 cells from 3 sessions in 3 mice) and E17.5 (n\=55 cells from 3 sessions in 3 mice) and imaged in adult brains. Both neuronal populations exhibited prominent peaks in their grand average CCGs and significantly higher synchronous event rates compared to jittered data (Figure 3-figure supplement 2A, B). Approximately 40% of the recorded neurons participated in synchronous ensembles, indicating robust synchronous activity involving a substantial proportion of the recorded cells (Figure 3-figure supplement 2C).

      In total, 1052 synchronous ensembles and 174 ripple episodes were detected across these imaging sessions. Consistent with findings from walking animals, few synchronous ensembles occurred during ripples when animals were immobilized in a tube (Figure 3-figure supplement 3A, B). Moreover, no distinguishable ripple oscillations were observed in synchronous events, and the average firing rates during ripple episodes were near zero (Figure 3-figure supplement 3C, D). At the single-cell level, 90% of neurons showed significant negative spiking modulation during ripples, with ripple modulation indexes close to -1, indicating strong suppression of spiking (Figure 4Ai). This suppression extended to subthreshold membrane potentials, as nearly all cells exhibited decreased fluorescence during ripples compared to baseline (Figure 4Bi, Ci). These results demonstrate that spiking activity and subthreshold membrane potentials are robustly suppressed during ripples.

      Contextual novelty plays a critical role in shaping hippocampal neuronal activities. To assess its influence, we trained mice to become familiar with the imaging procedure and the recording environment over five days and recorded CA1PC activities on the final day. Mean firing rates were significantly reduced in the familiar group compared to the novel group (Familiar group:

      1.1 to 5.2 Hz (25<sup>th</sup>-75<sup>th</sup> percentiles), median=2.3 Hz, n\=66 cells, 6 sessions, 4 mice; Novel group: 1.7 to 6.0 Hz (25<sup>th</sup>-75<sup>th</sup> percentiles), median=4.2 Hz, n\=111 cells, 6 sessions, 6 mice, p\=0.0083, Wilcoxon signed-rank test). Additionally, 15% of the neurons in the familiar group exhibited significantly positive spiking modulation by ripples, while fewer cells showed negative modulation compared to the novel group (Figure 4A). During ripples, neurons in the novel group predominantly displayed hyperpolarizing membrane voltage responses, whereas a subset of neurons in the familiar group exhibited prominent depolarizing responses (Figure 4B). The mean fluorescence changes in the familiar group shifted toward depolarization compared to the novel group (Figure 4C). Finally, synchronous event frequencies were significantly lower in the familiar context, indicating weaker synchronous activities under familiar conditions (Figure 4D). These results demonstrate that hippocampal neuronal activities, particularly synchronous ensembles, are strongly influenced by contextual novelty.”

      Third, the fact that more than 40% of cells frequently exhibit synchronous firing other than during ripples has not been reported before, and if it were the case, many electrophysiologists would have likely noticed it. Overall, the excitability of cells seems too high.

      We thank the reviewer for raising this point. As discussed above, the reported spike rates are within the range expected from the previous electrophysiology recordings in mice, especially given that we record cells in a novel environment. In addition, our jittering procedure ensures that the observed synchrony exceeds what could be expected from the given level of spike rates alone. These analyses support the robustness of our observations.

      As mentioned below, there are concerns about experimental artifacts and analytical issues from optical imaging.

      (2) Method: In surgery, the cortical tissue above the hippocampus was aspirated, which is a general method for in vivo calcium imaging from the hippocampus. Furthermore, they use a CAG promoter to express the sensors. To my knowledge, this promoter is excessively strong and may sometimes be toxic to cells. In addition, for imaging, they use DMSO and Pluronic F-127, which are relatively toxic materials (please describe their concentrations). These conditions might be damaging to hippocampal neurons.

      We thank the reviewer for raising these comments. As the reviewer mentioned, cortical aspiration is a general method for in vivo imaging from the hippocampus and has been employed in numerous studies, including behavioral and systems-level investigations(5-15). For example, place cells are routinely recorded in both familiar and novel environments using this method and other approaches. Additionally, synchronous population activities have been observed and studied in the hippocampus both with and without cortical aspiration(6,15-18). These findings demonstrate that the hippocampal neuronal network generates place cells and synchronous activities regardless of whether the cortical tissue above it has been aspirated.

      DMSO and Pluronic F-127 are used as solvents for dissolving the JF<sub>552</sub>HaloTag ligand, and the resulting solution is injected into the bloodstream rather than directly into brain tissue. The concentrations of these reagents in the dye solution are now described in the text (see below). Assuming a blood volume of 2 ml in adult mice, the final concentrations of DMSO and Pluronic F-127 in the bloodstream are estimated to be 1% upon injection and then decrease rapidly while they are metabolized and excreted out of the body. Moreover, the effective concentrations in the brain tissue would be even lower. These low concentrations have been demonstrated to have minimal impact on cells and tissue(19-22).

      Methods (line 616)

      “JF<sub>552</sub>-HaloTag ligand (a generous gift from Dr. Luke Lavis) was first dissolved in DMSO (20 μl, Sigma) and then diluted in Pluronic<sup>TM</sup> F-127 (20 μl, P3000MP, Invitrogen) and PBS to achieve a final concentration of 0.83 mM of JF<sub>552</sub>-HaloTag ligand. The solution was then injected intravenously through the retro-orbital sinus. Imaging sessions were initiated 3 hours after the injection of the JF<sub>552</sub>-HaloTag ligand.”

      We understand that the CAG promoter may sometimes be toxic to cells if it drives high expression. However, it is important to note that we injected highly diluted virus (20x, final titer: 2.7x10<sup>12</sup> GC/ml) to avoid excessive expression levels. This titer was determined from serial dilution experiments to ensure an optimal expression level free from toxicity (see below). The same titer was used in a previous study(23) to label CA1 interneurons, which exhibited physiological spike rates and synchrony (see Abdelfattah 2023, Neuron, Figure 8). Furthermore, Voltron expression does not significantly affect key cellular properties, including membrane resistance, membrane capacitance, resting membrane potentials, spike amplitudes, and spike width (see Abdelfattah 2019, Science, Supplementary Figures 11 and 12). In our recordings, individual neurons exhibit the expected modulation by locomotion, spatial locations, novelty, and theta oscillations. We now include images of the recorded neurons to demonstrate their intact morphology and healthy appearance following imaging experiments (Supplementary Figure 1A, B), further supporting minimal cytotoxic effects.

      Methods (line 577)

      “A serial dilution experiment was conducted to determine an optimal titer of the virus carrying Voltron2 genes, minimizing cell toxicity, for use in this and in previous imaging experiments. A fine injection pipette (tip diameter 10-60 um) was used to inject AAV2/1-CAG-flex-Voltron2-ST (2.7x10<sup>12</sup> GC/ml, a generous gift from Dr. Eric Schreiter and the GENIE team at HHMI Janelia Research Campus) into the exposed regions at a depth of 200 μm (up to six injection sites and 100-200 nL of viral suspension).”

      (3) Another concern is the relatively low number of cells simultaneously recorded during imaging compared to typical hippocampal imaging such as Inscopix which often records several hundred cells. In this study, however, this number is 20 or fewer. This is likely because the visualized cells at baseline were limited to this extent. It is possible that these cells represent particularly too strong sensor expression, which may facilitate visualization and high signal-to-noise ratio in voltage imaging. Consequently, there is a possibility of abnormal activity occurring in these cells.

      The Inscopix studies use calcium imaging, which has a temporal resolution that is too slow to resolve fast synchrony central to our study. To enable highspeed voltage imaging at 2000 frames per second, we employed strategies to achieve sparse labeling and carefully limited the number of labeled cells to minimize out-of-focus contamination. In our analysis, we applied a criterion to include only cells separated by 70 μm or longer, reducing the potential for channel cross-talk among nearby neurons. These criteria limited the number of simultaneously imaged cells in our experiments. To address this issue, we have now included new data from 12 additional animals with 177 neurons to support our findings.

      Furthermore, despite the limited number of simultaneously imaged cells, population synchrony beyond what could be expected by chance can be detected using rigorous statistical procedures. As discussed earlier, neuronal activities were within the expected range; they were modulated by animals’ locomotion (Figure 2 and Supplementary Figure 2), exhibited place tuning, and were significantly reduced when the recording context became familiar, supporting the normal physiology of the recorded cells.

      (4) Analysis: There are some criteria for detecting spikes (described in the Methods), but there are concerns about whether these criteria truly extract only spike activity. When examining the traces in Figure 1 and Figure 2, there appear to be some activities that show fluorescence increases up to the level of putative spikes. How can we determine that these are indeed subthreshold changes? Conversely, some activities detected as spikes may also be subthreshold synaptic potential (this possibility concerns me). There is a need for more precise validation of spike detection analysis to ensure its accuracy.

      Regarding spike detection, we used validated algorithms(23-25) to ensure robust and reliable spike identification. Spiking activity was first separated from slower subthreshold potentials using high-pass filtering. This approach prevents slow fluorescence increases from being misinterpreted as spikes, even if their amplitude is large. We benchmarked this detection algorithm in our recent publication (Huang et al., 2024)(24), demonstrating its high sensitivity and specificity in spike detection (see the figure below). While we acknowledge that a small number of spikes, particularly those occurring later in a burst, might be missed due to their smaller amplitudes (as illustrated in Figures 1 and 2 of the manuscript), we anticipate that any missed spikes would lead to a decrease, rather than an increase, in synchrony between neurons. Overall, we are confident that spike detection is performed in a rigorous and reliable manner.

      Method (line 670)

      “Previous studies have described and validated the procedure for imaging preprocessing and spike detection. In short, the fluorescence intensities of individual neurons were calculated by averaging the fluorescence intensities of pixels from the same ROIs. Bleaching was corrected by calculating the baseline fluorescence (F<sub>0</sub>) at each time point as an average of the fluorescence intensities within ±0.5 seconds around the time point. The dF/F was calculated as the F<sub>0</sub> minus the fluorescence intensity of the same time point divided by F<sub>0</sub>. Positive fluorescence transients were detected to identify spikes from the high-passed dF/F traces created by subtracting the dF/F traces from the median-filtered version with a 5-ms window. To simulate the noise of recordings, high-passed dF/F traces were inverted, and the amplitudes of the transients detected from the inverted traces were used to construct a noise distribution of the spike amplitudes. A threshold was set by comparing the amplitudes of the detected transients with the noise distribution of the spike amplitudes to minimize the sum of type I and type II errors. Spikes were first detected when transients were larger than the threshold. Then, spike amplitudes smaller than half of the top 5% spike amplitudes were excluded. The signal-to-noise ratio (SNR) was calculated for each neuron as a ratio of the averaged spike amplitudes over the standard deviation of the high-passed dF/F traces, excluding points 2 ms before and 4 ms after each detected spike to estimate the quality of the recordings.”

      (5) If the authors aim to establish this new physiological phenomenon, it is necessary to compare it with electrophysiological data or verify if similar phenomena can be detected from electrophysiological data. Recently, various datasets have been made publicly available (e.g. CRCNS and Mendeley data), and these should be easily verifiable without the need for conducting experiments.

      We thank the reviewer for the suggestion. To address this, we analyzed a publicly available dataset (hc-11 on CRCNS), which contains hippocampal recordings from rats navigating novel mazes for water rewards. Using our algorithm, we detected significant population synchrony in the dataset (Supplementary Figure 5A). The synchronous event rates were 6.4-fold higher than those in jittered controls, demonstrating the reliability of our findings.

      Additionally, these synchronous events mostly occurred in the absence of ripples and were coupled to theta oscillations (Supplementary Figure 5B-D). These results not only validate our findings using independent datasets but also highlight the generalizability of synchronous ensembles as a distinct network phenomenon relevant to hippocampal function.

      Results (line 366)

      “To further investigate synchronous ensembles across different datasets, we analyzed publicly available hippocampal recordings ‘hc-11’ from the CRCNS repository, where rats navigated novel mazes for water rewards (see Method). Using our algorithm, we identified a significant number of synchronous ensembles during the first three minutes of novel navigation. On average, the rates of synchronous events were 6.4-fold higher than those detected in jittered controls (mean event rate: 2.0 ± 0.3 Hz for the original data vs. 0.32 ± 0.03 Hz for jittered data, n \= 8 sessions, p \= 0.0078, W \= 36, Wilcoxon signedrank test; Supplementary Figure 5A). To assess whether ripple oscillations were associated with these synchronous ensembles, we analyzed ripple event rates and their relationship to population synchrony. During this period, ripple events were infrequent (mean ripple rate: 0.02 ± 0.01, n \= 8 sessions), and ripple power peaked during ripple episodes but remained low at the timings of population synchrony (Supplementary Figure 5B). Nevertheless, LFP traces aligned to population synchrony revealed prominent theta oscillations (Supplementary Figure 5C). Synchronous ensembles were modulated by LFP theta oscillation (modulation strength: 0.30 ± 0.04, n \= 8 sessions, p < 0.001), and the timings of individual ensembles were consistently locked to the preferred phase of each session, suggesting a functional coupling of synchronous ensembles to theta oscillations important for information processing (Supplementary Figure 5D).”

      (6) Please describe exact statistical information (e.g. statistical values, degree of freedom, and test types) throughout the manuscript.

      Statistical values, degree of freedom and test types have been included in the manuscript. Please see below an example in the manuscript:

      Result (line 96)

      “Consistent with previous studies, neurons labeled on E14.5 located more on the deep side of the pyramidal layer than those labeled on E17.5 (t<sub>(601)</sub>=22.8, p<0.0001, Student’s t-test; Supplementary Figure 1C, D).”

      Minor comment - Figure 2A legend: what is "gray rectangles"?

      We apologize for the inconsistency in nomenclature in the figure legends. We have now corrected this issue and consistently use the term “gray vertical bars” to indicate the timings and durations of synchronous events throughout the article.

      Reviewer #2 (Public Review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phasedlocked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      We thank the reviewer for a thorough and thoughtful review of our paper.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single-cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population-level activity in CA1.

      We thank the reviewer for pointing out the technical strength and the novelty of our study.

      Weaknesses:

      The evidence provided is weak, with the authors making surprising population-level claims based on a very sparse data set (5 data sets, each with less than 20 neurons simultaneously recorded) acquired with exciting, but less tested technology. Further, while the authors link these observations to the novelty of the context, both in the title and text, they do not include data from subsequent visits to support this. Detailed comments are below:

      We understand the reviewer’s concerns regarding the dataset size. In the revised manuscript, we have included additional data to further strengthen our conclusions and provide a more robust dataset. Specifically, we expanded our analysis by increasing the number of sessions and neurons recorded, ensuring that the findings are more representative and less likely to be influenced by sample sizes.

      Moreover, synchronous ensembles exceeding what could be expected by chance were detected in all examined data, validating our claims regarding population synchrony. We have also carefully considered the potential impact of the technology used in our experiments and included additional validation and comparison with results from other studies employing complementary techniques to support the reliability of our conclusions.

      Regarding the link to novelty, we have included data from subsequent visits, as suggested by the reviewer. These new data demonstrate that the observed changes in synchronous ensembles are context-dependent and significantly influenced by novelty. This confirms the novelty-related effects observed during initial visits and further supports the conclusions drawn in the manuscript. Please see below for our detailed replies to each of the reviewer’s points.

      (1) My first question for the authors, which is not addressed in the discussion, is why these events have not been observed in the countless extracellular recording experiments conducted in rodent CA1 during the exploration of novel environments. Those data sets often have 10x the neurons simultaneously recording compared to these present data, thus the highly synchronous firing should be very hard to miss. Ideally, the authors could confirm their claims via the analysis of publicly available electrophysiology data sets. Further, the claim of high extra-SWR synchrony is complicated by the observation that their recorded neurons fail to spike during the limited number of SWRs recorded during behavior- again, not agreeing with much of the previous electrophysiological recordings.

      We thank the reviewer for raising these important questions. To address the first question, it is possible that synchronous ensembles were not previously detected in extracellular recordings due to differences in detection methods or analysis approaches. To investigate this further, we analyzed a publicly available dataset (hc-11 on CRCNs), which contains hippocampal recordings from rats navigating novel mazes for water rewards. Using our algorithm, we detected robust synchronous ensembles in the dataset (Supplementary Figure 5). The rates of synchronous events were significantly higher than those in jittered controls, demonstrating the reliability and generalizability of these synchronous ensembles.

      Results (line 366)

      “To further investigate synchronous ensembles across different datasets, we analyzed publicly available hippocampal recordings ‘hc-11’ from the CRCNS repository, where rats navigated novel mazes for water rewards (see Method). Using our algorithm, we identified a significant number of synchronous ensembles during the first three minutes of novel navigation. On average, the rates of synchronous events were 6.4-fold higher than those detected in jittered controls (mean event rate: 2.0 ± 0.3 Hz for the original data vs. 0.32 ± 0.03 Hz for jittered data, n \= 8 sessions, p \= 0.0078, W \= 36, Wilcoxon signedrank test; Supplementary Figure 5A). To assess whether ripple oscillations were associated with these synchronous ensembles, we analyzed ripple event rates and their relationship to population synchrony. During this period, ripple events were infrequent (mean ripple rate: 0.02 ± 0.01, n \= 8 sessions), and ripple power peaked during ripple episodes but remained low at the timings of population synchrony (Supplementary Figure 5B). Nevertheless, LFP traces aligned to population synchrony revealed prominent theta oscillations (Supplementary Figure 5C). Synchronous ensembles were modulated by LFP theta oscillation (modulation strength: 0.30 ± 0.04, n \= 8 sessions, p < 0.001), and the timings of individual ensembles were consistently locked to the preferred phase of each session, suggesting a functional coupling of synchronous ensembles to theta oscillations important for information processing (Supplementary Figure 5D).”

      To address the second question, we conducted new experiments under conditions optimized for ripple generation. Specifically, we recorded neurons in mice head-fixed in a novel environment, resulting in 174 ripple episodes across six sessions. Consistent with our original findings, spiking rates were significantly suppressed and membrane potentials were hyperpolarized during ripples (Figure 4Ai-Ci of the manuscript). Despite this suppression, the same neurons exhibit rich synchronous activities outside of ripples (Figure 3-figure supplement 3 of the manuscript). These results confirm that these synchronous ensembles are distinct from ripple-related neuronal activity and strengthen our claim that the observed synchronous ensembles represent a distinct physiological phenomenon, consistent across different datasets and experimental conditions.

      Results (line 251)

      “It was puzzling that these CA1PCs exhibited robust spiking activities outside of ripples yet generated few spikes during ripples. To further investigate neuronal activities during ripples, we established a recording condition that allowed us to capture more ripple episodes. Specifically, we immobilized mice in a tube to promote behaviors favoring ripple generation. The mice were habituated to head fixation in a tube in a room distinct from the one where imaging experiments were conducted. On the imaging day, the mice were introduced to the recording room and head-fixed under the microscope for the first time.

      CA1PCs were labeled in utero on embryonic day (E) 14.5 (n\=56 cells from 3 sessions in 3 mice) and E17.5 (n\=55 cells from 3 sessions in 3 mice) and imaged in adult brains. Both neuronal populations exhibited prominent peaks in their grand average CCGs and significantly higher synchronous event rates compared to jittered data (Figure 3-figure supplement 2A, B). Approximately 40% of the recorded neurons participated in synchronous ensembles, indicating robust synchronous activity involving a substantial proportion of the recorded cells (Figure 3-figure supplement 2C).

      In total, 1052 synchronous ensembles and 174 ripple episodes were detected across these imaging sessions. Consistent with findings from walking animals, few synchronous ensembles occurred during ripples when animals were immobilized in a tube (Figure 3-figure supplement 3A, B). Moreover, no distinguishable ripple oscillations were observed in synchronous events, and the average firing rates during ripple episodes were near zero (Figure 3-figure supplement 3C, D). At the single-cell level, 90% of neurons showed significant negative spiking modulation during ripples, with ripple modulation indexes close to -1, indicating strong suppression of spiking (Figure 4Ai). This suppression extended to subthreshold membrane potentials, as nearly all cells exhibited decreased fluorescence during ripples compared to baseline (Figure 4Bi, Ci). These results demonstrate that spiking activity and subthreshold membrane potentials are robustly suppressed during ripples.”

      (2) The authors posit that these events are linked to the novelty of the context, both in the text, as well as in the title and abstract. However, they do not include any imaging data from subsequent days to demonstrate the failure to see this synchrony in a familiar environment. If these data are available it would strengthen the proposed link to novelty if they were included.

      Following the reviewer’s suggestion, we record neuronal activities in a familiar context to test the proposed link between synchronous activity and contextual novelty. We found that synchronous activity levels were significantly lower in the familiar context compared to the novel context, demonstrating that synchronous activity is strongly modulated by contextual novelty (Figure 4D of the manuscript). These findings provide further support for a link of the synchronous ensembles to novel environmental contexts.

      Result (line 277)

      “Contextual novelty plays a critical role in shaping hippocampal neuronal activities. To assess its influence, we trained mice to become familiar with the imaging procedure and the recording environment over five days and recorded CA1PC activities on the final day. Mean firing rates were significantly reduced in the familiar group compared to the novel group (Familiar group:

      1.1 to 5.2 Hz (25<sup>th</sup>-75<sup>th</sup> percentiles), median=2.3 Hz, n\=66 cells, 6 sessions, 4 mice; Novel group: 1.7 to 6.0 Hz (25<sup>th</sup>-75<sup>th</sup> percentiles), median=4.2 Hz, n\=111 cells, 6 sessions, 6 mice, p\=0.0083, Wilcoxon signed-rank test). Additionally, 15% of the neurons in the familiar group exhibited significantly positive spiking modulation by ripples, while fewer cells showed negative modulation compared to the novel group (Figure 4A). During ripples, neurons in the novel group predominantly displayed hyperpolarizing membrane voltage responses, whereas a subset of neurons in the familiar group exhibited prominent depolarizing responses (Figure 4B). The mean fluorescence changes in the familiar group shifted toward depolarization compared to the novel group (Figure 4C). Finally, synchronous event frequencies were significantly lower in the familiar context, indicating weaker synchronous activities under familiar conditions (Figure 4D). These results demonstrate that hippocampal neuronal activities, particularly synchronous ensembles, are strongly influenced by contextual novelty.”

      (3) In the discussion the authors begin by speculating the theta present during these synchronous events may be slower type II or attentional theta. This can be supported by demonstrating a frequency shift in the theta recording during these events/immobility versus the theta recording during movement.

      We thank the reviewer for the suggestion. As the reviewer points out, we did observe a frequency shift in synchrony-associated theta during immobility compared to locomotion (see Figure 5B, red vs. blue curves). We have now highlighted this result in the discussion section. Please refer to the text below.

      Discussion (line 471)

      “On the other hand, type 2 theta, or attentional theta, is slightly slower and is blocked by muscarinic receptor antagonists, emerging during states of arousal or attention, such as when entering a new environment. Consistent with these distinctions, the peak of the power spectrum density shows a distinctively slower theta frequency during immobility compared to locomotion (Figure 5B).”

      (4) The authors mention in the discussion that they image deep-layer PCs in CA1, however, this is not mentioned in the text or methods. They should include data, such as imaging of a slice of a brain post-recording with immunohistochemistry for a layer-specific gene to support this.

      We thank the reviewer for the constructive suggestion. In response, we have added images of slices from both E14.5 and E17.5 brains and analyzed soma locations along the radial axis of the pyramidal layer. The results are included in the main text, Methods, and Supplementary Figure 1 of the manuscript (see below).

      Result (line 96)

      “Consistent with previous studies, neurons labeled on E14.5 located more on the deep side of the pyramidal layer than those labeled on E17.5 (t<sub>(601)</sub>=22.8, p<0.0001, Student’s t-test; Supplementary Figure 1C, D).”

      Methods (line 563)

      “The injection resulted in Cre expression among neurons born on the day of injection, with earlier injection labeling neurons located on the deeper side of the cell layer.”

      Reviewer #3 (Public Review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected in the other side of the brain, and the investigation is flawed due to multiple problems with the point process analyses. The synchrony terminology refers to dozens of milliseconds as opposed to the millisecond timescale referred to in prior work, and the interpretations do not take into account theta phase locking as a simple alternative explanation.

      We appreciate the reviewer’s feedback and acknowledge the concerns raised. However, we believe these concerns can be effectively addressed without compromising the validity of our conclusions. With this in mind, we respectfully disagree with the assessment that our experiments and investigation are flawed. Please allow us to address these concerns and offer additional context to support the validity of our study.

      Weaknesses:

      The two main messages of the manuscript indicated in the title are not supported by the data. The title gives two messages that relate to CA1 pyramidal neurons in behaving head-fixed mice: (1) synchronous ensembles are associated with theta (2) synchronous ensembles are not associated with ripples.

      There are two main methodological problems with the work: (1) experimentally, the theta and ripple signals were recorded using electrophysiology from the opposite hemisphere to the one in which the spiking was monitored. However, both signals exhibit profound differences as a function of location: theta phase changes with the precise location along the proximo-distal and dorso-ventral axes, and importantly, even reverses with depth. And ripples are often a local phenomenon - independent ripples occur within a fraction of a millimeter within the same hemisphere, let alone different hemispheres. Ripples are very sensitive to the precise depth - 100 micrometers up or down, and only a positive deflection/sharp wave is evident.

      We acknowledge the reviewer’s consideration regarding the collection of LFP from the contralateral hemisphere. While we acknowledge the limitation of this design, we believe these contralateral LFP recordings still provide valuable insights into the dynamics of synchronous ensembles. Despite potential variations in theta phases due to differences in recording locations and depths, the occurrence and amplitudes of theta oscillations are generally wellcoordinated across hemispheres (Buzsaki et al., 2003, Fig 5)(3). The presence of prominent contralateral LFP theta activity around the times of synchronous ensembles in our study (Figure 5A of the manuscript) strongly supports our conclusion about their association with theta oscillations, even with LFP collected from the opposite hemisphere.

      Additionally, we explicitly noted in the manuscript that the “preferred phases” varied between sessions, likely reflecting variability in recording locations (see below). Thus, we believe the concern about theta phase variability has already been adequately addressed in the current manuscript.

      Result (line 321)

      “Although the preferred phases varied from session to session due to differences in recording sites along the proximal-distal axis of the hippocampus, the timings of individual ensembles were consistently locked to the preferred phase of each session (Figure 5C).”

      While we acknowledge that ripple oscillations can sometimes occur locally, the majority of ripples occur synchronously in both hemispheres (up to 70%)(3,26), as demonstrated both in the literature (Szabo et al., 2022, Supplementary Figure 2) and by data from our lab (Huang et al., 2024, Figure S6). As a result, using contralateral LFP to infer ripple occurrence on the ipsilateral side is a well-established practice in the field, commonly employed by many studies published in reputable journals(26-29). Given the high co-occurrence of both theta and ripple oscillations across hemispheres, we maintain that the two main messages of our manuscript are supported by data, despite the concern regarding phase discrepancy mentioned by the reviewer.

      (2) The analysis of the point process data (spike trains) is entirely flawed. There are many technical issues: complex spikes ("bursts") are not accounted for; differences in spike counts between the various conditions ("locomotion" and "immobility") are not accounted for; the pooling of multiple CCGs assumes independence, whereas even conditional independence cannot be assumed; etc.

      We acknowledge the reviewer’s concern regarding spike train analysis. Complex bursts or differences in behavioral conditions can indeed lead to variations in spike counts, which could potentially affect the detection of synchronous ensembles. However, our jittering procedure is specifically designed to account for variations in spike counts. Notably, while the jittered spike trains retain the same spike count variations, we observed 7.8 times more synchronous events in our data compared to the jitter controls (Figure 1G of the manuscript). This indicates that the specific spike timings in the original data - disrupted in the jitter data – are responsible for the observed synchrony.

      To further address the concern that complex bursts might influence the observed synchrony, we performed additional analyses in which we excluded all later spikes in bursts, considering only single spikes and the first spikes of bursts. Importantly, this procedure did not affect the rate or size of synchronous ensembles and did not significantly alter the grand-average CCG (Supplementary Figure 3). These results explicitly demonstrate that complex bursts do not significantly impact the analysis of synchronous ensembles.

      Result (line 131)

      The observed population synchrony was not attributable to spikes in complex bursts, as synchronous event rates did not differ significantly with or without the inclusion of later spikes in bursts (Supplementary Figure 3).

      Beyond those methodological issues, there are two main interpretational problems: (1) the "synchronous ensembles" may be completely consistent with phase locking to the intracellular theta (as even shown by the authors themselves in some of the supplementary figures).

      We agree with the reviewer that the synchronous ensembles are indeed consistent with theta phase locking. However, it is important to note that theta phase locking alone does not necessarily imply population synchrony. In fact, previous research has demonstrated that theta phase locking can “reduce” population synchrony(30). Thus, the presence of theta phase locking cannot be considered a simple alternative explanation for the synchronous ensembles.

      The idea that theta phase locking does not necessarily lead to population synchrony is illustrated in Author response image 1A. In this example, while all three neurons are perfectly locked to specific theta phases, no synchrony among neurons is evident. In contrast, our data align with the scenario depicted in Figure 4B, where spikes occur not only at specific theta phases but also in the same cycles, thereby facilitating population synchrony.

      Author response image 1.

      Illustrative diagram of the relationship between theta phase coupling and population synchrony. Illustration of theta phase coupling with low population synchrony. Illustration of population synchrony with theta phase coupling.

      To directly assess the contribution of theta phase locking to synchronous ensembles, we performed a new analysis in which the specific theta cycles during which neurons spike were randomized while keeping the spike phases unchanged. This manipulation disrupts spike co-occurrence while preserving theta phase locking, allowing us to test whether theta phase locking alone can explain the population synchrony. We found that theta-cycle randomization significantly reduced the rate of synchronous events by 4.5 folds (Supplementary Figure 4). This new analysis demonstrates that theta phase locking alone cannot account for the population synchrony observed in our data.

      Result (line 358)

      “Correlated intracellular theta and theta-phase locking of the synchronous ensembles raise the question of whether population synchrony among CA1PCs extends beyond synchrony derived from these effects. To address this, we analyzed population synchrony after randomizing the theta cycles during which neurons spiked, while keeping their theta phases unchanged. Supplementary Figure 4 illustrates a significant reduction in synchronous event rates following theta cycle randomization. The finding indicates spiking at specific theta cycles plays a major role in driving population synchrony.”

      (2) The definition of "synchrony" in the present work is very loose and refers to timescales of 20-30 ms. In previous literature that relates to synchrony of point processes, the timescales discussed are 1-2 ms, and longer timescales are referred to as the "baseline" which is actually removed (using smoothing, jittering, etc.).

      Regarding the timescale of synchronous ensembles, we acknowledge that it varies considerably across studies and cell types. However, it is important to note that a timescale of dozens or even hundreds of milliseconds is commonly used in the context of synchrony terminology for CA1 pyramidal neurons(6,31-33). In fact, a timescale of 20-30 ms is considered particularly important for information transmission and storage in CA1, as it aligns with the membrane time constant of pyramidal neurons, the period of hippocampal gamma oscillations, and the time window for synaptic plasticity. Therefore, we believe this timescale is highly relevant and consistent with established practices in the field.

      Reviewer #3 (Recommendations For The Authors):

      (1) L19-20: "these synchronous ensembles were not associated with ripple oscillations" - this is a main fallacy in the present work (ripples are from the other side; there are not enough ripples to obtain sufficient statistical power to even test the hypothesis; etc.). The sentence should be removed.

      As we have addressed in the public review, most ripples occur synchronously in both hemispheres(3,26). Many studies have used contralateral LFP to infer ripple occurrence on the ipsilateral side(26-29). Moreover, our new data now support the dissociation between synchronous ensembles and ripples with a much larger number of ripples and rigorous statistical testing (Figure 3-figure supplement 3 of the manuscript). These findings support our conclusion that synchronous ensembles are not associated with ripple oscillations.

      Result (line 266)

      “In total, 1052 synchronous ensembles and 174 ripple episodes were detected across these imaging sessions. Consistent with findings from walking animals, few synchronous ensembles occurred during ripples when animals were immobilized in a tube (Figure 3-figure supplement 3A, B). Moreover, no distinguishable ripple oscillations were observed in synchronous events, and the average firing rates during ripple episodes were near zero (Figure 3-figure supplement 3C, D). At the single-cell level, 90% of neurons showed significant negative spiking modulation during ripples, with ripple modulation indexes close to -1, indicating strong suppression of spiking (Figure 4Ai). This suppression extended to subthreshold membrane potentials, as nearly all cells exhibited decreased fluorescence during ripples compared to baseline (Figure 4Bi, Ci). These results demonstrate that spiking activity and subthreshold membrane potentials are robustly suppressed during ripples.”

      (2) L135/Figure 1: panel C and elsewhere: show the same traces after removing (clipping) the spikes. You may be able to see the intracellular theta nicely, which may be very strongly synchronized between neurons and could then be supplemented by ticks (as in conventional raster plots). This will allow a clearer visualization of the spiking and their relations with Vm.

      We have created the plot as suggested (Author response image 2). As demonstrated in our figures (Figure 5 in the manuscript), the subthreshold membrane potentials of individual neurons are strongly correlated and coherent at theta frequency, consistent with the reviewer’s viewpoint.

      Author response image 2.

      Fluorescence traces of 19 simultaneously recorded cells with truncated spikes replaced by dots. Horizontal scale bar: 25 ms; vertical scale bar: -3%.

      (3) Related to the above comment, in general, a much more robust approach with the present dataset may be to derive an estimate of the LFP from the intracellular records. Extracellular theta is related to intracellular theta (approximately the negative), and extracellular ripples co-occur with intracellular high-frequency oscillations. However, because the precise transfer function (TF) between the two is not well established, ground truth data should first be collected. This may be done by voltage imaging of even a single neuron in parallel with an extracellular glass pipette placed in near proximity of the same cell, at the same depth. Such datasets have been collected in the past, so it may be sufficient to contact those authors and derive the TF from existing data. Alternatively, new experiments may be required. It is possible that the TF will not be well defined - in which case there are two options: (1) limit the analyses to the relation between spikes in Vm, or (2) record new datasets with true LOCAL field potentials in every case.

      We thank the reviewer for the insightful suggestion. Establishing a precise TF between intracellular and extracellular recordings is indeed crucial when exact phase information is required to draw conclusions. However, our goal is to understand the occurrence of specific network oscillation states surrounding these synchronous ensembles, rather than pinpointing the precise phase at which they occur. Therefore, we believe that the strong bilateral cooccurrence of both theta and ripple oscillations provides a practical and valid foundation for supporting our objective.

      While the approach suggested by the reviewer is an excellent idea, conducting simultaneous voltage imaging and local LFP recording is currently not feasible due to technical constraints associated with the implanted glass windows. Nevertheless, we recognize the potential value of this approach and plan to incorporate it into future experimental designs, which could provide further insights into the specific oscillatory phases associated with population synchrony.

      (4) L135/Figure 1: panel D and elsewhere: Account for second-order spike train statistics (e.g., bursts). The simplest way to do this is to remove all spikes that are not the first spike in a burst. Otherwise, the zero-lag bin of a pair-wise CCG will be filled with counts that are due e.g., to the first spike of the second neuron co-occurring with the last spike in a burst of the first neuron. In other words, without accounting for bursts, sequential activity may be interpreted as synchrony.

      We thank the reviewer for this insightful comment. As recommended, we have performed the suggested analysis by removing all spikes that are not the first spike in a burst (Supplementary Figure 3). The results demonstrate that, even after removing the subsequent spikes in bursts, the rates of synchronous events remain unchanged compared to the original data, and the sizes of the synchronous ensembles are also unaffected. These findings indicate that our conclusions are robust and not confounded by the presence of later spikes within bursts.

      Result (line131)

      “The observed population synchrony was not attributable to spikes in complex bursts, as synchronous event rates did not differ significantly with or without the inclusion of later spikes in bursts (Supplementary Figure 3).”

      (5) L135/Figure 1: panel D and elsewhere: Related to the previous comment: the "grand average" CCG of a single neuron with all the other simultaneouslyrecorded neurons is prone to a peak at zero lag ("synchrony") even if all pairs of neurons have pure mono-synaptic connections (e.g., at a 2 ms time lag). This is because neuron1 (N1) may precede N2, whereas then N3 may precede N2. In such a case, the pooled CCG will have two peaks - at e.g., 2 ms and -2 ms. However, if bursts occur (as is the case in CA1 and Figure 1C), there will also be non-zero counts around zero lag, which will accumulate as well. Together, these will build up to a peak around zero - even without any theta phase locking or any other alternative correlations.

      Please see our reply to comment #6 below.

      (6) L135/Figure 1: panel D and elsewhere: refrain from averaging "grand averages" over neurons. This problem is distinct from the above (where e.g., N2-N1 is averaged with N2-N3). In any case, all visualizations and measures should be derived from individual (pair-wise) CCGs, and not "grand averages"

      We thank the reviewer for the detailed comments and appreciate the opportunity to clarify our methods and analyses related to population synchrony. In response to the suggestion to replace grand average CCGs with pairwise CCGs, we have now included a heatmap to visualize individual pairwise CCGs for all recorded neuronal pairs that meet our inclusion criteria (497 pairs, Author response image 3). The heatmap provides a comprehensive view of the temporal relationships between neuron pairs.

      Author response image 3.

      Color-coded plot of pairwise CCGs for all cell pairs that meet our inclusion criteria.

      While we have chosen to keep the grand-average CCGs, we emphasize that they are served only to summarize the overall temporal scale of the population synchrony. Importantly, our conclusions regarding synchronous ensembles are not based on grand-average CCGs. Instead, we assess population synchrony using a rigorous approach: we compute spike counts across the population in 25-ms sliding windows and compare these counts to those derived from jittered data, where spike timings are randomly shifted by ±75 ms while preserving the overall spike count distribution. Synchrony is identified when the original spike counts exceed those from the jittered data by more than 4 standard deviations. This approach accounts for the potential accumulation of zero-lag counts arising from mixed mono-synaptic connections or bursting, as noted by the reviewer. By perturbing spike timings and preserving spike count distributions, our method identifies synchrony beyond what is expected by chance, ensuring robust and artifact-free conclusions.

      (7) L135/Figure 1: panel D and elsewhere: after deriving measures (peak lag, FWHM, synchrony strength, etc.) from individual pairwise CCGs, show the measures as a function of the spike counts. For a pair of neurons N1-N2, derive the geometric mean spike count (or the mean, or the max). For instance, if there are 500 pairs of neurons, show e.g., pairwise synchrony strength as a function of the spike count geometric mean. While little correlation is expected when the timescale is small (1-2 ms), the "synchrony" effect at a timescale of 20-30 ms is expected to be very strongly related to the spike counts. Because the spike counts may differ between the lower and higher speed "states", many results reported in the present manuscript may be an epiphenomenon of that relationship.

      We thank the reviewer for these valuable comments. In response, we analyzed pairwise synchronization strengths as a function of spike counts geometric mean of neuron pairs, as suggested. As shown in Author response image 4, the CCG peak counts in the original data (red dots) increase with the spike count geometric mean, consistent with the expected trend. However, this trend is also captured by the jitter control (black dots), which reflects synchrony levels expected by chance given the spike count levels.

      Importantly, the normalized synchronization strengths - defined as the ratio of CCG peak counts in the original data to the jitter control – are not positively correlated with spike counts and remain significantly greater than 1 across all spike count levels (Author response image 5). This demonstrates synchrony beyond what could be explained by spike count variations alone.

      While we understand the potential influence of state-dependent spike count variations, our jittering approach effectively controls for this by removing chance-level synchrony that could arise from these variations. This ensures that the observed synchrony reflects genuine neuronal interactions rather than an epiphenomenon of spike count variations between states.

      Author response image 4.

      Plot of peak spike counts of pairwise CCGs (red) and mean spike counts from jittered data (black) against geometric means of pair spike counts.

      Author response image 5.

      Plot of normalized synchronization strengths against spike count geometric means.

      (8) L135/Figure 1: show all CCGs in a color matrix.

      We have generated a color matrix visualization of all pairwise CCGs, as recommended (Author response image 3). This visualization highlights the consistency of our results across neuron pairs.

      (9) L168/Figure 2: the LFPO is nearly irrelevant - it is from the other hemisphere, and it is unclear whether the depth is the same as in the "deep" (closer to the brain surface) imaging plain used for the voltage recordings.

      As previously explained, the LFPO is relevant because it reveals the occurrence of theta and ripple states, which are highly synchronous across both hemispheres and serve as reliable indicators of network states relevant to our findings.

      (10) L222/Figure 3: The ripple-related analyses are completely irrelevant - ripples are a local phenomenon, and recording from the other hemisphere is completely irrelevant.

      We thank the reviewer’s suggestions. As we have explained in the public review, as well as in the reviewer’s comments #1 and #3, the occurrences of theta and ripple oscillations are well-coordinated across hemispheres. As our analyses only depend on the occurrences of these oscillations, our conclusions regarding the association of the synchronous ensembles with theta but not ripple oscillations are supported by data.

      (11) L292/Figure 4, panels A-E: please trigger Vm on the same-neuron spikes, not on the "synchrony events". This will already explain most of the observations. Some of this is already shown in the supplementary figures.

      As the reviewer correctly noted, we have already presented data triggered on same-neuron spikes in Figure 5-figure supplement 1C and D. The reason we show synchrony-triggered LFP and subthreshold Vm in the figure is to highlight the network dynamics during synchronous events. This approach provides a broader perspective on how neural networks function and interact during periods of synchrony, offering insights beyond individual neuron activity

      (12) L351/Figure 5, panel C: typo - should read "strength"

      The typo has been corrected.

      (13) L351/Figure 5: show "spatial tuning correlation" vs. inter-soma distance (as in Fig. 4G). This may explain part (if not all) of the observations

      We have followed the reviewer’s suggestion and generated the plot (Author response image 6). Consistent with the literature, the plot demonstrates that the spatial tuning correlations of place cell pairs exhibit little relationship with their inter-soma distances.

      Author response image 6.

      Plot of spatial tuning correlation vs. inter-soma distance (Spearman correlation coefficient=0.06, p\=0.54, n\=91 pairs).

      (14) L937/Figure S3: panel A: the ripples here appear to be recorded from the top part of the layer, i.e., the electrode is not in the center of the layer. Panel B: add statistical testing.

      We agree with the reviewer that this is possible, as we aimed to place our LFP electrodes in the stratum pyramidale. Regarding panel B of the figure, we verified the quality of LFP recordings by acquiring data from subsequent sessions following the initial imaging sessions. The detection of ripples in the same animals during these later sessions indicates that the absence of ripples during the first sessions is not due to deterioration in LFP recording quality. However, due to the small sample size, the statistical power is insufficient to demonstrate significance (n\=5 sessions, p\=0.06, Wilcoxon signed-rank test). Nevertheless, our conclusions are not contingent upon achieving statistical significance in this test.

      (15) L944/Figure S4: The "R=1" is very likely to be an outcome of n=1 spike. In other words, estimates of phase are unreliable when the spike count is very low. This is related to the problem referred to in Comment #7 above.

      We understand that phase estimates can be unreliable when the spike counts are low. We now highlight that this effect has been taken into account by a shuffling procedure that assesses the significance of phase modulation, and by excluding neurons with nonsignificant modulation strengths. Neurons with low spike count or inconsistent spike phases are typically excluded due to the non-significant strength of phase modulation.

      Method (line 828)

      “The significance of the modulation strength was tested by shuffling the spike timings and recalculating the modulation strength a thousand times to generate a distribution based on the shuffled spike timings. The original modulation strength was then compared to the distribution, with significance determined if it exceeded the 95% confidence interval of the shuffled values.

      Significant modulation strengths were plotted and compared across groups.”

      (16) L944/Figure S4: Putting the spike count issue (Comment #15) aside for a moment, the analyses in this figure are actually valid - they are carried out at the single-neuron level, with respect to the local (same-neuron) Vm. These findings provide a key alternative explanation to the observations purported in the main figures: (1) if spiking is locked to intracellular theta (occurring at the peak of Vm); and if (2) intra-cellular (Vm) theta is locked to extracellular theta (antiphase); and if (3) extracellular theta is similar for nearby neurons (the imaged neurons), then synchrony is a necessary outcome. The key question is then whether there is any EXTRA synchrony between the CA1PC - beyond that which necessarily derives from (1)+(2)+(3).

      We acknowledge the reviewer’s perspective. However, the factors (1)+(2)+(3) alone do not account for the synchrony we observed. As the reviewer points out (and as discussed in our response to the public review and in Supplementary Figure 4), theta phase locking does not necessarily imply population synchrony. To demonstrate that population synchrony extends beyond the contribution of (1)+(2)+(3), we performed an analysis where the theta cycles in which neurons spike were randomized, while the theta phases remained unchanged (Supplementary Figure 4). The analysis revealed that randomizing the theta cycles while preserving theta phases significantly reduces population synchrony. This finding indicates that spiking in specific theta cycles plays a major role in driving population synchrony.

      Result (line 358)

      “Correlated intracellular theta and theta-phase locking of the synchronous ensembles raise the question of whether population synchrony among CA1PCs extends beyond synchrony derived from these effects. To address this, we analyzed population synchrony after randomizing the theta cycles during which neurons spiked, while keeping their theta phases unchanged. Supplementary Figure 4 illustrates a significant reduction in synchronous event rates following theta cycle randomization. The finding indicates spiking at specific theta cycles plays a major role in driving population synchrony.”

      (17) L944/Fig. S4: Why 71 neurons in AB and only 59 in CD?

      In the previous version, panels A and B included 71 neurons, as we collected data from 71 cells across 5 mice (see the text below).

      Result (line 93)

      “…in total, 71 cells imaged from 5 fields of view in 5 mice; Figure 1B and

      Supplementary Figure 1A and 1B).”

      In the current version, we only include neurons with significant modulation strengths, reducing the number of cells from 71 to 65 in panel A and from 71 to 54 in panel B.

      Methods (line 828)

      “The significance of the modulation strength was tested by shuffling the spike timings and recalculating the modulation strength a thousand times to generate a distribution based on the shuffled spike timings. The original modulation strength was then compared to the distribution, with significance determined if it exceeded the 95% confidence interval of the shuffled values. Significant modulation strengths were plotted and compared across groups.”

      “Figure 5-figure supplement 1 Figure legend (line 1231)

      Polar plot comparing subVm theta modulation between spikes participating in synchronous ensembles (sync spikes) and spikes not participating in synchronous ensembles (other spikes) during immobility. Each dot represents the averaged modulation of a cell. Cells with modulation strengths that are not significant are excluded in the plot and in the comparison.”

      For panels C and D, we excluded neurons with four or fewer triggering events from the analysis, which reduced the number of cells from 71 to 59 (see the second text paragraph below).

      Method (line 835)

      “We extracted segments of fluorescence traces using a ±300 ms time window centered on the spike timings. To examine variations in fluorescence waveforms triggered by spikes within and outside synchronous events, we categorized the fluorescence traces based on whether the spikes occurred within or outside these events. Subsequently, we performed pairwise comparisons of the fluorescence values from the same neuron, concentrating on spikes occurring during corresponding behavioral states. Neurons with four or fewer triggering events in any of these categories were omitted from the analysis.”

      (1) Mizuseki, K. & Buzsaki, G. Preconfigured, skewed distribution of firing rates in the hippocampus and entorhinal cortex. Cell Rep 4, 1010-1021 (2013). https://doi.org:10.1016/j.celrep.2013.07.039

      (2) McHugh, T. J., Blum, K. I., Tsien, J. Z., Tonegawa, S. & Wilson, M. A. Impaired hippocampal representation of space in CA1-specific NMDAR1 knockout mice. Cell 87, 1339-1349 (1996). https://doi.org:10.1016/s0092-8674(00)81828-0 3

      (3) Buzsaki, G. et al. Hippocampal network patterns of activity in the mouse. Neuroscience 116, 201-211 (2003). https://doi.org:10.1016/s03064522(02)00669-3

      (4) Karlsson, M. P. & Frank, L. M. Network dynamics underlying the formation of sparse, informative representations in the hippocampus. J Neurosci 28, 14271-14281 (2008). https://doi.org:10.1523/JNEUROSCI.4261-08.2008

      (5) Dombeck, D. A., Harvey, C. D., Tian, L., Looger, L. L. & Tank, D. W. Functional imaging of hippocampal place cells at cellular resolution during virtual navigation. Nat Neurosci 13, 1433-1440 (2010). https://doi.org:10.1038/nn.2648

      (5) Malvache, A., Reichinnek, S., Villette, V., Haimerl, C. & Cossart, R. Awake hippocampal reactivations project onto orthogonal neuronal assemblies. Science 353, 1280-1283 (2016). https://doi.org:10.1126/science.aaf3319

      (7) Sheffield, M. E. J., Adoff, M. D. & Dombeck, D. A. Increased Prevalence of Calcium Transients across the Dendritic Arbor during Place Field Formation. Neuron 96, 490-504 e495 (2017). https://doi.org:10.1016/j.neuron.2017.09.029

      (8) Adam, Y. et al. Voltage imaging and optogenetics reveal behaviour-dependent changes in hippocampal dynamics. Nature 569, 413-417 (2019). https://doi.org:10.1038/s41586-019-1166-7

      (9) Go, M. A. et al. Place Cells in Head-Fixed Mice Navigating a Floating RealWorld Environment. Front Cell Neurosci 15, 618658 (2021). https://doi.org:10.3389/fncel.2021.618658

      (10) Geiller, T. et al. Local circuit amplification of spatial selectivity in the hippocampus. Nature 601, 105-109 (2022). https://doi.org:10.1038/s41586021-04169-9

      (11) Rolotti, S. V. et al. Local feedback inhibition tightly controls rapid formation of hippocampal place fields. Neuron 110, 783-794 e786 (2022). https://doi.org:10.1016/j.neuron.2021.12.003

      (12) Pettit, N. L., Yap, E. L., Greenberg, M. E. & Harvey, C. D. Fos ensembles encode and shape stable spatial maps in the hippocampus. Nature 609, 327-334 (2022). https://doi.org:10.1038/s41586-022-05113-1

      (13) Hainmueller, T. & Bartos, M. Parallel emergence of stable and dynamic memory engrams in the hippocampus. Nature 558, 292-296 (2018). https://doi.org:10.1038/s41586-018-0191-2

      (14) Gauthier, J. L. & Tank, D. W. A Dedicated Population for Reward Coding in the Hippocampus. Neuron 99, 179-193 e177 (2018). https://doi.org:10.1016/j.neuron.2018.06.008

      (15) Grosmark, A. D., Sparks, F. T., Davis, M. J. & Losonczy, A. Reactivation predicts the consolidation of unbiased long-term cognitive maps. Nat Neurosci 24, 1574-1585 (2021). https://doi.org:10.1038/s41593-021-00920-7

      (16) Farrell, J. S., Hwaun, E., Dudok, B. & Soltesz, I. Neural and behavioural state switching during hippocampal dentate spikes. Nature 628, 590-595 (2024). https://doi.org:10.1038/s41586-024-07192-8

      (17) McHugh, S. B. et al. Offline hippocampal reactivation during dentate spikes supports flexible memory. Neuron 112, 3768-3781 e3768 (2024). https://doi.org:10.1016/j.neuron.2024.08.022

      (18) Gava, G. P. et al. Organizing the coactivity structure of the hippocampus from robust to flexible memory. Science 385, 1120-1127 (2024). https://doi.org:10.1126/science.adk9611

      (19) Galvao, J. et al. Unexpected low-dose toxicity of the universal solvent DMSO. FASEB J 28, 1317-1330 (2014). https://doi.org:10.1096/fj.13-235440

      (20) Yuan, C. et al. Dimethyl sulfoxide damages mitochondrial integrity and membrane potential in cultured astrocytes. PloS one 9, e107447 (2014). https://doi.org:10.1371/journal.pone.0107447

      (21) Modrzynski, J. J., Christensen, J. H. & Brandt, K. K. Evaluation of dimethyl sulfoxide (DMSO) as a co-solvent for toxicity testing of hydrophobic organic compounds. Ecotoxicology 28, 1136-1141 (2019). https://doi.org:10.1007/s10646-019-02107-0

      (22) Hoyberghs, J. et al. DMSO Concentrations up to 1% are Safe to be Used in the Zebrafish Embryo Developmental Toxicity Assay. Front Toxicol 3, 804033 (2021). https://doi.org:10.3389/ftox.2021.804033

      (23) Abdelfattah, A. S. et al. Sensitivity optimization of a rhodopsin-based fluorescent voltage indicator. Neuron (2023). https://doi.org:10.1016/j.neuron.2023.03.009

      (24) Huang, Y. C. et al. Dynamic assemblies of parvalbumin interneurons in brain oscillations. Neuron 112, 2600-2613 e2605 (2024). https://doi.org:10.1016/j.neuron.2024.05.015

      (25) Abdelfattah, A. S. et al. Bright and photostable chemigenetic indicators for extended in vivo voltage imaging. Science 365, 699-704 (2019). https://doi.org:10.1126/science.aav6416

      (26) Szabo, G. G. et al. Ripple-selective GABAergic projection cells in the hippocampus. Neuron 110, 1959-1977 e1959 (2022). https://doi.org:10.1016/j.neuron.2022.04.002

      (27) Dudok, B. et al. Alternating sources of perisomatic inhibition during behavior. Neuron 109, 997-10<sup>12</sup> e1019 (2021). https://doi.org:10.1016/j.neuron.2021.01.003

      (28) Terada, S. et al. Adaptive stimulus selection for consolidation in the hippocampus. Nature 601, 240-244 (2022). https://doi.org:10.1038/s41586021-04118-6

      (29) Geiller, T. et al. Large-Scale 3D Two-Photon Imaging of Molecularly Identified CA1 Interneuron Dynamics in Behaving Mice. Neuron 108, 968-983 e969 (2020). https://doi.org:10.1016/j.neuron.2020.09.013

      (30) Mizuseki, K. & Buzsaki, G. Theta oscillations decrease spike synchrony in the hippocampus and entorhinal cortex. Philos Trans R Soc Lond B Biol Sci 369, 20120530 (2014). https://doi.org:10.1098/rstb.2012.0530

      (31) Csicsvari, J., Hirase, H., Mamiya, A. & Buzsaki, G. Ensemble patterns of hippocampal CA3-CA1 neurons during sharp wave-associated population events. Neuron 28, 585-594 (2000). https://doi.org:10.1016/s08966273(00)00135-5

      (32) Harris, K. D., Csicsvari, J., Hirase, H., Dragoi, G. & Buzsaki, G. Organization of cell assemblies in the hippocampus. Nature 424, 552-556 (2003). https://doi.org:10.1038/nature01834

      (33) Yagi, S., Igata, H., Ikegaya, Y. & Sasaki, T. Awake hippocampal synchronous events are incorporated into offline neuronal reactivation. Cell Rep 42, 112871 (2023). https://doi.org:10.1016/j.celrep.2023.112871

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Khan et. al., investigated the functional redundancy of the non-canonical L-cysteine synthases of M. tuberculosis, CysM and CysK2, focussing on their role in mitigating the effects of host-derived stress. They found that while deletion mutants of the two synthases (Rv∆cysM, Rv∆cysK2) have similar transcriptomes under standard conditions, their transcriptional response to oxidative stress is distinct. The impact of deleting the synthases also differentially affected the pools of L-cysteinederived metabolites. They show that the mutants (Rv∆cysM, Rv∆cysK2) have impaired survival in peritoneal macrophages and in a mouse model of infection. Importantly, they show that the survival of the mutants increases when the host is defective in producing reactive oxygen and nitrogen species, linking the phenotype to a defect in combating host-derived stress. Finally, they show that compounds inhibiting L-cysteine synthases reduce the intracellular survival of M.

      tuberculosis.

      Strengths:

      (1) The distinct transcriptome of the Rv∆cysM and Rv∆cysK2 mutants in the presence of oxidative stress provides solid evidence that these mutants are distinct in their response to oxidative stress, and suggests that they are not functionally redundant.

      (2) The use of macrophages from phox-/- and INF-/- mice and an iNOS inhibitor for the intracellular survival assays provides solid evidence that the survival defect seen for the Rv∆cysM and Rv∆cysK2 mutants is related to their reduced ability to combat host-derive oxidative and nitrosative stress. This is further supported by the infection studies in phox-/- and INF-/- mice.

      Weaknesses:

      (1) There are several previous studies looking at the transcriptional response of M. tuberculosis to host-derived stress, however, the authors do not discuss initial RNA-seq data in the context of these studies. Furthermore, while several of the genes in sulfur assimilation and L-cysteine biosynthetic pathway genes are upregulated by more than one stress condition, the data does not support the statement that it is the "most commonly upregulated pathway in Mtb exposed to multiple host-like stresses".

      We have made changes in the manuscript in line with reviewer’s suggestion.  

      “Thus RNA-Seq data suggest that genes involved in sulfur assimilation and L-cysteine biosynthetic pathway are upregulated during various host-like stresses in Mtb (Figure S2). Given the importance of sulphur metabolism genes in in vivo survival of Mtb [1, 2], it is not surprising that these genes are dynamically regulated by diverse environment cues. Microarray studies have shown upregulation of genes encoding sulphate transporter upon exposure to hydrogen peroxide and nutrient starvation [3-7] Similarly, ATP sulfurlyase and APS kinase is induced during macrophage infection and by nutrient depletion. Induction of these genes that coordinate first few steps of sulphur assimilation pathway indicate that probable increase in biosynthesis of sulphate containing metabolites that may be crucial against host inflicted stresses. Furthermore, genes involved in synthesis of reduced sulphur moieties (cysH, sirA and cysM) are also induced by hydrogen peroxide and nutrient starvation. Sulfur metabolism has been postulated to be important in transition to latency. This hypothesis is based on transcriptional upregulation of cysD, cysNC, cysK2, and cysM upon exposure to hypoxia. Multiple transcriptional profiling studies have reported upregulation of moeZ, mec, cysO and cysM genes when cells were subjected to oxidative and hypoxic stress [1, 6-11] further suggesting an increase in the biosynthesis of reduced metabolites such as cysteine and methionine and sulfur containing cell wall glycolipids upon exposure to oxidative stress [12]. We have modified the sentence to “significantly upregulated pathway in Mtb exposed to multiple host-like stresses”

      (2) For the quantification of the metabolites, it isn't clear how the abundance was calculated (e.g., were standards for each metabolite used? How was abundance normalised between samples?), and this information should be included to strengthen the data.

      Thanks for picking up this. We have extended our description of metabolomics methods. It now reads: “Due to the tendency of M. tuberculosis to form clamps, which significantly skews any cell number estimation we normalized samples to protein/peptide concentration using the BCA assay kit (Thermo). Therefore, our LC-MS data is expressed as ion counts/mg protein or ratios of that for the same metabolite. This is a standard way to express ion abundance data as it was done previously [13, 14].

      Furthermore, labelling with L-methionine was performed to determine the rate of synthesis of the L-cysteine-derived metabolites. L-cysteine is produced from L-methionine via the transsulfuration pathway, which is independent of CysM and CysK2. It is therefore difficult to interpret this experiment, as the impact of deleting CysM and CysK2 on the transsulfuration pathway is likely indirect.

      The reviewer may have misunderstood the experiment and the results presented. Labelling was not performed with L-methionine. We use 34S derived from SO42-, to monitor reductive assimilation of sulfur and its transit from S2- until L-methionine, passing through cysteine. We specified in material and methods that we have used sodium sulfate-34S (Merck 718882), as our label source of sulfur. This method was first employed in M. tuberculosis by the Bertozzi group to identify sulfolipids in mycobacteria. Therefore, we are not measuring transsulfuration, but instead direct synthesis of L-methionine via cysteine, and consequently we are indeed assessing the importance of cysK2 and cysM in this process. We have now added to the results section (page 9) that we employed (Na34SO4) for labeling, to make sure other readers will not think we are measuring transulfuration.

      (3) The ability of L-cysteine to rescue the survival defect of the Rv∆cysM and Rv∆cysK2 mutants in macrophages is interpreted as exogenous L-cysteine being able to compensate for reduced intracellular levels. However, there is no evidence that L-cysteine is being taken up by the mutants and an alternate explanation is that L-cysteine functions as an antioxidant within cells i.e., it reduces intracellular ROS.

      The concentration of L-cysteine used for peritoneal macrophage survival rescue experiments was titrated to have no minimum survival advantage in case of wild-type Rv. Thus, at the given concentration, we believe that the contribution of cysteine in reducing intracellular ROS within cells does not have a major role since there is no significant difference in the survival of wild-type Rv strain. Had cysteine reduced intracellular ROS, we would expect increased bacterial survival of Rv due to diminished oxidative stress. 

      Furthermore, L-cysteine addition also mitigates CHP induced survival defect in vitro [15] and nullifies observed effect of Cysteine inhibitors in vitro [16] suggesting that cysteine or cystine can be transported into Mtb. This has also been previously shown in case of AosR mutant strain [15], CysH [2] and over 70% uptake of exogenously added [35S] cysteine to a growing culture of Mtb [17].

      The authors sought to investigate the functional redundancy of the non-canonical L-cysteine synthases CysM and CysK2. While their distinct transcriptional response to oxidative stress suggests distinct physiological roles, the study did not explore these differences and therefore provides only preliminary insight into the underlying reasons for this observation. In the context of drug development, this work suggests that while L-cysteine synthase inhibitors do not have high potency for killing intracellular M. tuberculosis, they have the potential to decrease the pathogen's survival in the presence of host-derive stress.

      Reviewer #2 (Public Review):

      Summary:

      The paper examines the role L-cysteine metabolism plays in the biology of Mycobacterium tuberculosis. The authors have preliminary data showing that Mycobacterium tuberculosis has two unique pathways to synthesize cysteine. The data showing new compounds that act synergistically with INH is very interesting.

      Strengths:

      RNAseq data is interesting and important.

      Weaknesses:

      The paper would be strengthened if the authors were to add further detail to their genetic manipulations.

      The authors provide evidence that they have successfully made a cysK2 mutant by recombineering. This data looks promising, but I do not see evidence for the cysM deletion. It is also important to state what sort of complementation was done (multicopy plasmid, integration proficient vector, or repair of the deletion). Since these mutants are the basis for most of the additional studies, these details are essential. It is important to include complementation in mouse studies as unexpected loss of PDIM could have occurred.

      The details of CysM knockout generation have been previously published ([15]; Appendix Figure S4), and complementation strain details are provided in the methods section.  

      Reviewer #3 (Public Review):

      In this work, the authors conduct transcriptional profiling experiments with Mtb under various different stress conditions (oxidative, nitrosative, low pH, starvation, and SDS). The Mtb transcriptional responses to these stress conditions are not particularly new, having been reported extensively in the literature over the past ~20 years in various forms. A common theme from the current work is that L-cysteine synthesis genes are seemingly up-regulated by many stresses. Thus, the authors focused on deleting two of the three L-cysteine synthesis genes (cysM and cysK2) in Mtb to better understand the roles of these genes in Mtb physiology.

      The cysM and cysK2 mutants display fitness defects in various media (Sautons media, starvation, oxidative and nitrosative stress) noted by CFU reductions. Transcriptional profiling studies with the cysM and cysK2 mutants revealed that divergent gene signatures are generated in each of these strains under oxidative stress, suggesting that cysM and cysK2 have non-redundant roles in Mtb's oxidative stress response which likely reflects the different substrates used by these enzymes, CysO-L-cysteine and O-phospho-L-serine, respectively. Note that these studies lack genetic complementation and are thus not rigorously controlled for the engineered deletion mutations.

      The authors quantify the levels of sulfur-containing metabolites (methionine, ergothioneine, mycothiol, mycothionine) produced by the mutants following exposure to oxidative stress. Both the cysM or cysK2 mutants produce more methionine, ergothioneine, and mycothionine relative to WT under oxidative stress. Both mutants produce less mycothiol relative to WT under the same condition. These studies lack genetic complementation and thus, do not rigorously control for the engineered mutations.

      Next, the mutants were evaluated in infection models to reveal fitness defects associated with oxidative and nitrosative stress in the cysM or cysK2 mutants. In LPS/IFNg activated peritoneal macrophages, the cysM or cysK2 mutants display marked fitness defects which can be rescued with exogenous cysteine added to the cell culture media. Peritoneal macrophages lacking the NADPH oxidase (Phox) or IFNg fail to produce fitness phenotypes in the cysM or cysK2 mutants suggesting that oxidative stress is responsible for the phenotypes. Similarly, chemical inhibition of iNOS partly abrogated the fitness defect of the cysM or cysK2 mutants. Similar studies were conducted in mice lacking IFNg and Phox establishing that cysM or cysK2 mutants have fitness defects in vivo that are dependent on oxidative and nitrosative stress.

      Lastly, the authors use small molecule compounds to inhibit cysteine synthases. It is demonstrated that the compounds display inhibition of Mtb growth in 7H9 ADC media. No evidence is provided to demonstrate that these compounds are specifically inhibiting the cysteine synthases via "ontarget inhibition" in the whole Mtb cells. Additionally, it is wrongly stated in the discussion that "combinations of L-cys synthase inhibitors with front-line TB drugs like INH, significantly reduced the bacterial load inside the host". This statement suggests that the INH + cysteine synthase inhibitor combinations reduce Mtb loads within a host in an infection assay. No data is presented to support this statement.

      We agree with the reviewer that the experiments do not conclusively prove that these compounds specifically inhibit the cysteine synthases via "on-target inhibition" in the whole Mtb cells. However, the inhibitors used in this study have been previously profiled in vitro (https://www.sciencedirect.com/science/article/abs/pii/S0960894X17308405?via%3Dihub).  We have modified the sentence to “a combination of L-cysteine synthase inhibitors with front-line TB drugs like INH, significantly reduced the bacterial survival in vitro”

      References

      (1) Hatzios, S.K. and C.R. Bertozzi, The regulation of sulfur metabolism in Mycobacterium tuberculosis. PLoS Pathog, 2011. 7(7): p. e1002036.

      (2) Senaratne, R.H., et al., 5'-Adenosinephosphosulphate reductase (CysH) protects Mycobacterium tuberculosis against free radicals during chronic infection phase in mice. Mol Microbiol, 2006. 59(6): p. 1744-53.

      (3) Betts, J.C., et al., Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling. Mol Microbiol, 2002. 43(3): p. 717-31.

      (4) Hampshire, T., et al., Stationary phase gene expression of Mycobacterium tuberculosis following a progressive nutrient depletion: a model for persistent organisms? Tuberculosis (Edinb), 2004. 84(3-4): p. 228-38.

      (5) Schnappinger, D., et al., Transcriptional Adaptation of Mycobacterium tuberculosis within Macrophages: Insights into the Phagosomal Environment. J Exp Med, 2003. 198(5): p. 693-704.

      (6) Voskuil, M.I., et al., The response of mycobacterium tuberculosis to reactive oxygen and nitrogen species. Front Microbiol, 2011. 2: p. 105.

      (7) Voskuil, M.I., K.C. Visconti, and G.K. Schoolnik, Mycobacterium tuberculosis gene expression during adaptation to stationary phase and low-oxygen dormancy. Tuberculosis (Edinb), 2004. 84(3-4): p. 218-27.

      (8) Brunner, K., et al., Profiling of in vitro activities of urea-based inhibitors against cysteine synthases from Mycobacterium tuberculosis. Bioorg Med Chem Lett, 2017. 27(19): p. 4582-4587.

      (9) Manganelli, R., et al., Role of the extracytoplasmic-function sigma factor sigma(H) in Mycobacterium tuberculosis global gene expression. Mol Microbiol, 2002. 45(2): p. 365-74.

      (10) Burns, K.E., et al., Reconstitution of a new cysteine biosynthetic pathway in Mycobacterium tuberculosis. J Am Chem Soc, 2005. 127(33): p. 11602-3.

      (11) Manganelli, R., et al., The Mycobacterium tuberculosis ECF sigma factor sigmaE: role in global gene expression and survival in macrophages. Mol Microbiol, 2001. 41(2): p. 423-37.

      (12) Tyagi, P., et al., Mycobacterium tuberculosis has diminished capacity to counteract redox stress induced by elevated levels of endogenous superoxide. Free Radic Biol Med, 2015. 84: p. 344-354.

      (13) de Carvalho, L.P., et al., Metabolomics of Mycobacterium tuberculosis reveals compartmentalized co-catabolism of carbon substrates. Chem Biol, 2010. 17(10): p. 1122-31.

      (14) Agapova, A., et al., Flexible nitrogen utilisation by the metabolic generalist pathogen Mycobacterium tuberculosis. Elife, 2019. 8.

      (15) Khan, M.Z., et al., Redox homeostasis in Mycobacterium tuberculosis is modulated by a novel actinomycete-specific transcription factor. EMBO J, 2021. 40(14): p. e106111.

      (16) Brunner, K., et al., Inhibitors of the Cysteine Synthase CysM with Antibacterial Potency against Dormant Mycobacterium tuberculosis. J Med Chem, 2016. 59(14): p. 6848-59.

      (17) Wheeler, P.R., et al., Functional demonstration of reverse transsulfuration in the Mycobacterium tuberculosis complex reveals that methionine is the preferred sulfur source for pathogenic Mycobacteria. J Biol Chem, 2005. 280(9): p. 8069-78.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure S1 it would be useful to include the reverse transsulfuration pathway given that it contributes to the L-cysteine pool, and that L-methionine was used for metabolite labelling experiments.

      We are in agreement with the reviewer’s suggestion, and we have included reverse transsulfuration in Fig S1. Please note that Labelling was not performed with L-methionine. We used 34S derived from SO42-to monitor the reductive assimilation of sulfur and its transit from S2- until Lmethionine, passing through cysteine. We specified in material and methods that we have used sodium sulfate-34S (Merck 718882), as our label source of sulfur. This method was first employed in M. tuberculosis by the Bertozzi group to identify sulfolipids in mycobacteria. Therefore, we are not measuring transsulfuration but instead a direct synthesis of Lmethionine via cysteine, and consequently, we are indeed assessing the importance of cysK2 and cysM in this process. We have now added to the results section (page 9) that we employed (Na34SO4) for labeling to make sure other readers will not think we are measuring transulfuration.

      Author response image 1.

      (2) In Figure S2 it is unclear why the control is included in this figure given that the stress conditions were compared to the control. What is the control being compared to here?

      The heat maps of controls have been included to demonstrate relative gene expression in independent/each of the replicates. The normalized count for the differentially expressed genes are plotted. To better understand the RNA-seq results, we plotted the fold change of differentially expressed genes due to different stress conditions (New figure & table- Figure S3 & Table S2). This allowed us to understand the expression profile of genes in all the stress conditions simultaneously, regardless of whether they were identified as differentially expressed. The data revealed that specific clusters of genes are up- and downregulated in oxidative, SDS, and starvation conditions. In comparison, the differences observed in the pH 5.5 and nitrosative conditions were limited (Figure S3 & Table S2).  

      (3) In Figure S3 it would be more informative to show fold-enrichment than gene counts in (b) to (f).

      In our opinion, gene counts are more informative when plotting GO enrichments, as the number of genes in each GO category can vary drastically. The significance values are already calculated based on the fold enrichment of a category compared to the background, and hence, p-adj values plotted on the x-axis can be sort of a proxy for fold enrichment. Hence, instead of plotting two related variables, plotting the total gene counts that belonged to a category is usually helpful for the reader in understanding the “scale” in which a category is affected.

      (4) Figure 1c standard Sautons is a defined media, and is not nutrient-limiting - the authors should clarify the composition of the media that they used here.

      The composition of Sautons media used in the study is 0.5g/L MgSO4.7H20, 2 g/L citric acid, 1g/L L-asparagine, 0.3 g/L KCl.H20, 0.2% glycerol, 0.64 g/L FeCl3, 100 μM NH4Cl and 0.7 g/L K2HPO4.3H20. We have modified the sentence in line with reviewer’s suggestion.  

      (5) The authors claim that the distinct transcriptomes for the two mutants indicate that "CysM and CysK2 distinctly modulate 324 and 1104 genes". The effect is likely due to distinct downstream consequences of the deletions, rather than direct regulation by the synthases. This section should be reworded for clarity.

      We have modified the sentence in line with reviewer’s suggestion.

      (6) In Figure 3 it would be useful to express mycothione levels as a percentage of the total mycothiol pool to give an indication of the extent to which the thiol is being oxidised.

      While we appreciate reviewer’s suggestion, we cannot make ratios of IC for two different compounds, as they ionize different. 100 ion counts of one does NOT equal to 100 ion counts of the other.

      (7) Figure 6 is difficult to interpret as the concentrations used in the INH + inhibitor wells are not clear. It would be useful to indicate the concentrations of each compound added next to the wells in the figure.

      We have modified the figure and legends in line with reviewer’s suggestion

      Reviewer #2 (Recommendations For The Authors):

      (1) Document the cysM deletion.

      The details of CysM knockout generation have been previously published ([15]; Appendix Figure S4), and complementation strain details are provided in the methods section. 

      (2) The oxidative stress CHP is not defined in the figure legend.

      We have modified the legend in line with the reviewer’s suggestion.

      (3) Can we see the structures of the compounds?

      Kindly refer to Fig 6a for the structures of compounds 

      (4) Fix the genetics and the paper is very interesting.

      I might be missing something. The authors do provide promising complementation data for several of the stresses. Provide evidence for the cysM deletion and complementation and the data will be very compelling. The focus of the paper is important for our understanding of the biology of Mycobacterium tuberculosis.

      Thank you for appreciating our study. The details of CysM knockout and complementation strain generation have been previously published ([15]; Appendix Figure S4 & Methods)). CysK2 mutant and complementation strain details are included in the present manuscript (Figure 1b & Methods).

      Reviewer #3 (Recommendations For The Authors):

      The transcriptional profiling studies do not rigorously control for the engineered mutations using genetic complementation.

      The complementation strains used in all in vitro, ex vivo and in vivo experiments showcase that the phenotypes associated with knockouts are gene specific. We choose not to include complementation strains in RNA sequencing experiments due to the large number of samples handling and associated costs.  

      Figure 3. These data are not rigorously controlled without genetic complementation, explain why some data in Figure 3 was generated at 24 hr and other data was generated at 48 hr, remove subbars in 3g. Please provide more clarification on Fig 3e-g because the normalization in these panels makes it appear as if there is little- or no-difference in the levels of 34S incorporation into the thiol metabolites.

      The complementation strains used in all in vitro, ex vivo, and in vivo experiments showcase that the phenotypes associated with knockouts are gene-specific. We chose not to include complementation strains in Figure 3 experiments due to the large number of sample handling and associated costs. 

      The time points in the given experiment were chosen based on an initial pilot experiment. It is apparent that a longer duration is required to see the phenotypes associated with labelling compared to pool size. The differences observed are statistically significant. 

      Surfactant and SDS stress are used interchangeably in the text, legends, and figures. Please be consistent here.

      We have modified the text in line with reviewer’s suggestion.

      Consider re-wording the 1st paragraph on page 5 to better clarify how Trp, Lys, and His interact with the host immune cells.

      We have modified the text in line with reviewer’s suggestion.

      Cite the literature associated with the sulfur import system in Mtb on page 3 in the 2nd paragraph.

      We have modified the text in line with reviewer’s suggestion.

      The manuscript nicely describes the construction of a cysK2 mutant. It is unclear how the cysM mutant was generated. Please clarify, cite, or add the cysM mutant construction to this manuscript.

      The details of CysM knockout and complementation strain generation has been previously published ([15]; Appendix Figure S4 & Methods)). We have included the citation in the methods section of current manuscript.

      Provide evidence that the small molecules used in Fig 6 are on target and inhibit the cysteine biosynthetic enzymes in whole bacteria. It is unclear how a MIC can be determined with these compounds in 7H9 ADC when deletion mutants grow just fine in this media. Is this because the compounds inhibit multiple cysteine synthesis enzymes and/or enzymatic targets in other pathways? To me, the data suggests that the compounds are hitting multiple enzymes in whole Mtb cells. Does cysteine supplementation reverse the inhibitory profiles with the compounds in Figure 6?

      As mentioned in the text, all the compounds were ineffective in killing Mtb, likely because Lcysteine synthases are not essential during regular growth conditions. Hence, the MIC for cysteine inhibitors was very high - C1 (0.6 mg/ml), C2 (0.6 mg/ml), and C3 (0.15 mg/ml) opposed to the standard drug, isoniazid with MIC of 0.06 ug/ml. We agree with the reviewer that the experiments do not conclusively prove that these compounds specifically inhibit the cysteine synthases via "on-target inhibition" in  Mtb cells. The inhibitors used in this study have been previously profiled in vitro [8]. However, one cannot rule out the hypothesis that these compounds might also have some off-target effects.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Cheong et al. use a synapse-resolution wiring map of the fruit fly nerve cord to comprehensively investigate circuitry between descending neurons (DNs) from the brain and motor neurons (MNs) that enact different behaviours. These neurons were painstakingly identified, categorised, and linked to existing genetic driver lines; this allows the investigation of circuitry to be informed by the extensive literature on how flights walk, fly, and escape from looming stimuli. New motifs and hypotheses of circuit function were presented. This work will be a lasting resource for those studying nerve cord function.

      Strengths:

      The authors present an impressive amount of work in reconstructing and categorising the neurons in the DN to MN pathways. There is always a strong link between the circuitry identified and what is known in the literature, making this an excellent resource for those interested in connectomics analysis or experimental circuits neuroscience. Because of this, there are many testable hypotheses presented with clear predictions, which I expect will result in many follow-up publications. Most MNs were mapped to the individual muscles that they innervate by linking this connectome to pre-existing light microscopy datasets. When combined with past fly brain connectome datasets (Hemibrain, FAFB) or future ones, there is now a tantalising possibility of following neural pathways from sensory inputs to motor neurons and muscle.

      Weaknesses:

      As with all connectome datasets, the sample size is low, limiting statistical analyses. Readers should keep this in mind, but note that this is the current state-of-the-art. Some figures are weakened by relying too much on depictions of wiring diagrams as evidence of circuit function, similarity between neuropils, etc. without additional quantitative justification.

      We thank the reviewer for their helpful comments. We are excited about the release of this densely reconstructed connectome and its potential to facilitate circuit exploration in the VNC. We note that while statistical methods for analyzing complicated networks such as the connectome are still being developed, the wiring diagrams presented are themselves visualizations of quantitative data. We address specific concerns below.

      Reviewer #2 (Public Review):

      Summary:

      In Cheong et al., the authors analyze a new motor system (ventral nerve cord) connectome of Drosophila. Through proofreading, cross-referencing with another female VNC connectome, they define key features of VNC circuits with a focus on descending neurons (DNs), motor neurons (MNs), and local interneuron circuits. They define DN tracts, MNs for limb and wing control, and their nerves (although their sample suffers for a subset of MNs). They establish connectivity between DNs and MNs (minimal). They perform topological analysis of all VNC neurons including interneurons. They focus specifically on identifying core features of flight circuits (control of wings and halteres), leg control circuits with a focus on walking rather than other limbed behaviors (grooming, reaching, etc.), and intermediate circuits like those for escape (GF). They put these features in the context of what is known or has been posited about these various circuits.

      Strengths:

      Some strengths of the manuscript include the matching of new DN and MN types to light microscopy, including the serial homology of leg motor neurons. This is a valuable contribution that will certainly open up future lines of experimental work.

      Also, the analysis of conserved connectivity patterns within each leg neuromere and interconnecting connectivity patterns between neuromeres will be incredibly valuable. The standard leg connectome is very nice.

      Finally, the finding of different connectivity statistics (degrees of feedback) in different neuropils is quite interesting and will stimulate future work aimed at determining its functional significance.

      We thank the reviewer for their constructive feedback, and are optimistic about the utility of the MANC connectome to the Drosophila neurobiology community in dissecting VNC circuit function.

      Weaknesses:

      First, it seems like quite a limitation that the neurotransmitter predictions were based on training data from a fairly small set of cells, none of which were DNs. It's wonderful that the authors did the experimental work to map DN neurotransmitter identity using FISH, and great that the predictions were overall decently accurate for both ACh and Glu, but unfortunate that they were not accurate for GABA. I hope there are plans to retrain the neurotransmitter predictions using all of this additional ground truth experimental data that the authors collected for DNs, in order to provide more accurate neurotransmitter type predictions across more cell types.

      The reviewer makes an excellent suggestion, and collecting further ground truth data and retraining the neurotransmitter classifier is an ongoing research project. 

      Second, the degradation of many motor neurons is unfortunate. Figure 5 Supplement 1 shows that roughly 50% of the leg motor neurons have significantly compromised connectivity data, whereas, for non-leg motor neurons, few seem to be compromised. If that is the correct interpretation of this figure, perhaps a sentence like this that includes some percentages (~50% of leg MNs, ~5% of other MNs) could be added to the main text so that readers can get a sense of the impact more easily.

      Thank you for this suggestion. We have added a line describing the percentage of leg and other MNs affected (L416-417).

      As well, Figure 5 Supplement 1 caption says "Note that MN groups where all members of the group have reconstruction issues may not be flagged" - could the authors comment on how common they think this is based on manual inspection? If it changes the estimate of the percentage of affected leg motor neurons from 50% to 75% for example, this caveat in the current analysis would need to be addressed more directly. Comparing with FANC motor neurons could perhaps be an alternative/additional approach for estimating the number of motor neurons that are compromised.

      We agree that a direct comparison to another dataset, such as FANC, would aid in identifying reconstruction issues. However, a full analysis is not currently possible as only a minority of FANC neurons have been proofread or annotated. We were able to gain some insights into reconstruction quality by looking at T1 motor neurons, where FANC MN reconstruction is more complete. As reported in the submitted manuscript, we were able to confidently match T1 MNs between FANC and MANC for all but one MN (we are missing one ltm MN on the right side of MANC). While some of the MANC neurons had smaller/less dense arbors than FANC, none of them would have been flagged as having reconstruction issues. However, for FANC, we observe that neurons on the right have less dense arbors and fewer reconstructed synapses than neurons on the left.  We have prepared a reviewer figure analyzing the consistency of synapse counts for the T1 (front leg) MNs:

      Author response image 1.

      In these results (MANC on the left, FANC on the right) we compare the number of input synapses on matched motor neurons on the left (LHS) and right hand side (RHS) of each dataset. We see that the MANC distribution is much more symmetric, indicating left and right hand side synapse counts for matched MNs are more similar in MANC. This is likely largely due to the left-right difference in reconstruction completeness in the FANC T1 leg neuropils. The number of synapses per cell type is also more variable in FANC. Overall, we recommend that end users should inspect the morphology and total synapse counts of individual MNs of interest in either dataset as part of any detailed analysis.

      This analysis might benefit from some sort of control for true biological variability in the number of MN synapses between left and right or across segments. I assume the authors chose the threshold of 0.7 because it seemed to do a good job of separating degraded neurons from differences in counts that could just be due to biological variability or reconstruction imperfections, but perhaps there's some way to show this more explicitly. For example, perhaps show how much variability there is in synapse counts across all homologs for one or two specific MN types that are not degraded and are reconstructed extremely well, so any variability in input counts for those neurons is likely to be biologically real. Especially because the identification of serial homologs among motor neurons is a key new contribution of this paper, a more in-depth analysis of similarities and differences in homologous leg MNs across segments could be interesting to the field if the degradation doesn't preclude it.

      We agree that there can be ambiguity in whether variability in synapse counts between left-right homologs of a MN type represents biological variability or technical issues. We have added a comparison of synapse counts of T1 leg MNs in MANC (Left) vs FANC (Right) as noted in the previous point. As the number of connectomes available to us increases, we will have a better idea of how synapse counts of MNs vary within and between animals.

      Fourth, the infomap communities don't seem to be so well controlled/justified. Community detection can be run on any graph - why should I believe that the VNC graph is actually composed of discrete communities? Perhaps this comes from a lack of familiarity with the infomap algorithm, but I imagine most readers will be similarly unfamiliar with it, so more work should be done to demonstrate the degree to which these communities are really communities that connect more within than across communities.

      A priori we expect that there is some degree of functional division between circuits controlling different limbs or motor systems, given current evidence that VNC neuropils and neural hemilineages are relatively specialized in controlling motor output. We have added this explanation to section 2.4.2 (L633-635).

      The Infomap algorithm was chosen out of several directed and undirected community detection methods that we tried, as it defined communities that each had connectivity with narrow and specific motor neuron subclasses. For example, it labeled populations in each of the six leg neuropils as belonging to distinct communities. We think this provides an interesting partitioning of the VNC network that could have biological relevance (which future functional studies should investigate). To the reviewer’s final sentence, we do show intra- vs inter-community connectivity in Fig. 9–supplement 1B. Notably, most communities except several small ones have far more intra-community connectivity than inter-community connectivity. We have added text highlighting this observation (L656-658).

      We do, however, agree with the general point of the reviewer that it is not yet known which community detection methods are ‘optimal’ for use with connectomics data, so we have added further text (L679-683) explaining that community detection in MANC will require further investigation and validation in the future.

      I think the length of this manuscript reduces its potential for impact, as I suspect the reality is that many people won't read through all 140 pages and 21 main figures of (overall excellent) work and analysis.

      We intend this paper to serve not only as a first look into the organization of descending-to-motor circuits, but also as a resource for future investigations in MANC. The provided detail is intended to serve these purposes.

      Reviewer #1 (Recommendations For The Authors):

      General comments:

      I find that there are too many main figures with too much content in them, as well as too much corresponding text. Much of the initial anatomical identification and description could be summarised in fewer main figures, with more supplementary figures if the authors desired. I think there is a lot of great insight in this paper, particularly in the second half, but I am concerned that the extensive detail in the initial sections may challenge reader engagement through to the later sections of the paper. It would also be useful to have a higher level and shorter discussion.

      Reiterating our response from above, we intend this paper to serve not only as a first look into the organization of descending-to-motor circuits, but also as a resource for future investigations in MANC. The provided detail is intended to serve these purposes.

      There is sometimes an over-reliance on wiring diagrams or complex plots as evidence without further quantification. I will mention several examples below, as well as additional suggestions.

      Specific comments:

      In Figure 2E, how are DNs divided into pair vs population type? This was a very interesting idea, particularly in light of "command-like" neurons vs ensembles of DNs controlling behaviour. However, it is not clear how this distinction is made. This concept is referenced throughout the manuscript, so I think a clear quantitative way of identifying "pair" vs "population" identity for each DN would be very useful. And at the very least, a thorough explanation of how it is done in the current manuscript.

      We have added additional text in the Figure 2 legend to point towards Materials and Methods where the DN grouping (pair vs. population) is explained. These groups were formed based on morphology and further split into types based on connectivity, if needed. However, as the connectome represents a static snapshot of connectivity with no functional data, it remains possible that some DNs that were grouped as populations may act functionally as multiple pairs. Future work should continue to update these annotations.

      In Figure 4, there are some inconsistencies between neurotransmitter predictions and experimental FISH data. Have the authors taken into consideration Lacin et al. 2019 (https://elifesciences.org/articles/43701)? Specifically in that paper, it is stated: "We did not find any cases of neurons using more than one neurotransmitter, but found that the acetylcholine specific gene ChAT is transcribed in many glutamatergic and GABAergic neurons, but these transcripts typically do not leave the nucleus and are not translated." I wonder if this might explain some of the inconsistencies between FISH (mRNA detection) and the neurotransmitter predictions (presumably based on indirect protein structures detected via EM imagery), or the presence of so much co-transmission.

      We agree and have added this possible explanation for apparent co-transmission in the text (L394-397).

      In Figure 8B, the authors state: "We found that individual DN and MN subclasses have direct downstream and upstream partners, respectively, that are relatively hemilineage-restricted (Figure 8B)." While the connectivity patterns highlighted are intriguing, further quantitative analysis could help strengthen this point. The connectivity matrices in Figure 8B are linked to activation phenotypes and hemilineages below. But I don't really know how to interpret "relatively hemilineage-restricted" in light of this plot. How does this connectivity pattern for example compare statistically to a randomly selected set of DNs (maintaining the same group size for example)? Would random DN sets be less hemilineage restricted? Similar quantification would be helpful to support this statement "...with high correspondence between the hemilineages connected to individual DN and MN subclasses that are expected to be functionally related."

      "both upper tectulum DNs (DNut) and wing MNs (MNwm) have significant connectivity with hemilineages 6A, 7B, 2A, 19B, 12A and 3B". What is significant connectivity? Looking at the plot in Figure 8B, why is DNut -> 16B not considered significant? Is there a threshold and if so, what is the justification?

      These plots aim to be descriptive rather than drawing hard quantitative thresholds between ‘significant’ and ‘non-significant’ connectivity. We have revised the text to remove the terms ‘restricted’ and ‘significant’ and to clarify our interpretation (L555-559).

      In Figure 9G-H, this is a very interesting finding, but how do we know that the difference is real? Why not do a statistical test to compare the brain and VNC? Or create a null model network with edge swaps, etc. to compare against.

      Statistical comparison between the brain and VNC may be problematic given differences in generating these connectomes, as well as missing connectivity (only half the brain is imaged) in the hemibrain connectome. Comparison to a null model is possible and for purposes of understanding motif frequency in general has already been done (see for example, Lin et al., 2024, Nature). However, a null or shuffled model is not required for comparing motif frequencies between brain or VNC neuropils as is the point of this particular graph. At present, we simply highlight a qualitative observation that will require future work to investigate.

      Referring to Figure 12 in the main text, "we observe that the power MN upstream network is largely shared among all power MNs and is highly bilateral." Quantifying the fraction of shared upstream neurons from power MNs would make this statement much stronger. Particularly if compared to other non-power MNs. Or potentially using some other network comparison metric.

      This is a good point. We have added cosine similarity to figure 6 for wing/haltere MNs to show the similarity between inputs across these MNs, and added text in section 2.3 (L461-465) and 2.5.3 discussing the cosine similarity (L987-988).

      In Figure 13B, "Nearly 50% of these restricted neurons (totalling about 1200 per leg neuropil) have been serially matched across the six neuropils (Figure 13B)". There seems like a disconnect here. In the IR, CR, and BR columns, I see ~2750, ~500, and ~1250 neurons not in a serial set (~4500 total); I see ~1500, ~750, and ~1000 in a serial set (~3250 total). This would mean that ~58% of neurons are not in serial sets, ~42% are in serial sets. Shouldn't the conclusion be the opposite then? That surprisingly most intrinsic neurons are not repeated across leg neuropils. I find this fascinating if true. Perhaps there is some confusion on my part, however.

      We now find that about half of the leg-restricted neurons are serially repeated across the 6 leg neuropil with similar morphology and connectivity, especially to the downstream leg motor neurons. Since first submission of this paper, we have identified some additional serial homologues while completing the systematic cell typing, described in the accompanying paper Marin et al. 2024. Figure 13B has now been updated to reflect this. In total, 3998 of 7684 restricted neurons (IR,CR,BR) have been assigned to a serial set or serial type. The sentence in the text has been adjusted to report that 52% of these restricted neurons are in serial sets (L1125).

      In Figure 13D-E, "the Tect INs are not a homogenous population." Providing additional evidence could strengthen this statement. A connectivity matrix is shown in (D), followed by examples of morphologies in (E). What makes a population homogenous or heterogenous? For example, compared to all possible INs, the Tect IN morphology actually looks quite similar. Are those connectivity matrices in (D) really so different? What would a random selection of neurons look like?

      Our sister paper, Marin et al. (2024), has looked into variation of connectivity across neurons of the entire VNC in much more detail, including clustering methods that include connectivity and other criteria for cell typing. Thus, we have now amended the text to direct the reader to that paper for more detail on variability of connectivity in the Tect INs, which were divided into 5 cell types in Marin et al. (2024) (L1027-1031). In addition, we have replaced our clustering by connectivity in Figure 13 with the cell type clusters from Marin et al. (2024).

      In reference to Figure 13 - Supplement 1, "This standard leg connectome was very similar across legs, but there were small deviations 1051 between T1, T2, and T3 legs, as shown in Figure 13-Supplement 1." - what makes a deviation considered small? T1 seems to generally have many more synapses, T2 many less, and T3 a mixture depending on the connection. Also, are there lost connections or new connections? A quantification of these issues would be helpful instead of simply depicting the wiring diagrams.

      The connections that differ are likely due to the reconstruction state of leg MNs. We have now stated this in the main text for clarification (L1143-1145). In the leg neuropils, T2 and T3 left hand side MNs have sparser dendritic arbors than the right hand side. Therefore the differences in Figure 13–Supplement 1, which are almost exclusively the connections between the leg restricted neurons onto leg MNs, seem stronger in T1. Future work, bolstered by additional datasets, will undoubtedly reveal further insight into the comparison of circuits for the different legs.

      In Figure 15 - Supplement 2, "We used effective connectivity to identify leg DNs with similar MN connectivity patterns (Figure 15-Supplement 2). Of previously identified DNs, we found that DNg13 showed a highly similar effective connectivity fingerprint."

      How was this similarity calculated? How do we know these particular DNs have similar effective connectivity? The connectivity matrix depicted is quite complex, with both layer and connectivity scores quantified at each location. A principled way of determining similarity would make this statement much stronger.

      The similarity was calculated simply as the Euclidean distance between the effective connectivity matrix for each DN onto the set of MNs. While this is a straightforward comparison mathematically, effective connectivity calculations (as first introduced in this context by Li et al., 2020 by our collaborators Larry Abbott and Ashok Litwin-Kumar) have not yet been subject to functional validation. We therefore agree with the reviewer that this should not be over interpreted at this point. Future functional work should explore hypotheses suggested here and more quantitatively compare the similarity of different DN-MN pathways.

      Minor notes:

      In Figure 4E, the circles, squares, and triangles in the figure legend are too small. This is also true to some extent in the plot itself.

      We have increased the size of the symbols in the legend and plot.

      In Figure 8E right, the figure legend and x/y axes are not clear to me. Unfortunately, I'm not sure what the plot is showing because of this.

      The right plot in figure 8E is the number of DN groups each MN group receives input from, at a threshold of 1% input. As this plot is redundant to the left plot, we have decided to remove it.

      In Figure 8I, it would be interesting to see which neurons are directly downstream of DNs. One can't see layers 2/3/4 with the fan-out expansion of neurons and the y-axis scale.

      We have revised the plot to better show cell composition of individual layers.

      In Figure 19E, it would be helpful to also have a standard y-axis.

      The panel has been revised accordingly.

      Reviewer #2 (Recommendations For The Authors):

      General:

      In the Title, you do not mention DNs or MNs but these are a major focus of this study. The title could be more descriptive of the work.

      Per the reviewer’s comments, we have revised the title to “Transforming descending input into motor output: An analysis of the Drosophila Male Adult Nerve Cord connectome”.

      A glossary would be helpful, where all the paper's abbreviations and their definitions are provided in one place. Perhaps a hierarchical structure would help (for at least part of the glossary), so that terms like NTct, WTct, and HTct could be nested underneath UTct, for example.

      We do include a glossary in the sister paper, Marin et al. (2024) and in this paper have included a short glossary in the first Figure. Please refer to these sources for abbreviation reference.

      Introduction:

      Define 'Premotor'.

      We have defined ‘premotor circuits’ to be ‘circuits that directly or indirectly control motor output’ in lines 45-46.

      It might be worthwhile to start with a broader introduction sentence than the current one that focuses just on the fly, in order to emphasize the impact of MANC as the first complete connectome of a motor circuit in any animal with limbs or wings.

      We have revised the introductory paragraph per the reviewer’s suggestions.

      "Muscles in the leg are not innervated uniformly; indeed, in the T1 legs the number of MNs per muscle varies by as much as an order of magnitude" needs to specify the axis of variability more clearly - the authors probably mean variability across muscles in the leg (not variability across individuals for example) but I think the current sentence is a bit ambiguous in that respect.

      We have reworded this sentence to clarify this point (L132-133).

      Line 182 end of paragraph: It would be useful to point out explicitly what makes the MANC project valuable in the context of a similar FANC project - for example, that the MANC connectome is more complete, is a male (so interesting for anyone interested in sexual dimorphism), and gives the field an n=2 for VNC connectome datasets.

      We agree, and have added a sentence describing the benefits of the MANC connectome on L209-212.

      Line 213: A brief phrase or sentence of context could be provided to help unaware readers understand that 42% of synaptic connectivity being captured is in the same sort of range as previous datasets like the hemibrain and likely leads to the vast majority of important cell-cell connections being identified (perhaps cite Buhmann et al 2021 Nature Methods which does an analysis of this), and therefore is a reason to think highly of this dataset's quality and its potential for impact on the field. The sentence at the end of this paragraph doesn't quite do it for me.

      We have added the comparison of MANC synapse completeness to that of the Hemibrain, and revised the ending sentence in L234-237.

      Line 271: Clarify what happened to the remaining 15% of DNs that weren't able to be assigned to a tract. They travelled outside the tracts, or data quality issues prevented assignment, or something else?

      Indeed, some DNs could not be assigned to a tract as they traveled outside of all axon tracts and did not bundle with other DNs. We have added this explanation to the text (L300-301).

      Figure 1:

      The pie chart "DN postsynaptic partners by neuron class" is a bit hard to interpret without having another pie chart next to it showing "Neurons in MANC by neuron class". I know these numbers are written on the schematic but it would be nice to be able to easily tell which cell classes are overrepresented or underrepresented in the set of postsynaptic partners of DNs. e.g. It's obvious that ANs are overrepresented and DNs are underrepresented in the set of postsynaptic partners of DNs, but it would be nice if readers didn't have to do any mental math to figure out if INs or MNs are under/overrepresented.

      We agree and have added a pie chart of the neuron class composition of the entire VNC to Figure 1.

      "35.9% of leg MNs are matched to FANC" Why is this number so low? Because FANC motor neurons were only identified in T1, so the remaining 2/3rds of leg MNs in MANC weren't matched? How successful was matching for the neurons where it was actually attempted?

      For this work, we only matched the T1 neurons across the two datasets. This was both a way of checking that we found everything in these segments and a way of being more sure of muscle target assignments as our collaborators in the FANC dataset had generated extensive light level data to match motor neurons with their target leg muscles. The T2 and T3 MNs were not fully proofread or identified in FANC, precluding further analysis, and leading to the 35.9% matched number. We hope to be able to compare between these datasets more thoroughly in future, and have matched all the premotor leg restricted intrinsic neurons of our standard connectome to FANC. We report on their stereotypy in our latest preprint, Stürner, Brooks et al. 2024.

      Figure 2:

      Figure 2A: Perhaps darken the color of the MTD-III skeletons. Currently, they're so light it's hard to see, and this is one of the most interesting tracts because the claim is that it's a new tract.

      We take the reviewer’s point, however, the color scheme used for the tracts in Figure 2 is coordinated between multiple figures and figure panels, and thus we would prefer to keep it as is. If readers would like to examine DNs of a particular tract, we encourage them to retrieve said DNs using the tract annotations in NeuPrint.

      Figure 2 supplement 1: It's not clear to me what I should be getting out of seeing the right side DNs as well. If you want readers to be able to visually compare the left and right side morphologies and appreciate the high degree of symmetry, you may want to put the left and right side DN panels side-by-side. Perhaps do that (show both the left and right side DNs) for one or two tracts in the main Fig2, and then leave out the remaining panels - or if you want to include the remaining panels, explain more clearly what readers are supposed to learn from seeing them.

      We agree and have now removed Figure 2 supplement 1.

      Figure 2C caption: Instead of "DN primary neurites" I think the authors probably mean "longest single branch of each DN" or something along those lines. I think "primary neurite" is usually used to refer to the thick non-synaptic branch coming out of a neuron's soma, which can't be how it's being used here.

      We agree and have changed all references to ‘primary neurite’ for DNs to ‘longest neurite’.

      Figure 2D+E: Perhaps add an overall % of neurons of each class to the legend. I ask because I would be very interested to know what % of all DNs exist as single pairs versus as populations, and I imagine that could be a number that is quoted a fair amount by others in the field when talking about DNs.

      We agree and have added the overall percentage of each neuron class to the results (L275-276) and Figure 2 legend.

      Figure 3:

      UTct.IntTct neurons are by far the largest class of DNxn neurons, so would it be worth calling these the DNxt class (DN projecting to some combination of tectulum neuropils), to mirror the DNxl class? I would vote for doing that.

      Thanks for the suggestion.  However, the subclass naming scheme for DNs had been coordinated between multiple groups of people working on MANC reconstruction and annotation. As making changes to subclasses will impact many analyses that have already been completed for existing work, we will refrain from doing so.

      Figure 3G feels a bit out of place in this figure and under-explained

      We have clarified in the text our citations to Figure 3G to better explain our interpretation of this data.

      Figure 4

      "DNp20 has few vesicles and may be electrically coupled": If I'm correct that DNp20 is also known as DNOVS1 and is the second largest diameter axon in the neck after the giant fiber, then yes, Suver et al. 2016 J Neurosci show that this DN is gap junction coupled to neck motor neurons (see their Fig 2F). This neuron (along with the giant fiber) is enough of an outlier that it might be more representative to show a different, more canonical DN that has a low prediction probability.

      The reviewer is right that DNp20 is also known as DNOVS1 with known gap junction coupling.  We now clarify in the text (L366) how we think that could lead to a lower neurotransmitter prediction score, which is what we were trying to illustrate.

      Figure 4E: It looks like only a single DN has more inputs (~11000) than outputs (~9000), is that right? It could be interesting to dedicate some panels and text to the connectivity profile of that one unique neuron.

      Yes, that is correct, there is just one pair of DNs, DNxn166, that receives more input than it gives output (the two triangles lie on top of each other). We think that the other DN pair in that same box (more variable in total synapse number and therefore the triangles are further apart) also receives an unusually high amount of input versus output. The morphology of these two types are shown in Figure 4F and they both have fine processes that look more like dendrites, especially when compared to other DNs such as the ones in 4G. Unfortunately, neither of these two types have been matched to light microscopy images so we cannot say if they have the same type of morphology in the brain, or further explore their brain connectivity, at this time point.

      Figure 4E: "black rectangle ... gray rectangle" don't look different shades to me. It's obvious which is which based on where they are in the graph but if you want to color code this, pick more separate colors. Or code it with something other than colors.

      We have made the rectangle in Figure 4E a lighter shade of grey and added labels to refer to the panels D, F and G. The figure legend now also describes more clearly that we are plotting every DN as a single shape and exactly how many DN types are included in those rectangles to avoid confusion.

      Figure 5:

      "subclass is their two-letter muscle anatomical category" should be explained better, I'm not sure what "muscle anatomical category" means.

      We have changed the wording in the Figure 5 legend to better clarify that MN subclasses are the broad muscle category that they innervate (e.g. legs, wings).

      Figure 7:

      Leg MN identification and serial homology.

      Why are there no tarsus reductor (tarm1 and tarm2) motor neurons? Do we not know their anatomy from light microscopy well enough, perhaps? Were these MNs identified in FANC? Is it reasonable to guess that the remaining small number of unidentified T1 leg motor neurons in MANC would control these muscles? I think Marta Moita's lab has some ongoing projects on these muscles (see Twitter), so if more LM data is needed perhaps it will come from them.

      We now know that the small number of unidentified T1 leg motor neurons (a T1 pair with a serial T2 pair, serial set 17664) are not in fact MNs. A new and unpublished dataset (Janelia whole male CNS volume, the optic lobe from which has been published as Nern et al., 2025) shows they have axons within the VNC. The MN annotation for these neurons has been removed and they now have the type name INXXX471. Thus, we have no T1 leg MNs without a muscle target annotated. Our muscle target annotation comes from matching to the FANC dataset that has also not annotated tarsus reductor MNs. We suspect that the tarsus reductor MNs are hard to distinguish from the tarsus depressor MNs of which there are 5 per side and segment.

      It seems there are a few more leg motor neurons in MANC vs FANC. Any indication of which muscles they control?

      See above.

      -Figure 7E: A qualitative comparison between the cosine similarity results here and from FANC could be useful. What generally is the same versus different? Any indication of male/female differences?

      We observe no differences in the cosine similarity of T1 leg MNs between MANC and FANC and only very minor differences between T1, T2 and T3, as shown in Figure 7. In our most recent work, now on bioRxiv (Stürner, Brooks et al., 2024), we were able to find all intrinsic leg serial sets that we included in our standard leg premotor circuit here in the FANC dataset. We do not see any differences between them in terms of morphology, and while we have several cases in which we are still missing 1 of the 6 neurons in a serial set in FANC, we see similar connectivity when comparing small circuits. We have also found almost all neurons interconnecting the legs, with some very interesting exceptions, mainly coming from the abdomen, that we believe are male specific. These male-specific neurons can also be found in this preprint (Stürner, Brooks et al., 2024).

      Figure 8

      Figure 8A: Why are ~1/3rd of the wing and leg motor neurons considered populations instead of pairs? I thought essentially all wing and leg motor neurons have unique morphologies.

      Pair vs populations are assigned based on MN morphology and connectivity. For the wing MNs, many sets of DVMns and DLMns have near-identical morphology and connectivity, are not easily distinguishable in the VNC and are categorized as a ‘population’. For the leg MNs, there are ‘true’ population MN types that provide multiple innervation of the same muscle.

      The text states "up to a maximum of 20% [traversal probability] (corresponding to a synapse input fraction of 1)" but I interpret the bottom of Figure 8G to have flipped values, where a synapse input fraction of 0.2 yields a traversal probability of 1. Is there a mistake here or have I misunderstood?

      Thank you for pointing this discrepancy out. The text description was indeed flipped, and we have corrected this error.

      Caption for J says "Layers without neurons are omitted". How is it possible to have a layer without neurons?? Something about how the traversal is done doesn't seem to be explained clearly enough. If it's really possible to have a layer without neurons, I think the approach might need to be revisited as this seems quite strange.

      Here, ‘layer’ should be viewed as a nonlinear measure of indirect connectivity combining path length and synaptic weights. Layers without neurons are possible due to the details of the calculation–layer position is assigned probabilistically by the downstream synapse connectivity of the source neurons, and the probability is scaled up to 1 at an input synapse fraction of 0.2. Neuron-to-neuron connectivity of an input synapse fraction of >=0.2 is very rare in the VNC connectome and thus neurons strictly assigned to layer 2 downstream of each DN type are similarly rare. We have updated the figure legend for figure 8 to better explain this.

      Section 2.6

      "flies have been shown to walk normally without proprioceptive feedback, suggesting that inter- and intra-leg coordination is not strictly dependent on sensory feedback loops from the legs" is quite a drastic overinterpretation of that paper's results. The ablation there was not complete (some subtypes of sensory neurons were not perturbed), and the perturbed flies certainly walked with some defects. This statement certainly should be removed or significantly softened.

      Thank you for pointing this detail out. The term ‘normally’ has been removed from this sentence to soften the statement.

      Figure 13, Standard leg connectome

      Unfortunately, the motor neurons controlling the tarsus could not be included here, I suppose due to the difficulty in identifying the T2 and T3 homologs for these motor neurons. This should be mentioned in the text. This version of the standard leg connectome is without a doubt still an incredibly valuable discovery, but readers should be made aware that this version of the standard leg connectome does in fact lack the motor neurons for one joint.

      The MNs controlling the tarsus could not be matched with high confidence. We have added a sentence pointing this out when the leg circuit is introduced (L1141-1142).

      The focus here is on locomotion is the absence of other behaviors whereas the legs are responsible for grooming, reaching, boxing, etc. How should we consider the leg connectome in light of this?

      This is a very good point, and we have indeed found known grooming neurons that target our leg premotor circuit (L1158-1161). We’ve now added this observation to the Discussion (L1949-1951).

      Minor points

      L84 - re: Descending neurons work together - cite Braun et al., bioRxiv 2023; cite Yang HH bioRxiv 2023 .

      We agree that these papers are relevant to the function of DNs in combination, and have added them to the introduction (L83-84, 86-87).

      L193 - "intrepid" is overly florid language; similar for L1507 "enigmatic".

      We have replaced these words with suitable synonyms.

      L273 - The acronym "ITD" is not explained. Please check all other acronyms. Related, it would be good to include a Table or Box with all acronyms for the reader.

      We have added the full name of the ITD to the text. A glossary is available in Figure 1, and a full glossary of MANC terms is available in Table 1 of our sister paper, Marin et al. 2024.

      -L514, you state that hemilineages 6A and 6B unexpectedly produce uncoordinated leg movements (flight-related was expected). However, Harris didn't study animals in tethered flight but headless on the ground.

      The experimental setup of Harris et al. was capable of assessing flight-like motor output even if not true flight, as seen in the predominantly wing movement phenotypes of activating hemilineages 7B, 11A/B and 2A. We now also note that hemilineage annotation in Marin et al., 2024, shows that the 6B hemilineage has some projections into the leg neuropils, in support of a leg motor role in addition to an upper tectular role (L570-571).

      L1425 - "the TTM" is repeated twice.

      This sentence addresses both the TTM and its MN (TTMn). We have revised this sentence to improve clarity by expanding the full name of TTM in that paragraph and leaving TTMn abbreviated

      L1728 - Ascending neuron projections to the brain - cite Chen et al., Nat Neuro 2023.

      We agree that Chen et al. 2023 is relevant to the discussion of AN function, and have added this citation (L1836-1838).

      L1817, It is a good idea to compare with previous predictions for circuit control. But these originate from non-Drosophila work as well. Please cite and consider the original models from Buschges, Cruse, Holmes, and others.

      Thanks for the suggestion. We now cite the non-Drosophila literature as well. (L1971)

      L1827, how precisely should these "theories" be updated? Be explicit.

      We summarize in the sentences before what is different in comparison to one of the suggested models. We have now additionally added examples to the sentence (L1942-1945) to suggest that theoretical leg circuits need to account for the posterior-to-anterior as well as anterior-to-posterior connections between leg neuropils, as well as relative lack of connectivity between the left and right mesothoracic leg neuropils.

      L1831, include a discussion about another alternative which is through mechanical coupling and sensory feedback.

      We agree that leg sensory input likely contributes to leg locomotor circuits. We have added the following sentence to point out that annotations of sensory neurons in MANC are available through work in a companion paper (Marin et al. 2024), and future work is necessary to examine the contribution of sensory input to leg motor circuits (L1954-1956).

      Methods

      https://flyconnectome.github.io/malevnc/ link doesn't work.

      We have updated the link.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents valuable findings on the role of RIPK1 in maintaining liver homeostasis under metabolic stress. Strengths include the intriguing findings that RIPK1 deficiency sensitizes the liver to acute liver injury and apoptosis, but because the conclusions require additional experimental support, the evidence is incomplete.

      We are truly grateful, and wish to express our sincere acknowledgement to the reviewer and the editor for the time and effort spent in reviewing our manuscript. We highly appreciate the thorough and constructive comments, which can greatly improve our manuscript. We have conducted new experiments to address the reviewer’s concerns. We also carefully checked and changed our manuscript according to the constructive suggestions by the reviewer. Hopefully we have adequately addressed all the concerns. In the revised manuscript version, changes are highlighted in yellow. Please find the detailed point-to-point responses below. 

      Public Reviews:

      Reviewer #1 (Public Review):

      This study presents an investigation into the physiological functions of RIPK1 within the context of liver physiology, particularly during short-term fasting. Through the use of hepatocyte-specific Ripk1-deficient mice (Ripk1Δhep), the authors embarked on an examination of the consequences of Ripk1 deficiency in hepatocytes under fasting conditions. They discovered that the absence of RIPK1 sensitized the liver to acute injury and hepatocyte apoptosis during fasting, a finding of significant interest given the crucial role of the liver in metabolic adaptation. Employing a combination of transcriptomic profiling and single-cell RNA sequencing techniques, the authors uncovered intricate molecular mechanisms underlying the exacerbated proinflammatory response observed in Ripk1Δhep mice during fasting. While the investigation offers valuable insights into the consequences of Ripk1 deficiency in hepatocytes during fasting conditions, there appears to be a primarily descriptive nature to the study with a lack of clear connection between the experiments. Thus, a stronger focus is warranted, particularly on understanding the dialogue between hepatocytes and macrophages. Moreover, the data would benefit from reinforcement through additional experiments such as Western blotting, flow cytometry, and rescue experiments, which would offer a more quantitative aspect to the findings. By incorporating these enhancements, the study could achieve a more comprehensive understanding of the underlying mechanisms and ultimately strengthen the overall impact of the research.

      We thank the reviewer for the encouraging comments and helpful suggestions. We agree with the reviewer that additional experiments could reinforce our findings. Therefore, we conducted additional experiments including flow cytometry, western blotting, and using kinase-dead mutant mice to further investigate the underlying mechanisms. We carefully addressed every comment by the reviewer as indicated below.

      Detailed major concerns:

      (1) Related to Figure 1.

      It is imperative to ensure consistency in the number of animals analyzed across the different graphs. The current resolution of the images appears to be low, resulting in unsharp visuals that hinder the interpretation of data beyond the presence of "white dots". To address this issue, it is recommended to enhance the resolution of the images and consider incorporating zoom-in features to facilitate a clearer visualization of the observed differences. Moreover, it would be beneficial to include a complete WB analysis for the cell death pathways analyzed. These adjustments will significantly improve the clarity and interpretability of Figure 1.

      Thanks very much for the constructive advice. We carefully checked the number of animals and make sure that the animal number were consistent within different figures. We further updated the figures with incorporating zoom-in features in updated Figure 1, and the resolution of the figures were greatly improved. Western blot analysis were also included in updated Supplementary Figure 1.

      (2) Related to Figure 2.

      It is essential to ensure consistency in the number of animals analyzed across the different graphs, as indicated by n=6 in the figure legend (similar to Figure 1). Additionally, it is crucial to distinguish between male and female subjects in the dot plots to assess any potential gender-based differences, which should be consistent throughout the paper. To achieve this, the dots plot should be harmonized to clearly differentiate between males and females and investigate if there are any disparities between the genders. Moreover, it is imperative to correlate hepatic inflammation with the activation of Kupffer cells, infiltrating monocytes, and/or hepatic stellate cells (HSCs). Therefore, conducting flow cytometry would be instrumental in achieving this correlation. Additionally, the staining for Ki67 appears to be non-specific, showing a granular pattern reminiscent of bile crystals rather than the expected nuclear staining of hepatocytes or immune cells. It is crucial to ensure specific staining for Ki67, and conducting in vitro experiments on primary hepatocytes could further elucidate the proliferation process. These experiments are relatively straightforward to implement and would provide valuable insights into the mechanisms underlying hepatic inflammation and proliferation.

      Thanks very much for the helpful advice. First, we corrected the number of animals analyzed in different graphs and make sure that the number of animals listed in the figure legend were consistent with the graphs in all figures. Second, to distinguish the results between male and female mice, blue represents male mice, pink represents female mice, and green represents RIPK1 kinase inactivated mice. The majority of results were obtained from male mice, and our results indicated that there was no difference between male and female mice herein.

      The percentages of immune cell subpopulations isolated from mouse liver tissue were determined. The results were consistent with single cell analysis that greater number of  macrophages were recruited into the liver tissue in Ripk1<sup>Δhep</sup> upon 12-hour fasting (updated Figure 4F&G).

      To confirm the results of Ki67, we first detected the transcriptional expression of Ki67 using real-time qPCR, and the results were consistent with the protein expression measured by immunohistochemical analysis. The percentage of Ki67<sup>+</sup> cells in liver cells were also detected, and there was significantly more Ki67<sup>+</sup> cells in Ripk1<sup>Δhep</sup> mouse liver than WT control mouse upon 12-hour fasting. Taken together, our transcriptional analysis, immunohistochemical analysis as well as flow cytometry data indicated that Ki67 expression was higher in Ripk1<sup>Δhep</sup> mice than Ripk1<sup>fl/fl</sup> mice. (updated Figure 2). 

      (3) Related to Figure 3 & related to Figure 4.

      The immunofluorescence data presented are not entirely convincing and are insufficient to conclusively demonstrate the recruitment of monocytes. Previous suggestions for flow cytometry studies remain pertinent and are indeed necessary to bolster the robustness of the data and conclusions. Conducting flow cytometry analyses would provide more accurate and quantitative assessments of monocyte recruitment, ensuring the reliability of the findings and strengthening the overall conclusions of the study. Regarding the single-cell RNA sequencing analysis presented in the manuscript, it's worth questioning its relevance and depth of information provided. While it successfully identifies a quantitative difference in the cellular composition of the liver between control and knockout mice, it may fall short in elucidating the intricate interactions between different cell populations, which are crucial for understanding the underlying mechanisms of hepatic inflammation. Therefore, I propose considering alternative bioinformatic analyses, such as CellPhone-CellChat, which could potentially provide a more comprehensive understanding of the cellular dynamics and interactions within the liver microenvironment. By examining the dialogue between different cell clusters, these analyses could offer deeper insights into the functional consequences of Ripk1 deficiency in hepatocytes and its impact on hepatic inflammation during fasting.

      Thanks very much for the constructive suggestion. We agree with the reviewer that conducting flow cytometry analyses would provide accurate and quantitative assessments of monocyte recruitment, ensuring the reliability of the findings. Following the advice, both WT and Ripk1<sup>Δhep</sup> mice were fasted for 12 hour and then single hepatic cells were isolated and analyzed by flow cytometry. As indicated in updated Figure 4F&G, the percentage of F4/80<sup>+</sup>CD11b<sup>+</sup> cells were significantly higher in Ripk1<sup>Δhep</sup> compared with WT control mice, confirming that more monocytes were recruited into the liver.

      Additionally, we performed CellChat analysis on the single-cell transcriptomic data. As shown in updated Figures 4H-J, both the number of ligand-receptor pairs and the interaction strength among the eight cell types were significantly increased in Ripk1<sup>Δhep</sup> mice, particularly the interactions between macrophages and other cell types. Network analysis indicated that inflammation and proliferation signals were amplified in Ripk1<sup>Δhep</sup> mice. Consistent with the bulk RNA sequencing data, SAA signaling was upregulated in the hepatocytes of Ripk1<sup>Δhep</sup> mice (updated Figure 4K). SAA has been found to play a role in regulating immune responses and tumor development. Based on these findings, we speculate that fasting-induced liver injury in RIPK1 knockout mice may exacerbate the inflammatory response in liver tissue through enhanced SAA signaling. The above data analysis and interpretation were included in the updated Figure 4&S4 and line 421 - 443.

      (4) Related to Figure 5.

      What additional insights do the data from Figure 5 provide compared to the study published in Nat Comms, which demonstrated that RIPK1 regulates starvation resistance by modulating aspartate catabolism (PMID: 34686667)?

      Thank you very much for your constructive suggestion. As noted by the reviewer, this study (PMID: 34686667) primarily focuses on metabolomic analyses of Ripk1<sup>-/-</sup> neonatal mouse brain tissue and Ripk1<sup>-/-</sup> MEF cells. The authors propose that Ripk1 regulates starvation resistance by modulating aspartate catabolism.

      In our study, the global metabolic changes induced by fasting were monitored. Fastinginduced lipolysis in peripheral adipose tissue leads to hepatic lipid accumulation, and excessive deposition of free fatty acids has been shown to induce endoplasmic reticulum (ER) stress in the liver. Data from Figure 5 demonstrate that administering the ER stress inhibitor 4-PBA effectively mitigated fasting-induced liver injury and inflammatory responses in Ripk1<sup>Δhep</sup> mice. Our findings suggest that ER stress plays a critical role in fasting-induced liver injury and inflammation in Ripk1<sup>Δhep</sup> mice.

      (5) Related to Figure 6.

      The data presented in Figure 7 are complementary and do not introduce new mechanistic insights.

      Thank you very much for your insightful suggestion. As you mentioned, the AAV-TBG-Cre-mediated liver-specific RIPK1 knockout mice offer complementary validation of the results obtained from Ripk1<sup>Δhep</sup> mice. Moreover, TBG is a promoter that is exclusively expressed in mature hepatocytes, while the ALB promoter is active not only in mature hepatocytes but also in precursor cells and cholangiocytes. Therefore, we think that the inclusion of AAV-TBG-Cre further strengthens our finding that RIPK1 in hepatocytes is responsible for fasting-induced liver injury and inflammatory responses.

      (6) Related to Figure 7.

      The data from Figure 7 suggest that RIPK1 in hepatocytes is responsible for the observed damage. However, it has been previously demonstrated that inhibition of RIPK1 activity in macrophages protects against the development of MASLD (PMID: 33208891). One possible explanation for these findings could be that the overreaction of macrophages to fasting, coupled with the absence of RIPK1 in hepatocytes (an indirect effect), contributes to the observed damage. Considering this, complementing hepatocytes with a kinase-dead version of RIPK1 could be a valuable approach to further refine the molecular aspect of the study. This would allow for a more precise investigation into the specific role of RIPK1's scaffolding or kinase function in response to starvation in hepatocytes. Such experiments could provide additional insights into the mechanisms underlying the observed effects and help delineate the contributions of RIPK1 in different cell types to metabolic stress responses.

      Thank you very much for the constructive suggestion. We fully agree with the reviewer that employing a RIPK1 kinase-inactive mutant mice could precisely investigate the specific roles of RIPK1's scaffolding and kinase functions in hepatocyte responses to starvation, respectively. In accordance with this advice, we established a 12-hour fasting model using Ripk1<sup>WT/WT</sup> and Ripk1<sup>K45A/K45A</sup> mice, which were previously established and confirmed with the inactivity of RIPK1 kinase activity. As demonstrated in updated Supplementary Figure 2, these mice did not show significant liver damage or inflammatory responses after 12 hours of fasting. These findings suggest that the liver damage and inflammatory response induced by fasting in Ripk1<sup>Δhep</sup> mice may not be contributed by the kinase activity of RIPK1.  

      Reviewer #2 (Public Review):

      Summary:

      Zhang et al. analyzed the functional role of hepatocyte RIPK1 during metabolic stress, particularly its scaffold function rather than kinase function. They show that Ripk1 knockout sensitizes the liver to cell death and inflammation in response to short-term fasting, a condition that would not induce obvious abnormality in wild-type mice.

      Strengths:

      The findings are based on a knockout mouse model and supported by bulk RNA-seq and scRNA-seq. The work consolidates the complex role of RIPK1 in metabolic stress.

      Weaknesses:

      However, the findings are not novel enough because the pro-survival role of RIPK1 scaffold is well-established and several similar pieces of research already exist. Moreover, the mechanism is not very clear and needs additional experiments.

      We thank the reviewer for the encouraging comments and helpful suggestions. Here we conducted additional experiments including flow cytometry, western blotting, and using kinase-dead mutant mice to further investigate the underlying mechanisms. We carefully addressed every comment by the reviewer as indicated below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (7) I recommend that the authors consider reassessing their results, particularly with regards to elucidating the dialogue between macrophages and hepatocytes, as this could further strengthen the study's conclusions.

      Thank you very much for your constructive suggestion. We conducted additional experiments, including flow cytometry and western blotting, to reassess our findings. Furthermore, to clarify the interactions between cells, we employed CellChat for a more in-depth analysis of the single-cell sequencing results. In the revised manuscript version, changes are highlighted in yellow. In this study, we demonstrated that the specific deletion of RIPK1 in hepatocytes exacerbated the liver's vulnerability to metabolic disturbances, such as short-term fasting and high-fat diet feeding, resulting in increased liver damage, apoptosis, inflammation, and compensatory proliferation. The data indicate that fasting-induced liver injury in RIPK1 knockout mice of hepatic parenchymal cells may exacerbate the inflammatory response in liver tissue through enhanced SAA signaling. In summary, we revealed a novel physiological role of RIPK1 as a scaffold in maintaining liver homeostasis during fasting and other nutritional disturbances.

      (8) It would be beneficial for the authors to address the minor weaknesses identified in the study, such as ensuring consistency in the number of animals analyzed across different graphs and enhancing the resolution of images to improve data clarity.

      Thank you for the suggestion. In the revised manuscript, we have addressed these minor weaknesses, and we checked the consistency in the number of animals in different graphs, as well as enhanced the resolution of all images.

      (9) I encourage the authors to incorporate additional experiments, such as Western blotting and flow cytometry, to provide a more quantitative assessment of the observed effects and enhance the robustness of their conclusions.

      Thank you for your insightful suggestion. We completely agree with the reviewer that incorporating flow cytometry and western blotting would strengthen the robustness of our conclusions. We conducted flow cytometry analysis and western blotting and the results were listed in updated Supplementary Figure 1, Figure 2, Figure 4 and Supplementary Figure 4.

      (10) Furthermore, the authors may consider conducting complementary experiments, such as rescue experiments involving complementing hepatocytes with a kinase-dead version of RIPK1, to further refine the molecular aspect of the study and elucidate the specific roles of RIPK1's scaffolding or kinase function in response to starvation.

      Thank you very much for your constructive suggestion. As shown in updated Supplementary Figure 2, we conducted fasting experiments using RIPK1 kinase-dead mice. These findings suggest that the liver damage and inflammatory response induced by fasting in Ripk1<sup>Δhep</sup> mice may not contributed by the kinase activity of RIPK1.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      (11) What is the upsteam signal for RIPK1? The study investigated the change induced by short-term fasting which is metabolic stress. Although RIPK1 knockout promotes cell death and inflammation, how it is involved in this condition is unclear. RIPK1 is never reported as a metabolic sensor and its function is typically downstream of TNFR1 as well as other death receptors such as Fas, TRAIL-R1, TRAIL-R2. Thus, it's probable that metabolic stress induces the expression and secretion of some ligand of the above receptors. Although TNFα expression is upregulated on both mRNA and protein levels, it could not be concluded that TNFα is the upsteam signal for RIPK1 because expression difference does not always lead to fuctional role. In addition, a recent study, which is also reference 33, reports that knockout of TNFR1/2 does not protect against 18 h liver ischemia, a condition that is similar to the present study. Therefore, the link between the metabolic fluctuation and RIPK1 function is elusive and should be addressed. The expression difference analysis should be extended to other relevant ligands. A functional study using neutralizing antibodies in RIPK1ΔHep mice is encouraged. At least, this should be discussed in the discussion section.

      Thank you very much for your insightful comments. The upstream signals of RIPK1 remains a significant area of scientific inquiry. Fasting, as one of the main causes of metabolic stress, is known to trigger a series of physiological changes, including but not limited to decreased blood glucose levels, hepatic glycogen depletion, increased production of hepatic glucose and ketone bodies, adipose tissue lipolysis, and the influx and accumulation of free fatty lipids in the liver. It is well-established that the elevated lipid influx and hepatic accumulation during fasting may cause lipotoxicity stress for liver. To investigate whether the elevated free fatty acids influx might act as the signal to induce cytotoxicity, we isolated primary hepatocytes but observed that a significant number of cells underwent spontaneous death during the isolation and perfusion processes. To address this question, we utilized CRISPR-Cas9 technology to generate Ripk1<sup>-/-</sup> AML12 cells, as illustrated in Author response image 1A.

      To mimic hepatic lipid accumulation induced by short-term fasting, we treated the cells with palmitic acid (PA) or oleic acid (OA) for 12 hours in vitro. Our results indicated a significant increase in cell death among Ripk1<sup>-/-</sup> AML12 cells after PA treatment compared to WT control cells (Author response image 1B). As shown in Author response image 1C, we also observed a marked increase in caspase-3 activity in Ripk1<sup>-/-</sup> AML12 cells following PA treatment.

      Collectively, our results highlight the crucial role of RIPK1 in hepatocytes in maintaining the liver's adaptive capacity to counteract lipotoxicity induced by metabolic stress. These in vitro results were not included in the manuscript; however, we addressed them in the discussion section (line 593 - 597). If the reviewer suggest, we would like to incorporate in our manuscript.

      Author response image 1.

      (12) What is the exact relationship between ER stress and RIPK1? In Figure 5A and Figure 6B, Ripk1 knockout only slightly promotes the expression of ER stress markers. The evidence of RIPK1 leading to ER stress is limited in the literature and poorly supported in this study. Also in reference 33, the hypothesis is proposed that ER stress leads to death receptor upregulation and activation, which induces RIPK1 activation. Although the ER stress inhibitor showed good efficacy in rescue experiments, it could not determine whether RIPK1 deficiency leads to ER stress-associated phenotype or ER stress leads to death receptor activation and RIPK1 deficiency-associated phenotype. If RIPK1 deficiency leads to ER stress, the possible mechanism should be investigated.

      Thank you very much for your insightful comments. As the reviewer noted, the specific relationship between endoplasmic reticulum (ER) stress and RIPK1 remains unclear. However, our data, along with findings from other studies (Piccolis M et al., Mol Cell. 2019; Geng Y et al., Hepatol Int. 2021), suggest that fasting-induced lipolysis in peripheral adipose tissue leads to hepatic lipid accumulation. Additionally, excessive deposition of free fatty acids has been shown to induce ER stress in the liver. One possible explanation is that ER stress may trigger the upregulation and activation of death receptors, and the scaffold function of RIPK1 may play a protective and checkpoint role in this process. ER stress during the fasting might locate upstream of RIPK1. This could help explain why short-term fasting results in liver damage in Ripk1<sup>Δhep</sup> mice while control mice remain unaffected. Moreover, the inhibition of ER stress using 4-PBA can effectively alleviate this damage.

      Minor:  

      (13) The study starts directly from functional experiments. However, it should be firstly explored whether RIPK1 expression or activation is modulated in wild-type mice.

      Thank you very much for your insightful observation. Previous studies showed that RIPK1 deficiency in hepatocytes does not impact the growth and development of mice, indicating that RIPK1 is dispensable for proper liver development and homeostasis (Filliol A et al., Cell Death Dis. 2016). Furthermore, we did not observe any changes in RIPK1 levels in wild-type mice induced by fasting across different experimental batches. In our bulk transcriptomic analysis, the expression of RIPK1 was not changed before and after 12-hour fasting in Ripk1<sup>fl/fl</sup> mice. Therefore, we focused our attention on the function of RIPK1 and started our study directly with functional experiments.

      (14) Knockout of RIPK1 deprived both its scaffold function and kinase function. It is encouraged to explore whether blocking RIPK1 kinase activity influences the outcome of metabolic stress.

      Thank you for your insightful suggestion. To investigate the role of RIPK1 kinase activity in response to metabolic stress, we added fasting experiments using RIPK1 kinaseinactive mice in the updated Supplementary Figure 2, in which blocking RIPK1 kinase activity does not affect the outcome of metabolic stress.

      (15) In Figure 1, the number of TUNEL+ cells is about 2 times of c-casp3. What is the possible reason?

      Thank you for your careful reading. Indeed, the number of TUNEL<sup>+</sup> cells in Figure 1 is twice that of cleaved-caspase-3<sup>+</sup> cells. There are two possible reasons. First, we speculate that this discrepancy may be attributed to the higher sensitivity of the TUNEL assay compared to the cleaved-caspase-3 assay. Secondly, TUNEL assay detects DNA fragmentation, indicating that these cells are in a pre-apoptotic state or poised to undergo apoptosis. In contrast, cleaved-caspase-3 specifically identifies cells that have already committed to the apoptotic pathway, whereas TUNEL assay could detects all types of apoptosis, but the mechanisms of apoptosis may involve more than just cleaved-caspase3.

      (16) Infiltrated innate immune cells could lead to hepatocyte death. Is the hepatocyte death in this study partially caused by immune cells?

      Many thanks for the advice. As outlined in the response to the 11th comment from the second reviewer, our findings indicate that metabolic stress induced by short-term fasting is the primary cause of hepatocyte death. Additionally, we demonstrate that infiltrated innate immune cells may also play a partial role in hepatocyte death through subsequent cascade reactions.

      (17) Could the in vivo results be consolidated by in vitro experiments on primary mouse hepatocytes? This would be helpful to answer question 4.

      Thank you for your helpful comments. As demonstrated in the response to the 11th comment by the second reviewer, we attempted to conduct in vitro experiments using primary hepatocytes. However, during the isolation and perfusion processes, we observed that a significant number of cells underwent spontaneous death. To address this issue, we utilized CRISPR-Cas9 technology to generate Ripk1<sup>-/-</sup> AML12 cells, in which a significant increase in cell death among Ripk1<sup>-/-</sup> AML12 cells after palmitic acid (PA) treatment compared to WT control cells. We also observed a marked increase in caspase-3 activity in Ripk1<sup>-/-</sup> AML12 cells following PA treatment.

      (18) RIPK1 scaffold function is associated with NF-kB signal. Is NF-kB signal transduction influenced by Ripk1 deficiency? If so, to what extent does it contribute to the observed phynotype? If not, what is the direct downstream effect of Ripk1 deficiency?

      Thank you very much for your insightful perspective. As reported by Clucas J et al., RIPK1 serves as a scaffold for downstream NF-κB signaling through the ubiquitin chains generated by its ubiquitination (Clucas J et al., Nat Rev Mol Cell Biol. 2023). The deficiency of RIPK1 in hepatic parenchymal cells can disrupt NF-κB signaling and impair its pro-survival functions, resulting in increased cell death in response to stress. Our current findings suggest that the RIPK1-NF-κB axis serves as a crucial scaffold platform essential for the liver's adaptation to metabolic fluctuations. Any inappropriate inactivation or deletion of components within this scaffold disrupts the delicate balance between cell death, inflammation, and normal function, making the liver susceptible to metabolic changes, ultimately leading to liver damage, hepatic inflammation, and compensatory proliferation.

      (19) In Figure 6B, the 'RIP' should be changed to 'RIPK1'.

      Thank you for your careful observation. We have corrected "RIP" to "RIPK1" in updated Figure 6B.

      (20) For Western blot results, the blot height should be at least the lane width to reveal additional signals and the molecular weight as well as unspecific signals should be denoted.

      Thank you for your valuable advice. We appreciate your suggestions regarding the western blot results. We went through the previous western blot results and did not find any additional nonspecific signals. We added the molecular weights in the updated figures Figure 5, Figure 6 and Supplementary Figure 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary:

      In this manuscript, Fister et. al. investigate how amputational and burn wounds affect sensory axonal damage and regeneration in a zebrafish model system. The authors discovered that burn injury results in increased peripheral axon damage and impaired regeneration. Convincing experiments show altered axonal morphology and increased Ca2+ fluxes as a result of burn damage. Further experimental proof supports that early removal of the burnt tissue by amputation rescues axonal damage. Burn damage was also shown to markedly increase keratinocyte migration and increase localized ROS production as measured by the dye Pfbsf. These responses could be inhibited by Arp 2/3 inhibition and isotonic treatment. 

      Strengths: 

      The authors use state-of-the-art methods to study and compare transection and burn-induced tissue damage. Multiple experimental approaches (morphology, Ca2+ fluxing, cell membrane labeling) confirm axonal damage and impaired regeneration time. Furthermore, the results are also accompanied by functional response tests of touch sensitivity. This is the first study to extend the role of tissue-damage-related osmotic exposure beyond wound closure and leukocyte migration to a novel layer of pathology: axonal damage and regeneration. 

      Weaknesses: 

      The conclusions of the paper claiming a link between burn-induced epithelial cell migration, spatial redox signaling, and sensory axon regeneration are mainly based on correlative observations. Arp 2/3 inhibition impairs cell migration but has no significant effect on axon regeneration and restoration of touch sensitivity. 

      We agree with the reviewer. We have tried many experiments to address this question. The data show that Arp 2/3 inhibition with CK666 is an effective way to inhibit initial keratinocyte migration. However, later migration still proceeds. What is interesting is that just inhibition of the early migration is sufficient to restore localized ROS production in the wound area in the first  hour post-burn, even if this is not sufficient to prevent ROS accumulation over time. There is also a trend toward improved sensory neuron function late after this early treatment. However, this is not statistically significant. We think it is likely that both migration and tissue scale ROS influence the regeneration defect of sensory neurons after burn. The data using isotonic solution supports this conclusion. We have tried many other ways to limit keratinocyte migration including depletion of talin and expression of a dominant negative Rac in basal epithelial cells, but these treatments were not compatible with survival of the fish after burn.

      Pharmacological or genetic approaches should be used to prove the role of ROS production by directly targeting the known H2O2 source in the system: DUOX. 

      We agree that pharmacologic or genetic approaches to directly manipulate ROS production would provide substantial support to the hypothesis that ROS, along with keratinocyte migration, is a main factor contributing to poor burn outcomes. To address this, we first tried using a morpholino to deplete DUOX. However, the combination of DUOX morpholino and burn injury was lethal to larvae. We also used pharmacologic inhibition of ROS production using DPI (Diphenyleneiodonium). With this treatment, ROS is inhibited for only the first hour post-burn as treatment is lethal for longer periods of time. Burned larvae have marginally improved axon density and touch sensitivity, suggesting the importance of ROS in burn outcomes, however it was not statistically significant. It is likely that an increased effect would be observed with longer treatment, but treatment for more than 1 hour was toxic. We have added a supplemental figure with this new DPI data.

      While the authors provide clear and compelling proof that osmotic responses lie at the heart of the burn-induced axonal damage responses, they did not consider the option of further exploring any biology related to osmotic cell swelling. Could osmotic ATP release maybe play a role through excitotoxicity? Could cPLA2 activation-dependent eicosanoid production relate to the process? Pharmacological tests using purinergic receptor inhibition or blockage of eicosanoid production could answer these questions. 

      We agree that the role of osmotic cell swelling in the burn response is an interesting avenue for future study. However, we make use of isotonic treatment in this study specifically for its effect on keratinocyte migration and broad-scale wound healing. As a result, we feel that pursuing the biology of this swelling phenomenon is outside the scope of this paper.

      The authors provide elegant experiments showing that early removal of the burnt tissue can rescue damage-induced axonal damage, which could also be interpreted in an osmotic manner: tail fin transections could close faster than burn wounds, allowing for lower hypotonic exposure time. Axonal damage and slow regeneration in tail fin burn wounds could be a direct consequence of extended exposure time to hypotonic water. 

      We have done experiments using FM dye to test how long it takes burn and transection wounds to close (shown below). In these experiments, dye entry into wounded tissue is used as a readout of wound closure. Dye is only able to enter wounded tissue when the epithelial barrier is disrupted. Our data reveal that transections take approximately 10 minutes to fully close, while burns take approximately 20 minutes to close.

      Author response image 1.

      To test if this difference in wound closure time would have an effect on axon outcomes, we repeated, but slightly modified, the dual-wound experiment. We increased the amount of time the burn condition was exposed to hypotonic conditions by 10 additional minutes (by transecting burned tissue at 15 minutes post burn, shortly before closure) and compared axon outcomes to the 5 mpw control transection. These results show there was no difference in axon regeneration or function when secondary transection was performed at 5 or 15 minutes post burn, suggesting that increased exposure to hypotonic solution is not the reason for defects in axon outcomes after burn injury.

      Author response image 2.

      Reviewer #2 (Public Review): 

      This is an interesting study in which the authors show that a thermal injury leads to extensive sensory axon damage and impaired regrowth compared to a mechanical transection injury. This correlates with increased keratinocyte migration. That migration is inhibited by CK666 drug treatment and isotonic medium. Both restrict ROS signalling to the wound edge. In addition, the isotonic medium also rescues the regrowth of sensory axons and recovery of sensory function. The findings may have implications for understanding non-optimal re-innervation of burn wounds in mammals. 

      The interpretation of results is generally cautious and controls are robust. 

      Here are some suggestions for additional discussion: 

      The study compares burn injury which produces a diffuse injury to a mechanical cut injury which produces focal damage. It would help the reader to give a definition of wound edge in the burn situation. Is the thermally injured tissue completely dead and is resorbed or do axons have to grow into damaged tissue? The two-cut model suggests the latter. Also giving timescales would help, e.g. when do axons grow in relation to keratinocyte movement? An introductory cartoon might help. 

      We thank the reviewer for these insightful comments and questions. The burn wound is defined as the area that is directly damaged as a result of increased heat (labeled by FM dye entry), and the burn wound edge as the first line of healthy cells adjacent to the burned cells. These definitions have been added to the text to clarify the areas referenced. Recent experiments lead us to believe the wound area is composed almost completely of dead cells, but we are currently working to discover the fate of these dead cells as well as the wound adjacent cells that migrate to the wound edge after burn. As a result, we do not know whether axons grow into damaged tissue or if the damaged tissue is extruded, but we do see growth cone formation within a few hours after wounding suggesting the axons are actively trying to regenerate after a burn.

      Could treatment with CK666 or isotonic solution influence sensory axons directly, or through other non-keratinocyte cell types, such as immune cells? 

      We have done experiments looking at the density of caudal fin innervation in CK666, isotonic, or DPI treated fins. The axon density is unchanged in all these treatments compared to control treated larvae, so we do not believe these treatments affect axon health homeostatically. These data have been added to supplemental figure 3. Additionally, one of the benefits of the larval zebrafish burn model is the simplicity of the system – the epidermis is primarily composed of sensory axons, mesenchymal cells and keratinocytes. The burn environment is proinflammatory so it does promote immune cell recruitment, but we do not believe the immune cells are interacting directly with sensory axons besides clearing axonal debris. Previous papers by our lab have shown that peak immune cell recruitment occurs at 6 hpw, but they localize to the damaged tissue in the burn area and not the wound edge.

      Reviewer #3 (Public Review): 

      Fister and colleagues use regeneration of the larval zebrafish caudal fin to compare the effects of two modes of tissue damage-transection and burn-on cutaneous sensory axon regeneration. The authors found that restoration of sensory axon density and function is delayed following burn injury compared to transection. 

      The authors hypothesized that thermal injury triggers signals within the wound microenvironment that impair sensory neuron regeneration. The authors identify differences in the responses of epithelial keratinocytes to the two modes of injury: keratinocytes migrate in response to burn but not transection. Inhibiting keratinocyte migration with the small-molecule inhibitor of Arp2/3 (CK666) resulted in decreased production of reactive oxygen species (ROS) at early, but not late, time points. Preventing keratinocyte migration by wounding in isotonic media resulted in increased sensory function 24 hours after burn. 

      Strengths of the study include the beautiful imaging and rigorous statistical approaches used by the authors. The ability to assess both axon density and axon function during regeneration is quite powerful. The touch assay adds a unique component to the paper and strengthens the argument that burns are more damaging to sensory structures and that different treatments help to ameliorate this. 

      A weakness of the study is the lack of genetic and cell-autonomous manipulations. Additional comparisons between transection and burns, in particular with manipulations that specifically modulate ROS generation or cell migration without potentially confounding effects on other cell types or processes would help to strengthen the manuscript.

      The use of genetic and cell-autonomous approaches would strengthen our study, however, we were unable to do this due to the lethality of these genetic approaches (or cell autonomous approaches). Basal epithelial migration is necessary for embryonic development. We attempted to circumvent this by generation of larvae transiently expressing a dominant-negative form of Rac, a protein crucial to the migratory process. The chimeric expression of the dominant negative Rac was either damaging to the larvae or the mosaicism was too low to observe any effects on migration phenotype.

      We also attempted a genetic approach to manipulate ROS production, as discussed above. We found that the DUOX morpholino was lethal to burned larvae. Finally, we attempted pharmacological inhibition of ROS production using the inhibitor DPI (Diphenyleneiodonium). With this treatment, burned larvae have marginally improved axon density and touch sensitivity, suggesting that dampening ROS may improve outcome. The DPI data have been added to the manuscript.

      In terms of framing their results, the authors refer to "sensory neurons" and "sensory axons" throughout the text - it should be made clear what type of neuron(s)/axon(s) are being visualized/assayed. Along these lines, a broader discussion of how burn injuries affect sensory function in other systems - and how the authors' results might inform our understanding of these injury responses - would be beneficial to the reader. 

      In summary, the authors have established a tractable vertebrate system to investigate different sensory axon wound healing outcomes in vivo that may ultimately allow for the identification of improved treatment strategies for human burn patients. Although the study implicates differences in keratinocyte migration and associated ROS production in sensory axon wound healing outcomes, the links between these processes could be more rigorously established. 

      The inconsistency between “neuron” and “axon” has been noted and the text has been corrected accordingly. “Neuron” is used when referring to the cell as a whole, while “axon” is used when referring to the sensory processes in the caudal fin. We added information about burn in the introduction as suggested: “While epithelial tissue is well adapted to repair from mechanical damage, burn wounds heal poorly. Thermal injury results in chronic pain and lack of sensation in the affected tissue, suggesting that an abnormal sensory neuron response contributes to burn wound pathophysiology.”

      We thank the reviewer’s for their comments.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      Suggested experiments: 

      (1) ROS measurements with the dye Pfbsf should be validated with more established ROS probes such as HyPer. 

      Pfbsf has been used previously as a readout of ROS production, and its use is documented in zebrafish (Maeda et al., Angew Chem Int Ed Engl, 2004, and Niethammer et al, Nature, 2009). These sources have been added as references when introducing Pfbsf to provide context for its use. The probe was validated and compared to HyPer in Niethammer’s 2009 paper. In our hands, we have used both probes and have similar results with tail transection.

      (2) To better support claims on ROS and H2O2 playing a central role in mediating axonal damage, the authors should consider pharmacological approaches such as rescue experiments with H2O2 and experiments using inhibitors such as DPI ar apocynin. 

      While the above reagents and drugs have limitations and non-specific side effects, more convincing proof could result from genetic approaches including experiments on DOUX knockdown or knockout lines. 

      To further dissect the role of ROS in the burn response, we conducted experiments using DPI, a potent ROS inhibitor that is well-documented in the literature. We found that 20 uM treatment of DPI (1 hour pretreatment, 1 hour post-burn) marginally improved axon density when quantified 24 hpw. Any higher dose, when in combination with a burn, proved to be lethal. Longer treatment with DPI was also not tolerated.

      In addition to experiments with DPI, we attempted to burn larvae that were injected with DUOX morpholino. The combined use of burn and DUOX MO was lethal. We have dampened the conclusions and include the new data with the DPI in the revised manuscript.

      Minor corrections: 

      (1)A phrase/expression in the abstract is confusing: isotonic treatment does not "induce osmotic regulation". Cells exposed to hypo- or hypertonicity will respond by regulatory volume decrease or increase, respectively. Isotonic treatment maintains homeostasis. 

      We appreciate this point and agree with the distinction. Revisions have been made in the text accordingly.

      (2) Figures 4E and 5E would be better to show as an average of multiple experiments with statistical significance. 

      The purpose of figures 4E and 5E are to demonstrate changes in fluorescence intensity and localization of ROS using the representative time series shown in 4D and 5D. The figure legend has been updated accordingly.

      Reviewer #2 (Recommendations For The Authors): 

      Figure 3D How can one distinguish between the two cellular elements that randomly meet or that there is actual coordination? Can the interactions be quantified? It is also unclear what the authors mean by "sensory neuron movement". The authors show that the neuronal cell bodies stay in their position, so only the axons change position. Do they do this by growth, i.e. the neuronal growth cones follow the keratinocytes or do keratinocytes displace the axon shafts? 

      We have included supplemental movies that address this question in the new uploaded document. Figure 3D is comprised of still images taken from supplemental movie 2, which is a timelapse of keratinocytes/axons moving together after a burn injury.  This movie clearly shows keratinocytes and their ensheathed axons moving simultaneously, so keratinocytes are mechanically pulling sensory axon shafts with them. We have revised the text to say axon movement, not sensory neuron movement.

      Over the time course of axonal movement (1 hour post-burn), it is not possible that neuronal growth cones contribute to movement, as this is too slow – previous work by other labs has shown that it takes several hours for axons to fully regenerate into amputated tissue, with movement not even noticeable until about 3 hours post-wound (Rieger and Sagasti, PLOS Biology, 2011).

      Regarding the second point, “neuron” vs. “axon” is an inconsistency in the text that has been corrected. “Neuron” is used when referring to the cell as a whole, “axon” is used when referring to the processes that innervate the caudal fin. The axons are physically pulled along with keratinocytes as they migrate after burn application. From our observations, growth cones appear closer to the wound site after the movement has stopped.

      Figure 4G It is surprising that the visual differences in the distribution of values are not statistically significant. 

      The distribution of values in 4G was large and that is why there is no statistically-significant difference – we were also surprised at this result. We did all statistics with a statistician and this included rigorous criteria for significance.

      Figure 4H The images seem to show a difference, whereas the quantification does not. I suggest choosing more representative images. 

      Figure 4H has been updated to include a more representative image of axon patterning with CK666 treatment.

      Figure 6A The text states that axon damage in the control and isotonic condition is comparable, yet in the image, it appears that the damage in the isotonic treatment at 0 hpw is more distal. 

      This is a good observation that we consistently see in isotonic-treated fish after burn. Axon damage localizes more proximally in isotonic-treated samples because the keratinocytes distal to the notochord are likely dead, and the axons innervating those cells are likely immediately destroyed upon burn application. As a result, the distal axons are not present to express GCaMP. We believe isotonic treatment allows keratinocytes to live slightly longer, so axon damage is therefore prevented for longer. This is also the focus of continuing work to further understand the burn microenvironment.

      Finally, the materials section could mention bias mitigation measures, e.g. withholding the treatment condition from the experimenter in the touch test. 

      We minimized bias in experiments whenever possible, and the conservative statistical measures that were applied to our data further reduce the likelihood of false significance.

      Reviewer #3 (Recommendations For The Authors): 

      - Line numbers would have facilitated reviewer feedback. 

      - Supplementary movies were missing in the submission. 

      The lack of supplementary movies upon submission was a mistake and the movies have been uploaded along with the revised manuscript.

      Introduction: 

      - Pg. 3: "In response to tissue damage, sensory neurons undergo rapid and localized axonal degeneration 4,5." Not sure reference 4 (Reyes et al) is appropriate here as this study was not in the context of tissue damage. 

      We have revised this section as suggested by the reviewer.

      Results: 

      - The expected expression pattern/localization of several transgenes was unclear. Please clearly state what cell type(s) each should label. For example, pg. 5 - "We next sought to further investigate sensory neuron function in burned tissue. For this, we assessed wound-induced axonal damage using zebrafish larvae that express the calcium probe GCaMP." Where is GCaMP expressed? 

      The manuscript has been updated to include expression patterns for the included transgenes – in this mentioned case, GCaMP is expressed in neurons under the pan-neuronal Elavl3 promoter.

      - Introducing the GCaMP labeling could use some clarification. Pg. 5 - "As shown previously by other groups, GCaMP labels degenerating neurons in real time35." This is confusing. Do the authors mean that GCaMP increases immediately prior to Wallerian degeneration as shown by Vargas et al. (PMID: 26558774)? 

      Sustained elevated calcium levels are associated with axon damage. Previous work from other labs has shown that calcium influx follows axon injury (Ziv and Spira, EJN 1993, Adalbert et al., Neuroscience 2012). In these experiments, whenever there are CGaMP-positive punctae, this indicates axon damage. We have revised the manuscript to address this critique.

      The Elavl3-GCaMP5 transgenic line will label when calcium levels increase in neurons. However, given the parameters used for imaging in our study (20x magnification, 100 ms exposure, and collection speed every 30 seconds for timelapses), we believe that only sufficiently large increases in calcium that are indicative of cell damage, and not physiological function, are being visualized.

      - Figure 1E - Are these panels images of the same fish? Please specify in the legend. 

      Figure 1E is comprised of one transected and one burned larva each, live-imaged over the course of six hours. The legend has been updated to include this information.

      - Figure 1F - How was the damage area measured? Consider doing this measurement over time to match Figure 1E. 

      Axon damage area measurements were performed similar to axon density measurements – maximum intensity z-projected confocal images of the caudal fin were generated using FIJI. For all experiments, the caudal fin area posterior to the notochord was outlined using the Polygon tool and measured to obtain a total surface area ROI. Axon fragments inside the outlined area were manually thresholded so all fragments posterior to the notochord were labeled and no saturated pixels were present, and an area measurement of these thresholded pixels was taken. We have added a section describing these measurements in the Methods section under “Axon damage quantification.”

      - Pg. 5 - When introducing the ngn1 MO - please state the expected phenotype and cite the appropriate background literature_._ 

      The ngn1 morpholino was cited in the Methods section with the appropriate literature (Cornell and Eisen, Development, 2002), from which we got the morpholino sequence. We thank the reviewer for pointing out the need for more introduction and clarification in the main text, so the ngn1 morpholino has been discussed in greater depth and cited in the main text as well using the same citation.

      - The two-wound model is an elegant approach but could be more clearly described in the main text. 

      An improved explanation of the two-wound experiment has been added to the text.

      - For Figure 3, it would be helpful to have a schematic of the anatomy illustrating the relative positions of axons and epidermal cell types. 

      - Figure 3C - should an additional control here be transected? Given that the krt4:lifeact transgene labels both layers of the epidermis, how were the superficial and basal keratinocytes separated? Interpretation of this section should be carefully worded. The authors state that "...suggesting that the superficial keratinocytes are being pulled by the motile basal keratinocytes" (pg.7 ) but isn't another possibility that the superficial cells are stationary? 

      It is correct that the krt4:lifeact transgene labels both layers of keratinocytes, which together span 20-30 microns. These layers were separated from the same z-stack collected by confocal imaging. The first z-slice and last z-slice of the same stack were separated using FIJI and pseudocolored to appear as different colors. This clarification has been added to the Methods.

      Prior observations with the krt4:lifeact and krt4:utrch (figure 3A) transgenic lines reveal that both keratinocyte layers will move distally after burn application.

      - Pg. 7 - "The axons of sensory neurons are ensheathed within actin-rich channels running through basal keratinocytes 50,51." ref 51 is a C. elegans paper which does not have basal keratinocytes.

      This was in error. The correct reference has replaced reference 51 (O’brien, J Comp. Neurol., 2012), in which electron microscopy is used to document the development of two layers of epithelial cells that also ensheath sensory neurons in a protective manner similar to glial cells in the central nervous system.

      - Figures S1E and F - the authors state that RB and DRG soma don't move. However, it was unclear from the figure panels and legend whether the authors imaged neurons that actually innervate the caudal fin (rather than some other region of the animal). Please clarify. For comparison, Fig S1F needs a pre-injury image to be meaningful. 

      The imaged cell bodies were those in the posterior trunk region, which are responsible for innervating the posterior sections of the fish including the caudal fin. From our observations, there was no movement of neuronal cell bodies after the burn.

      - Figure 5 title - can the authors clarify what aspect of this figure relates to "sustained epidermal damage" 

      The figure 5 title has been updated in response to the reviewer comments.

      - Figure 6 - is touch sensitivity really "restored" as the authors suggest? Alternatively, sensitivity may never be lost in isotonic treatment. Or the loss may be delayed? 

      We have modified the text accordingly by updating our phrasing – “restored” has been replaced with “improved” to indicate benefit over time.

      - Can the authors further disentangle the effects of keratinocyte migration, ROS, and isotonic treatment on axon regeneration? For example, would the addition of CK666 to the Isotonic +1 hpw treatment improve axon regeneration? Can the authors directly manipulate ROS signaling (e.g., through exogenous addition of H2O2 or duox1 MO) to alter regeneration outcomes in their wounding assays? 

      See the comments above.

      - Figure 6 title - consider removing or clarifying the word "excessive" here 

      The title has been revised according to the reviewer suggestion.

      - hpw vs hpb were used inconsistently throughout the text 

      The manuscript has been revised to use “hpw” when referring to the timeframe after injury application.

      Methods: 

      - Zebrafish transgenics are missing allele names 

      References: 

      - Many mistakes were noted in this section e.g., journal names missing, wrong authors, typos, DOIs misformatted 

      The references section has been corrected to use formatting consistent with APA citation and eLife preferred guidelines.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      The authors of the study investigated the generalization capabilities of a deep learning brain age model across different age groups within the Singaporean population, encompassing both elderly individuals aged 55 to 88 years and children aged 4 to 11 years. The model, originally trained on a dataset primarily consisting of Caucasian adults, demonstrated a varying degree of adaptability across these age groups. For the elderly, the authors observed that the model could be applied with minimal modifications, whereas for children, significant fine-tuning was necessary to achieve accurate predictions. Through their analysis, the authors established a correlation between changes in the brain age gap and future executive function performance across both demographics. Additionally, they identified distinct neuroanatomical predictors for brain age in each group: lateral ventricles and frontal areas were key in elderly participants, while white matter and posterior brain regions played a crucial role in children. These findings underscore the authors' conclusion that brain age models hold the potential for generalization across diverse populations, further emphasizing the significance of brain age progression as an indicator of cognitive development and aging processes.

      Strengths: 

      (1) The study tackles a crucial research gap by exploring the adaptability of a brain age model across Asian demographics (Chinese, Malay, and Indian Singaporeans), enriching our knowledge of brain aging beyond Western populations.

      (2) It uncovers distinct anatomical predictors of brain aging between elderly and younger individuals, highlighting a significant finding in the understanding of age-related changes and ethnic differences.

      Weaknesses: 

      (1) Clarity in describing the fine-tuning process is essential for improved comprehension.

      (2) The analysis often limits its findings to p-values, omitting the effect sizes crucial for understanding the relationship with cognition.

      (3) Employing a predictive framework for cognition using brain age could offer more insight than mere statistical correlations.

      (4) Expanding the study's scope to evaluate the model's generalisability to unseen Caucasian samples is vital for establishing a comparative baseline.

      In summary, this paper underscores the critical need to include diverse ethnicities in model testing and estimation.

      Reviewer #1 (Recommendations for the authors): 

      Comment #1 - Fine-Tuning Process Clarity: Enhanced clarity in the fine-tuning process documentation is crucial for understanding how models are adapted to new datasets. This involves explaining parameter adjustments and choices, which facilitates replication and application in further research.

      We thank Reviewer #1 for this pertinent point. As advised, we have added a Supplementary Methods section with more details on the finetuning process. This includes the addition of Supplementary Figure S6, which shows examples of learning curves that helped inform our parameter adjustments and choices. We have added a reference to this section in Section 5.2 of the Methods.

      Comment #2 - Effect Sizes Reporting: The emphasis on reporting effect sizes alongside p-values addresses the need to quantify the strength of observed effects, particularly the relationship between brain age and cognition. Effect sizes provide insights into the practical significance of findings, crucial for clinical and practical applications.

      We thank Reviewer #1 for raising this important comment. As suggested, we have added standardized regression coefficients (as measures of effect size) alongside p-values in Figures 3 – 4, Supplementary Figures S2 – S4, Supplementary Tables S4 – S15, and the text of Sections 2.2 – 2.3 of the Results. We have additionally added 95% confidence intervals to Supplementary Tables S4 – S15.

      Comment #3 - Predictive Framework for Cognition: Adopting a predictive framework for cognition using brain age moves the research from mere correlation to actionable prediction, offering potentials based on predictive analytics.

      We thank Reviewer #1 for this insightful suggestion. Adopting a predictive framework would certainly be a useful and exciting avenue for the application of brain age. However, we note that the current study was primarily interested in the generalizability and interpretability of brain age in Asian children and older adults, as well as the added value of longitudinal measures of brain age. Thus, we believe our correlation-based analysis effectively demonstrated that deviations of brain age from chronological age were not merely random errors, but were informative of cognition. Furthermore, ongoing changes to these deviations were informative of future cognition. This helps to establish the brain age gap as a biomarker for aging, independent of chronological age. Additionally, we expect that the accurate prediction of future cognition would require a multitude of factors, in addition to T1-based brain age, as well as a large sample size to train and test. We believe such a dataset would be a promising avenue for future work, but it is outside the scope of the current study.

      Nonetheless, we were able to conduct a preliminary analysis using the current longitudinal data from SLABS and GUSTO. We extracted the same variables used in the original analyses of future cognition, corresponding to Figures 3D and 4B in the main text. To implement a predictive framework, we split the data into 10 stratified cross-validation folds. We also used kernel ridge regression (KRR) as the predictive model, as it has previously shown promising performance in behavioral and cognitive prediction [1]. We used a cosine kernel and nested 5-fold cross-validation to pick the optimal regularization strength (alpha).

      To investigate the added value of BAG and longitudinal changes in BAG, we compared 3 predictive models for each cognitive domain. The baseline model consisted of the demographic covariates used in the original analyses (i.e. chronological age, sex, and years of education for older adults). A second model combined demographics with baseline BAG, and the third model incorporated demographics, baseline BAG, and the (early) annual rate of change in BAG. Predictions were extracted from each test fold, and performance was measured by the correlation between test predictions and actual values of future cognition (or change in cognition). Models were statistically compared using the corrected resampled t-test for machine learning models [1], [2], [3]. The Benjamini-Hochberg procedure was used to correct for multiple comparisons.

      Author response image 1 shows the prediction results for SLABS and GUSTO. Notably, adding the early change in BAG significantly improves the prediction of future change in executive function in SLABS. There is also an improvement in predicting the future inhibition score in GUSTO, but this is not significant after multiple comparison correction. Encouragingly, these are the same domains that showed significant associations with the change in BAG in the original analyses. This suggests that longitudinal brain age continues to contribute information, independent of baseline factors, in a predictive framework. We hope that future work can expand on this analysis with, for instance, larger sample sizes, more varied and informative predictors, and state-of-the-art prediction methods, in order to establish actionable predictions of future cognition.

      Author response image 1.

      Predictive framework for cognition similarly suggests value of longitudinal change in BAG. Prediction performance (Pearson's correlation) of KRR across future cognitive outcomes. Each boxplot shows the distribution of performance over cross-validation folds. Model performances are statistically compared for each outcome. Significant outcomes from the original analyses are bolded. (A) Results for SLABS using the early change in BAG and future change in cognitive scores (non-overlapping). Early change in BAG again shows benefit for predicting future change in executive function. (B) Results for GUSTO using the early change in BAG (from 4.5-7.5 years old) and future cognitive score (at 8.5 years old). Early change in BAG again shows benefit for predicting future inhibition, but it is not significant after multiple comparison correction. Key - **: p < 0.01; * (ns): p < 0.05 but p<sub>corr</sub> > 0.05 after multiple comparison correction; ns: p > 0.05

      Comment #4 - Generalizability to Unseen Caucasian Samples: Evaluating the model's performance on unseen (longitudinal) Caucasian samples is important for benchmarking.

      We thank Reviewer #1 for this important comment. We agree that generalizability should be benchmarked against performance on unseen Caucasian samples. In the SFCN model paper [4], they conducted an out-of-sample test on unseen Caucasian samples from ages 13 to 95. In this age range, they reported a high correlation (r = 0.975) and low MAE (MAE = 3.90). This favorable generalization performance was verified in adults by independent evaluations [5], [6]. This is also in line with what we observed in Asian older adults, taking into account the different age ranges and sample sizes involved [7].

      However, this also highlights the difficulty in evaluating on younger ages in the range of GUSTO (4.5 – 10.5 years old). Most accessible developmental datasets (e.g. HBN, PING) were already included in model training, preventing an unbiased evaluation on these samples. Datasets such as PNC and ABCD were not included in training, but they primarily consist of an older age range than GUSTO. Holm et al. [8] previously tested the SFCN model in ABCD and reported satisfactory performance (low MAE) from 9 – 13 years old. However, to the best of our knowledge, there are no reported generalization results (for any ethnicity) from 4.5 – 7.5 years old, which is where we found the most performance degradation in GUSTO. We are also not aware of any datasets in this age range we could access to test this, unfortunately, but it would be an important area for future work.

      While benchmarking in Caucasian children is difficult, we were able to conduct a preliminary analysis with older adults using the ADNI dataset (which was not included in the model training [4]). We selected a longitudinal subset with cognitive data available and no dementia at baseline (N = 137). We used composite cognitive scores covering memory, executive function, language, and visuospatial function [9], [10], [11]. We followed the same methodology (e.g. preprocessing, finetuning, statistical analysis) as the main analyses on EDIS, SLABS, and GUSTO. To maximize the data available, we tested associations with future cognition (taken at the last available time point), similar to GUSTO. We again included chronological age, sex, and years of education as demographic covariates.

      Author response image 2 shows the brain age predictions for the pretrained and finetuned models on ADNI. Similar to Singaporean older adults, the pretrained model performs well, producing a high correlation (r = 0.8053; compared to r = 0.7389 for EDIS and r = 0.8136 for SLABS) and somewhat low MAE (MAE = 4.9735; compared to MAE = 3.9895 for EDIS and MAE = 3.4668 for SLABS). After finetuning, the MAE improves (MAE = 3.6837; compared to MAE = 3.3232 for EDIS and MAE = 3.2653 for SLABS) with a similar correlation (r = 0.7854; compared to r = 0.7445 for EDIS and r = 0.8138 for SLABS). This suggests that generalization to unseen Singaporean older adults is in line with the generalization to unseen Caucasian older adults.

      Author response image 2. 

      Brain age predictions on unseen Caucasian sample of older adults. Predictions from the A) pretrained and B) finetuned brain age models on ADNI participants. Compare to Figure 2 of the main text.

      For the associations with future cognition, we again find that baseline BAG does not associate with future cognition (Author response tables 1 and 2). However, encouragingly, we find that the early annual rate of change in BAG does associate with future memory, which is significant after multiple comparison correction for the finetuned model (Author response tables 2 and 3). This suggests  a degree of replicability to the original results, but interestingly, in a different domain (memory vs. executive function). In contrast to SLABS, which consists of healthy older adults recruited from the community, ADNI consists of participants at risk of AD recruited from memory clinics. Thus, this difference in domain could be due to factors such as a stronger signal for memory in the testing battery or greater variations in memory function and decline. However, it could also reflect other population differences between ADNI and SLABS. This is an intriguing area for future study, ideally with larger sample sizes and more diverse populations included.

      Author response table 1.

      Linear relationship between pretrained baseline BAG and future cognitive score in ADNI. Compare to Supplementary Tables S4 – S15 of the original text.

      Author response table 2. 

      Linear relationship between finetuned baseline BAG and future cognitive score in ADNI. Compare to Supplementary Tables S4 – S15 of the original text.

      Author response table 3.

      Linear relationship between pretrained change in BAG and future cognitive score in ADNI. Compare to Supplementary Tables S4 – S15 of the original text.

      Author response table 4. 

      Linear relationship between finetuned change in BAG and future cognitive score in ADNI. Compare to Supplementary Tables S4 – S15 of the original text.

      References

      (1) L. Q. R. Ooi et al., “Comparison of individualized behavioral predictions across anatomical, diffusion and functional connectivity MRI,” NeuroImage, vol. 263, p. 119636, Nov. 2022, doi: 10.1016/j.neuroimage.2022.119636.

      (2) C. Nadeau and Y. Bengio, “Inference for the Generalization Error,” Mach. Learn., vol. 52, no. 3, pp. 239–281, Sep. 2003, doi: 10.1023/A:1024068626366.

      (3) R. R. Bouckaert and E. Frank, “Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms,” in Advances in Knowledge Discovery and Data Mining, H. Dai, R. Srikant, and C. Zhang, Eds., Berlin, Heidelberg: Springer, 2004, pp. 3–12. doi: 10.1007/978-3-540-24775-3_3.

      (4) E. H. Leonardsen et al., “Deep neural networks learn general and clinically relevant representations of the ageing brain,” NeuroImage, vol. 256, p. 119210, Aug. 2022, doi: 10.1016/j.neuroimage.2022.119210.

      (5) R. P. Dörfel et al., “Prediction of brain age using structural magnetic resonance imaging: A comparison of accuracy and test-retest reliability of publicly available software packages,” Neuroscience, preprint, Jan. 2023. doi: 10.1101/2023.01.26.525514.

      (6) J. L. Hanson, D. J. Adkins, E. Bacas, and P. Zhou, “Examining the reliability of brain age algorithms under varying degrees of participant motion,” Brain Inform., vol. 11, no. 1, p. 9, Apr. 2024, doi: 10.1186/s40708-024-00223-0.

      (7) A.-M. G. de Lange et al., “Mind the gap: Performance metric evaluation in brain-age prediction,” Hum. Brain Mapp., vol. 43, no. 10, pp. 3113–3129, Jul. 2022, doi: 10.1002/hbm.25837.

      (8) M. C. Holm et al., “Linking brain maturation and puberty during early adolescence using longitudinal brain age prediction in the ABCD cohort,” Dev. Cogn. Neurosci., vol. 60, p. 101220, Feb. 2023, doi: 10.1016/j.dcn.2023.101220.

      (9) P. K. Crane et al., “Development and assessment of a composite score for memory in the Alzheimer’s Disease Neuroimaging Initiative (ADNI),” Brain Imaging Behav., vol. 6, no. 4, pp. 502–516, Dec. 2012, doi: 10.1007/s11682-012-9186-z.

      (10) L. E. Gibbons et al., “A composite score for executive functioning, validated in Alzheimer’s Disease Neuroimaging Initiative (ADNI) participants with baseline mild cognitive impairment,” Brain Imaging Behav., vol. 6, no. 4, pp. 517–527, Dec. 2012, doi: 10.1007/s11682-012-9176-1.

      (11) S.-E. Choi et al., “Development and validation of language and visuospatial composite scores in ADNI,” Alzheimers Dement. Transl. Res. Clin. Interv., vol. 6, no. 1, p. e12072, 2020, doi: 10.1002/trc2.12072.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript reports the investigation of PriC activity during DNA replication initiation in Escherichia coli. It is reported that PriC is necessary for the growth and control of DNA replication initiation under diverse conditions where helicase loading is perturbed at the chromosome origin oriC. A model is proposed where PriC loads helicase onto ssDNA at the open complex formed by DnaA at oriC. Reconstituted helicase loading assays in vitro support the model. The manuscript is well-written and has a logical narrative.

      Thank you for understanding this study.

      Major Questions/Comments:

      An important observation here is that a ΔpriC mutant alone displays under-replication, suggesting that this helicase loading pathway is physiologically relevant. Has this PriC phenotype been reported previously? If not, would it be possible to confirm this result using an independent experimental approach (e.g. marker frequency analysis or fluorescent reporter-operator systems)?

      We thank Reviewer 1 for this comment. This study provides the first direct evidence for PriC’s role in initiation of chromosome replication. Given the change of the oriC copy number of ∆priC cells in non-stressed conditions is only slight, resolution of the suggested methods is clearly not high enough to distinguish the differences in the oriC copy number between priC<sup>+</sup> (WT) and ∆priC cells. Thus, to corroborate the ∆priC phenotype, we additionally analyzed using flow cytometry priC<sup>+</sup> and ∆priC cells growing under various nutrition and thermal conditions.

      As shown in Figure 2-figure supplement 1 of the revised version, the fraction of cells with non-2<sup>n</sup> oriC copies was slightly higher in ∆priC cells compared to priC<sup>+</sup> cells. Furthermore, when grown in M9 minimal medium at 37˚C, ∆priC mutant cells exhibited slightly reduced ori/mass values. These are supportive to the idea that inhibition of replication initiation occurs at low frequency even in the WT dnaA and dnaC background, and that PriC function is necessary to ensure normal replication initiation. Related descriptions have been revised accordingly.

      Is PriA necessary for the observed PriC activity at oriC? Is there evidence that PriC functions independently of PriA in vivo?

      As described in Introduction of the original manuscript, PriA is a 3’-to-5’ helicase which specifically binds to the forked DNA with the 3’-end of the nascent DNA strand. Thus, structural specificity of target DNA is essentially different between PriA and PriC. Consistent with this, our in vitro data indicate that PriC alone is sufficient to rescue the abortive helicase loading at oriC (Figure 7), indicating that PriA is principally unnecessary for PriC activity at oriC. Consistently, as described in Introduction, PriC can interact with ssDNA to reload DnaB (Figure 1E). Nevertheless, a possibility that PriA might participate in the PriC-dependent DnaB loading rescue at oriC in vivo can not be completely excluded. However, elucidation of this possibility is clearly beyond the scope of the present study and should be analyzed in the future. An additional explanation has been included in Discussion of the revised version.

      Is PriC helicase loading activity in vivo at the origin direct (the genetic analysis leaves other possibilities tenable)? Could PriC enrichment at oriC be detected using chromatin immunoprecipitation?

      These are advanced questions about genomic dynamics of PriC. Given that PriC facilitates DnaB reloading at stalled replication forks (Figure 1E) (Heller and Marians, Mol Cell., 2005; Wessel et al., J Biol Chem, 2013; Wessel et al., J Biol Chem, 2016), PriC might interact with the whole genome and its localization might not necessarily exhibit a preference for oriC in growing cells. Analysis about these advanced questions is interesting but is beyond the scope of the present study and should be analyzed in the future study.

      Reviewer #2 (Public review):

      This is a great paper. Yoshida et al. convincingly show that DnaA does not exclusively do loading of the replicative helicase at the E. coli oriC, but that PriC can also perform this function. Importantly, PriC seems to contribute to helicase loading even in wt cells albeit to a much lesser degree than DnaA. On the other hand, PriC takes a larger role in helicase loading during aberrant initiation, i.e. when the origin sequence is truncated or when the properties of initiation proteins are suboptimal. Here highlighted by mutations in dnaA or dnaC.

      This is a major finding because it clearly demonstrates that the two roles of DnaA in the initiation process can be separated into initially forming an open complex at the DUE region by binding/nucleation onto DnaA-boxes and second by loading of the helicase. Whereas these two functions are normally assumed to be coupled, the present data clearly show that they can be separated and that PriC can perform at least part of the helicase loading provided that an area of duplex opening is formed by DnaA. This puts into question the interpretation of a large body of previous work on mutagenesis of oriC and dnaA to find a minimal oriC/DnaA complex in many bacteria. In other words, mutants in which oriC is truncated/mutated may support the initiation of replication and cell viability only in the presence of PriC. Such mutants are capable of generating single-strand openings but may fail to load the helicase in the absence of PriC. Similarly, dnaA mutants may generate an aberrant complex on oriC that trigger strand opening but are incapable of loading DnaB unless PriC is present.

      We would like to thank Revierwer#2 for the very positive comments about our work.

      In the present work, the sequence of experiments presented is logical and the manuscript is clearly written and easy to follow. The very last part regarding PriC in cSDR replication does not add much to the story and may be omitted.

      Given that the role PriC in stimulating cSDR was unclear, we believe that our finding that PriC has little or no role in cSDR, despite being a negative result, is valuable for the general readership of eLife. To further assess impact of PriC on cSDR and as recommended by Referee #1, we carried out the chromosome loci copy-number analysis by the whole-genome sequencing. As shown in Figure 8-supplement 1 of the revised version, the results support our conclusion from the original version.

      Reviewer #3 (Public review):

      Summary:

      At the abandoned replication fork, loading of DnaB helicase requires assistance from PriABC, repA, and other protein partners, but it does not require replication initiator protein, DnaA. In contrast, nucleotide-dependent DnaA binding at the specific functional elements is fundamental for helicase loading, leading to the DUE region's opening. However, the authors questioned in this study that in case of impeding replication at the bacterial chromosomal origins, oriC, a strategy similar to an abandoned replication fork for loading DnaB via bypassing the DnaA interaction step could be functional. The study by Yoshida et al. suggests that PriC could promote DnaB helicase loading on the chromosomal oriC ssDNA without interacting with the DnaA protein. However, the conclusions drawn from the primarily qualitative data presented in the study could be slightly overwhelming and need supportive evidence.

      Thank you for your understanding and careful comments.

      Strengths:

      Understanding the mechanism of how DNA replication restarts via reloading the replisomes onto abandoned DNA replication forks is crucial. Notably, this knowledge becomes crucial to understanding how bacterial cells maintain DNA replication from a stalled replication fork when challenging or non-permissive conditions prevail. This critical study combines experiments to address a fundamental question of how DnaB helicase loading could occur when replication initiation impedes at the chromosomal origin, leading to replication restart.

      Thank you for your understanding.

      Weaknesses:

      The term colony formation used for a spotting assay could be misleading for apparent reasons. Both assess cell viability and growth; while colony formation is quantitative, spotting is qualitative. Particularly in this study, where differences appear minor but draw significant conclusions, the colony formation assays representing growth versus moderate or severe inhibition are a more precise measure of viability.

      We used serial dilutions of the cell culture for the spotting assay and thus this assay should be referred as semi-quantitative rather than simply qualitative. For more quantitative assessment of viability, we analyzed the growth rates of cells and the chromosome replication activity using flow cytometry.

      Figure 2

      The reduced number of two oriC copies per cell in the dnaA46priC-deficient strain was considered moderate inhibition. When combined with the data suggested by the dnaAC2priC-deficient strain containing two origins in cells with or without PriC (indicating no inhibition)-the conclusion was drawn that PriC rescue blocked replication via assisting DnaC-dependent DnaB loading step at oriC ssDNA.

      The results provided by Saifi B, Ferat JL. PLoS One. 2012;7(3):e33613 suggests the idea that in an asynchronous DnaA46 ts culture, the rate by which dividing cells start accumulating arrested replication forks might differ (indicated by the two subpopulations, one with single oriC and the other with two oriC). DnaA46 protein has significantly reduced ATP binding at 42C, and growing the strain at 42C for 40-80 minutes before releasing them at 30 C for 5 minutes has the probability that the two subpopulations may have differences in the active ATP-DnaA. The above could be why only 50% of cells contain two oriC. Releasing cells for more time before adding rifampicin and cephalexin could increase the number of cells with two oriCs. In contrast, DnaC2 cells have inactive helicase loader at 42 C but intact DnaA-ATP population (WT-DnaA at 42 or 30 C should not differ in ATP-binding). Once released at 30 C, the reduced but active DnaC population could assist in loading DnaB to DnaA, engaged in normal replication initiation, and thus should appear with two oriC in a PriC-independent manner.

      This is a question about dnaA46 Δ_priC_ mutant cells. Inhibition of the replication forks causes inhibition of RIDA (the DNA-clamp complex-dependent DnaA-ATP hydrolysis) system, resulting in the increase of ATP-DnaA molecules (Kurokawa et al. (1999) EMBO J.). Thus, if Δ_priC_ inhibits the replication forks significantly, the ATP-DnaA level should increase and initiation should be stimulated. However, the results of Figure 2BC are opposite, indicating inhibition of initiation by Δ_priC_. Thus, we infer that the inhibition of initiation in the Δ_priC_ cells is not related to possible changes in the ATP-DnaA level. Even if the ATP-DnaA levels are different in subpopulations in dnaA46 cells, Δ_priC_ mutation should not affect the ATP-DnaA levels significantly. Thus, we infer that even in dnaA46 Δ_priC_ mutant cells, Δ_priC_ mutation directly affect initiation mechanisms, rather than indirectly through the ATP-DnaA levels.

      Broadly, the evidence provided by the authors may support the primary hypothesis. Still, it could call for an alternative hypothesis: PriC involvement in stabilizing the DnaA-DnaB complex (this possibility could exist here). To prove that the conclusions made from the set of experiments in Figures 2 and 3, which laid the foundations for supporting the primary hypothesis, require insights using on/off rates of DnaB loading onto DnaA and the stability of the complexes in the presence or absence of PriC, I have a few other reasons to consider the latter arguments.

      This is a very careful consideration. However, we infer that stabilization of the DnaA-DnaB interaction by PriC, even if present, does not always result in stimulation of DnaB loading to oriC. Given that interactions between DnaA and DnaB during DnaB loading to oriC are highly dynamic and complicated with multiple steps, stabilization of the DnaA-DnaB interaction by PriC, even if it occurs, has a considerable risk of inhibiting the DnaB loading by constructing abortive complexes. In addition, DnaA-DiaA binding is very tight and stable (Keyamura et al., 2007, 2009). Even if WT DnaA and WT DnaB are present, PriC can rescue the initiation defects of oriC mutants. Based on these facts and the known characteristics of PriC as explained in Introduction, it is more reasonable to infer that PriC provides a bypass of DnaB loading even at oriC, as proposed for the mechanism at the stalled replication fork. However, we cannot completely rule out the indicated possibility and these explanations are included in the revised version.

      Figure 3

      One should consider the fact that dnA46 is present in these cells. Overexpressing pdnaAFH could produce mixed multimers containing subunits of DnaA46 (reduced ATP binding) and DnaAFH (reduced DnaB binding). Both have intact DnaA-DnaA oligomerization ability. The cooperativity between the two functions by a subpopulation of two DnaA variants may compensate for the individual deficiencies, making a population of an active protein, which in the presence of PriC could lead to the promotion of the stable DnaA: DnaBC complexes, able to initiate replication. In the light of results presented in Hayashi et al. and J Biol Chem. 2020 Aug 7;295(32):11131-11143, where mutant DnaBL160A identified was shown to be impaired in DnaA binding but contained an active helicase function and still inhibited for growth; how one could explain the hypothesis presented in this manuscript. If PriC-assisted helicase loading could bypass DnaA interaction, then how growth inhibition in a strain carrying DnaBL160A should be described. However, seeing the results in light of the alternative possibility that PriC assists in stabilizing the DnaA: DnaBC complex is more compatible with the previously published data.

      Unfortunately, in this comment, there is a crucial misunderstanding in the growth of cells bearing DnaA L160A. Hayashi et al. reported that the dnaB(Ts) cells bearing the dnaB L160A allele grew slowly and formed colonies even at 42°C. This feature is similar to the growth of dnaA46 cells bearing dnaA F46A H136A allele (Figure 2). Thus, the results of dnaB L160A cells are consistent with our model and support the idea that PriC partially rescues the growth inhibition of cells bearing the DnaB L160A allele by bypassing the strict requirement for the DnaA-DnaB interaction. Nevertheless, we have to be careful about a possibility that DnaB L160A could affect interaction with PriC, which we are going to investigate for a future paper.

      As suggested, if mixed complexes of DnaA46 and DnaA F46A H136A proteins are formed, those might retain partial activities in oriC unwinding and DnaB interaction although those cells are inviable at 42°C without PriC. It is noteworthy that in the specific oriC mutants which are impaired in DnaB loading (e.g., Left-oriC), PriC effectively rescues the initiation and cell growth. In these cells, both DnaA and DnaB are intact. Thus, the idea that only mutant DnaA (or DnaB) protein is simulated specifically via PriC interaction is invalid. Even in cells bearing wild-type oriC, DnaA and DnaB, contribution of PriC for initiation is detected.

      In addition, as described in the above response, given that interactions between DnaA and DnaB during DnaB loading to oriC are very dynamic and complicated with multiple steps, stabilization of the DnaA-DnaB interaction by PriC, even if present, would not simply result in stimulation of DnaB loading to oriC; rather we think a probability that it would inhibit the DnaB loading by constructing abortive complexes. Based on the known characteristics of PriC as explained in Introduction, it is more reasonable to infer that PriC provides a bypass of DnaB loading even at oriC, as proposed for the mechanism at the stalled replication fork.

      However, we cannot completely rule out the indicated possibility and this explanation has been described in the revised version as noted in response to the above question.

      Figure 4

      Overexpression of DiaA could contribute to removing a higher number of DnaA populations. This could be more aggravated in the absence of PriC (DiaA could titrate out more DnaA)-the complex formed between DnaA: DnaBC is not stable, therefore reduced DUE opening and replication initiation leading to growth inhibition (Fig. 4A ∆priC-pNA135). Figure 7C: Again, in the absence of PriC, the reduced stability of DnaA: DnaBC complex leaves more DnaA to titrate out by DiaA, and thus less Form I*. However, adding PriC stabilizes the DnaA: DnaBC hetero-complexes, with reduced DnaA titration by DiaA, producing additional Form I*. Adding a panel with DnaBL160A that does not interact with DnaA but contains helicase activity could be helpful. Would the inclusion of PriC increase the ability of mutant helicase to produce additional Form I*?

      Unfortunately, the proposed idea is biased disregarding the fact that DiaA effectively stimulates assembling processes of DnaA molecules at oriC. As oriC contains multiple DnaA boxes and multiple DnaA molecules are recruited there, DiaA will efficiently facilitate assembling of DnaA molecules on oriC. Even DnaA molecules of DnaA-DiaA complexes can efficiently bind to oriC. This is consistent with in vitro experiments showing that higher levels of DiaA stimulate assembly of DnaA molecules and oriC unwinding (i.e., DUE opening) but even excessive levels of DiaA do not inhibit those reactions (Keyamura et al., J. Biol. Chem. (2009) 284, 25038-25050). However, as shown in Figure 9, DiaA tightly binds to the specific site of DnaA which is the same as the DnaB L160-binding site, which causes inhibition of DnaA-DnaB binding (ibid). These are consistent with in vivo experiments, and concordantly consistent with the idea that the excessive DiaA level inhibits interaction and loading of DnaB by the DnaA-oriC complexes, but not oriC unwinding (i.e., DUE opening) in vivo. Also, as mentioned above, we do not consider that stabilization of DnaA-DnaBC complex simply results in stimulation of DnaB loading to oriC. Based on the known characteristics of PriC, it is more reasonable to infer that PriC provides a bypass of DnaB loading even at oriC, as proposed for the mechanism at the stalled replication fork (Figure 1E), as described in the above response.

      As for DnaB L160A, as mentioned above, we are currently investigating interaction modes between DnaB and PriC. While investigating DnaB L160A could further support our model, we believe its contribution to the present manuscript would be incremental. In addition, there is a possibility that DnaA L160A could affect interaction with PriC. Thus, analysis of DnaB mutants in this PriC rescue mechanisms should be addressed in future study.

      Figure 5

      The interpretation is that colony formation of the Left-oriC ∆priC double mutant was markedly compromised at 37˚C (Figure 5B), and 256 the growth defects of the Left-oriC mutant at 25{degree sign}C and 30{degree sign}C were aggravated. However, prima facia, the relative differences in the growth of cells containing and lacking PriC are similar. Quantitative colony-forming data is required to claim these results. Otherwise, it is slightly confusing.

      The indicated concern was raised due to our typing error lacking ∆priC. In the revised manuscript, we have amended as follows: the cell growth of the Left-oriCpriC double mutant was markedly compromised at 37˚C and moderately reduced at 25°C and 30°C (Figure 5B).

      A minor suggestion is to include cells expressing PriC using plasmid DNA to show that adding PriC should reverse the growth defect of dnaA46 and dnaC2 strains at non-permissive temperatures. The same should be added at other appropriate places.

      Even in the presence of PriC, unwinding of oriC and DnaB helicase loading to the wound oriC require DnaA and DnaC activities as indicated by previous studies (see for a review, Windgassen et al., (2018) Nucleic Acids Res. 46, 504-519). Thus, dnaA46 cells and dnaC2 cells bearing pBR322-priC can not grow at 42°C and 37°C (as follows). These are reasonable results. However, at semi-permissive temperatures (37°C for dnaA46 and 35°C for dnaC2), slight stimulation of the cell growth by pBR322-priC might be barely observed (Figure 2-supplement 1 of the revised version). These suggest that the intrinsic level of PriC is functionally nearly sufficient. This explanation has been included in the revised version.

      Author response image 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Line 38. "in assembly of the replisome".

      Corrected.

      Line 137. "specifically" rather than specificity.

      Corrected.

      Line 139. "at" rather than by.

      Corrected.

      The DnaA46 protein variant contains two amino acid substitutions (A184V and H252Y) within the AAA+ motif. H136 appears to reside adjacent to A184 in structure. Is A184V mutation causative?

      The DnaA H136A and A184V alleles are responsible for different defects. Indeed, the DnaA A184V variant is thermolabile and defective in ATP binding whereas the H136A variant retains ATP binding but impairs DnaB loading (Carr and Kaguni, Mol. Microbiol., 1996; Sakiyama et al., Front. Microbiol., 2018). These observations strongly support the view that the phenotype of the DnaA H136A allele is independent of that of the DnaA A184V allele.

      Figure 2A. Regarding the dnaA46 allele grown at 37°C.

      Individual colonies cannot be resolved. Is an image from a later time-point available?

      We have replaced the original image with one from another replicate that provides better resolution. Please see Figure 2A in the revised version.

      Figure 2C. Quantification of the number of cells with more than one chromosome equivalent in the dnaC2 ΔpriC strain. The plot from flow cytometry appears to show >20% of cells with only 1 genome. Are these numbers correct?

      Thank you for this careful comment. We quantified the peaks more strictly, but the percentages were noy largely changed. To improve resolution of the DNA profiles, we have changed the range of the x-axis in panels B and C of Figure 2 in the revised version.

      Figure 3. Are both F46A and H136A mutations in the plasmid-encoded dnaA necessary?

      Yes. The related explanation is included in the Discussion section (the third paragraph) of the original manuscript. As described there, dnaA46 cells expressing the DnaA H136A single mutant exhibited severe defects in cell growth even in the presence of PriC (Sakiyama et al., 2018). The His136 residue is located within the weak, secondary DnaB interaction region in DnaA, and is crucial for DnaB loading onto oriC ssDNA. Given domain I in DnaA H136A can stably tether DnaB-DnaC complexes to DnaA complexes on oriC (Sakiyama et al., 2018), we infer that oriC-DnaA complexes including DnaA H136A stably bind DnaB via DnaA domain I as an abortive complex, which inhibits functional interaction between PriC and DnaB as well as DnaB loading to oriC DNA.

      As for DnaA F46A mutant, our previous studies show that DnaA F46A has a limited residual activity in vivo (unlike in vitro), and allows slow growth of cells. As the stable DnaA-DnaB binding is partially impaired in vivo in DnaA F46A, this feature is consistent with the above ideas. Thus, both F46A and H136A mutations are required for severer inhibition of DnaB loading. This is additionally described in the revised Discussion.

      Figure 3. Is the DnaA variant carrying F46A and H136A substitutions stably expressed in vivo?

      We have performed western blotting, demonstrating that the DnaA variant carrying F46A and H136A substitutions is stable in vivo. In the revised version, we have added new data to Figure 3-figure supplement 1 and relevant description to the main text as follows:

      Western blotting demonstrated that the expression levels were comparable between WT DnaA and DnaA F46A H136A double mutant (Figure 3-figure supplement 1).

      Figure 5A. Should the dashed line extending down from I2 reach the R4Tma construct?

      We have amended the indicated line appropriately.

      Figure 6C. It was surprising that the strain combining the subATL mutant with ΔpriC displayed a pronounced under-initiation profile by flow cytometry, and yet there was no growth defect observed (see Figure 6B). This seems to contrast with results using the R4Tma origin, where the ΔpriC mutant produced a relatively modest change to the flow cytometry profile, and yet growth was perturbed (Figure 5C-D). How might these observations be interpreted? Is the absolute frequency of DNA replication initiation critical?

      Please note that, in E. coli, initiation activity corelates closely with the numbers of oriC copies per cell mass (ori/mass), rather than the apparent DNA profiles measured by flow cytometer. When cells were grown in LB at 30˚C, the mean ori/mass values were as follows: 0.34 for R4Tma priC, 0.51 for R4Tma, 0.82 for DATL priC, 0.99 for DATL (Figures 5 & 6 in the original manuscript). These values closely correspond to the cell growth ability shown in Figure 5C in the original manuscript.

      In the revised manuscript, we have cited appropriate references for introduction of the ori/mass values as follows.

      To estimate the number of oriC copies per unit cell mass (ori/mass) as a proxy for initiation activity (Sakiyama et al., 2017, 2022),

      Line 295. Reference for Form I* assay should cite the original publication.

      Done. The following paper is additionally cited.

      Baker, T. A., Sekimizu, K., Funnell, B. E., and Kornberg, A. (1986). Extensive unwinding of the plasmid template during staged enzymatic initiation of DNA replication from the origin of the Escherichia coli chromosome. Cell 45, 53–64.doi: 10.1016/0092-8674(86)90537-4

      Reviewer #2 (Recommendations for the authors):

      The partial complementation of the dnaC2 strain by PriC seems quite straightforward since this particular mutation leads to initiation arrest at the open complex stage and this sets the stage for PriC to load the helicase. The situation is somewhat different for dnaA46. Why is this mutation partly complemented by PriC at 37C? DnaA46 binds neither ATP nor ADP, yet it functions in initiation at permissive temperature. At nonpermissive temperature, it binds oriC as well but does not lead to initiation. Does the present data imply that the true initiation defect of DnaA46 lies in helicase loading? The authors need to comment on this in the text.

      Given the thermolabile propensity of the DnaA46 protein, it is presumable that DnaA46 protein becomes partially denatured at the sub-permissive temperature of 37˚C. This partial denaturation should impair both origin unwinding and helicase loading, though not to the extent that cell viability is lost. The priC deletion should further exacerbate helicase loading defects by inhibiting the bypass mechanism, resulting in the lethality of dnaA46 cells at this temperature. This explanation is included in the revised Discussion section.

      Relating to the above. In Figure 3 it is shown that the pFH plasmid partly complements dnaA46 in a PriC-dependent manner. Again, it would be nice to know the nature of the DnaA46 protein defect. It would be interesting to see how a pING1-dnaA46 plasmid performs in the experiment presented in Figure 3.

      A previous paper showed that multicopy supply of DnaA46 can suppress temperature sensitivity of the dnaA46 cells (Rao and Kuzminov, G3, 2022). This is reasonable in that DnaA46 has a rapid degradation rate unlike wild-type DnaA. As DnaA46 preserves the intact sequences in DnaB binding sites such as G21, F46 and H136, the suppression would not depend on PriC but would be due to the dosage effect.

      Figure 8 B: The authors should either remove the data or show a genome coverage: it is not clear that yapB is a good reference. A genome coverage would be nice, and show whether initiation can occur at oriC even if it is not the major place of initiation in a rnhA mutant.

      As suggested, we carried out the chromosome loci copy-number analysis by whole-genome sequencing to assess impact of PriC on cSDR. The new data are shown in Figure 8-supplement 1 with relevant descriptions of the main text of the revised version as shown below. Briefly, results of the chromosome loci copy-number analysis are consistent with those of real-time qPCR (Figure 8B). Given that the role PriC in stimulating cSDR was unclear, we believe that our finding that PriC has little or no role in cSDR, despite being a negative result, is valuable for the general readership of eLife.

      Line 38-39: .....resulting in replisome assembly.

      Corrected.

      Line 48: Something is wrong with the Michel reference. Also in the reference list.

      Corrected

      Line 156: replace retarded with reduced.

      Corrected.

      Line 171 and elsewhere: WT priC cells is somewhat misleading. Isn't this simply PriC+ cells?

      Yes. We have revised the wording to “priC<sup>+</sup>” for clarity.

      Line 349-350: "the oriC copy number ratio of the dnaA46 DpriC double mutant was lower than that of the dnaA46 single mutant....". This is only provided growth rate of the strains is the same.

      These strains exhibited similar growth rates. This is included in the Result section of the revised manuscript as follows: At the permissive temperature, despite having similar growth rates, the oriC copy number ratio of the dnaA46priC double mutant strain was lower than that of the dnaA46 single mutant.

      Reviewer #3 (Recommendations for the authors):

      I would suggest improved or additional experiments, data, or analyses.

      The revised version includes improved or additional experiments, data, or analyses.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused at identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      • The main issue remains that it appears that the screen has largely failed, and the reasons for that remain unclear, which make it difficult to interpret how useful is the resulting data. The authors mention batch effects as a potential contributor. The authors start with a library that includes ~6,000 variants, which makes it a medium-size MPRA. But then, only 483 pairs of WT/mutated UTRs yield high confidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give high-confidence information. The profiles presented as base-case examples in Fig. 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically-relevant associations.

      • From the variants that had an effect, the authors go on to carry out some protein-level validations, and see some changes, but it is not clear if those changes are in the same direction was observed in the screen. In their rebuttal the authors explain that they largely can not infer directionality of changes form the screen, which further limits its utility.

      • It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      We recognize that RNA distribution within polysomes is inherently less stable than the associated protein components. This instability has been noted in previous studies, including those cited by the reviewer, which used RNA from bulk polysomes to infer the translatome without fractionation. Acknowledging this limitation, we purposely adopted a conservative strategy: (i) performing gross fractionation of polysomes, and (ii) collaborating with biostatisticians at the Institute of Statistical Science, Academia Sinica, to design a conservative yet optimized analysis pipeline that minimized batch effects.

      This approach proved robust: representative cases in Fig. 2B clearly demonstrate distinct distributions of reference and alternative alleles. From our high-confidence dataset, we applied a well-established statistical framework specifically designed to accommodate multiple influencing factors in relatively small datasets (Elements of Statistical Learning by Hastie, Tibshirani, and Friedman). We further conducted sensitivity analyses to select an optimal QC cutoff across a range of stringencies, ensuring maximal reliability of our results. We have therefore successfully shortlisted UTR variants which have strong effect on translation.

      Building upon these conservative measures, we developed a predictive model for translation effects of UTR variants. Importantly, this model was validated not only with our internal test dataset but also with independent external datasets. In addition, the sequence features identified by the model were validated through reporter assays and in vivo CRISPR editing. These external and functional validations establish the generalizability and robustness of our approach.

      A more detailed analysis of the directionality of changes in translation efficiency is under active investigation. These results will be reported in a separate manuscript currently in preparation.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused on identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      (1) The main issue is that it appears that the screen has largely failed, yet the reasons for that are unclear, which makes it difficult to interpret. The authors start with a library that includes approximately 6,000 variants, which makes it a medium-sized MPRA. But then, only 483 pairs of WT/mutated UTRs yield highconfidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give high-confidence information. The profiles presented as basecase examples in Figure 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically relevant associations.

      To make sure our final results are technically and statistically sound, we applied stringent selection criteria and cutoffs in our analytics workflow. First, from our RNA-seq dataset, we filtered the UTRs with at least 20 reads in a polysome profile across all three repeated experiments. Secondly, in the following main analysis using a negative binomial generalized linear model (GLM), we further excluded the UTRs that displayed batch effect, i.e. their batch-related main effect and interaction are significant. We believe our measure has safeguarded the filtered observations (UTRs) from the (potential) high variation of our massively parallel translation assays and thus gives high confidence to our results.

      Regarding the interpretation of Figure 2B, since we aimed to identify the UTRs whose interaction term of genotype and fractions is significant in our generalized linear model, it is statistically conventional to doublecheck the interaction of the two variables using such a graph. For instance, in the top left panel of Figure 2B (5'UTR of ANK2:c.-39G>T), we can see that read counts of WT samples congruously decreased from Mono to Light, whereas the read counts of mutant samples were roughly the same in the two fractions – the trend is different between WT and mutant. Ergo, the distinct distribution patterns of two genotypes across three fractions in Figure 2B offer the readers a convincing visual supplement to our statistics from GLM.

      In contrast to Figure 2B, the graphs of nonsignificant UTRs (shown below) reveal that the trends between the two genotypes are similar across the 'Mono and Light' and 'Light and Heavy' polysome fractions. Importantly, our analysis remains unaffected by differential expression levels between WT and mutant, as it specifically distinguishes polysome profiles with different distributions. This consistent trend further supports the lack of interaction between genotype and polysome fractions for these UTRs.

      Author response image 1.

      Examples of non-significant UTR pairs in massively parallel polysome profiling assays.

      (2) From the variants that had an effect, the authors go on to carry out some protein-level validations and see some changes, but it is not clear if those changes are in the same direction as observed in the screen.

      To infer the directionality of translation efficiency from polysome profiles, a common approach involves pooling polysome fractions and comparing them with free or monosome fractions to identify 'translating' fractions. However, this method has two major potential pitfalls: (i) it sacrifices resolution and does not account for potential bias toward light or heavy polysomes, and (ii) it fails to account for discrepancies between polysome load and actual protein output (as discussed in https://doi.org/10.1016/j.celrep.2024.114098 and https://doi.org/10.1038/s41598-019-47424-w). Therefore, our analysis focused on the changes within polysome profiles themselves. 'Significant' candidates were identified based on a significant interaction between genotype and polysome distribution using a negative binomial generalized linear model, without presupposing the direction of change on protein output. 

      (3) The authors follow up on specific motifs and specific RBPs predicted to bind them, but it is unclear how many of the hits in the screen actually have these motifs, or how significant motifs can arise from such a small sample size.

      We calculated the Δmotif enrichment in significant UTRs versus nonsignificant UTRs using Fisher’s exact test. For example, the enrichment of the Δ‘AGGG’ motif in 3’ UTRs is shown below:

      Author response table 1.

      This test yields a P-value of 0.004167 by Fisher’s exact test. The P-values and Odds ratios of Δmotifs in relation to polysome shifting are included in Supplementary Table S4, and we will update the detailed motif information in the revised Supplementary Table S4.

      (4) It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      We understand the concern regarding the relatively small number of translation-shifting variants compared to the large number of features. To address this, we employed LASSO regression, which, according to The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman, is particularly suitable for datasets where the number of features 𝑝𝑝 is much larger than the number of samples 𝑁𝑁. LASSO effectively performs feature selection by shrinking less important coefficients to zero, allowing us to build a robust and generalizable model despite the limited number of variants.

      (5) The lack of meaningful validation experiments altering the SNPs in the endogenous loci by genome editing limits the impact of the results.

      Following the reviewer’s suggestion, we assessed the endogenous mutant effect by generating CRISPR knock-in clones carrying the IRF6:c.-4609G>A variant. We showed that this G>A variant generate a deleterious upstream open reading frame, which dramatically reduced protein expression of the main open reading frame (Fig. 7B-D). The genome editing further demonstrated the G>A variant reduced endogenous IRF6 protein expression to 23% or 44% in two independent clones. We have incorporated the genome editing results in the revised  main text and the new Figure 7E&F: 

      “To further validate the endogenous effect of the novel upstream ATG (uATG), we generated CRISPR knockin clones carrying the IRF6:c.-4609G>A variant and examined its impact on gene expression. The introduction of the uATG reduced RNA levels to 88% and 37% of the wild-type in two independent clones (Fig. 7E), and protein levels to 44% and 23%, respectively (Fig. 7F), resulting in an overall reduction of translation efficiency to 50–62%.“ (p.18)

      Reviewer #2 (Public Review):

      Summary:

      In their paper "Massively Parallel Polyribosome Profiling Reveals Translation Defects of Human DiseaseRelevant UTR Mutations" the authors use massively parallel polysome profiling to determine the effects of 5' and 3' UTR SNPs (from dbSNP/ClinVar) on translational output. They show that some UTR SNPs cause a change in the polysome profile with respect to the wild-type and that pathogenic SNPs are enriched in the polysome-shifting group. They validate that some changes in polysome profiles are predictive of differences in translational output using transiently expressed luciferase reporters. Additionally, they identify sequence motifs enriched in the polysome-shifting group. They show that 2 enriched 5' UTR motifs increase the translation of a luciferase reporter in a protein-dependent manner, highlighting the use of their method to identify translational control elements.

      Strengths:

      This is a useful method and approach, as UTR variants have been more difficult to study than coding variants. Additionally, their evidence that pathogenic mutations are more likely to cause changes in polysome association is well supported.

      Weaknesses:

      The authors acknowledge that they "did not intend to immediately translate the altered polysome profile into an increase or decrease in translation efficiency, as the direction of the shift was not readily evident. Additionally, sedimentation in the sucrose gradient may have been partially affected by heavy particles other than ribosomes." However, shifted polysome distribution is used as a category for many downstream analyses. Without further clarity or subdivision, it is very difficult to interpret the results (for example in Figure 5A, is it surprising that the polysome shifting mutants decrease structure? Are the polysome "shifts" towards the untranslated or heavy fractions?)

      Our approach, combining polysome fractionation of the UTR library with negative binomial generalized linear model (GLM) analysis of RNA-seq data, systematically identifies variants that affect translational efficiency. The GLM model is specifically designed to detect UTR pairs with significant interactions between genotype and polysome fractions, relying solely on changes in polysome profiles to identify variants that disrupt translation. Consequently, our analytical method does not determine the direction of translation alteration.

      Following the massively parallel polysome profiling, we sought to understand how these polysome-shifting variants influence the translation process. To do this, we examined their effects on RNA characteristics related to translation, such as RBP binding and RNA structure. In Figure 5A, we observed a notable trend in significant hits within 5’ UTRs—they tend to increase ΔG (weaker folding energy) in response to changes in polysome profiles, regardless of whether protein production increases or decreases (Fig. 3).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 3A - the claim that 5'UTR variants had a stronger effect than 3'UTR is based on the two UTRs with the strongest effect. It is unclear how these differences between 5' and 3'UTRs are significant.

      We carried out a Wilcoxon rank-sum test to examine the mut/WT fold change of translation efficiency between the 3’ and 5’ UTR variants. The results showed that the 5’ UTR variants exhibited a greater change of translation efficiency. We have inserted this result in the revised Figure 3C and refers to this figure in the main text: “Furthermore, we observed that 5’ UTR variants had a greater impact on translation activity relative to 3’ UTR variants (Fig. 3C).” (p. 12)

      (2) Figures 2B and S1, S2 - what is the meaning of less signal for a light chain and a similar signal for a heavy chain? How can this situation, while being a significant difference between the profiles, lead to a biologically relevant difference in eventual protein output?

      Taking 3’UTR ACADSB:c.*4177G>A (bottom-left panel in Figure 2B) as an example: WT transcripts have less read count (in the unit of log(CPM)) compared with the transcripts carrying the mutant UTR in the light polysome-containing fraction, whereas the read counts of the two genotypes are approximately the same in the heavy polysome-containing fraction.

      In line with our reply to Reviewer 1’s major comment 1, we aimed to identify the UTRs whose interaction term of genotype and fractions is significant in our generalized linear model (GLM). That is, the UTR pairs whose WT and mutant have different trends across the fractions (Mono to Light & Light to Heavy) are our targets. In Figure 2B, 3’UTR ACADSB:c.*4177G>A is a perfect example of our significant hits, as it displays the clear distinction of the trends of the two genotypes across three fractions.

      It is widely known that the alteration of polysome profiling distribution indicates the change of translational efficiency. Our GLM model helped us identify the UTR pairs whose WT and mutant have different polysome profiling patterns and thus likely have distinct translational efficiency. Nevertheless, since we only had limited polysome fractions in our experiments, we further validated our significant hits and confirmed the direction of regulation using luciferase reporter assay.

      (3) The paragraph starting with "Even with the high confidence dataset, we did not intend to immediately translate the altered polysome profile into an increase or decrease in translation efficiency" is confusing. The whole premise of the screen used by the authors is that polysome profiling is a useful proxy for estimating levels of translation, so claiming that it doesn't necessarily measure translation is counterintuitive.

      In line with our reply to the last question, our goal is to use the alteration of polysome profiling patterns as a proxy for the change of translational efficiency. However, due to the limited number of fractions in our experiment, we could not directly infer the direction of regulation, i.e. increase or decrease of translational efficiency, of the statistically significant variants. That is why we refrained from making any conclusion about the direction of the regulation for the significant hits and proceed to validate them using luciferase reporter assay.

      (4) Figure S5A - this is normalized to the nucleotide distribution in 5' or 3'UTRs? Is this statistic being applied to 27 SNPs in 3'UTRs?

      To identify sequence features associated with altered polysome association, we systematically analyzed both significant and nonsignificant UTRs for nucleotide and motif-level changes. Fisher’s exact test was employed to evaluate whether specific nucleotide or motif alterations were enriched or depleted in polysome-shifting UTRs, compared to nonsignificant UTR pairs. For example, in the case of nucleotide C (see table below; also Table S4 and new Fig. S6A), only four significant 3’ UTRs involved a change in C, resulting in a significant depletion of this nucleotide change among polysome-shifting 3’ UTRs (odds ratio = 0.22, p = 0.0069). Expanding this approach to all 1-7 nt motifs, we identified multiple motif and nucleotide changes that were significantly associated with altered polysome association.

      Author response table 2.

      (5) "uATG in the 5' UTR was not identified by the model as a widespread feature explaining polysome shifting". Is this because of the method of ribosome profiling or because of the sequences in the library? Can having more sequences in the library specifically looking at 5'UTR give more power for such an effect to emerge?

      Our assay design accounted for the presence of upstream ATG codons and the strength of adjacent Kozak sequences. However, additional factors known to influence the function of upstream open reading frames (uORFs)—such as the reading frame of the uORF relative to the main coding sequence, and the use of nonATG initiation codons—were not systematically included. As a result, the current assay may have limited sensitivity in detecting uORF-related regulatory effects. A dedicated design specifically tailored to uORF variants is likely to enhance the detection power and better capture their contribution to translational control.

      (6) Figure 7B- it is not clear whether the luciferase reporter and the GFP reporter in the library function in a similar manner; is it creating out-of-frame or in of in frame uORF? Also, it is not clear if the differences are statistically significant.

      In the MPRA library, the IRF6 uORF is out of frame relative to the GFP coding sequence. To directly assess its translational impact, we employed a luciferase reporter assay by fusing luciferase downstream of the IRF6 uORF. These constructs revealed a significant reduction in protein production, as shown in Figures 3 and 7B–F. Although the clinically relevant IRF6 uORF is out-of-frame with the main ORF, we engineered an inframe uORF variant to validate translation initiation at the upstream ATG (uATG) (Fig. 7B-D). The in-frame construct confirmed uATG usage and led to a significant reduction in luciferase protein expression. Together, these results support the conclusion that the IRF6:c.-4609G>A variant gives rise to an active uORF that suppresses translation of the main ORF.

      Reviewer #2 (Recommendations For The Authors):

      (1) It would be helpful for the authors to subcategorize their data in ways that they consider meaningful and interpretable (e.g. shifts from all monosome to heavy, all heavy to monosome/free, etc.) Relatedly, what do the authors think the functional meaning is when a given transcript has high mono/heavy occupancy but low light occupancy (like what is shown in Figure 2B for ANK2) in the polysome profiling experiment? It is not apparent why a transcript with a high ribosome occupancy (heavy) would also have light occupancy (light).

      From the amplicon sequencing data, we obtained read counts for each UTR variant across the monosome, light, and heavy polysome fractions. Notably, this approach does not preserve the original relative abundance of transcripts among the three fractions. That is, despite a greater abundance of mRNAs in the heavy polysome fraction, comparable numbers of sequencing reads were recovered from the monosome and light fractions. As a result, this method is not suitable for interpreting the global directionality of translational shifts but is well-suited for detecting relative differences in polysome association. Therefore, our experimental and analytical design—combining targeted amplicon sequencing with generalized linear modeling (GLM)—was optimized to identify UTR variants that alter polysome association, independently of absolute transcript abundance in each fraction.

      (2) The method put forward in Figure 2 would be more convincing if there was data showing reproducibility in the massively parallel reporter assay. Perhaps the mut/WT ratio for all transcripts can be plotted against each other and a statistical test of correlation can be performed.

      Thank you for pointing this out. To demonstrate the reproducibility of our massively parallel reporter assay, we have plotted scatter plots of the ratios of all transcripts (summing the monosome, light, and heavy fractions) across different batches using our high-confidence dataset. We calculated the Pearson correlation coefficients and corresponding p-values for these comparisons. The results show strong correlation between each batch, supporting the reproducibility of our assay. We have incorporated this analysis in the main text as well as Supplemental Figure 3: “Pearson correlation analysis revealed R coefficients ranging from 0.59 to 0.71 for the mut-to-WT transcript ratios across three independent experiments (Supplemental Fig. 3).”

      (3) The dots in Figure 2B indicate separate experiments, but the y-axis is log(counts). Values could be normalized (perhaps a ratio of mut/WT) for comparison between experiments.

      We aimed to compare UTR distribution across polysome fractions and recognized the importance of presenting the distribution patterns for both genotypes. This approach allows us to more clearly illustrate the differences or similarities in polysome association between the two genotypes.

      (4) When describing the 5' UTRs used for the validation experiments in Figure 3, more information about the 5' UTR sequence used is necessary. It is not clear how much or what part of the 5' UTRs were removed, or why this was necessary considering the same experiment was conducted using full-length UTRs.

      In the initial library design, technical limitations of bulk oligonucleotide synthesis constrained the UTRs to 155 nucleotides, comprising 115-nt of endogenous human UTR sequence flanked by 20-nt priming sites on both ends. Variants were centered at the 58th nucleotide within the 115-nt UTR sequence. When one flanking region of the native UTR was shorter than 57 nt, the variant was shifted accordingly toward the shorter arm to maintain the 115-nt UTR length (Fig. 2A).

      Given that endogenous UTRs in the human genome are often longer than 155 nt, we further evaluated the functional consequences of variants within full-length UTR sequences (Fig. 3B). While the mutant effects observed in the library setting were largely recapitulated, their magnitude was diminished in the full-length context, likely due to the increased sequence and structural complexity.

      To clarify the experimental design related to Figure 3, we modified the text as the following: “The variants significantly altering the polysome profile were then individually validated by means of high-sensitivity luciferase reporter assays (Fig. 3A). To that end, we resynthesized both the variant and corresponding wildtype alleles in the same library format - 115-nt native UTR segments centered on the variant and flanked by 20-nt priming sites. These UTRs were then cloned upstream (5’) or downstream (3’) of the firefly luciferase coding sequence, depending on their genomic location.” (p. 11)

      (5) The conclusions from inserting RBP-binding motifs into 5' UTRs and assaying translational output (Figure 4) would be strengthened by including luciferase reporters containing endogenous 5' UTRs containing these motifs, and versions where the motifs are disrupted.

      Several variants that altered translation efficiency were validated in their native sequence contexts, including 5’ UTR variants in DMD and NF1 that affect SRSF1/2 binding sites, as well as a 3’ UTR variant in AL049650.1 that impacts a KHSRP binding site (Fig. 3 and Supplemental Figs. S1 & S2). To address the functional relevance of these variants within their native regulatory landscapes, we have incorporated the following clarification into the text (p. 13): “This observation is consistent with additional findings where variants that create or disrupt specific RBP binding sites—such as SRSF1/2 (e.g., in DMD and NF1; Fig. 2 and Supplementary Fig. S4) and KHSRP (e.g., in AL049650.1; Fig. 2 and Supplementary Figs. S4 & S5)—led to significant changes in translation efficiency within their native UTR contexts.”

      (6) Figure 5C shows that 5' UTR SNPs that form an uAUG are associated with greater structural changes, but this does not "indicate" that "structure‐modifying UTR variants may control primary ORF translation partly by interfering with translation initiation from a uORF." The data presented in Figure 5 and luciferase/polysome data presented previously do not distinguish whether translation is occurring at an uAUG or canonical AUG. The statement quoted above is speculative and it should be clear that it is a hypothesis generated by the data and is not conclusive.

      We appreciate the reviewer’s suggestion. We have therefore modified our text to: ”Therefore, while changes in uATG may not be common explanatory factors for polysome-shifting mutations, our results suggest that structure-modifying UTR variants may control primary ORF translation partly by interfering with translation initiation from a uORF.” (p. 14)

      Minor points/questions

      (1) The authors should clarify whether during library construction for massively parallel polysome profiling the 3' UTR constructs contain a common 5' UTR? Likewise, do the 5' UTR constructs contain a common 3' UTR? Perhaps the lack of a 5' UTR in the 3' UTR constructs, which is implied by Figure 2A, would influence differences seen between 3' UTR pairs (and likewise for 5' UTR pairs).

      There are short common 5’ UTRs appended to the 3’ UTR library, and likewise, a common short 3’ UTR is included in the 5’ UTR library. The common 5’ UTR comprises partial sequences from the CMV promoter and the plasmid backbone of pEGFP-N1 vector. The common 3’ UTR includes sequences from the pEGFP-N1 backbone and a short polyadenylation signal from HBA1 (hemoglobin subunit alpha 1). While we cannot entirely rule out potential crosstalk between 5’ and 3’ UTRs, the design ensures that all constructs are compared in a controlled and consistent context, enabling valid pairwise comparisons between variant and wildtype alleles.

      To clarify the library design, we have revised the main text to include this explanation: 

      “The entire library of UTR oligonucleotides (UTR library) was subsequently ligated upstream or downstream of an enhanced GFP (EGFP) coding region, along with a CMV promoter and a common UTR sequence on the opposite end. Cells transfected with the UTR library were treated with cycloheximide 14 hours post transfection and then subjected to polysome fractionation (see Methods).” (p.11) 

      “The variants significantly altering the polysome profile were then individually validated through highsensitivity luciferase reporter assays (Fig. 3A). To this end, we resynthesized both the variant and corresponding wildtype alleles in the same library format - 115-nt native UTR segments centered on the variant and flanked by 20-nt priming sites. These UTRs were then cloned upstream (5’) or downstream (3’) of the firefly luciferase coding sequence, depending on their genomic location. As the initial library design, the test UTR segment differs only by one nucleotide, while a shared short UTR fragment is present on the opposite end of the coding sequence to ensure consistency across constructs (Fig. 2A).” (p. 12)

      (2) The lines connecting the polysome distribution points make the plots appear busy and difficult to read, the data would be easier to interpret if they were removed.

      We employed a generalized linear model (GLM) to identify the variants that altered the polysome association of the corresponding transcripts. Statistically speaking, we were looking for the variants which led to significant interaction between genotype and polysome fractions. Ergo, displaying the lines as it is in our plots offers readers a convincing visualization of the interaction: lines from WT and Mut groups were not parallel, which indicates the interaction between genotype and polysome fractions. Moreover, showing the lines from three batches of experiments also helps us ascertain the reproducibility of our experiments. Taken all together, the presence of the lines makes our plots even more informative.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to both reviewers for taking the time to review our manuscript and data in great detail. We thank you for the fair assessment of our work, the helpful feedback, and for recognizing the value of our work. We have done our best to address your concerns below:

      eLife assessment This work reports a valuable finding on glucocorticoid signaling in male and female germ cells in mice, pointing out sexual dimorphism in transcriptomic responsiveness. While the evidence supporting the claims is generally solid, additional assessments would be required to fully confirm an inert GR signaling despite the presence of GR in the female germline and GR-mediated alternative splicing in response to dexamethasone treatment in the male germline. The work may interest basic researchers and physician-scientists working on reproduction and

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cincotta et al set out to investigate the presence of glucocorticoid receptors in the male and female embryonic germline. They further investigate the impact of tissue-specific genetically induced receptor absence and/or systemic receptor activation on fertility and RNA regulation. They are motivated by several lines of research that report inter and transgenerational effects of stress and or glucocorticoid receptor activation and suggest that their findings provide an explanatory mechanism to mechanistically back parental stress hormone exposure-induced phenotypes in the offspring.

      Strengths:

      A chronological immunofluorescent assessment of GR in fetal and early life oocyte and sperm development.

      RNA seq data that reveal novel cell type specific isoforms validated by q-RT PCR E15.5 in the oocyte.

      2 alternative approaches to knock out GR to study transcriptional outcomes. Oocytes: systemic GR KO (E17.5) with low input 3-tag seq and germline-specific GR KO (E15.5) on fetal oocyte expression via 10X single cell seq and 3-cap sequencing on sorted KO versus WT oocytes both indicating little impact on polyadenylated RNAs

      2 alternative approaches to assess the effect of GR activation in vivo (systemic) and ex vivo (ovary culture): here the RNA seq did show again some changes in germ cells and many in the soma.

      They exclude oocyte-specific GR signaling inhibition via beta isoforms.

      Perinatal male germline shows differential splicing regulation in response to systemic Dex administration, results were backed up with q-PCR analysis of splicing factors. Weaknesses:

      COMMENT #1: The presence of a protein cannot be entirely excluded based on IF data

      We agree that very low levels of GR could escape the detection by IF and confocal imaging. We feel that our IF data do match transcript data in our validation studies of the GR KO using (1) qRT-PCR on fetal ovary in Fig 2E and (2) scRNA-seq in germ cells and ovarian soma in Fig S2B.

      COMMENT #2: (staining of spermatids is referred to but not shown).

      You are correct that this statement was based on a morphological identification of spermatids using DAPI morphology. We have performed a co-stain for GR with the spermatocyte marker SYCP3, and the spermatid/spermatozoa marker PNA (Peanut Agglutinin; from Arachis hypogaea) in adult testis tissue. We have updated Figure 4D to reflect this change, as well as the corresponding text in the Results section.

      COMMENT #3: The authors do not consider post-transcriptional level a) modifications also triggered by GR activation b) non-coding RNAs (not assessed by seq).

      We thank the reviewer for raising this very important point about potential post-transcriptional (non-genomic) effects of GR in the fetal oocyte. We agree that while our RNA-seq results show only a minimal transcriptional response, we cannot rule out a non-canonical signaling function of GR, such as the regulation of cellular kinases (as reviewed elsewhere1), or the regulation of non coding RNAs at the post-transcriptional level, and we have amended the discussion to include a sentence on this point. However, while we fully acknowledge the possibility of GR regulating non-genomic level cellular signaling, we chose not to explore this option further based on the lack of any overall functional effect on meiotic progression when GR signaling was perturbed- either by KO (Figure 2D) or dex-mediated activation (Figure S3C).

      COMMENT #4: Sequencing techniques used are not total RNA but either are focused on all polyA transcripts (10x) or only assess the 3' prime end and hence are not ideal to study splicing

      We thank the reviewer for raising this concern, however this statement is not correct and we have clarified this point in the Results section to explain how the sequencing libraries of the male germ cell RNA-seq were prepared. We agree that certain sequencing techniques (such as 3’ Tag-Seq) that generate sequencing libraries from a limited portion of an entire transcript molecule are not appropriate for analysis of differential splicing. This was not the case, however, for the RNA-seq libraries prepared on our male germ cells treated with dexamethasone. These libraries were constructed using full length transcripts that were reverse transcribed using random hexamer priming, thus accounting for sequencing coverage across the full transcript length. As a result, this type of library prep technique should be sufficient for capturing differential splicing events along the length of the transcript. We do, however, point out that these libraries were constructed on polyA-enriched transcripts. Thus while we obtained full length transcript coverage for these polyA transcripts, any differential splicing taking place in non poly-adenylated RNA moieties were not captured. While we are excited about the possibility of exploring GR-mediated splicing regulation of other RNA species in the future, we chose to focus the scope of our current study on polyA mRNA molecules specifically.

      COMMENT #5: The number of replicates in the low input seq is very low and hence this might be underpowered

      While the number of replicates (n=3-4 per condition) is sufficient for performing statistical analysis of a standard RNA-seq experiment, we do acknowledge and agree with the reviewer that low numbers of FACS-sorted germ cells from individual embryos combined with the low input 3’ Tag-Seq technique could have led to higher sample variability than desired. Given that we validated our bulk RNA-seq analysis of GR knockout ovaries using an orthogonal single-cell RNA-seq approach, we feel that our conclusions regarding a lack of transcriptional changes upon GR deletion remain valid.

      COMMENT #6: Since Dex treatment showed some (modest) changes in oocyte RNA - effects of GR depletion might only become apparent upon Dex treatment as an interaction.

      We may be missing the nuance of this point, but our interpretation of an effect that is seen only when the KO is treated with Dex would be that the mechanism would not be autonomous in germ cells but indirect or off-target.

      COMMENT #7: Effects in oocytes following systemic Dex might be indirect due to GR activation in the soma.

      As both the oocytes and ovarian soma express GR during the window of dex administration, we agree that it is possible that the few modest changes seen in the oocyte transcriptome are the result of indirect effects following robust GR signaling in the somatic compartment. However, given that these modest oocyte transcript changes in response to dex treatment did not significantly alter the ability of oocytes to progress through meiosis, we chose not to explore this mechanism further.

      COMMENT #8: Even though ex vivo culture of ovaries shows GR translocation to the nucleus it is not sure whether the in vivo systemic administration does the same.

      AND

      The conclusion that fetal oocytes are resistant to GR manipulation is very strong, given that "only" poly A sequencing and few replicates of 3-prime sequencing have been analyzed and information is lacking on whether GR is activated in germ cells in the systemically dex-injected animals.

      If we understand correctly, the first part refers to a technical limitation and the second part takes issue with our interpretation of the data. For the former, we appreciate this astute insight on the conundrum of detecting a response to systemic dex in fetal oocytes, which is generally monitored by nuclear translocation of GR. As shown in Figure 1A and 1B, GR localization is overwhelmingly nuclear in fetal oocytes of WT animals at E13.5 without addition of any dex. We could not, therefore, use GR translocation as a proxy for activation in response to dex treatment. We instead used ex vivo organ culture to monitor localization changes, as we were able to maintain fetal ovaries ex vivo in hormone-depleted and ligand negative conditions. As shown in Fig. 3, these defined culture conditions elicited a shift of GR to the cytoplasm of fetal oocytes. This led us to conclude that GR is capable of translocating between nucleus and cytoplasm in fetal oocytes, and we were able to counteract this loss in nuclear localization by providing dex ligand in the media.

      We feel that our conclusion that oocytes are resistant to manipulation of glucocorticoid signaling despite their possession of the receptor and capacity for nuclear translocation is substantiated by multiple results: meiotic phenotyping, bulk RNA-seq and scRNA-seq analysis of both GR KO and dex dosed mice. Our basis for testing the timing and fidelity of meiotic prophase I was the coincident onset of GR expression in female germ cells at E13, and the disappearance of GR in neonatal oocytes as they enter meiotic arrest. The lack of transcriptional changes observed in oocytes in response to dex has made it even more challenging to demonstrate a bona fide “activation” of GR. Observation of a dose-dependent induction of the canonical GR response gene Fkbp5 in the somatic cells of the fetal ovary (Figure S3A and 3A) affirmed that dex traverses the placenta. We agree with the reviewer that it remains possible that dex or GR KO could lead to changes in epigenetic marks or small RNAs in oocytes, and have mentioned these possibilities in the discussion, but we note that even epigenetic perturbations during oocyte development such as the loss of Tet1 or Dnmt1 result in measurable changes in the transcriptome and the timing of meiotic prophase 2–4.

      COMMENT #9: This work is a good reference point for researchers interested in glucocorticoid hormone signaling fertility and RNA splicing. It might spark further studies on germline-specific GR functions and the impact of GR activation on alternative splicing. While the study provides a characterization of GR and some aspects of GR perturbation, and the negative findings in this study do help to rule out a range of specific roles of GR in the germline, there is still a range of other potential unexplored options. The introduction of the study eludes to implications for intergenerational effects via epigenetic modifications in the germline, however, it does not mention that the indirect effects of reproductive tissue GR signaling on the germline have indeed already been described in the context of intergenerational effects of stress.

      The reviewer raises an excellent point that we have not made sufficient distinction in our manuscript between prior studies of gestational stress and preconception stress and the light that our work may shed on those findings. We have revised the introduction to clarify this difference, and added reference to an outstanding study that identifies glucocorticoid-induced changes to microRNA cargo of extracellular vesicles shed by epididymal epithelial cells that when transferred to mature sperm can induce changes in the HPA axis and brain of offspring 5. Interestingly, this GR-mediated effect in the epididymal epithelial cells concurs with our observation in the adult testis that GR can be detected only cKit+ spermatogonia but not in subsequent stages of spermatids.

      COMMENT #10: Also, the study does not assess epigenetic modifications.

      We agree with the reviewer that exploring the role of GR in regulating epigenetic modifications within the germline is an area of extreme interest given the potential links between stress and transgenerational epigenetic inheritance. As this is a broader topic that requires a more thorough and comprehensive set of experiments, we have intentionally chosen to keep this work separate from the current study, and hope to expand upon this topic in the future.

      COMMENT #11: The conclusion that the persistence of a phenotype for up to three generations suggests that stress can induce lasting epigenetic changes in the germline is misleading. For the reader who is unfamiliar with the field, it is important to define much more precisely what is referred to as "a phenotype". Furthermore, this statement evokes the impression that the very same epigenetic changes in the germline have been observed across multiple generations.

      We see how this may be misleading, and we have amended the text of the introduction and discussion accordingly to avoid the use of the term “phenotype”.

      COMMENT #12: The evidence of the presence of GR in the germline is also somewhat limited - since other studies using sequencing have detected GR in the mature oocyte and sperm.

      As described above in response to Comment #2, we have included immunostaining of adult testis in a revised Figure 4D and shown that we detect GR in PLZF+ and cKIT+ spermatogonia. We also show low/minimal expression in some (SYCP3+) early meiotic spermatocytes, but not in (Lectin+) spermatids. We are not aware of any studies that have shown expression of GR protein in the mature oocyte.

      COMMENT #13: The discussion ends again on the implications of sex-specific differences of GR signaling in the context of stress-induced epigenetic inheritance. It states that the observed differences might relate to the fact that there is more evidence for paternal lineage findings, without considering that maternal lineage studies in epigenetic inheritance are generally less prevalent due to some practical factors - such as more laborious study design making use of cross-fostering or embryo transfer.

      We thank the reviewer for this valid point, and we have amended the discussion section.

      Reviewer #2 (Public Review):

      Summary:

      There is increasing evidence in the literature that rodent models of stress can produce phenotypes that persist through multiple generations. Nevertheless, the mechanism(s) by which stress exposure produces phenotypes are unknown in the directly affected individual as well as in subsequent offspring that did not directly experience stress. Moreover, it has also been shown that glucocorticoid stress hormones can recapitulate the effects of programmed stress. In this manuscript, the authors test the compelling hypothesis that glucocorticoid receptor (GR)-signaling is responsible for the transmission of phenotypes across generations. As a first step, the investigators test for a role of GR in the male and female germline. Using knockouts and GR agonists, they show that although germ cells in male and female mice have GR that appears to localize to the nucleus when stimulated, oocytes are resistant to changes in GR levels. In contrast, the male germline exhibits changes in splicing but no overt changes in fertility.

      Strengths:

      Although many of the results in this manuscript are negative, this is a careful and timely study that informs additional work to address mechanisms of transmission of stress phenotypes across generations and suggests a sexually dimorphic response to glucocorticoids in the germline. The work presented here is well-done and rigorous and the discussion of the data is thoughtful. Overall, this is an important contribution to the literature.

      Reviewer #1 (Recommendations For The Authors):

      RECOMMENDATION #1: To assess whether in females the systemic Dex administration directly activates GR in oocytes it would be great to assess GR activation following Dex administration, and ideally to see the effects abolished when Dex is administered to germline-specific KO animals.

      In regard to the recommendation to assess GR activation in response to systemic dex administration, we refer the reviewer back to our response in Comment #8 highlighting the difficulties defining and measuring GR activation in the germline.

      This therefore has made it difficult to assess whether any of the modest effects seen in response to dex are abolished in our germline-specific KO animals. While repeating our RNA-seq experiment in dex-dosed germline KO animals would address whether the ~60 genes induced in oocytes are the result of oocyte-intrinsic GR activity, we have decided not to explore this mechanism further due to the overall lack of a functional effect on meiotic progression in response to dex (Figure S3C).

      RECOMMENDATION #2: To further strengthen the link between GR and alternative splicing it would be great to see the dex administration experiment repeated in germline specific GR KO's.

      While we understand the reviewer’s suggestion to explore whether deletion of GR in the spermatogonia is sufficient to abrogate the dex-mediated decreases in splice factor expression, we chose not to explore the details of this mechanism given that deletion of GR in the male germline does not impair fertility (Figure 6).

      RECOMMENDATION #3: I am wondering how much a given reduction in one of the splicing factors indeed affects splicing events. Can the authors relate this to literature, or maybe an in vitro experiment can be done to see whether the level of differential splicing events detected is in a range that can be expected in the case of the magnitude of splicing factor reduction?

      It has been shown in many instances in the literature that a full genetic deletion of a single splice factor leads to impairments in spermatogenesis, and ultimately infertility 6–16. We suspect that dex treatment leads to fewer differential splicing events than a full splice factor deletion, given that dex treatment causes a broader decrease in splice factor expression without entirely abolishing any single splice factor. We have amended the discussion section to include this point. While we share the reviewer’s curiosity to compare the effects of dex vs genetic deletion of splicing machinery on the overall magnitude of differential splicing events, we unfortunately do not have access to mice with a floxed splice factor at this time. While we have considered knocking out one or more splice factors in an ex vivo cultured testis to compare alongside dex treatment, our efforts to date have proven unsuccessful due to high cell death upon culture of the postnatal testis for more than 24 hours.

      RECOMMENDATION #4: It is unclear from the methods whether in germline-specific KO's also the controls received tamoxifen.

      We thank the reviewer for catching this missing piece of information. All control embryos that were assessed received an equivalent dose of tamoxifen to the germline-specific KO embryos. The only difference between cKOs and controls was the presence of the Cre transgene. We have updated the Materials and Methods 3’ Tag-Seq sample preparation section to include the sentence: “Both GRcKO/cKO and control GRflox/flox embryos were collected from tamoxifen-injected dams, and thus were equally exposed to tamoxifen in utero”.

      Reviewer #2 (Recommendations For The Authors):

      I just have only a few comments/questions.

      RECOMMENDATION #5: It is somewhat surprising that GR is expressed in female germ cells, yet there doesn't seem to be a requirement. Is there any indication of what it does? Is the long-term stability of the germline compromised?

      We thank the reviewer for these questions, and we agree that it was quite surprising to find a lack of GR function in the female germline despite its robust expression. The question of whether loss of GR affects the long-term stability of the female germline is interesting, given that similar work in GR KO zebrafish has shown impairments to female reproductive capacity, yet only upon aging 17–19.

      While we have shared interest in this question, technical limitations thus far have prevented us from properly assessing the effect of GR loss in aged females. Homozygous deletion of GR results in embryonic lethality at approximately E17.5. Conditional deletion of GR using Oct4-CreERT2 with a single dose of tamoxifen (2.5 mg / 20g mouse) at E9.5 results in complete deletion of GR by E10.5, although dams consistently suffer from dystocia and are no longer able to deliver viable pups. While using the more active tamoxifen metabolite (4OHT) at 0.1 mg / 20g has allowed for successful delivery, the resulting deletion rate is very poor (see qPCR results in panel below, left). While using half the dose of standard tamoxifen (1.25 mg / 20g mouse) at E9.5 has on rare occasions led to a successful delivery, the resulting recombination efficiency is insufficient (Author response image 1 right panel).

      Author response image 1.

      While a Blimp1-Cre conditional KO model was used to assess male fertility on GR deletion, we believe this model may not be ideal for studying fertility in the context of aging. While Blimp1-Cre is highly specific to the germ cells within the gonad, there are many cell types outside of the gonad that express Blimp1, including the skin and certain cells of the immune system. It is unclear, particularly over the course of aging, whether any effects on fertility seen would be due to an oocyte-intrinsic effect, or the result of GR loss elsewhere in the body. While we hope to explore the role of GR in the aging oocyte further using alternative Cre models in the future, this is currently outside the scope of this work.

      RECOMMENDATION #6: Figure 5b: what is the left part of that panel? Is it the same volcano plot for germ cells as shown in part a but with splicing factors?

      We apologize if this panel was unclear. Yes, the left panel of Figure 5B is in fact the same volcano plot in 5A, labeled with splicing factors instead of top genes. We have edited Figure 5B and corresponding figure legend to clarify this.

      References: 1. Oakley, R.H., and Cidlowski, J.A. (2013). The biology of the glucocorticoid receptor: New signaling mechanisms in health and disease. J. Allergy Clin. Immunol. 132, 1033–1044. 10.1016/j.jaci.2013.09.007.

      1. Hargan-Calvopina, J., Taylor, S., Cook, H., Hu, Z., Lee, S.A., Yen, M.-R., Chiang, Y.-S., Chen, P.-Y., and Clark, A.T. (2016). Stage-Specific Demethylation in Primordial Germ Cells Safeguards against Precocious Differentiation. Dev. Cell 39, 75–86. 10.1016/j.devcel.2016.07.019.

      2. Hill, P.W.S., Leitch, H.G., Requena, C.E., Sun, Z., Amouroux, R., Roman-Trufero, M., Borkowska, M., Terragni, J., Vaisvila, R., Linnett, S., et al. (2018). Epigenetic reprogramming enables the transition from primordial germ cell to gonocyte. Nature 555, 392–396. 10.1038/nature25964.

      3. Eymery, A., Liu, Z., Ozonov, E.A., Stadler, M.B., and Peters, A.H.F.M. (2016). The methyltransferase Setdb1 is essential for meiosis and mitosis in mouse oocytes and early embryos. Development 143, 2767–2779. 10.1242/dev.132746.

      4. Chan, J.C., Morgan, C.P., Leu, N.A., Shetty, A., Cisse, Y.M., Nugent, B.M., Morrison, K.E., Jašarević, E., Huang, W., Kanyuch, N., et al. (2020). Reproductive tract extracellular vesicles are sufficient to transmit intergenerational stress and program neurodevelopment. Nat Commun 11, 1499. 10.1038/s41467-020-15305-w.

      5. Kuroda, M., Sok, J., Webb, L., Baechtold, H., Urano, F., Yin, Y., Chung, P., Rooij, D.G. de, Akhmedov, A., Ashley, T., et al. (2000). Male sterility and enhanced radiation sensitivity in TLS−/− mice. Embo J 19, 453–462. 10.1093/emboj/19.3.453.

      6. Liu, W., Wang, F., Xu, Q., Shi, J., Zhang, X., Lu, X., Zhao, Z.-A., Gao, Z., Ma, H., Duan, E., et al. (2017). BCAS2 is involved in alternative mRNA splicing in spermatogonia and the transition to meiosis. Nat Commun 8, 14182. 10.1038/ncomms14182.

      7. Li, H., Watford, W., Li, C., Parmelee, A., Bryant, M.A., Deng, C., O’Shea, J., and Lee, S.B. (2007). Ewing sarcoma gene EWS is essential for meiosis and B lymphocyte development. J Clin Invest 117, 1314–1323. 10.1172/jci31222.

      8. O’Bryan, M.K., Clark, B.J., McLaughlin, E.A., D’Sylva, R.J., O’Donnell, L., Wilce, J.A., Sutherland, J., O’Connor, A.E., Whittle, B., Goodnow, C.C., et al. (2013). RBM5 Is a Male Germ Cell Splicing Factor and Is Required for Spermatid Differentiation and Male Fertility. Plos Genet 9, e1003628. 10.1371/journal.pgen.1003628.

      9. Zagore, L.L., Grabinski, S.E., Sweet, T.J., Hannigan, M.M., Sramkoski, R.M., Li, Q., and Licatalosi, D.D. (2015). RNA Binding Protein Ptbp2 Is Essential for Male Germ Cell Development. Mol Cell Biol 35, 4030–4042. 10.1128/mcb.00676-15.

      10. Xu, K., Yang, Y., Feng, G.-H., Sun, B.-F., Chen, J.-Q., Li, Y.-F., Chen, Y.-S., Zhang, X.-X., Wang, C.-X., Jiang, L.-Y., et al. (2017). Mettl3-mediated m6A regulates spermatogonial differentiation and meiosis initiation. Cell Res 27, 1100–1114. 10.1038/cr.2017.100.

      11. Horiuchi, K., Perez-Cerezales, S., Papasaikas, P., Ramos-Ibeas, P., López-Cardona, A.P., Laguna-Barraza, R., Balvís, N.F., Pericuesta, E., Fernández-González, R., Planells, B., et al. (2018). Impaired Spermatogenesis, Muscle, and Erythrocyte Function in U12 Intron Splicing-Defective Zrsr1 Mutant Mice. Cell Reports 23, 143–155. 10.1016/j.celrep.2018.03.028.

      12. Ehrmann, I., Crichton, J.H., Gazzara, M.R., James, K., Liu, Y., Grellscheid, S.N., Curk, T., Rooij, D. de, Steyn, J.S., Cockell, S., et al. (2019). An ancient germ cell-specific RNA-binding protein protects the germline from cryptic splice site poisoning. Elife 8, e39304. 10.7554/elife.39304.

      13. Legrand, J.M.D., Chan, A.-L., La, H.M., Rossello, F.J., Änkö, M.-L., Fuller-Pace, F.V., and Hobbs, R.M. (2019). DDX5 plays essential transcriptional and post-transcriptional roles in the maintenance and function of spermatogonia. Nat Commun 10, 2278. 10.1038/s41467-019-09972-7.

      14. Yuan, S., Feng, S., Li, J., Wen, H., Liu, K., Gui, Y., Wen, Y., and Wang, X. (2021). hnRNPH1 recruits PTBP2 and SRSF3 to cooperatively modulate alternative pre-mRNA splicing in germ cells and is essential for spermatogenesis and oogenesis. 10.21203/rs.3.rs-1060705/v1.

      15. Wu, R., Zhan, J., Zheng, B., Chen, Z., Li, J., Li, C., Liu, R., Zhang, X., Huang, X., and Luo, M. (2021). SYMPK Is Required for Meiosis and Involved in Alternative Splicing in Male Germ Cells. Frontiers Cell Dev Biology 9, 715733. 10.3389/fcell.2021.715733.

      16. Maradonna, F., Gioacchini, G., Notarstefano, V., Fontana, C.M., Citton, F., Valle, L.D., Giorgini, E., and Carnevali, O. (2020). Knockout of the Glucocorticoid Receptor Impairs Reproduction in Female Zebrafish. Int J Mol Sci 21, 9073. 10.3390/ijms21239073.

      17. Facchinello, N., Skobo, T., Meneghetti, G., Colletti, E., Dinarello, A., Tiso, N., Costa, R., Gioacchini, G., Carnevali, O., Argenton, F., et al. (2017). nr3c1 null mutant zebrafish are viable and reveal DNA-binding-independent activities of the glucocorticoid receptor. Sci Rep-uk 7, 4371. 10.1038/s41598-017-04535-6.

      18. Faught, E., Santos, H.B., and Vijayan, M.M. (2020). Loss of the glucocorticoid receptor causes accelerated ovarian ageing in zebrafish. Proc Royal Soc B 287, 20202190. 10.1098/rspb.2020.2190.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents some valuable information regarding the molecular mechanisms controlling the regeneration of pancreatic beta cells following induced cell ablation. However, the study lacks the critical lineage tracing result to support the conclusion about the origin of the regenerated beta cells. The results of the pharmacological manipulation of CaN signaling are also incomplete. In particular, these manipulation are not cell-specific, making it difficult to interpret and thus genetic approach is recommended.

      Public Reviews:

      Reviewer #1 (Public Review):

      Induction of beta cell regeneration is a promising approach for the treatment of diabetes. In this study, Massoz et.al., identified calcineurin (CaN) as a new potential modulator of beta cell regeneration by using zebrafish as model. They also showed that calcineurin (CaN) works together with Notch signaling calcineurin (CaN) to promote the beta cell regeneration. Overall, the paper is well organized, and technically sound. However, some evidence seems weak to get the conclusion.

      Reviewer #2 (Public Review):

      This work started with transcriptomic profiling of ductal cells to identify the upregulation of calcineurin in the zebrafish after beta-cell ablation. By suppressing calcineurin with its chemical inhibitor cyclosporin A and expressing a constitutively active form of calcineurin ubiquitously or specifically in ductal cells, the authors found that inhibited calcineurin activity promoted beta-cell regeneration transiently while ectopic calcineurin activity hindered beta-cell regeneration in the pancreatic tail. They also showed similar effects in the basal state but only when it was within a particular permissive window of Notch activity. To further investigate the roles of calcineurin in the ductal cells, the authors demonstrated that calcineurin inhibition additionally induced the proliferation of the ductal cells in the regenerative context or under a limited level of Notch activity. Interestingly, the enhanced proliferation was followed by a depletion of ductal cells, suggesting that calcineurin inhibition would exhaust the ductal cells. Based on the data, the authors proposed a very attractive and intriguing model of the role of calcineurin in maintaining the balance of the progenitor proliferation and the endocrine differentiation. However, the conclusions of this paper are only partially supported by the data as some evidence from the data remains suggestive.

      (1) In the transcriptomic profiling, genes differentially regulated in the ablated adults could be solely due to the chemical effects of metronidazole instead of the beta-cell ablation. A control group without ins:NTR-mCherry but treated with metronidazole is necessary to exclude the side effects of metronidazole.

      We believe that it is unlikely that the differential regulation observed is due to metronidazole rather than the beta cell loss. This experimental strategy as proven successful in well-published studies to identify regulators of beta cell regeneration in the zebrafish larvae. Importantly, the candidates identified in these studies were subsequently functionally validated in mammalian models (Lu et al. 2016, Karampelias 2021). Moreover, in our study, we also used another chemical compound, the nifurpirinol (Bergemann et al., 2018), to ablate the beta cells. Regardless of whether we employed metronidazole or nifurpirinol for beta cell ablation, our results consistently indicate a notable involvement of calcineurin. Of note, the nifurpirinol molecule is commonly used in fishkeeping without toxicity reported on the global health of the fish.

      (2) Although it has been shown that the pancreatic duct is a major source of the secondary islets in the pancreatic tail in previous studies, there is no direct evidence showing the cyclosporin A-induced cells share the source in this manuscript. Without any proper lineage tracing work, the origin of those cyclosporin A-induced cells cannot be concluded.

      Our experimental setting is similar to the one described in Ninov et al. 2013, where lineage tracing experiments demonstrate an increase of beta cell formation in the pancreatic tail that originate from the pancreatic ducts. In our study, we performed the same experiment with the addition of CsA and showed more ductal cell proliferation (Figure 5G) followed by a 19% increase of beta cell regeneration compared to nonregenerative conditions (Figure 2B). It is unlikely that the additional 19% of regenerated beta cells under CaN inhibition come from another source than the 68% first.

      On the other hand, the acinar cells cannot be consider as another source of regenerated beta cell as they are not able to form beta cells unless they are artificially reprogrammed (Maddison et al., 2012). Therefore the only other potential source of regenerated beta cell is the endocrine compartment. However at the stage where we performed beta cell ablation, there are no endocrine cell in the pancreatic tail. Moreover, there are no evidence that secondary islets could come from the principal islet, they are tightly associated with the ducts and differentiate form ductal cell (Mi et al., 2023).

      Importantly, we demonstrated that overexpression of CaN specifically in the pancreatic ducts prevents beta cell regeneration. CaN effect is therefore intrinsic to the ducts. Moreover, we showed that CsA increase beta cells formation when Notch signalling is repressed. Given that Notch signalling is known to act on the ductal cell population, this strongly suggests again that CsA exacerbate beta cells formation from the ducts.

      All of these compelling evidences strongly support the notion that the cyclosporininduced beta cells originate from the ductal cells.

      (3) It is interesting to see an increase of beta cells in the primary islet after cyclosporin A treatment (Supplemental Fig 2B). However, it remains unclear if their formation shares the same mechanism with the newly formed beta cells in the pancreatic tail.

      There are indeed several source of beta cell regeneration in the primary islet. However, a recent study showed that the contribution of alpha cell to regeneration is minor and the main contributors are ductal and sst1.1 cells (Mi et al., 2023). In our previous publication, we indeed showed that a major source of beta cell in the principal islet is the delta 1.1 cell population. Those sst1.1 cells begin to express insulin and therefore are named ‘bihormonal’ (Carril et al., 2022). We tested if this population is impacted by CsA treatment and we showed below that CsA does not affect bi-hormonal cell formation (Figure 2D supplemental). These new results suggest that the CsA mediated increase of beta cells in the principal islet arise from the ductal cells as observed in the tail. These results were added in the manuscript as Figure 2D supplemental.

      Author response image 1.

      Tg (sst1.1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with NFP 4µM to induce beta cell ablation. Then larvae were treated with CsA 1µM from 4 to 6 dpf (or ctl with DMSO); prior fixation and analysis of bi-hormonal cells in the principal islet at 6dpf.

      (4) The conclusion of the effect of cyclosporin A on the endocrine progenitors (Line 175) is not convincing because the data cannot distinguish the endocrine progenitors from the insulin-expressing cells. Indeed, Figure 2E shows that neurod1+ cells are fewer than ins+ cells (Figure 2D) in the pancreatic tail at 10 dpt, suggesting that all or at least the majority of neurod1+ cells are already ins+.

      The neurod1+ cells population indeed included both endocrine progenitor cells and differentiated endocrine cells. However, we would like to point out that the timing of the analysis is essential to reach our conclusion. When we treat with CsA, we show an increase of neurod1+ cells already at 4dpt. At this time point, no hormone- producing cell can yet be detected (Figure 2E). Those additional neurod1+ cell are therefore endocrine progenitors and not beta cells. This result shows that CaN inhibition induces pro-endocrine cell formation in regenerative conditions.

      At 10dpt, the neurod1+ cells population includes beta cells as well as endocrine progenitor cell. We agree that the way the data are presented in figure 2D and 2E can be confusing. Those 2 figures come form 2 separated experiments, the number of beta cell in figure 2D can therefore not be compared to the number of Neurod1+ cell in figure 2E. Indeed, from one experiment to another the efficiency and rate of regeneration can vary, independently of calcineurin. To clarify, we added the number of beta cells regenerated in the experiment of figure 2E (see Author response image 2 in red). As you can see in this experiment, regeneration was a bit slower than usual.

      Author response image 2.

      Tg (neurod1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with NFP 4µM to induce beta cell ablation. Then larvae were treated with CsA 1µM from 4 to 6 dpf (or ctl with DMSO); prior fixation and analysis of GFP+ cells (in grey, pink, dark grey and green), and mCherry+ cells for the condition ablated + CsA in red from 2 to 10 dpf.

      (5) Figure 5D shows a significant loss of nkx6.1+ cells in the combined treatment group but there is no direct evidence showing this was a result of differentiation as the authors suggested. This cell loss also outnumbered the increase in ins+ cells (Figure 4D). The cell fates of these lost cells are still undetermined, and the authors did not demonstrate if apoptosis could be a reason of the cell loss.

      Firstly, as you can notice on the graphs, we encountered a very high variability between individuals within the same condition. We decided to show this variability by presenting the raw data. This high variability could partially explain the differences that you underline. Moreover, we would like to point out that independently of CaN inhibition the progenitor loss (nkx6.1+ cell) outnumber the gain of beta cells. Indeed, in average there is a loss of 29% (41 GFP+) of the nkx6.1+ cells and a gain of only 6 beta cells after Notch inhibitory treatment. The other progenitors cells being differentiated into other endocrine cell types (pro-endocrine, alpha, delta). In the combined treatment (Notch and CaN inhibitors), we decreased the number of progenitors cell by 50%, i.e 21% (20 cells) more than without CaN inhibitor. However, we increased the number of regenerated beta cells by two fold (6 cell to 12 cells). In brief, the important progenitors cell loss could be explained by precocious differentiation in the pro-endocrine and endocrine cells type. It is therefore normal than the number of beta cells regenerated do not match the progenitors cell number loss and this in presence or absence of CaN inhibition.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) The evidence to indicate the proliferating ductal cell differentiate into beta cell is weak. They should use linkage tracing, or other marker genes immunostaining to confirm that.

      The experiment from the Figure 5 A-D is a short term tracing experiment and should have been presented as such in the manuscript. After LY411575 (Notch inhibitor) and CsA treatments at 3dpf, we exposed the larvae to EdU at 4dpf during 8 hours (Figure 5A). We showed that EdU is incorporated in dividing ductal cells at 4dpf (Figure 5C) ant that 2 days later there are newly form beta cells that are EdU+.(see Author response image 3) To reinforce our conclusion, the image below will be added to the manuscript.

      Author response image 3.

      Tg (nkx6.1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with both CsA 1µM and LY411575 5µM. At 4dpf, the larvae were exposed to EdU 4mM during 8 hours, before analysis at 6 dpf.

      (2) To inhibition of CaN and Notch pathway, they just used the pharmacological approaches, genetical approaches should be used to get stronger evidence.

      We employed two distinct inhibitors specifically targeting calcineurin (CsA and FK506) for CaN inhibition. While these inhibitors have distinct chemical structures and potential non-specific effects, they both yield the same result of increased beta cell formation under Notch repression (see Figure 4D and Figure 4B in the supplementary data). This convergence of outcomes strongly suggests that the observed effect is primarily attributable to the specific inhibition of calcineurin.

      Furthermore, we complemented our inhibitor-based approach with a genetic strategy involving CaN overexpression (see Figure 3). Notably, the overactivation of CaN resulted in a reduction of beta cell regeneration. Given that this genetic approach generated an effect contrary to that achieved with the inhibitors, it provides robust support for our model, which postulates that calcineurin plays a critical role in the regulation of beta cell regeneration (see Figure 3, panels C-E).

      As for Notch inhibition, previous published data from our laboratory compared the effects of Notch inhibitor (LY411575) and genetic approaches (mib mutant and transgenic line) on pro-endocrine cell (ascl1b+) and ductal cell (nkx6.1+) formation. This study showed that both Notch inhibitor (LY411575) and Notch repression using genetic approaches recapitulate the same effect: an induction of pro-endocrine cells formation. The specificity of this inhibitor being validated (Ghaye et al., 2015), we did not consider the need of a genetic approach.

      (3) The most enriched pathways among the up-regulated genes were DNA replication and cell cycle, which suggested that these genes are more important for the duct cell proliferation, how is Calcineurin related to these pathways, such as regulating the genes important for proliferation?

      The transcriptomic data presented in this manuscript suggest that the ductal cells undergo a strong proliferative response after beta cell ablation. This is in accordance with our experimental data showing activation of ductal proliferation after beta cell ablation (Ghaye at al., 2015) and data from this manuscript (Figure 1 I-J).

      Calcineurin is a well-known regulator of the cell cycle, and can either promote or repress the cell cycle depending on the cell type. For example, stressing the cell provokes an entry of calcium and subsequently a CaN activation which result in cell cycle arrest (Leech et al. 2020). Nevertheless, depending the cell type, CaN can be either necessary or deleterious to cell proliferation (Goshima et al. 2019; Masaki and Shimada 2022). The intriguing dual role of CaN in cell cycle is well illustrated in β cell regeneration. While CaN should be repressed to enable ductal progenitor amplification and subsequent endocrine differentiation, CaN is then necessary for β cell function and for their replication (Dai et al. 2017; Heit et al. 2006). Moreover, CaN is related to cellular senescence and CaN function is important for proper fin regeneration in zebrafish.

      (4) It is hard to understand why they pick up the pathway of cellular senescence signature for the duct cell progenitor neogenesis? Moreover, among these senescence genes, many genes are cell cycle regulators.

      In response to beta cell ablation, the ductal cells undergo a strong proliferative response, as shown in our previous data (Ghaye 2015). It was therefore not surprising that many differentially expressed genes are cell cycle regulators. On the other hand, the cellular senescence signature was surprising. Indeed, senescence is usually associated with cell cycle arrest and aging. However, recent studies showed that cellular senescence is required for proper development and regeneration. We therefore wanted to investigate this pathway and more particularly the function of calcineurin, which can either promote or repress the cell cycle in different cell types (see comment above).

      (5) The RNA-seq data obtained from adult fish, while the authors use larvae to explore the CaN functions, it may have different conclusion using adult fish. Moreover, it is unclear whether the CaN increased when the beta cell ablated in young larvae.

      We decided to first perform functional experiment in the larvae as this model unable the quantification of beta cell regeneration from the ducts in the pancreatic tail. However, to validate our results in non-developmental stages, we perform experiments in juveniles (2 months old) and adults. CsA treatments in juveniles zebrafish recapitulated the same results that in larvae (Figure 2B and Figure 6A-C). Moreover, we showed that CaN overactivation delayed glycemia recovery after ablation adults (Figure 6D-E), which is in accordance with an impaired regeneration. Altogether, these results strongly suggest that CaN act as regulator of beta cell regeneration both in the juvenile/adult and larval stages.

      Concerning the expression of CaN in the zebrafish larvae, we tried to detect the level of CaN in the different experimental conditions by in situ hybridization. However, we were not able to detect it using this technique. We also tried immunostaining with antiphospho-nfact3 ser165 polyclonal antibody (Invitrogen) but this antibody does not seem to work in zebrafish. Finally, we tried to sort ductal cell at larval stage to perform a transcriptomic analysis but we were unable to collect enough ductal cells to proceed further. Indeed our staining experiment showed that there are only around 150 ductal cells (nkx6.1+, Figure 5D) at this stage.

      (6) The beta cell regeneration in the young larvae usually recovers within ~ 5 days in principle islet. Please also show the beta cell number (PI) during the beta cell recovery after ablation.

      We did show beta cell regeneration in the principal islet in Figure 2A-B supplemental. While new beta cells appears quickly in this islet (Carril, Massoz, Dupont et al., 2023), the principal islet has not yet fully recover at 5dpt.

      (7) Since the studies did not show the CaN level in Fig.3, it is hard to know that the CaN is exactly expressed.

      In the figure 3B, using Tg(hsp70:GFP-CaNCA), it is indeed not possible to see CaN expression at 10 dpt as the heat shocks induce only transiently CaNCA overexpression. However, the transient expression was detected in live shortly after the heat shocks. On the other hand, with the transgenic line Tg(UAS:GFP-CaNCA); Tg(cftr:Gal4), in which GFPCaNCA is continuously expressed allowing us to show CaNCA expression in the pancreatic ducts (Figure 3).

      (8) In Fig.6 D and 6E, did these drug treatments change the glucose level in nonablated fish?

      As you can see below, the CaN inhibitor, CsA does not affect the glycemia of the fish in non-regenerative conditions.

      Author response image 4.

      Glycemia of non-ablated fish, 3 days after drug treatment.

      (9) The logic of writing in Results is very hard to understand.

      We proofed read the paper in an effort to clarify it.

      Minor concerns,

      (1) Make a scheme for ablation and RNA-seq, and indicate the age of the fish used in Fig. 1.

      We added the scheme in Figure 1 supplemental.

      (2) In Fig. 1G, two arrows indicated mCherry+ cells is hard to see in the non-ablated fish.

      One arrow was indeed mislocated, we moved the arrow and try to improve the intensity of red. However, the only cells are indeed small and can be difficult to see.

      (3) In Fig.6, it is hard to know that the arrows indicated islets are small islets (up to 5 cells), how they compared with big islets and defined as small islet. Moreover, some of these islets are almost invisible.

      We now show a close up of a portion of the pancreatic tail and show the beta cells with arrows only in this picture, to enhance clarity.

      Reviewer #2 (Recommendations For The Authors):

      (1) This manuscript needs more proofreading and polishing to increase its readability.

      We proofread the manuscript and change some paragraph for more clarity.

      (2) The extensive use of words like "modulate" or "regulate" sometimes makes the text ambiguous as the effect is not stated directly and clearly.

      We re-wrote some parts of the text and try to avoid using “regulate” as often.

      However, as we used both repression and over-activation of CaN, we still use words as regulate to stipulate general conclusions on the function of CaN.

      (3) The list of individual differentially regulated genes after the beta-cell ablation in the RNAseq seems missing. This list could be interesting and helpful for other researchers. We added it.

      (4) In Figure 1D, "modulated" genes are shown but were they all upregulated like those in Figure 1A? The modulation should be indicated more clearly (e.g. up- or down-regulated) in the figure. The authors can use different colours to illustrate that.

      Done.

      (5) Is Figure 2D showing the same data extracted from Figure 2B? Does Figure 2D add any information to the data?

      No, it does not add data. We actually add the Figure 2D for a better visualisation of the increase at 10dpt.

      (6) In the y-axis of Figure 3E, it should be "mCherry".

      It already is. We did check all the axis again to be sure it is correct.

      (7) Line 219, "Figure 4E supplemental" instead of "Figure 4D supplemental"

      Done.

      (8) Line 266, "ablated juveniles" instead of "ablated larvae"

      Done. Thank you for noticing these mistakes.

      (9) In Figure 6A, many mCherry+ cells are hardly visible and there are some greyish white signals in the images that are supposed to show the mCherry channel only. What are those grey signals?

      There is no channel showing grey on the picture, I improved the overall quality of this pictures and show close up to improve the figure.

      (10) In Figure 6D and 6E, CaNCA overexpression had a significant effect on the glycemia. But did the overexpression affect the beta cell formation or regeneration? We showed that CaNCA overexpression did not affect beta cell formation in absence of regeneration in the larvae (Figure 3E). Moreover, it does not affect the glycemia of the fish in non-regenerative conditions (Author response image 5). As for regenerative conditions, CaN overexpression decreased the regeneration in the larvae (Figure 3E).

      Author response image 5.

      Glycemia of Tg(UAS:GFP-CaNCA); Tg(cftr:Gal4) fish, overexpressing CaNCA, compared to controls fish, in non-regenerative conditions.

      (11) The role of calcineurin seems transient (e.g. Figure 2B and 4E) and does not play a significant role in long term. It would be interesting to see if long-term/repeated treatments of calcineurin inhibitors and overexpression/knockout of important members of calcineurin signaling would affect the pool of progenitors in long term.

      We were also interested in the consequences of CaN overexpression on the long term. Our overexpression tool Tg(UAS:CaNCA) allow to address this question, as CaN is overexpress permanently. We assessed the structure of the ducts and the number of beta cells in transgenic larvae and did not see any defects of the ducts whether in regenerative context or not. On the other hand, we showed in this manuscript that CaN effect is specific to regenerative conditions. As a consequence, it is not likely that repeated treatments long after the ablation would continue to affect beta cell formation and the progenitors pool.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      The study could also valuably explore what kinds of genes experienced what forms of expression evolution. A brief description of GO terms frequently represented in genes which showed strong patterns of expression evolution might be suggestive of which selective pressures led to the changes in expression in the C. bursa-pastoris lineage, and to what extent they related to adaptation to polyploidization (e.g. cell-cycle regulators), compensating for the initial pollen and seed inviability or adapting to selfing (endosperm- or pollen-specific genes), or adaptation to abiotic conditions. ”

      We did not include a gene ontology (GO) analysis in the first place as we did not have a clear expectation on the GO terms that would be enriched in the genes that are differentially expressed between resynthesized and natural allotetraploids. Even if we only consider adaptive changes, the modifications could occur in various aspects, such as stabilizing meiosis, adapting to the new cell size, reducing hybrid incompatibility and adapting to self-fertilization. And each of these modifications involves numerous biological processes and molecular functions. As we could make post-hoc stories for too many GO terms, extrapolating at this stage have limited implications and could be misleading.

      Nonetheless, we are not the only study that compared newly resynthesized and established allopolyploids. GO terms that were repeatedly revealed by this type of exploratory analysis may give a hint for future studies. For this reason, now we have reported the results of a simple GO analysis.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      The majority of concerns from reviewers and the reviewing editor are in regards to the presentation of the manuscript; that the framing of the manuscript does not help the general reader understand how this work advances our knowledge of allopolyploid evolution in the broad sense. The manuscript may be challenging to read for those who aren't familiar with the study system or the genetic basis of polyploidy/gene expression regulation. Further, it is difficult to understand from the introduction how this work is novel compared to the recently published work from Duan et al and compared to other systems. Because eLife is a journal that caters to a broad readership, re-writing the introduction to bring home the novelty for the reader will be key.

      Additionally, the writing is quite technical and contains many short-hands and acronyms that can be difficult to keep straight. Revising the full text for clarity (and additionally not using acronyms) would help highlight the findings for a larger audience.

      Reviewer #1 (Recommendations For The Authors):

      Most of my suggestions on this interesting and well-written study are minor changes to clarify the writing and the statistical approaches.

      The use of abbreviations throughout for both transcriptional phenomena and lines is logical because of word limits, but for me as a reader, it really added to the cognitive burden. Even though writing out "homoeolog expression bias" or "hybridization-first" every time would add length, I would find it easier to follow and suspect others would too.

      Thank you for this suggestion. Indeed, using less uncommon acronyms or short-hands should increase the readability of the text for broader audience. Now in most places, we refer to “Sd/Sh” and “Cbp” as “resynthesized allotetraploids” and “natural allotetraploids”, respectively. We have also replaced the most occurrences of the acronyms for transcriptional phenomena (ELD, HEB and TRE) with full phrases, unless there are extra attributes before them (such as “Cg-/Co-ELD” and “relic/Cbp-specific ELD”).

      It would be helpful to include complete sample sizes to either a slightly modified Figure 1 or the beginning of the methods, just to reduce mental arithmetic ("Each of the five groups was represented by six "lines", and each line had six individuals" so there were 180 total plants, of which 167 were phenotyped - presumably the other 13 died? - and 30 were sequenced).

      The number 167 only applied to floral morphorlogical traits (“Floral morphological traits were measured for all five groups on 167 plants…”), but the exact total sample size for other traits differed. Now the total sample sizes of other traits have also been added to beginning of the second paragraph of the methods.

      For this study 180 seedings have been transplanted from Petri dishes to soil, but 8 seedlings died right after transplanting, seemingly caused by mechanical damage and insufficient moistening. Later phenotyping (2020.02-2020.05) was also disrupted by the COVID-19 pandemic, and some individuals were not measured as we missed the right life stages. Specifically, 5 individuals were missing for floral morphological traits (sepal width, sepal length, petal width, petal length, pistil width, pistil length, and stamen length), 30 for pollen traits, 1 for stem length, and 2 for flowering time. As for seed traits, we only measured individuals with more than ten fruits, so apart from the reasons mentioned above, individuals that were self-incompatible and had insufficient hand-pollination were also excluded. We spotted another mistake during the revision: two individuals with floral morphological measurements had no positional information (tray ID). These measurements were likely mis-sampled or mislabeled, and were therefore excluded from analysis. We assumed most of these missing values resulted from random technical mistakes and were not directly related to the measured traits.

      In general, the methods did a thorough job of describing the genomics approaches but could have used more detail for the plant growth (were plants randomized in the growth chamber, can you rule out block/position effects) and basic statistics (what statistical software was used to perform which tests comparing groups in each section, after the categories were identified).

      When describing the methods, mention whether the plants; this should be straightforward as a linear model with position as a covariate.

      Data used in the present study and a previously published work (Duan et al., 2023) were different subsets of a single experiment. For this reason, we spent fewer words in describing shared methods in this manuscript but tried to summarize some methods that were essential for understanding the current paper. But as you have pointed out, we did miss many important details that should have been kept. Now we have added some description and a table (Supplementary file 1) in the “Plant material” section for explaining randomization, and added more information of the software used for performing statistic tests in the “Phenotyping” section.

      Although we did not mention in the present manuscript, we used a randomized block design for the experiment (Author response image 1).

      Author response image 1.

      Plant positions inside the growth chamber. Plants used in the present study and Duan et al. (2023) were different subsets of a single experiment. The entire experiment had eight plant groups, including the five plant groups used in the present study (diploid C. orientalis (Co2), diploid C. grandiflora (Cg2), “whole-genome-duplication-first” (Sd) and “hybridization-first”(Sh) resynthesized allotetraploids, and natural allotetraploids, C. bursa pastoris (Cbp), as well as three plant groups that were only used in Duan et al. (2023; tetraploid C. orientalis (Co4), tetraploid C. grandiflora (Cg4) and diploid hybrids (F)). Each of the eight plant groups had six lines and each line represented by six plants, resulting in 288 plants (8 groups x 6 lines x 6 individuals = 288 plants). The 288 plants were grown in 36 trays placed on six shelves inside the same growth chamber. Each tray had exactly one plant from each of the eight groups, and the position of the eight plants within each tray (A-H) were randomized with random.shuffle() method in Python (Supplementary file 1). The position of the 36 trays inside the growth room (1-36) was also random and the positions of all trays were shuffled once again 28 days after germination (randomized with RAND() and sorting in Microsoft Excel Spreadsheet). (a) Plant distribution; (b) An example of one tray; (c) A view inside the growth chamber, showing the six benches.

      With the randomized block design and one round of shuffling, positional effect is very unlikely to bias the comparison among the five plant groups. The main risk of not adding positions to the statistical model is increasing error variance and decreasing the statistical power for detecting group effect. As we had already observed significant among-group variation in all phenotypic traits (p-value <2.2e-16 for group effect in most tests), further increasing statistical power is not our primary concern. In addition, during the experiment we did not notice obvious difference in plant growth related to positions. Although we could have added more variables to account for potential positional effects (tray ID, shelf ID, positions in a tray etc.), adding variables with little effect may reduce statistical power due to the loss of degree of freedom.

      Due to one round of random shuffling, positions cannot be easily added as a single continuous variable. Now we have redone all the statistical tests on phenotypic traits and included tray ID as a categorical factor (Figure 2-Source Data 1). In general, the results were similar to the models without tray ID. The F-values of group effect was only slightly changed, and p-values were almost unchanged in most cases (still < 2.2e-16). The tray effect (df=35) was not significant in most tests and was only significant in petal length (p-value=0.0111), sepal length (p-value=0.0242) and the number of seeds in ten fruits (p-value=0.0367). As expected, positions (tray ID) had limited effect on phenotypic traits.

      Figure 2 - I assume the numbers at the top indicate sample sizes but perhaps add this to the figure caption.

      Statistical power depends on both the total sample size and the sample size of each group, especially the group with the fewest observations. We lost different number of measurements in each phenotypic trait, and for pollen traits we did have a notable loss, so we chose to show sample sizes above each group to increase transparency. Since we had five different sets of sample sizes (for floral morphological traits, stem length, days to flowering, pollen traits and seed traits, respectively), it would be cumbersome to introduce all 25 numbers in figure caption and could be hard for readers to match the sample sizes with results. For this reason, we would like to keep the sample sizes in the figure, and now we have modified the legend to clarify that the numbers above groups are sample sizes.

      ’The trend has been observed in a wide range of organisms, including ...’ - perhaps group Brassica and Raphanobrassica into one clause in the sentence, since separating them out undermines the diversity somewhat.

      Indeed, it is very strange to put “cotton” between two representatives from Brassicaceae. Now the sentence is changed to “… including Brassica (Wu et al., 2018; Li et al., 2020; Wei et al., 2021) and Raphanobrassica (Ye et al., 2016), cotton (Yoo et al., 2013)…”

      The diagrams under the graph in Figure 4B are particularly helpful for understanding the expression patterns under consideration! I appreciated them a lot!

      Thank you for the comment. We also feel the direction of expression level dominance is convoluted and hard to remember, so we adopted the convention of showing the directions with diagrams.

      Reviewer #2 (Recommendations For The Authors):

      The science is very interesting and thorough, so my comments are mostly meant to improve the clarity of the manuscript text:

      • I found it challenging to remember the acronyms for the different gene expression phenomena and had to consistently cross-reference different parts of the manuscript to remind myself. I think using the full phrase once or twice at the start of a paragraph to remind readers what the acronym stands for could improve readability.

      Thank you for this reasonable suggestion. Now we have replaced the most occurrence of acronyms with the full phrases.

      • There are some technical terms, such as "homoeologous synapsis" and "disomic inheritance", which I think are under-defined in the current text.

      Indeed these terms were not well-defined before using in the manuscript. Now we have added a brief explanation for each term.

      • Under the joint action of these forces, allopolyploid subgenomes are further coordinated and degenerated, and subgenomes are often biasedly fractionated" This sentence has some unclear terminology. Does "coordinated" mean co-adapted, co-inherited, or something else? Is "biasedly fractionated" referring to biased inheritance or evolution of one of the parental subgenomes?

      We apologize for not using accurate terms. With “coordinated” we emphasized the evolution of both homoeologs depends on the selection on total expression of both homoeologs, and on both relative and absolute dosages, which may have shifted away from optima after allopolyploidization. “Co-evolved” or “co-adapted” might be a better word.

      But the term "biasedly fractionation" has been commonly used for referring to the phenomenon that genes from one subgenome of polyploids are preferentially retained during diploidization (Woodhouse et al., 2014; Wendel, 2015). Instead of inventing a new term, we prefer to keep the same term for consistency, so readers could link our findings with numerous studies in this field. Now the sentence is changed to “Under the joint action of these forces, allopolyploid subgenomes are further co-adapted and degenerated, and subgenomes are often biasedly retained, termed biased fractionation”.

      • There are a series of paragraphs in the results, starting with "Resynthesized allotetraploids and the natural Cbp had distinct floral morphologies", which consistently reference Figure 1 where they should be referencing Figure 2.

      Thank you for spotting this mistake! Now the numbers have been corrected.

      • ‘The number of pollen grains per flower decreased in natural Cbp’ this wording implies it's the effect of some experimental treatment on Cbp, rather than just measured natural variation.

      Yes, it is not scientifically precise to say this in the Results section, especially when describing details of results. We meant that assuming resynthesized allopolyploids are good approximation of the initial state of natural allotetraploid C. bursa-pastoris, our results indicate that the number of pollen grains had decreased in natural C. bursa-pastoris. But this is an implication, rather than an observation, so the sentence is better rewritten as “Natural allotetraploids had less pollen grains per flower.”

      • ‘The percentage of genes showing complete ELD was altogether limited but doubled between resynthesized allotetraploid groups and natural allotetraploids’ for clarity, I would suggest revising this to something like "doubled in natural allotetraploids relative to resynthesized allotetraploids

      Thank you for the suggestion. The sentence has been revised as suggested.

      • I'm not sure I understand what the difference is between expression-level dominance and homeolog expression bias. It seems to me like the former falls under the umbrella of the latter.

      Expression-level dominance and homeolog expression bias are easily confused, but they are conceptually independent. One gene could have expression-level dominance without any homeolog expression bias, or strong homeolog expression bias without any expression-level dominance. The concepts were well explained in Grover et al., (2012) with nice figures.

      Expression level dominance compares the total expression level of both homoeologs in allopolyploids with the expression of the same gene in parental species, and judges whether the total expression level in allopolyploids is only similar to one of the parental species. The contributions from different homoeologs are not distinguished.

      While homoeolog expression bias compares the relative expression level of each homoeologs in allopolyploids, with no implication on the total expression of both homoeologs.

      Let the expression level of one gene in parental species X and Y be e(X) and e(Y), respectively. And let the expression level of x homoeolog (from species X) and y homoeolog (from species Y) in allopolyploids be e(x) and e(y), respectively.

      Then a (complete) expression level dominance toward species X means: e(x)+e(y)=e(X) and e(x)+e(y)≠e(Y);

      While a homoeolog expression bias toward species X means: e(x) > e(y), or e(x)/e(y) > e(X)/e(Y), depending on the definition of studies.

      Both expression-level dominance and homeolog expression bias have been widely studied in allopolyploids (Combes et al., 2013; Li et al., 2014; Yoo et al., 2014; Hu & Wendel, 2019). As the two phenomena could be in opposite directions, and may be caused by different mechanisms, we think adopting the definitions in Grover et al., (2012) and distinguishing the two concepts would facilitate communication.

      • Is it possible to split up the results in Figure 7 to show which of the two homeologs was lost (i.e. orientalis vs. grandiflora)? Or at least clarify in the legend that these scenarios are pooled together in the figure?

      Maybe using acronyms without explanation made the figure titles hard to understand, but in the original Figure 7 the loss of two homoeologs were shown separately. Figure 7a,c showed the loss of C. orientalis-homoeolog (“co-expession loss”), and Figure 7b,d showed the loss of C. grandiflora-homoeolog (“cg-expession loss”). Now the legends have been modified to explain the Figure.

      • The paragraph starting with "The extant diploid species" is too long, should probably be split into two paragraphs and edited for clarity.

      The whole paragraph was used to explain why the resynthesized allotetraploids could be a realistic approximation of the early stage of C. bursa-pastoris with two arguments:

      1) The further divergence between C. grandiflora and C. orientalis after the formation of C. bursa-pastoris should be small compared to the total divergence between the two parental species; 2) The mating systems of real parental populations were most likely the same as today. Now the two arguments were separated as two paragraphs, and the second paragraph has been shortened.

      • On the other hand, the number of seeds per fruit" implies this is evidence for an alternative hypothesis, when I think it's really just more support for the same idea.

      “On the other hand” was used to contrast the reduced number of pollen grains and the increased number of seeds in natural allotetraploids. As both changes are typical selfing syndrome, indeed the two support the same idea. We replaced the “On the other hand” with “Moreover”.

      • ‘has become self-compatible before the formation" "has become" should be "became".

      The tense of the word has been changed.

      • If natural C. bursa-pastoris indeed originated from the hybridization between C. grandiflora-like outcrossing plants and C. orientalis-like self-fertilizing plants, the selfing syndrome in C. bursa-pastoris does not reflect the instant dominance effect of the C. orientalis alleles, but evolved afterward.’ This sentence should be closer to the end of the paragraph, after the main morphological results are summarized.

      Thank you for the suggestion. The paragraph is indeed more coherent after moving the conclusion sentence.

      References

      Combes, M.C., Dereeper, A., Severac, D., Bertrand, B. & Lashermes, P. (2013) Contribution of subgenomes to the transcriptome and their intertwined regulation in the allopolyploid Coffea arabica grown at contrasted temperatures. New Phytologist, 200, 251–260.

      Grover, C.E., Gallagher, J.P., Szadkowski, E.P., Yoo, M.J., Flagel, L.E. & Wendel, J.F. (2012) Homoeolog expression bias and expression level dominance in allopolyploids. New Phytologist, 196, 966–971.

      Hu, G. & Wendel, J.F. (2019) Cis – trans controls and regulatory novelty accompanying allopolyploidization. New Phytologist, 221, 1691–1700.

      Li, A., Liu, D., Wu, J., Zhao, X., Hao, M., Geng, S., et al. (2014) mRNA and Small RNA Transcriptomes Reveal Insights into Dynamic Homoeolog Regulation of Allopolyploid Heterosis in

      Nascent Hexaploid Wheat. The Plant Cell, 26, 1878–1900. Wendel, J.F. (2015) The wondrous cycles of polyploidy in plants. American Journal of Botany, 102, 1753–1756.

      Woodhouse, M.R., Cheng, F., Pires, J.C., Lisch, D., Freeling, M. & Wang, X. (2014) Origin, inheritance, and gene regulatory consequences of genome dominance in polyploids. Proceedings of the National Academy of Sciences of the United States of America, 111, 5283–5288.

      Yoo, M.J., Liu, X., Pires, J.C., Soltis, P.S. & Soltis, D.E. (2014) Nonadditive Gene Expression in Polyploids. https://doi.org/10.1146/annurev-genet-120213-092159, 48, 485–517.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their insightful comments. The main issue raised by the reviewers was that because E6AP depletion reduced checkpoint signaling vis MASTL upregulation, this pathway is likely to be involved also in DNA damage checkpoint activation, in addition to checkpoint recovery. Hence, the proposed “timer”-like model is not fully supported. However, it is important to note that, the expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6AP-depletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript.

      We have also addressed other concerns raised by the reviewers, as explained in the point-to-point responses below. With the addition of new modifications and data, we believe the revised manuscript is complete and conclusive.

      Reviewer #1 (Public Review):

      In principle a very interesting story, in which the authors attempt to shed light on an intriguing and poorly understood phenomenon; the link between damage repair and cell cycle re-entry once a cell has suffered from DNA damage. The issue is highly relevant to our understanding of how genome stability is maintained or compromised when our genome is damaged. The authors present the intriguing conclusion that this is based on a timer, implying that the outcome of a damaging insult is somewhat of a lottery; if a cell can fix the damage within the allocated time provided by the "timer" it will maintain stability, if not then stability is compromised. If this conclusion can be supported by solid data, the paper would make a very important contribution to the field.

      However, the story in its present form suffers from a number of major gaps that will need to be addressed before we can conclude that MASTL is the "timer" that is proposed here. The primary concern being that altered MASTL regulation seems to be doing much more than simply acting as a timer in control of recovery after DNA damage. There is data presented to suggest that MASTL directly controls checkpoint activation, which is very different from acting as a timer. The authors conclude on page 8 "E6AP promoted DNA damage checkpoint signaling by counteracting MASTL", but in the abstract the conclusion is "E6AP depletion promoted cell cycle recovery from the DNA damage checkpoint, in a MASTL-dependent manner". These 2 conclusions are definitely not in alignment. Do E6AP/MASTL control checkpoint signaling or do they control recovery, which is it?<br /> Also, there is data presented that suggest that MASTL does more than just controlling mitotic entry after DNA damage, while the conclusions of the paper are entirely based on the assumption that MASTL merely acts as a driver of mitotic entry, with E6AP in control of its levels. This issue will need to be resolved.

      We thank the reviewer for his/her insightful comments. The main issue raised by the reviewers was that because E6AP depletion reduced checkpoint signaling vis MASTL upregulation, this pathway is likely to be involved also in DNA damage checkpoint activation, in addition to checkpoint recovery. Hence, the proposed “timer”-like model is not fully supported. However, it is important to note that, the expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6AP-depletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript.

      As suggested by the reviewer, we have rephrased the statement in abstract to “E6AP depletion reduced DNA damage signaling, and promoted cell cycle recovery from the DNA damage checkpoint, in a MASTLdependent manner”.

      As a mitotic kinase, MASTL promotes mitotic entry and progression. This is well in line with our findings that DNA damage-induced MASTL upregulation promotes cell cycle re-entry into mitosis. MASTL upregulation could also inhibit DNA damage signaling. This manner of feedback, inhibitory, modulation of DNA damage signaling by mitotic kinases (e.g., PLK1, CDK) has been implicated in previous studies (reviewed in Cell & Bioscience volume 3, Article number: 20 (2013)). In the revised manuscript, we have included more discussions about this aspect of checkpoint regulation.

      Finally, the authors have shown some very compelling data on the phosphorylation of E6AP by ATM/ATR, and its role in the DNA damage response. But the time resolution of these effects in relation to arrest and recovery have not been addressed.

      Detailed time point information is now added in the figure legends for E6AP phosphorylation data. We were able to observe this event during early stages (e.g., 1 hr, or 2-4 hr) of the DNA damage response, prior to significant MASTL protein accumulation.

      Reviewer #2 (Public Review):

      This is an interesting study from Admin Peng's laboratory that builds on previous work by the PI implicating Greatwall Kinase (the mammalian gene is called MASTL) in checkpoint recovery.

      The main claims of this study are:

      1) Greatwall stability is regulated by the E6-AP ubiquitin ligase and this is inhibited following DNA damage in an ATM dependent manner.

      2) Greatwall directly interacts with E6-AP and this interaction is suppressed by ATM dependent phosphorylation of E6-AP on S218

      3) E6-AP mediates Greatwall stability directly via ubiqitylation

      4) E6-AP knock out cells show reduced ATM/ATR activation and quicker checkpoint recovery following ETO and HU treatment

      5) Greatwall mediated checkpoint recovery via increased phosphorylation of Cdk substrates

      In my opinion, there are several interesting findings presented here but the overall model for a role of the E6-AP -Greatwall axis is not fully supported by the current data and will require further work. Moreover, there are a number of technical issues making it difficult to assess and interpret the presented data.

      Major points:

      1) The notion that Greatwall is indeed required for checkpoint recovery hinges on two experiments shown in Figures 5A and B where Greatwall depletion blocks the accumulation of HELA cells in mitosis following recovery from ETO treatment and in G2/M following release from HU. An alternative possibility to the direct involvement of Greatwall in checkpoint recovery could be that Greatwall in HeLA cells is required for S-phase progression (as for example Charrasse et al. suggested). A simple control would be to monitor the accumulation of mitotic cells by microscopy or FACS following Greatwall depletion without any further checkpoint activation.

      We thank the reviewer for his/her insightful comments.

      Charrasse et al. showed ENSA knockout prolonged, but not stopped the progression of S-phase. In our experiments, MASTL (partial) knockdown did not significantly impact HeLa cells proliferation in the absence of DNA damage (Fig. 5, supplemental 1A). The reported role of MASTL in checkpoint recovery was consistently seen in response to various drugs, including etoposide which typically induces G2 arrest. Thus, we do not believe a prolonged S-phase accounts for the checkpoint recovery phenotype.

      2) The changes in protein levels of Greatwall and the effects of E6-AP on Greatwall stability are rather subtle and depend mostly on a qualitative assessment of western blots. Where quantifications have been made (Figures 2D and 4F) the loading control and the starting conditions for Greatwall (0 timepoints in the right panel) appear saturated making precise quantification impossible. I would argue that the authors should at least quantify the immuno-blots that led them to conclude on changes in Greatwall levels and make sure that the exposure times used are in the dynamic range of the camera (or film). A more precise experiment would be to use the exogenously expressed CFP-Greatwall that is described in Figure 6 and measure the acute changes in protein levels using quantitative fluorescence microscopy in live cells. This is, in my opinion, a lot more trustworthy than quantitative immuno-blots.

      I also note here that most experiments linking Greatwall levels to E6-AP were done using siRNA, while the E6-AP ko cells would be a more reliable background for these experiments, especially with reconstituted controls.

      DNA damage-induced MASTL upregulation was observed in various cell lines and after different treatments. To further strengthen this point, as suggested by the reviewer, we have included quantification of fluorescent measurements (Fig. 2, supplemental 1 A-C). Quantification of immunoblots for MASTL upregulation was also added in Fig. 1, supplemental 1E. The effects of E6AP depletion were consistently shown for both siRNA and stable KO.

      3) This study has no data linking the effects of Greatwall to its canonical target PP2A:B55. The model shown in Figure 9 is therefore highly speculative. The possibility that Greatwall could act independently of PP2A:B55 should at least be considered in the discussion given the lack of experimental evidence.

      The role of MASTL in promoting cell cycle progression via suppressing PP2A/B55 has been well established. As suggested by the reviewer, we have included discussions to acknowledge that “The role of MASTL upregulation in promoting checkpoint recovery and cell cycle progression can be attributed to inhibition of PP2A/B55, although the potential involvement of additional mechanisms is not excluded”.

      4) The major effect of E6-AP depletion on the checkpoint appears to be a striking reduction in ATM/ATR activation, suggesting that this ubiquitin ligase is involved in checkpoint activation rather than recovery. It is not clear if this phenotype is dependent on Greatwall. If so it would be hard to reconcile with the default model that E6-AP acts via the destabilisation of Greatwall. In the permanent absence of E6-AP, increased Greatwall levels should inactivate B55:PP2A. How would this lead to a decrease in ATM/ATR activation? This is unlikely, and indeed Figure 5E shows that the reduction of MASTL in parallel to E6-AP does not result in elevated levels of ATR/ATM activation. Conversely, the S215A E6-AP mutant does have a strong rescue impact on ATR/ATM (Figure 8D).

      We do not propose that PP2A/B55 directly dephosphorylates ATM/ATR-mediated phosphorylation. In fact, PP2A/B55 dephosphorylates and inactivates mitotic kinases and substrates which can feedback inhibit DNA damage checkpoint signaling (as previously shown for PLK1 and CDK). We included in a discussion about this point in the revised manuscript.<br /> The point regarding checkpoint activation vs recovery is addressed below (point 5).

      5) In summary, I do not think that the presented experiments clearly dissect the involvement of E6-AP and Greatwall in checkpoint activation and recovery. E6-AP depletion has a strong effect on checkpoint activation while Greatwall depletion is likely to have various checkpoint-independent effects on cell cycle progression.

      It is important to note that, the expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6APdepletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, Li et al. describe the contribution of the ATM-E6AP-MASTL pathway in recovery from DNA damage. Different types of DNA damage trigger an increase in protein levels of mitotic kinase MASTL, also called Greatwall, caused by increased protein stability. The authors identify E3 ligase E6AP to regulate MASTL protein levels. Depletion or knockout of E6AP increases MASTL protein levels, whereas overexpression of E6AP leads to lower MASTL levels. E6AP and MASTL were suggested to interact in conditions without damage and this interaction is abrogated after DNA damage. E6AP was shown to be phosphorylated upon DNA damage on Ser218 and a phosphomimicking mutant does not interact with MASTL. Stabilization of MASTL was hypothesized to be important for recovery of the cell cycle/mitosis after DNA damage.

      The identification of this novel pathway involving ATM and E6AP in MASTL regulation in the DNA damage response is interesting. However, is surprising that authors state that not a lot is known about DNA damage recovery while Greatwall and MASTL have been described to be involved in DNA damage (checkpoint) recovery. In addition, PP2A, a phosphatase downstream of MASTL is a known mediator of checkpoint recovery, in addition to other proteins like Plk1 and Claspin. Although some of the publications regarding these known mediators of DNA damage recovery are mentioned, the discussion regarding the relationship to the data in this manuscript are very limited.

      We thank the reviewer for his/her insightful comments. As suggested, the previously reported role of PLK1 and other cell cycle kinases in DNA damage checkpoint recovery is discussed in more details in the revised manuscript. As for PP2A/B55, we do not think it promotes checkpoint recovery, e.g., by dephosphorylating ATM/ATR or their substrates. Instead, this phosphatase dephosphorylates cell cycle kinases or their substrates, such as CDK1 or PLK1.

      The regulation of MASTL stability by E6AP is novel, although the data regarding this regulation and the interaction are not entirely convincing. In addition, several experiments presented in this paper suggest that E6AP is (additionally) involved in checkpoint signalling/activation, whereas the activation of the G2 DNA damage checkpoint was described to be independent of MASTL. Has E6AP multiple functions in the DNA damage response or is ATM-E6AP-MASTL regulation not as straightforward as presented here?

      Altogether, in my opinion, not all conclusions of the manuscript are fully supported by the data.

      We showed that E6AP depletion reduced checkpoint signaling vis MASTL upregulation, so this pathway is likely to be involved also in DNA damage checkpoint activation, in addition to checkpoint recovery. However, it is important to note that, the expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6APdepletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      In principle a very interesting story, that attempts to shed light on an intriguing and poorly understood phenomenon; the link between damage repair and cell cycle re-entry once a cell has suffered from DNA damage. The issue is highly relevant to our understanding of how genome stability is maintained or compromised when our genome is damaged. The authors present the intriguing conclusion that this is based on a timer, implying that the outcome of a damaging insult is somewhat of a lottery; if a cell can fix the damage within the allocated time it will maintain stability, if not then stability is compromised. However, the story in its present form suffers from a number of major gaps that will need to be addressed

      Major point:

      My primary concern regarding the main conclusion is that altered MASTL regulation seems to be doing much more than simply promoting more rapid recovery after DNA damage. This concern comes from the following gaps that I noted whilst reading the paper:

      • Knock out of E6AP, is leading to a dramatic inhibition of ATM/ATR activation after damage (Fig.5C,D,E), this is (partially) rescued by co-depletion of MASTL (Fig5E). The authors will have to show that the primary effect of altered MASTL regulation is improved recovery, rather than reduced checkpoint activation. In other words, is initial checkpoint activation in cells that have lost E6AP normal, or do these cells fail to mount a proper checkpoint response? If the latter is true, that could completely alter the take home-message of this paper, because it could mean that E6AP/MASTL do not act as a "timer", but as a "tuner" to set checkpoint strength at the start of the DNA damage response. The authors themselves conclude on page 8 "E6AP promoted DNA damage checkpoint signaling by counteracting MASTL", but in the abstract the conclusion is "E6AP depletion promoted cell cycle recovery from the DNA damage checkpoint, in a MASTL-dependent manner". These 2 conclusions are definitely not in alignment, do E6AP/MASTL control checkpoint signaling or do they control recovery?

      The expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6AP-depletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript. We have also made clarification to the statement indicated by the reviewer.

      • MASTL KD has a rather unexpected effect on cell cycle progression after HU synchronization (Fig.5B). It seems that the MASTL KD cells fail to exit from the HU-imposed G1/S arrest, an effect that is not rescued in the E6AP knock-outs. Inversely, E6AP knock-outs seem to more readily exit from the HU-imposed arrest, an effect that is completely lost after knock-down of MASTL. How do the authors interpret these results? Their conclusions are entirely based on a role for MASTL as a driver of mitotic entry, with E6AP in control of its levels, but this experiment suggests that MASTL and E6AP are controlling very different aspects of cell cycle control in their system.

      As the reviewer pointed out, our data in checkpoint signaling and cell cycle progression suggested that MASTL upregulation could also inhibit DNA damage signaling, in addition to promoting cell cycle progression. This manner of feedback, inhibitory, modulation of DNA damage signaling by mitotic kinases (e.g., PLK1, CDK) has been implicated in previous studies (reviewed in Cell & Bioscience volume 3, Article number: 20 (2013)). In the revised manuscript, we have included discussions about this aspect of checkpoint regulation.

      • It is not possible to evaluate the validity of the conclusions that are based on Figure 6. We need to know how long the cells were treated with HU to disrupt the interaction between E6AP and MASTL. Is the timing of this in the range of the timing of MASTL increase after damage? A time course experiment is required here.

      • The data obtained on E6AP-S218 phosphorylation and with the S218A mutant during damage and recovery look very promising. But again, the release from HU is confusing me as to what to conclude from them. Also, the authors should show how S218A expression affects MASTL levels (before and after damage). Also, a time course of ATM/ATR activation is required to decide if initial or late ATM/ATR signaling is affected.

      Detailed time point information is now added in the figure legends for E6AP phosphorylation and E6AP-MASTL dissociation data. We were able to observe these events during early stages (e.g., 1 hr, or 2-4 hr) of the DNA damage response, prior to significant MASTL protein accumulation.

      • The conclusion that "and was not likely to be caused by the completion of DNA repair, as judged by the phosphorylation of replication protein A" (page 5) is based on western blots that represent the average across the entire population. It is possible that MASTL expression is still low in the cells that have not completed repair, while it's increase on blots comes from a subset of cells where repair is complete. The authors should perform immunofluorescence so that expression levels of MASTL can be directly compared to levels of phospho-RPA in individual cells. In fact, the manuscript could benefit a lot from a more in-depth single-cell (microscopy)-based analysis of the relations over time between ATM/ATR activation, E6AP phosphorylation, MASTL stabilization versus the checkpoint arrest and subsequent recovery.

      Time point analyses were provided for DNA damage-induced RPA phosphorylation and ATM/ATR substrate phosphorylation (Fig. 1). These data showed MASTL accumulation in the presence of active DNA damage checkpoint signaling. To further strengthen this point, as suggested by the reviewer, we have included quantification of fluorescent measurements (Fig. 2, supplemental 1 A-C). IF data showed MASTL upregulation in correlation with ATM/ATR activation.

      Minor points:

      It's not "ionized radiation", but "ionizing radiation" (page 5)

      We have made the correction as pointed out by the reviewer.

      Expression levels of MASTL should be quantified over time after DNA damage. In some of the experiments the increase seems to plateau relatively quick (HU treatment, fig 1B, 1-2 hours), while in others the levels continue to increase over longer periods (HU treatment, fig 1D, 6 hours). This is relevant to the timer function of MASTL that is proposed here.

      The kinetics of MASTL upregulation is generally consistent among all cell lines. As suggested, quantification of immunoblots is provided (Fig. 1, supplemental 1E); additional quantification of IF signals is also included (Fig. 2, supplemental 1 A-C).

      The experiment executed with caffeine (page 5) should be repeated with more selective/potent ATM/ATR inhibitors that are commercially available.

      Specific ATM inhibitor was used to confirm the caffeine result in Fig. 7 supplemental 1B&C.

      "a potential binding pattern" (page 6) should be "a potential binding partner"

      We have made the correction as pointed out by the reviewer.

      Reviewer #2 (Recommendations For The Authors):

      1) All western blots require size markers. The FACS blots shown do not have any axis labels.

      We have included size markers for blots, at the first appearance of each antibody. Labels are added for FACS blots.

      2) The quantification of mitotic cells does not indicate how many cells were counted and if this was done by eye or using software.

      The missing experimental information is included in the figure legends, as suggested.

      3) The western blots demonstrating ubiquitylation of Greatwall (Figure 4D) are of very poor quality and impossible to interpret.

      The ubiquitination of MASTL did not show clear ladders, possibly due to its relative protein size.

      Reviewer #3 (Recommendations For The Authors):

      Specific suggestions to improve the manuscript:

      1) Include literature regarding known mediators of DNA damage checkpoint recovery, including MASTL/Greatwall and PP2A, in the manuscript and discuss the observations from this manuscript in relationship with the literature.

      Related literatures are included in the discussion.

      2) The increase in MASTL protein levels upon DNA damage are not always clear, for example Fig. 1A. The same for MASTL stability after DNA damage, such as in Fig. 2C. Quantification of the westerns would help demonstrating a significant effect.

      As suggested by the reviewer, we have included quantification of fluorescent measurements (Fig. 2, supplemental 1 A-C). Quantification of immunoblots for MASTL upregulation was also added in Fig. 1, supplemental 1E.

      3) The E6AP-MASTL in vitro interaction studies shown in Fig. 3 raise doubts. First, beads only are used as negative control, whereas MBP only-beads are a better control. The westerns in top panels of 3B (MASTL), 3C (GST-MASTL) and 3D (MASTL) should be improved. In addition, in Fig. 3C, different GSTMASTL fragments are used in an MBP-E6AP pull down, but the GST-MASTL input does not show any specific band to demonstrate that these fragments are correct. The same for the GFP-E6AP fragments in Fig. 3 Suppl. 1C The input does not show any proteins, there is no N fragment present in the IP and the size of the fragment N3 in the IP GFP does not seem correct.

      Altogether, it makes me doubt that the interaction between E6AP and MASTL is direct. Better data with appropriate controls should show whether the interaction is direct or mediated via another protein.

      Purified proteins used for the in vitro interaction had significant degradation, causing many bands in the input. We included a lighter exposure of the input here as Author response image 1. MBP alone did not bind MASTL, as both M and C segments of MASTL were MBP-tagged, and did not pull down MASTL. We agree with the reviewer that our direct interaction data showed rather weak MASTL/E6AP interaction, suggesting the interaction is dynamic or possibly mediated by additional binding proteins. We have included this statement in the revised manuscript “Taken together, our data characterized MASTL-E6AP association which was likely mediated via direct protein interaction, although the potential involvement of additional binding partners was not excluded”.

      Author response image 1.

      4) Fig. 4B. Overexpression of HA-E6AP results in a decrease in MASTL protein levels. Can this effect be rescued by treatment with proteasome inhibitor MG132?

      As expected, MG132 stabilized MASTL, with or without E6AP overexpression. We have added this new data in Fig. 4, supplemental 1B.

      5) Fig. 4G. MASTL interacts with HA-ubiquitin in WT, but not E6AP KO cells. These cells are treated with MG132, so if E6AP really ubiquitinates MASTL, I would expect MASTL to be polyubiquitinated. However, the "interaction signal" does not show polyubiquitination. In fact, this band actually runs lower than MASTL in input samples, which even could be an artifact. Please explain.

      The ubiquitination of MASTL did not show clear ladders, possibly due to its relative protein size. As the reviewer noted, the band position in the HA-Ub IP lanes seemed slightly shifted, compared to the input. We have noticed in many experiments that bands in the IP lanes did not perfectly align with the input lanes.

      6) The DNA damage recovery experiments measuring mitotic index after washing off etoposide (Fig. 5A and Fig. 8A): What are the time points taken? And importantly, why are there no error bars on these intermediate time points, but only on the 4 hour time point?

      As suggested, time point information and additional error bars are included.

      7) Fig. 5E. According to the authors, depletion of MASTL rescues the effect of KO of E6AP. However, no increase in pATM/ATR substrate signal is seen upon etoposide treatment in these samples so I am not convinced this experiment demonstrates a rescue.

      The rescue was evident, especially for many high molecular weight bands which were more effectively detected by this phospho-specific antibody.

      8) Fig. 5C and 8D strongly suggest that E6AP is involved in checkpoint activation. How do these data relate to DNA damage recovery? Is the recovery in E6AP KO cells faster as a consequence of reduced checkpoint signaling or is the recovery effect really specific by stabilization of MASTL? These data should be explained, also taken the data from Wong et al. (Sci. Rep. 2016) into account, that demonstrate that G2 checkpoint activation is independent of MASTL.

      The expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6AP-depletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript.

      9) The model presented in Fig. 9 is puzzling because there does not seem to be a difference between phosphorylation of E6AP and the interaction with MASTL on early versus late times after DNA damage. And this exactly is what is missing in the manuscript: A more detailed evaluation of the timing of E6APSer218 phosphorylation and the E6AP-MASTL interaction in response to DNA damage.

      More clarification is given to explain this model in the figure legend of Fig. 9.<br /> Time point analyses were provided for DNA damage-induced RPA phosphorylation and ATM/ATR substrate phosphorylation (Fig. 1). These data showed MASTL accumulation in the presence of active DNA damage checkpoint signaling. To further strengthen this point, we have included quantification of fluorescent measurements (Fig. 2, supplemental 1 A-C). IF data showed MASTL upregulation in correlation with ATM/ATR activation. Time point information was also added for Ser-218 phosphorylation and MASTL-ENSA dissociation which were observed in early stages of the DNA damage response (1 hr, or 2-4 hr).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The authors sought to examine the associations between child age, reports of parent-child relationship quality, and neural activity patterns while children (and also their parents) watched a movie clip. Major methodological strengths include the sample of 3-8 year-old children in China (rare in fMRI research for both age range and non-Western samples), use of a movie clip previously demonstrated to capture theory of mind constructs at the neural level, measurement of caregiver-child neural synchrony, and assessment of neural maturity. Results provide important new information about parent-child neural synchronization during this movie and associations with reports of parent-child relationship quality. The work is a notable advance in understanding the link between the caregiving context and the neural construction of theory of mind networks in the developing brain.

      We are grateful for the reviewer’s generous and thoughtful summary of our work. We particularly appreciate the recognition of the methodological strengths—including the rare developmental sample, culturally diverse context, and use of naturalistic, theory of mind-relevant stimuli—as well as the importance of integrating neural synchrony and relational variables. The reviewer’s comments affirm the core motivation behind this study: to advance our understanding of how the caregiving environment shapes the neurodevelopment of social cognition in early childhood. We have taken all specific suggestions seriously and hope the revised manuscript more clearly communicates these contributions.

      We appreciate that the authors wanted to show support for a mediational mechanism. However, we suggest that the authors drop the structural equation modeling because the data are cross-sectional so mediation is not appropriate. Other issues include the weak justification of including the parent-child neural synchronization as part of parenting.... it could just as easily be a mechanism of change or driven by the child rather than a component of parenting behavior. The paper would be strengthened by looking at associations between selected variables of interest that are MOST relevant to the imaging task in a regression type of model. Furthermore, the authors need to be more explicit about corrections for multiple comparisons throughout the manuscript; some of the associations are fairly weak so claims may need to be tempered if they don't survive correction.

      Thanks for feedback on the use of SEM in our study. We recognize the limitations of using SEM to infer mediation with cross-sectional data and acknowledge that longitudinal designs are better suited for such analyses. However, our goal was not to establish causality but to explore potential pathways linking parenting, personal traits, and Theory of Mind (ToM) behavior to social cognition outcomes. SEM allowed us to simultaneously examine the relationships among these latent constructs, providing a cohesive framework for understanding the interplay of these factors. That said, we understand your concern and are willing to revise the manuscript to de-emphasize causal interpretations of the SEM findings.

      We thank the reviewer for raising the corrections for multiple comparisons. We confirm that all correlation analyses reported in the manuscript have been corrected for multiple comparisons using the False Discovery Rate (FDR) procedure. In the revised manuscript, we now explicitly indicate FDR correction for all relevant p-values to ensure clarity and transparency. Where this information was previously missing, we have corrected the oversight and clearly labeled the results as FDR-corrected or uncorrected where appropriate. Additionally, we have carefully reviewed our interpretation of all reported associations. For any results that were close to the significance threshold, we have tempered our claims and now describe them as a marginally significant association to avoid overstating our findings.

      The corresponding changes have been made on Discussion section of the revised manuscript.

      Reverse correlation analysis is sensible given what prior developmental fMRI studies have done. But reverse correlation analysis may be more prone to overfitting and noise, and lacks sensitivity to multivariate patterns. Might inter-subject correlation be useful for *within* the child group? This would minimize noise and allow for non-linear patterns to emerge.

      We appreciate the reviewer’s thoughtful suggestion regarding potential limitations of reverse correlation analysis. While we agree that inter-subject correlation (ISC) within the child group may be useful in other contexts, our primary goal in using reverse correlation was not to identify temporally distributed or multivariate response patterns, but rather to isolate specific events within the naturalistic stimulus that reliably evoke Theory of Mind (ToM) and Social Pain-related responses in adults—who possess more stable and mature neural signatures. These adult-derived events serve as anchors for subsequent developmental comparisons and provide a principled way to define timepoints of interest that are behaviorally and theoretically meaningful.

      Using reverse correlation in adults allows us to identify canonical ToM and Social Pain events in a data-driven yet hypothesis-informed manner. We then examine how children’s neural responses to these same events vary with age, neural maturity, and dyadic synchrony. This approach is consistent with prior work in developmental social neuroscience (e.g., Richardson et al., 2018) and offers a valid framework for identifying interpretable social-cognitive events in naturalistic stimuli.

      We have now clarified the rationale for using adult-based reverse correlation in the revised manuscript and explicitly stated its advantages for identifying targeted ToM and Social Pain content in the stimulus.

      The corresponding changes have been made on pages 17 of the revised manuscript.

      “We employed reverse correlation analysis in adults to identify discrete events within the movie that elicited reliable neural responses across participants in ToM and SPM networks.

      The events of adults were chosen for this analysis due to the relative stability and maturity of their social brain responses, allowing for robust detection of canonical ToM and social pain-related moments. These events, once identified, served as stimulus-locked timepoints for subsequent analyses in the child cohort. This approach enables us to examine how children's responses to well-characterized, socially meaningful events vary with age and parent-child dyadic dynamics.”

      No learning effects or temporal lagged effects are tested in the current study, so the results do not support the authors' conclusions that the data speak to Bandura's social learning theory. The authors do mention theories of biobehavioral synchrony in the introduction but do not discuss this framework in the discussion (which is most directly relevant to the data). The data can also speak to other neurodevelopmental theories of development (e.g.,neuroconstructivist approaches), but the authors do not discuss them. The manuscript would benefit from significantly revising the framework to focus more on biobehavioral synchrony data and other neurodevelopmental approaches given the prior work done in this area rather than a social psychology framework that is not directly evaluated.

      We appreciate the reviewer’s thoughtful and constructive feedback. We agree that the current study does not directly test mechanisms central to Bandura’s social learning theory, such as observational learning over time or behavioral modeling. In light of this, we have significantly revised the theoretical framing of the manuscript to focus more directly on the biobehavioral synchrony framework, which more accurately reflects the dyadic neural measures employed in this study and is better supported by our findings.

      Specifically, we have expanded the Discussion to contextualize our findings in terms of biobehavioral synchrony, emphasizing how inter-subject neural synchronization may reflect coordinated parent-child engagement and emotional attunement. We have also incorporated insights from neurodevelopmental and neuroconstructivist models, acknowledging that social cognitive development is shaped by dynamic interactions between neural maturation and environmental input over time.

      Although we continue to briefly reference Bandura’s theory to situate our findings within broader social-cognitive frameworks, we have clearly delineated the boundaries of what our data can support and have tempered previous claims. These changes are intended to better align our conceptual framing with the empirical evidence and relevant theoretical models.

      The corresponding changes have been made on pages 11-12 of the revised manuscript.

      “Insights into mechanisms of Neuroconstructivist Perspectives and Bandura’s social learning theory

      Our findings align with a neuroconstructivist perspective, which conceptualizes brain development as an emergent outcome of reciprocal interactions between biological constraints and context-specific environmental inputs. Rather than presuming fixed traits or linear maturation, this perspective highlights how neural circuits adaptively organize in response to experience, gradually supporting increasingly complex cognitive functions49. It offers a particularly powerful lens for understanding how early caregiving environments modulate the maturation of social brain networks.

      Building on this framework, the present study reveals that moment-to-moment neural synchrony between parent and child, especially during emotionally salient or socially meaningful moments, is associated with enhanced Theory of Mind performance and reduced dyadic conflict. This suggests that beyond age-dependent neural maturation, dyadic neural coupling may serve as a relational signal, embedding real-time interpersonal dynamics into the child’s developing neural architecture [1] . Our data demonstrate that children’s brains are not merely passively maturing, but are also shaped by the relational texture of their lived experiences—particularly interactions characterized by emotional engagement and joint attention. Importantly, this adds a new dimension to neuroconstructivist theory: it is not simply whether the environment shapes development, but how the quality of interpersonal input dynamically calibrates neural specialization. Interpersonal variation leaves detectable signatures in the brain, and our use of neural synchrony as a dyadic metric illustrates one potential pathway through which caregiving relationships exert formative influence on the developing social brain.

      The contribution of this work lies not in reiterating the interplay of nature and nurture, but in specifying the mechanistic role of interpersonal neural alignment as a real-time, context-sensitive developmental input. Neural synchrony between parent and child may function as a form of relationally grounded, temporally structured experience that tunes the child’s social brain toward contextually relevant signals. Unlike generalized enrichment, this form of neural alignment is inherently personalized and contingent—features that may be especially potent in shaping social cognitive circuits during early childhood.

      Although our study was not designed to directly examine learning mechanisms such as imitation or reinforcement, the findings can be viewed as broadly consistent with social learning theory. Bandura's theory posits that human behavior is shaped by observational learning and modeling from others in one's environment [2-4]. According to Bandura, children acquire social cognitive skills by observing and interacting with their parents and other significant figures in their environment. This dynamic interplay shapes their ability to understand and predict the behavior of others, which is crucial for the development of ToM and other social competencies.”

      References

      (1) Hughes, C. et al. Origins of individual differences in theory of mind: From nature to nurture? Child development 76, 356-370 (2005).

      (2) Koole, S. L. & Tschacher, W. Synchrony in psychotherapy: A review and an integrative framework for the therapeutic alliance. Frontiers in psychology 7, 862 (2016).

      (3) Liu, D., Wellman, H. M., Tardif, T. & Sabbagh, M. A. Theory of mind development in Chinese children: a meta-analysis of false-belief understanding across cultures and languages. Developmental Psychology 44, 523 (2008).

      (4) Frith, U. & Frith, C. D. Development and neurophysiology of mentalizing. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 358, 459-473 (2003).

      The significance and impact of the findings would be clearer if the authors more clearly situated the findings in the context of (a) other movie and theory of mind fMRI task data during development; and (b) existing data on parent-child neural synchrony (often uses fNIRS or EEG). What principles of brain and social cognition development do these data speak to? What is new?

      We thank the reviewer for this thoughtful comment. In response, we have revised the Discussion section to more clearly situate our findings within two key literatures: (a) fMRI studies examining Theory of Mind using movie-based and traditional task paradigms across development, and (b) research on parent-child neural synchrony. We now articulate more explicitly how our findings advance current understanding of the neural architecture of social cognition in childhood, and how they contribute new insights into the relational processes shaping brain function. These revisions clarify the conceptual and empirical novelty of our study, particularly in its use of naturalistic fMRI, simultaneous child-parent dyads, and integration of neural maturity with interpersonal synchrony.

      The corresponding changes have been made on pages 12 of the revised manuscript.

      “Our findings contribute to and extend prior research using fMRI paradigms to investigate ToM development in children.  Previous work has shown that these networks become increasingly specialized and differentiated throughout childhood [1-3]. The current study extends these findings by demonstrating that the development of social brain networks is a gradual process that continues beyond the preschool years and is related to children's chronological age. This finding is consistent with behavioral research indicating that ToM and social abilities continue to develop and refine throughout middle childhood and adolescence [4]. Importantly, we move beyond prior work by combining reverse correlation with naturalistic stimuli to isolate discrete, behaviorally meaningful events (e.g., mental state attribution, social rejection) and relate children’s brain responses to adult patterns and social outcomes. This event-level analysis in a dyadic context offers greater ecological and interpretive precision than traditional block or condition-based designs. Our study provides novel evidence for the neural underpinnings of this protracted development, suggesting that the functional maturation of social brain networks may support the continued acquisition and refinement of social cognitive skills.

      In parallel, our study builds on and extends a growing body of work on parent-child neural synchrony, much of which has relied on fNIRS or EEG hyperscanning to demonstrate interpersonal alignment during communication, shared attention, or cooperative tasks [5-7]. While these modalities offer fine temporal resolution, they are limited in spatial precision and typically focus on surface-level cortical regions such as the prefrontal cortex. By contrast, our naturalistic fMRI approach enables the examination of deep and distributed brain networks—specifically those supporting social cognition—within child-parent dyads during emotionally and cognitively rich scenarios. Intriguingly, we found that neural synchronization during movie viewing was higher in child-mother dyads compared to child-stranger dyads.”

      Reference

      (1) Jacoby, N., Bruneau, E., Koster-Hale, J. & Saxe, R. Localizing Pain Matrix and Theory of Mind networks with both verbal and non-verbal stimuli. Neuroimage 126, 39-48 (2016).

      Astington, J. W. & Jenkins, J. M. A longitudinal study of the relation between language and theory-of-mind development. Developmental Psychology 35, 1311 (1999).

      (2) Carter, E. J. & Pelphrey, K. A. School-aged children exhibit domain-specific responses to biological motion. Social Neuroscience 1, 396-411 (2006).

      (3) Cantlon, J. F., Pinel, P., Dehaene, S. & Pelphrey, K. A. Cortical representations of symbols, objects, and faces are pruned back during early childhood. Cerebral Cortex 21, 191-199 (2011).

      (4) Im-Bolter, N., Agostino, A. & Owens-Jaffray, K. Theory of mind in middle childhood and early adolescence: Different from before? Journal of experimental child psychology 149, 98-115 (2016).

      (5) Deng, X. et al. Parental involvement affects parent-adolescents brain-to-brain synchrony when experiencing different emotions together: an EEG-based hyperscanning study. Behavioural brain research 458, 114734 (2024).

      (6) Miller, J. G. et al. Inter-brain synchrony in mother-child dyads during cooperation: an fNIRS hyperscanning study. Neuropsychologia 124, 117-124 (2019).

      (7) Nguyen, T., Bánki, A., Markova, G. & Hoehl, S. Studying parent-child interaction with hyperscanning. Progress in brain research 254, 1-24 (2020).

      There is little discussion about the study limitations, considerations about the generalizability of the findings, and important next steps and future directions. What can the data tell us, and what can it NOT tell us?

      We appreciate the reviewer’s recommendation to elaborate on the study’s limitations, generalizability, and future directions. In response, we have added a dedicated section to the Discussion that critically addresses these considerations. We acknowledge the cross-sectional nature of the study, the modest sample size, and the use of a single stimulus context as key limitations. We also clarify the inferences that can be drawn from our data and what remains speculative. Finally, we outline specific future research directions.

      The corresponding changes have been made on pages 13-14 of the revised manuscript.

      “While leveraging a naturalistic movie-viewing paradigm allowed us to study children's spontaneous neural responses during a semi-structured yet engaging task, dedicated experimental designs are still needed to make stronger inferences about the cognitive processes involved. Additionally, our region-of-interest approach precluded examination of whole-brain networks; future work could explore developmental changes in broader functional circuits. The cross-sectional nature of our study is a further limitation, as it cannot definitively establish the causal directions of the observed relationships. Longitudinal designs tracking children's brain development and social cognitive abilities over time would help clarify whether early parenting impacts later neural maturation and behavioral outcomes, or vice versa. Our sample was restricted to mother-child dyads, leaving open questions about potential differences in father-child relationships and gender effects on parenting neurobiology. Larger and more diverse samples would enhance the generalizability of the findings.

      Several future directions emerge from this research. First, combining naturalistic neuroimaging with structured cognitive tasks could elucidate the specific mental processes underlying children's neural responses during movie viewing. Examining how these processes relate to real-world social behavior would further bridge neurocognitive function and ecological validity. Longitudinal studies beginning in infancy could chart the developmental trajectories of parent-child neural synchrony and their impact on long-term social outcomes. Such work could also explore sensitive periods when parenting may be most influential on social brain maturation. Finally, expanding this multimodal approach to clinical populations like autism could yield insights into atypical social cognitive development and inform tailored intervention strategies targeting parent-child relationships and neural plasticity.”

      To evaluate associations between child neural activity patterns during the movie AND parent-child synchronization patterns AND other variables such as parent-child communication and theory of mind behavior, it seems like a robust approach could be to examine whether similar synchronization patterns are associated with similar scores on different variables. Would allow for non-linear and multivariate associations.

      We greatly appreciate the reviewer’s thoughtful suggestion regarding the use of similarity-based or multivariate analyses to assess whether dyads with similar neural synchronization profiles also exhibit similar scores on behavioral or relational variables. We agree that this type of analysis—such as representational similarity analysis (RSA) or inter-subject pattern similarity—offers a powerful framework for capturing non-linear and multivariate associations, and could provide deeper insights into shared neurobehavioral patterns across participants. However, the analytic logic of similarity-based approaches typically requires the availability of comparable measures across individuals or dyads (e.g., child A and child B must both have measures of brain activity, behavior, and environment). In the present study, our focus was on the child as the behavioral and developmental target, and we did not collect parallel behavioral or cognitive variables from the parent side (e.g., adult Theory of Mind ability, emotional traits, parenting style questionnaires beyond dyadic reports). As a result, it was not feasible to construct pairwise similarity matrices across dyads that include both neural synchrony and matched behavioral dimensions from both individuals.

      Instead, our study was designed to examine how child-level outcomes (e.g., Theory of Mind performance, social functioning) are associated with (a) the child’s neural responses to specific social events, and (b) the degree of neural synchronization with their mother, as a marker of relational engagement. The analytical emphasis, therefore, remained on within-child variation, modulated by the quality of the parent-child interaction.

      Were there associations between parent-child neural synchronization and child age? What was the association between neural maturity and parent-child neural synchronization

      We thank the reviewer for raising this important point regarding associations between parent-child neural synchronization (ISS), child age, and neural maturity.

      As reported in the original manuscript, we did not observe significant correlations between parent-child ISS and child age for either the Theory of Mind (ToM) or Social Pain Matrix (SPM) networks (all ps > 0.1). Additionally, we conducted additional analysis, we found no significant correlations between ISS and neural maturity (Author response image 1, r = 0.2503, p = 0.1533).

      These findings indicate that parent-child neural synchronization in this naturalistic viewing context is not simply explained by age-related maturation or children's neural maturity level. Instead, ISS may predominantly reflect real-time interpersonal engagement or relational dynamics rather than individual developmental trajectories or neural maturity.

      Author response image 1.

      Scatterplot showing the association between parent-child inter-subject synchronization (ISS) and neural maturity, averaged across the Theory of Mind (ToM) and Social Pain Matrix (SPM) networks. Each point represents one dyad. No significant correlation was observed between ISS and neural maturity (r = 0.2503, p = 0.1533, suggesting that interpersonal neural synchronization and individual neural maturation may reflect dissociable aspects of social brain development.

      The rationale for splitting the ages into 3 groups is unclear and creates small groups that could be more prone to spurious associations. Why not look at age continuously?

      We thank the reviewer for raising this important point. We fully agree that analyzing age as a continuous variable is statistically more robust and minimizes concerns about spurious associations due to arbitrary groupings.

      To clarify, all primary statistical models—including correlational analyses—treated age as a continuous variable, and our core developmental inferences are based on these continuous-age findings.

      In addition to these analyses, we included age group comparisons as a supplementary approach, guided by both theoretical considerations and visual inspection of the data. Specifically, we aimed to explore whether functional differentiation between social brain networks (e.g., ToM and SPM) might begin to emerge non-linearly or earlier than expected, particularly in the youngest children. Such early neural divergence may not be well-captured by linear trends alone. The grouped analysis allowed us to illustrate that network differentiation was already observable in children under age 5, suggesting that certain aspects of social brain organization may emerge earlier than classically assumed.

      We have now clarified this rationale in the revised manuscript and emphasized that the group-based analysis was used solely to highlight developmental shifts that may not follow a linear pattern, and not for formal hypothesis testing.

      The corresponding changes have been made on pages 9 of the revised manuscript.

      “While our primary analyses treated age as a continuous variable, we also performed exploratory group-based comparisons to probe for potential non-linear developmental shifts in social brain network organization. This approach revealed that the differentiation between ToM and SPM networks was already present in the youngest group (ages 3–4), suggesting that early neural specialization may begin prior to the age at which ToM behavior is reliably observed. These group-level observations provide complementary evidence to the continuous analyses and may inform future work examining sensitive periods or early markers of social brain development.”

      Tables would be improved if they were more professionally formatted (e.g., names of the variables rather than variable abbreviation codes).

      We appreciate the reviewer’s suggestion to improve the clarity and professionalism of our tables. In the revised manuscript, we have reformatted all tables to include full variable names rather than abbreviations or coded labels, and we ensured consistency in terminology across the manuscript text, tables, and figure legends. We have also added explanatory footnotes where needed to clarify any derived or composite measures. We hope these revisions improve the accessibility and readability of the results for a broader audience

      Reviewer #2:

      Summary:

      This study investigates the impact of mother-child neural synchronization and the quality of parent-child relationships on the development of Theory of Mind (ToM) and social cognition. Utilizing a naturalistic fMRI movie-viewing paradigm, the authors analyzed inter-subject neural synchronization in mother-child dyads and explored the connections between neural maturity, parental caregiving, and social cognitive outcomes. The findings indicate age-related maturation in ToM and social pain networks, emphasizing the importance of dyadic interactions in shaping ToM performance and social skills, thereby enhancing our understanding of the environmental and intrinsic influences on social cognition.

      Strengths:

      This research addresses a significant question in developmental neuroscience, by linking social brain development with children's behaviors and parenting. It also uses a robust methodology by incorporating neural synchrony measures, naturalistic stimuli, and a substantial sample of mother-child dyads to enhance its ecological validity. Furthermore, the SEM approach provides a nuanced understanding of the developmental pathways associated with Theory of Mind (ToM).

      We appreciate the positive evaluation and valuable comments of the reviewer. According to the reviewer`s comments, we have revised the manuscript thoroughly to address the concerns raised by the reviewer. A point-by-point response to each of the issues raised by the reviewer has been made. We believe that the revision of our manuscript has now been significantly improved.

      Upon reviewing the introduction, I feel that the first goal - developmental changes of the social brain and its relation to age - seems somewhat distinct from the other two goals and the main research question of the manuscript. The authors might consider revising this section to enhance the overall coherence of the manuscript. Additionally, the introduction lacks a clear background and rationale for the importance of examining age-related changes in the social brain.

      We thank the reviewer for this thoughtful observation. In response, we have revised the Introduction to better integrate the developmental aspect of the social brain with the broader research aims. We now explicitly link age-related changes in social brain organization to the emergence of social cognitive abilities and highlight why early childhood (ages 3–8) represents a particularly formative period. This revision clarifies that our first aim—examining functional specialization and neural maturity in Theory of Mind (ToM) and Social Pain Matrix (SPM) networks—serves as a developmental foundation for understanding how dyadic influences, such as neural synchrony and caregiving quality, shape children’s social cognition.

      We have also improved the rationale for examining age-related change, drawing on key literature in developmental neuroscience to show how the early emergence and specialization of social brain networks provide a necessary context for interpreting interpersonal neural dynamics.

      The corresponding changes have been made on pages 3 of the revised manuscript.

      “These findings suggest that the development of specialized brain regions for reasoning about others' mental states and physical sensations is a gradual process that continues throughout childhood.

      Understanding how these networks differentiate with age is essential not only for mapping typical brain development, but also for contextualizing the role of environmental influences. By establishing normative patterns of neural maturity and differentiation, we can better interpret how relational experiences—such as caregiver-child synchrony and parenting quality—modulate these trajectories. Thus, our first goal provides a developmental anchor that grounds our investigation of interpersonal and environmental contributions to social brain function.”

      The manuscript uses both "mother-child" and "parent-child" terminology. Does this imply that only mothers participated in the fMRI scans while fathers completed the questionnaires? If so, have the authors considered the potential impact of parental roles (father vs. mother)?

      We thank the reviewer for raising this important point regarding terminology and parental roles. To clarify, all participating caregivers in the current study were biological mothers, and all behavioral questionnaires were also completed by these same mothers. No fathers were included in this study. We have revised the manuscript throughout to consistently use the term “mother-child” when referring to the specific dyads in our sample.

      We also appreciate the opportunity to elaborate on the rationale for including only mothers. Prior research has shown that maternal and paternal influences on child development are not interchangeable, and that the neural correlates of caregiving behaviors differ between mothers and fathers. For example, studies have demonstrated distinct patterns of brain activation during social and emotional processing in mothers versus fathers (Abraham et al., 2014; JE Swain et al., 2014). Given these differences, we deliberately focused on mother-child dyads to maintain neurobiological consistency in our analysis and reduce variance associated with heterogeneous caregiving roles. We now clarify this rationale in the revised Methods and Discussion sections.

      The corresponding changes have been made on pages 14 of the revised manuscript.

      “We chose to focus exclusively on mother-child dyads in this study based on prior evidence suggesting distinct neural and behavioral caregiving profiles between mothers and fathers [1-2], allowing us to maintain role consistency and reduce variability in dyadic interactions.

      Our sample was restricted to mother-child dyads, leaving open questions about potential differences in father-child relationships and gender effects on parenting neurobiology [1]. Larger and more diverse samples would enhance the generalizability of the findings.”

      Reference:

      (1) Swain, J. E. et al. Approaching the biology of human parental attachment: Brain imaging, oxytocin and coordinated assessments of mothers and fathers. Brain research 1580, 78-101 (2014).

      (2) Abraham, E. et al. Father's brain is sensitive to childcare experiences. Proceedings of the National Academy of Sciences 111, 9792-9797 (2014).

      There is inconsistent usage of the terms ISC and ISS in the text and figures, both of which appear to refer to synchronization derived from correlation analysis. It would be beneficial to maintain consistency throughout the manuscript.

      We thank the reviewer for highlighting the inconsistent use of “ISC” and “ISS” in the original manuscript. We agree that clarity and consistency in terminology are essential. In response, we have revised the manuscript to consistently use “ISS” (inter-subject synchronization) throughout the text, figures, tables, and legends.

      Of the 50 dyads, 16 were excluded due to data quality issues, which constitutes a significant proportion. It would be helpful to know whether these excluded dyads exhibited any distinctive characteristics. Providing information on demographic or behavioral differences-such as Theory of Mind (ToM) performance and age range between the excluded and included dyads would enhance the assessment of the findings' generalizability.

      We thank the reviewer for this important observation. We agree that understanding the characteristics of excluded participants is essential for assessing the generalizability of the findings.

      In response, we conducted comparative analyses between included and excluded dyads (N = 34 included; N = 16 excluded) on key demographic and behavioral variables, including child age, gender, and Theory of Mind (ToM) performance. These analyses revealed no significant differences between groups on any of these measures (ps > 0.1), suggesting that data exclusion due to quality issues (e.g., excessive motion, incomplete scans) did not introduce systematic bias.

      We have now added this information to the Results and Methods sections of the manuscript.

      The corresponding changes have been made on pages 6 and 17 of the revised manuscript.

      “Of the 50 initial mother-child dyads recruited, 16 were excluded due to excessive head motion (n = 11), incomplete scan sessions (n = 3), or technical issues during data acquisition (n = 2). The final sample consisted of 34 dyads. To assess potential bias introduced by data exclusion, we compared included and excluded dyads on child age, gender, and Theory of Mind performance. No significant differences were found across these variables (all ps > 0.1), suggesting that the analytic sample was demographically representative of the full cohort.

      Comparison between included and excluded dyads revealed no significant differences in child age (t = 1.23, p = 0.24), ToM scores (t = -0.54, p = 0.59), or sex distribution (χ² < 0.01, p = 0.98), indicating that data exclusion did not bias the sample in a systematic way.”

      The article does not adhere to the standard practice of using a resting state as a baseline for subtracting from task synchronization. Is there a rationale for this approach? Not controlling for a baseline may lead to issues, such as whether resting state synchronization already differs between subjects with varying characteristics.

      We thank the reviewer for raising this important methodological point. We agree that controlling for baseline synchronization, such as using a resting-state scan as a comparison, can help disambiguate whether task-induced synchrony reflects genuine stimulus-driven coupling or baseline differences across individuals or dyads.

      In the present study, we focused on inter-subject synchronization (ISS) during naturalistic movie viewing, a task condition that has been widely used in previous developmental and social neuroscience research to assess shared neural engagement. We did not include a resting-state scan in the current protocol due to time constraints and the young age of our participants (ages 3–8), as longer scanning sessions often result in increased motion and reduced data quality in pediatric populations. Moreover, many prior studies using ISS in naturalistic paradigms have similarly focused on task-driven synchrony without subtracting a resting baseline (e.g., Hasson et al., 2004; Nguyen et al., 2020; Reindl et al., 2018).

      That said, we acknowledge that baseline neural synchrony across dyads may vary depending on individual or relational characteristics (e.g., temperament, arousal, attentional style), and this remains an important question for future research. In the revised Discussion, we now explicitly note the absence of a resting-state baseline as a limitation and highlight the need for future studies to examine how resting and task-based ISS may interact, particularly in the context of child-caregiver dyads.

      The corresponding changes have been made on page 13 of the revised manuscript.

      “Another limitation of the current design is the lack of a resting-state baseline for inter-subject synchronization. While our focus was on synchronization during naturalistic social processing, we cannot determine whether individual differences in ISS reflect purely task-induced coupling or are partially shaped by trait-level synchrony present at rest. Including both resting and task conditions in future work would allow for stronger inferences about stimulus-specific versus baseline-driven synchronization, especially in relation to interpersonal factors such as relationship quality or social responsiveness.”

      The title of the manuscript suggests a direct influence of mother-child interactions on children's social brain and theory of mind. However, the use of structural equation modeling (SEM) may not fully establish causal relationships. It is possible that the development of children's social brain and ToM also enhances mother-child neural synchronization. The authors should address this alternative hypothesis of the potential bidirectional relationship in the discussion and exercise caution regarding terms that imply causality in the title and throughout the manuscript.

      We appreciate the reviewer’s careful attention to issues of causality in our manuscript. We agree that our cross-sectional design limits causal inference, and that the use of structural equation modeling (SEM) in this context does not allow for conclusions about directional or mechanistic pathways. In response, we have revised the Discussion to explicitly acknowledge these limitations, and now include an expanded section on the potential for bidirectional or co-constructed processes, consistent with neuroconstructivist frameworks.

      We have also tempered the interpretation of our SEM findings, avoiding causal language throughout the manuscript and clarifying that our analyses are exploratory and associational in nature. We hope that these changes provide a more cautious and developmentally grounded interpretation of the data.

      With regard to the title, we respectfully chose to retain the original wording, as we believe it captures the thematic focus and central research question of the paper—namely, the potential role of mother-child interaction in the development of children’s social brain and Theory of Mind. While we understand the reviewer’s concern, we note that the interpretation of this phrasing is contextualized within the manuscript, which now includes clear qualifications regarding the limits of causal inference. We have taken care to ensure that no claims of unidirectional causality are made in the body of the paper.

      The corresponding changes have been made on pages 11- 12 of the revised manuscript.

      “Our findings align with a neuroconstructivist perspective, which conceptualizes brain development as an emergent outcome of reciprocal interactions between biological constraints and context-specific environmental inputs. Rather than presuming fixed traits or linear maturation, this perspective highlights how neural circuits adaptively organize in response to experience, gradually supporting increasingly complex cognitive functions54. It offers a particularly powerful lens for understanding how early caregiving environments modulate the maturation of social brain networks.

      Building on this framework, the present study reveals that moment-to-moment neural synchrony between parent and child, especially during emotionally salient or socially meaningful moments, is associated with enhanced Theory of Mind performance and reduced dyadic conflict. This suggests that beyond age-dependent neural maturation, dyadic neural coupling may serve as a relational signal, embedding real-time interpersonal dynamics into the child’s developing neural architecture. Our data demonstrate that children’s brains are not merely passively maturing, but are also shaped by the relational texture of their lived experiences—particularly interactions characterized by emotional engagement and joint attention. Importantly, this adds a new dimension to neuroconstructivist theory: it is not simply whether the environment shapes development, but how the quality of interpersonal input dynamically calibrates neural specialization. Interpersonal variation leaves detectable signatures in the brain, and our use of neural synchrony as a dyadic metric illustrates one potential pathway through which caregiving relationships exert formative influence on the developing social brain.

      The contribution of this work lies not in reiterating the interplay of nature and nurture, but in specifying the mechanistic role of interpersonal neural alignment as a real-time, context-sensitive developmental input. Neural synchrony between parent and child may function as a form of relationally grounded, temporally structured experience that tunes the child’s social brain toward contextually relevant signals. Unlike generalized enrichment, this form of neural alignment is inherently personalized and contingent—features that may be especially potent in shaping social cognitive circuits during early childhood.

      The cross-sectional nature of our study is a further limitation, as it cannot definitively establish the causal directions of the observed relationships. Longitudinal designs tracking children's brain development and social cognitive abilities over time would help clarify whether early parenting impacts later neural maturation and behavioral outcomes, or vice versa.”

      I would appreciate more details about the 14 Theory of Mind (ToM) tasks, which could be included in supplemental materials. The authors score them on a scale from 0 to 14 (each task 1 point); however, the tasks likely vary in difficulty and should carry different weights in the total score (for example, the test and the control questions should have different weights). Many studies have utilized the seven tasks according to Wellman and Liu (2004), categorizing them into "basic ToM" and "advanced ToM." Different components of ToM could influence the findings of the current study, which should be further examined by a more in-depth analysis.

      We thank the reviewer for raising this important point regarding the structure and scoring of the Theory of Mind (ToM) tasks. We will provide a detailed description of all 14 tasks in the Supplemental Materials, including their content, targeted mental state concepts (e.g., beliefs, desires, intentions), and design features (e.g., test/control items, task format).

      We fully agree that ToM tasks differ in complexity, and in principle, a weighted or component-based scoring approach (e.g., distinguishing basic and advanced ToM) could offer greater interpretive value. However, in our study design, tasks were administered in a fixed sequence from lower to higher difficulty, and testing was terminated if the child was unable to successfully complete three consecutive tasks. This approach is developmentally appropriate for younger children but results in non-random missingness for more advanced tasks—particularly among children at the lower end of the age range (3–4 years).

      Given this adaptive task structure, re-scoring using weighted or subscale-based approaches would introduce systematic bias, as children who struggled with early items were not administered more complex ones. As a result, a full breakdown by task type (e.g., basic vs. advanced ToM) would only reflect a restricted subsample and would not be comparable across the full cohort. For this reason, we retained the unit-weighted total ToM score as the most developmentally valid and comparable metric across participants.

      Reviewer #3:

      Summary:

      The article explores the role of mother-child interactions in the development of children's social cognition, focusing on Theory of Mind (ToM) and Social Pain Matrix (SPM) networks. Using a naturalistic fMRI paradigm involving movie viewing, the study examines relationships among children's neural development, mother-child neural synchronization, and interaction quality. The authors identified a developmental pattern in these networks, showing that they become more functionally distinct with age. Additionally, they found stronger neural synchronization between child-mother pairs compared to child-stranger pairs, with this synchronization and neural maturation of the networks associated with the mother-child relationship and parenting quality.

      Strengths:

      This is a well-written paper, and using dyadic fMRI and naturalistic stimuli enhances its ecological validity, providing valuable insights into the dynamic interplay between brain development and social interactions. However, I have some concerns regarding the analysis and interpretation of the findings. I have outlined these concerns below in the order they appear in the manuscript, which I hope will be helpful for the revision.

      We appreciate the reviewer’s thoughtful and constructive summary of the manuscript. The concerns raised regarding aspects of the analysis and interpretation have been carefully considered. Detailed point-by-point responses are provided below, along with descriptions of the corresponding revisions made to improve the clarity, precision, and interpretive caution of the manuscript.

      Given the importance of social cognition in this study, please cite a foundational empirical or review paper on social cognition to support its definition. The current first citation is primarily related to ASD research, which may not fully capture the broader context of social cognition development.

      We thank the reviewer for this helpful suggestion. We agree that a broader, foundational reference is more appropriate for introducing the concept of social cognition. In response, we have revised the Introduction to include a widely cited theoretical or review paper on social cognition to provide a more general developmental context.

      The corresponding changes have been made on pages 3 of the revised manuscript.

      “Social cognition, defined as the ability to interpret and predict others' behavior based on their beliefs and intentions and to interact in complex social environments and relationships is a crucial aspect of human development [1-2]”

      (1) Adolphs, R. The social brain: neural basis of social knowledge. Annual review of psychology 60, 693-716 (2009).

      (2) Frith, C. D. & Frith, U. Mechanisms of social cognition. Annual review of psychology 63, 287-313 (2012).

      It is standard practice to report the final sample size in the Abstract and Introduction, rather than the initial recruited sample, as high attrition rates are common in pediatric studies. For example, this study recruited 50 mother-child dyads, and only 34 remained after quality control. This information is crucial for interpreting the results and conclusions. I recommend reporting the final sample size in the abstract and introduction but specifying in the Methods that an additional 16 mother-child dyads were initially recruited or that 50 dyads were originally collected.

      We thank the reviewer for this helpful recommendation. In the original version of the manuscript, the Abstract and Introduction referenced the total number of dyads recruited (N = 50). In line with standard reporting practices and to ensure clarity regarding the analytic sample, we have now revised both the Abstract and Introduction to report the final sample size (N = 34). The full recruitment and exclusion details—including the number of dyads removed due to excessive motion or technical issues—are now clearly described in the Methods section.

      The corresponding changes have been made on pages 1 and 4 of the revised manuscript.

      In the "Neural maturity reflects the development of the social brain" section, the authors report the across-network correlation for adults, finding a negative correlation between ToM and SPM. However, the cross-network correlations for the three child groups are not reported. The statement that "the two networks were already functionally distinct in the youngest group of children we tested" is based solely on within-network positive correlations, which does not fully demonstrate functional distinctness. Including cross-network correlations for the child groups would strengthen this conclusion.

      We thank the reviewer for this insightful comment. We agree that within-network correlations alone do not fully establish functional distinctness, particularly in early development. To more directly test whether the ToM and SPM networks were already differentiated in children, we have now included the cross-network correlations between the two networks for each of the three age groups in the revised manuscript. These findings support and strengthen our original claim that the ToM and SPM networks are functionally dissociable even in early childhood, and we have revised the relevant Results sections accordingly to reflect this.

      The corresponding changes have been made on page 7 of the revised manuscript.

      “In children, each network also exhibited positive correlations within-network and negative correlations across networks (within-ToM correlation M(s.e.) = 0.31(0.04); within-SPM correlation M(s.e.) = 0.29(0.04); across-network M(s.e.) = −0.09 (0.02).

      In the Pre-junior group only (3-4 years old children, n = 12), both ToM and SPM networks had positive within-network correlations (within-ToM correlation M (s.e.) = 0.29(0.06); within-SPM correlation M(s.e.) = 0.23(0.05), across-network M(s.e.) = −0.05(0.02)).”

      The ROIs for the ToM and SPM networks are defined based on previous literature, applying the same ROIs across all age groups. While I understand this is a common approach, it's important to note that this assumption may not fully hold, as network architecture can evolve with age. The functional ROIs or components of a network might shift, with regions potentially joining or exiting a network or changing in size as children develop. For instance, Mark H. Johnson's interactive specialization theory suggests that network composition may adapt over developmental stages. Although the authors follow the approach of Richardson et al. (2018), it would be beneficial to discuss this limitation in the Discussion. An alternative approach would be to apply data-driven analysis to justify the selection of the ROIs for the two networks.

      We thank the reviewer for this thoughtful and theoretically grounded comment.  In our study, we followed the approach of Richardson et al. (2018), using a priori ROIs defined from adult meta-analyses and ToM/SPM task studies. This approach facilitates comparison with prior work and provides anatomical consistency across participants. However, we fully agree that applying adult-defined ROIs to pediatric populations involves important assumptions about the stability of network architecture across development, which may not fully hold in early childhood.

      We have now addressed this limitation more explicitly in the revised Discussion, emphasizing that the fixed-ROI approach may not capture the dynamic reorganization of social brain networks during development.

      The corresponding changes have been made on pages 13 of the revised manuscript.

      “Moreover, the ROIs used to define the ToM and SPM networks were based on meta-analyses and task studies primarily conducted with adults. While this approach promotes comparability with existing literature, it assumes that the spatial organization of these networks is stable across age groups. However, theories of interactive specialization suggest that the composition and boundaries of functional networks may undergo reorganization during development, with regions potentially entering or exiting networks based on experience and maturational processes. As a result, the current analysis may not fully capture age-specific functional architecture, particularly in younger children. Future studies using data-driven or age-appropriate parcellation methods could provide more precise characterizations of how social brain networks are constructed and differentiated throughout childhood.”

      The current sample size (N = 34 dyads) is a limitation, particularly given the use of SEM, which generally requires larger samples for stable results. Although the model fit appears adequate, this does not guarantee reliability with the current sample size. I suggest discussing this limitation in more detail in the Discussion.

      We thank the reviewer for highlighting the limitations of applying structural equation modeling (SEM) with a relatively modest sample size. We agree that SEM generally benefits from larger samples to ensure model stability and parameter reliability, and that satisfactory model fit does not guarantee robustness in small-sample contexts.

      In the revised Discussion, we now more clearly acknowledge that the use of SEM in the current study is exploratory in nature, and that all results should be interpreted with caution due to potential sample size-related constraints. The model was constructed to provide an integrated view of the observed associations rather than to establish definitive pathways. We have also added a note that future research with larger samples and longitudinal designs will be needed to validate and extend the proposed model.

      The corresponding changes have been made on pages 13 of the revised manuscript.

      “In addition, the modest sample size (N = 34 dyads) presents limitations for the application of structural equation modeling (SEM), which typically requires larger samples for stable estimation and generalizable inferences. While the model fit was acceptable, the results should be interpreted as exploratory and hypothesis-generating, rather than confirmatory. Future studies with larger, independent samples will be important for validating the structure and directionality of the proposed relationships”

      Based on the above comment, I believe that conclusions regarding the relationship between social network development, parenting, and support for Bandura's theory should be tempered. The current conclusions may be too strong given the study's limitations.

      We thank the reviewer for this important and balanced observation. We agree that the conclusions drawn from the current study should reflect the exploratory nature of the analyses, as well as the methodological limitations, including the modest sample size and cross-sectional design.

      In response, we have revised the Conclusion sections to use more cautious, associative language when describing the observed relationships among social brain development, parenting factors, and Theory of Mind outcomes. In particular, we have tempered statements regarding support for Bandura’s social learning theory, clarifying that while our findings are consistent with social learning frameworks, the data do not allow for direct tests of modeling or observational learning mechanisms.

      We hope these revisions help clarify the scope of the findings and improve the conceptual rigor of the manuscript.

      The corresponding changes have been made on pages 14 of the revised manuscript.

      “Our study provides novel evidence that children's social cognitive development may be shaped by the intricate interplay between environmental influences, such as parenting, and biological factors, such as neural maturation. Our findings contribute to a growing understanding of the factors associated with social cognitive development and suggest the potential importance of parenting in this process. Specifically, the study points to the possible role of the parent-child relationship in supporting the development of social brain circuitry and highlights the relevance of family-based approaches for addressing social difficulties. The observed neural synchronization between parent and child, which was associated with relationship quality, underscores the potential significance of positive parental engagement in fostering social cognitive skills. Future longitudinal and clinical research can build on this multimodal approach to further clarify the neurobehavioral mechanisms underlying social cognitive development. Such research may help inform more effective strategies for promoting healthy social functioning and mitigating social deficits through targeted family-based interventions.”

      The SPM (pain) network is associated with empathic abilities, also an important aspect of social skills. It would be relevant to explore whether (or explain why) SPM development and child-mother synchronization are (or are not) related to parenting and the parent-child relationship.

      We thank the reviewer for this thoughtful and important comment regarding the role of the Social Pain Matrix (SPM) network in social cognition and empathy. We agree that this network represents a critical component of social-cognitive development and is theoretically linked to affective processing and interpersonal understanding.

      We would like to clarify that in our existing analyses—already included in the original submission and detailed in the Supplemental Results—SPM network measures showed similar significant associations with behavioral outcomes than the ToM network. These outcomes included children's performance on ToM tasks as well as broader measures of social functioning. We have added more discussion in the supplementary results.

      “To further investigate the specificity of our findings, we conducted additional control analyses focusing on the individual components of the social brain networks examined in our study: the Theory of Mind (ToM) and Social Pain Matrix (SPM) networks.

      When analyzing these networks separately, we found significant correlations between neural maturity and age, as well as between inter-subject synchronization (ISS) and parent-child relationship quality for both the ToM and SPM networks individually (Fig. S1). Specifically, neural maturity within each network was positively correlated with age, indicating that both networks undergo maturation during childhood. Similarly, ISS within each network was negatively correlated with parent-child conflict scores, suggesting that both networks contribute to the observed relationship between neural synchrony and parent-child relationship quality.

      These results highlight the importance of considering the social brain as an integrated system, where the ToM and SPM networks work in concert to support social cognitive development. While each network shows age-related maturation and sensitivity to parent-child relationship quality, their combined functioning appears to be crucial for predicting broader social cognitive outcomes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First, we discovered several erroneous duplicate values in our source data sets from figures S1, 2, 4, and 8, due to mistakes from MATLAB analysis. We have re-analyzed the data and corrected these errors; since limited values in each data set changed, the results were unaffected. The changes are reflected in updated figures and source data.

      Overall, the reviewers gave a positive assessment of our work, but had reservations about:

      (1) Specifics of the iGluSnFR data and analysis

      (2) Overstatement/oversimplification of the importance of syt7 and Doc2

      (3)The strength and interpretation of the EM data 4) The relevance and parametrization of the modeling data

      (1) We have clarified aspects of the iGluSnFR data and analysis in the point-by-point response, as well as in the manuscript.

      (2) We have toned down our statements about the role of syt7 and Doc2 throughout, and emphasized that the DKO data are conclusive and reveal that there must be additional Ca2+ sensors for AR. We have also added to the discussion, noting syt3 as a strong candidate to perform a function analogous to syt7 (to regulate docking), along with another protein (or proteins) performing a role similar to Doc2 (directly in fusion) that has not been identified as a candidate in the field yet.

      (3) We feel the EM data are consistent with the model as much as they could be, and while a sequence of events can only be inferred from time-resolved EM, we believe our work falls in the scope of reasonable interpretation. However, upon reexamining the terminology of ‘feeding’ and related discussion, we realized this could be misleading, so these sections have been revised.

      (4) We have improved the description and interpretation of the model in the manuscript and provide a detailed rationale of our approach in the point-by-point-response.

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) It is surprising the optical GluSnFR approach reports so much asynchronous release in control hippocampal neurons after single stimuli (36% of release). This seems much higher than what is observed at most synapses, where asynchronous release is usually less than 5% of the initial response to the first evoked stimuli. Any thoughts on why the GluSnFR approach reports such a high level of asynchronous release? Could the optical approach be slower in activation kinetics in some cases, which artificially elevates the asynchronous aspect of fusion? This seems to be the case, given electrophysiology recordings in Figure 3 show the asynchronous release component as ~10% in controls at the 1st stimuli (panel C).

      The reported proportion of asynchronous release from cultured hippocampal neurons varies, contingent upon a range of factors (calcium concentration, how asynchronous release is quantified, etc). However, we would argue that there is considerable evidence for a higher percentage of asynchronous release (more than the <5% indicated by the referee) at synapses in the hippocampus. In our previous work on Doc2 using electrophysiology in cultured hippocampal neurons (Yao et al., 2011, Cell), it was noted that there is an approximate 25% incidence of asynchronous release after a single action potential. Furthermore, Hagler and Goda also reported a 26% ratio of asynchronous neurotransmitter release, also from cultured hippocampal neurons (Hagler and Goda, 2001, J Neurophysiol.).

      We also point out that another study using iGluSnFR to measure synchronous/asynchronous release ratios, with more sophisticated stimulation, imaging, and analysis procedures than ours, found an average ratio of synchronous to asynchronous release that is in-line with our values, with considerable variability among individual boutons (Mendonça et al., 2022; 25% asynchronous release after a single action potential). We feel that iGluSnFR is actually the superior approach (barring specialized e-phys preparations that can measure quantal events at individual small synapses; please see Miki et al., 2018), as it directly measures the timing of individual release events at individual boutons. By comparison, in most electrophysiology experiments there is a large peak of synchronous release from many synapses. iGluSnFR also bypasses postsynaptic considerations such as receptor kinetics and desensitization, or asynchronous release being poorly aligned to AMPA receptors, per a recent study of ours (Li et al., 2021), and a study showing 25% of asynchronous release occurs outside the active zone (Malagon et al., 2023). All these factors could obscure asynchronous release or otherwise make it difficult to measure by electrophysiology. To our knowledge, the approach in Miki et al., 2018 best bypasses these limitations, though the data in that study are from exceptionally fast and synchronous cerebellar synapses, and so cannot be directly compared to our findings. Thus, it is possible that iGluSnFR can report more asynchronous release than electrophysiological recordings, but this may actually reflect real biology.

      This being said, after considering the reviewer’s points we realized that our analysis method likely underestimates the total amount of synchronous release when using the high-affinity sensor (Figure 1). We quantify release by ‘events’ (that is, peaks), which does not take into account multiquantal peaks resulting from near-simultaneous multivesicular release. We have previously determined by quantal analysis that most synchronous peaks after a single action potential are multiquantal, while for asynchronous release there are still multiquantal events but they are in the minority (Vevea et al., 2021; Mendonça et al., 2022). So, in our data sets, the total amount of synchronous release is underestimated more so than asynchronous release. Thus, 37% asynchronous release is probably an overestimate, which explains the 12% difference compared to Mendonça et al., 2022, who used sophisticated quantal analysis (though that study also was performed at room temperature, which could also cause differences). We have now pointed this out in the text:

      “This ratio of synchronous to asynchronous release is likely an underestimate, since our analysis only counts the number of peaks (‘events’) and does not take into account multiquantal peaks resulting from near-simultaneous multivesicular release. We have previously determined by quantal analysis that most synchronous peaks are multiquantal after a single action potential, while for AR there are still multiquantal events but they are in the minority (Vevea et al., 2021). So, in our measurements, the total amount of synchronous release is underestimated; sophisticated quantal analysis using the A184V iGlusnFR recently found the percentage of total release that is AR to be ~25%, with otherwise similar results to ours (Mendonça et al., 2022) . Nonetheless, this approach faithfully distinguishes synchronous from asynchronous release…”

      However, while this method underestimates total synchronous release, it does not misclassify synchronous events as asynchronous because of kinetics. Even the slower iGluSnFR variant does not have a rise time that would misrepresent a synchronous event as asynchronous (Marvin et al., 2018). Mendonça et al (2022) note that averaged iGluSnFR traces for the A184V are biphasic, with the transition from fast to slow component occurring around 10 ms. These authors also determined that the temporal resolution of glutamate imaging is actually limited by the frame rate, not the biosensor, and based on simulations found that detection time was biased in their data to be about 1 ms earlier than the actual timing of release events.

      The reviewer’s final point about Figure 3 is a misunderstanding, as these are data from iGluSnFR, not electrophysiology. The asynchronous proportion in these experiments is ~10% because, as noted in the manuscript, we used a faster, lower-affinity variant of iGluSnFR in train stimulation experiments (Figure 2). In contrast to the high-affinity sensor, as explained above, in our analysis this variant would be expected to underestimate the amount of asynchronous release because it fails to detect many uniquantal release events (presumably those further from the focal plane, with too little fluorescence to reach our detection threshold) as evidenced by the fact that the apparent mini rate is much lower as measured by this sensor compared to higher-affinity variants. Since synchronous peaks are mostly multiquantal after a single action potential, while asynchronous peaks are mostly uniquantal, a fraction of release going undetected results in mostly smaller synchronous peaks, which are counted the same in our analysis while many asynchronous peaks are missed entirely. We have added a bit more clarification in the text to avoid confusion on this point:

      “This sensor underestimates the fraction of AR (~10% of total release for a single action potential) as compared to the A184V variant used above that overestimates the fraction of AR (~35% of total release for a single action potential). This is because it is less sensitive and misses many uniquantal events; as discussed above, our analysis quantifies release by number of peaks, and most synchronous peaks are multiquantal after a single action potential, while most AR peaks are uniquantal (Vevea et al., 2021). Still, the S72A variant reported the same phenotypes as the A184V variant after the first action potential (Fig. 3B, C).”

      As discussed above, we think the synchronous-to-asynchronous ratio is actually harder to determine with electrophysiology, and the preparations are different (acute slice vs dissociated culture); still, our electrophysiological measurements are in line with the iGluSnFR data: 29% for Figure 2 and 26% from the first action potential of Figure 4. These values also agree with the findings from Yao et al. (2011) and Hagler and Goda (2001), discussed above.

      Finally, the ultimate goal of our study was to measure the effects of deleting Doc2 and syt7 on synchronous and asynchronous release, not to measure the exact ratio between the two. If iGluSnFR greatly misreported synchronous events as asynchronous, we would expect the results from the knockouts to diverge between our imaging and electrophysiology data, which they do not. We have also previously applied this approach to syt1 knockouts, showing the characteristic desynchronization of release (Vevea et al., 2020). Furthermore, the high-affinity and low-affinity iGluSnFR variants, which as discussed above in our analysis overestimate and underestimate the fraction of release that is asynchronous, respectively, both reported the same phenotypes.

      (2) In the acute hippocampal physiology traces, it looks like the effect on cumulative release in Doc2A mutants only appears around ~40 msec after stimulation. This is a relatively late phase of asynchronous release. Any reason this effect does not show up sooner, where most asynchronous fusion events occur, or is this due to some technical aspects of the physiology clamp that masks earlier components?

      The reviewer is correct, although the curves actually diverge at around 30 ms (see image below). This can be attributed to the fact that the EPSCs in our recordings are broad, probably because of the large number of different synaptic inputs captured in our stimulation and recording paradigm (note that the currents are also quite large), resulting in a broad spread in the timing of release. That is to say, synchronous release is likely still occurring fairly late into the trace, obscuring any changes in asynchronous release earlier than 30 ms. This is not related to Doc2 specifically, as the EGTA charge transfer curve also diverges from the control curve at the same time. This EGTA control gives us confidence that our broad EPSCs still faithfully report synchronous and asynchronous release, even if the exact timing is spread-out to some extent.

      Author response image 1.

      (3) How do the authors treat multi-vesicular release in their synchronous/asynchronous quantification? It was not clear from the methods section. Many of the optical traces show dual peaks - are those that occur in the 10 ms bin assigned to synchronous and those outside to asynchronous? Are the authors measuring the area of the response or just the peak amplitude for the measurements? The methods seem to indicate peak amplitude, but asynchronous is better quantified with area measurements for electrophysiology.

      This is an excellent point by the reviewer, and in the Methods we now explicitly state how we treat multivesicular release/multiple peaks in our analysis. Release timing is assigned based on peak timing, including when there are multiple peaks at the same bouton.

      “Timing of release was determined based on the frame in which the signal peaked, including for dual peaks in the case of synchronous and asynchronous release at the same bouton.”

      Regarding the comparison to area measurements for electrophysiology, we agree with the reviewer, which is why we used such an approach for our electrophysiological data. However, a key advantage of iGluSnFR is the ability to resolve individual quantal events (or, as is often the case for synchronous release, simultaneous multiquantal events), so temporal binning of the peaks is the appropriate analysis approach regarding these data. This is comparable to the analysis used for electrophysiology recordings of responses from single small synapses, which also detects individual quantal events, where release timing is calculated as the latency between the stimulus and the beginning of each EPSC (Miki et al., 2018).

      This leaves the general concern that multiple vesicle fusions at the same bouton that occur milliseconds apart could blur together and make it more difficult to accurately determine release timing, particularly with the slower sensor used in the single-stim experiments in Figure 1. We believe this is not a major concern, since we also performed experiments with the much faster sensor, S72A which can resolve peaks from 100 Hz stimulation (Marvin et al., 2018). Furthermore, while the peak-calling method we used is crude by comparison, the synchronous/asynchronous ratio we report is similar to that of Mendonça et al. (2022) who used a higher frame rate and deconvolution to produce more easily distinguishable quanta when synchronous and asynchronous release occur at the same bouton after the same action potential.

      (4) It would be relevant to show that calcium binding mutations in Syt7 do not support SV docking/capture in the current assays, given some evidence for Syt7 calcium-independent activities has been reported in the field.

      To our knowledge, when using the correct mutations to block calcium binding, none of the reported syt7 knockout phenotypes (including those reported by our laboratory in Liu et al., 2014) have ever been rescued. However, this does not formally rule out a calciumindependent role in transient docking. For the EM data, we originally considered including rescue experiments with normal and non-calcium binding mutants of both syt7 and Doc2 in our study. However, our EM approach is spectacularly expensive and labor-intensive and such experiments would as much as triple the amount of EM work in the study. We plan on doing such experiments, and there is a great deal of additional structure-function work to be done on both these proteins. We feel that reassessing the calcium binding mutants with iGluSnFR and zap-andfreeze falls into the scope of this future work. For now, this as a limitation of the current study.

      (5) The authors are not consistent in how they describe the role of the two proteins in asynchronous release, with the reader often drawing the impression that these two proteins solely mediate this aspect of SV fusion. As the authors note, some synapses do not require Syt7 or Doc2 for SV release, indicating different asynchronous sensors or molecular components at distinct brain synapses. Indeed, asynchronous release is only reduced, not eliminated, in the double mutants the authors report, so other components are at play even in these hippocampal synapses. The authors should be more consistent in noting this in their text, as the wording can be confusing as noted below:

      "Together, these data further indicated that AR after single action potentials is driven by Doc2α, but not syt7, in excitatory mouse hippocampal synapses."

      "after a single action potential, Doc2α accounts for 54-67% of AR at hippocampal excitatory synapses, whereas deleting syt7 has no effect."

      "This, along with our finding that syt7/Doc2a DKOs still had remaining AR, raises the possibility that there are other unidentified calcium sensors for AR."

      We have made adjustments throughout to not overstate the role of syt7 and Doc2, including at the locations the reviewer points out. This is an important point from the reviewer, and not just to avoid misleading readers. It is itself interesting; in the original manuscript we should have emphasized, far more than we did, that the DKO experiments strongly point to asyet-unidentified proteins being involved in asynchronous release. This has been rectified in the revised text: we now emphasize that another calcium sensor for asynchronous release is likely present at all relevant points in the manuscript.

      (6) Given the authors' data, I don't think it's fair to say "raises the possibility" of other AR sensors, as almost 50% of AR remained in the Doc2A mutant in some of the experimental approaches. Clearly, other AR calcium sensors or molecular components are required, so better to just state that in the 1st paragraph of the discussion with something like: "Given syt7/Doc2a DKOs still had remaining AR, further work should explore the diversity of synaptic Ca2+ sensors and how they contribute to heterogeneity in synaptic transmission throughout the brain."

      We agree; this was poor phrasing on our part. We meant to imply that there may be proteins that have not even been considered, because it is also technically possible that the remaining asynchronous release is supported by the known machinery (i.e., syt1). We have changed “raises the possibility” to “indicates”.

      Minor points:

      (1) Remove "on" from the abstract sentence "Consequently, both synchronous and asynchronous release depress from the second pulse on during repetitive activity".

      We have changed “on” to “onward” to reduce ambiguity.

      (2) Shouldn't syt7 be Syt7 and syt1 be Syt1 when referring to the proteins?

      To our knowledge there is not a hard-and-fast convention for non-acronym mouse protein abbreviations. The technically correct full name is lowercase, so we find it reasonable to use lowercase for the abbreviation.

      (3) Both calcium and Ca2+ are used in the manuscript - better to stick to one term throughout.

      We thank the referee for catching this error; we now use only “Ca2+” throughout our study.

      Reviewer #2 (Recommendations For The Authors):

      (1) While the GluSnFR experiments appear to be well done, what is striking is the relatively small and "jagged" fluorescent responses. Are the authors concerned that they are missing many fast (with peaks occurring within 10 ms) synchronous events and incorrectly identifying them asynchronous? If this is not a concern, why not?

      With respect to the small raw responses, this is the nature of measuring individual quanta from individual boutons while imaging at 100 Hz, even with the excellent signal-to-noise ratio of the iGluSnFR variants we used.

      As far as kinetics, as noted in the response to Reviewer 1 point #1, even the slower iGluSnFR variant has a rise time fast enough that it cannot misrepresent a synchronous event as asynchronous (Marvin et al., 2018). This threshold for iGluSnFR has been used by others: see Mendonça et al., 2022, who note that averaged iGluSnFR traces are biphasic, with the transition from fast to slow component occurring around 10 ms. The ‘jaggedness’ is in large part due to the frame rate (100 Hz); Mendonça et al., 2022 used 250 Hz and deconvolution to produce smoother, cleaner traces, but still achieved similar results to us.

      Finally, we reiterate what we wrote in response to Reviewer 1 point #1: “the ultimate goal of our study was to measure the effects of deleting Doc2 and syt7 on synchronous and asynchronous release, not to measure the exact ratio between the two. If iGluSnFR misreported synchronous events as asynchronous, we would expect the results from the knockouts to diverge between those data and our electrophysiology data, which they do not. We have also previously applied this approach to syt1 knockouts, showing the characteristic desynchronization of release (Vevea et al., 2020). Also, the phenotypes reported by the faster and slower iGluSnFR variants were identical. ”

      (2) On page 6, I'm not sure I would agree that short-term plasticity is "so catastrophically disrupted". It is probably enough to say that plasticity is disrupted in the ko.

      We argue that syt7 knockout causes the most severe phenotype specific to short-term plasticity so far described (that is, without affecting initial release probability), but we have changed “catastrophically” to “strongly”.

      (3) Differences in the post-stim number of "docked" vesicles between conditions are, in absolute numbers, very small. For example, it seems that the number of docked vesicles goes from ~ 2.2 prior to stimulation, to ~ 1.5 in the first 5 ms window following stimulation. While this number may be statistically significant, I worry about bias and sampling errors. It is comforting that images are randomized prior to analysis. Nevertheless, the differences are very small and this should be explicitly acknowledged.

      This ~40% decrease in number of docked vesicles in dissociated cultured hippocampal neurons has been consistent throughout all our studies using flash-and-freeze and zap-and-freeze electron microscopy (Watanabe et al., 2013; Kusick et al., 2020, Li et al., 2021), as well as those of other labs (Chang et al., 2018). Statistically, 40% is far beyond the limit to detect differences between samples with 200-300 synapses quantified per condition and an average of ~2 docked vesicles per image. The low absolute number of docked vesicles per synaptic profile (since the 40 nm section only captures a portion of the active zone, which contain an average of 12 docked vesicles in total; Kusick et al., 2020) is not relevant except that it does reduce the statistical power to detect differences, but this is compensated for by the huge number of images we capture and annotate per sample. We are able to detect differences in fusion and endocytic pits (albeit with much less precision and sensitivity), such as the Doc2 phenotype in this study, even though these events are an order of magnitude rarer than docked vesicles. Biologically, in our view, a 40% reduction in all docked vesicles across all synapses, considering that the majority of synapses do not have even 1 vesicle fusion, after only a single action potential, is substantial. We have even been puzzled why there is such a large decrease, but as stated above this result has been consistent for a decade of using this approach. For comparison to the magnitude of baseline docking changes in mutants, this 40% is similar to the effect of deleting synaptotagmin 1 (Imig et al, 2014; Chang et al, 2018; note in Imig et al., considered a gold standard in the field, the average number of docked vesicles per tomogram is ~10, but there are fewer than 25 tomograms per sample, so the actual amount of sampling in our data set is slightly greater).

      (4) The related point is that how can one know about the "transient" nature of vesicle docking when the analysis is performed on completely different sections from different cells? Moreover, what does it mean that the docked granules have recovered or not recovered (abstract)? This should be explained in more detail.

      This is a fundamental difficulty of interpreting time-resolved electron microscopy data. We cannot observe a sequence of events at any given synapse, but only try to measure each time point as accurately as we can and interpret the data.

      By ‘recovery’ we simply mean that the number of docked vesicles at a given time point after stimulation is similar to the no-stimulation baseline. We have replaced ‘recovery’ in the abstract with ‘replenishment’ to avoid confusion.

      We now realize that in the context of this study the term ‘transient docking’ is confusing, since we only measured out to 14 ms in this study. In experiments with samples frozen at 5 ms, 14 ms , 100 ms, 1,s and 10 s, the return to baseline at 14 ms appears temporary, since samples frozen at 100 ms have a similar reduction of docked vesicles as those at 5 ms (Kusick et al., 2020). The number of vesicles again returns to baseline at 10 s, so we used the term ‘transient docking’ to distinguish the recovery at 14 ms from the slower and presumably permanent return to baseline that takes 10 s. The apparently temporary nature of this process is why we believe it contributes to facilitation, which likewise peaks soon after stimulation and decays over the course of ~100 ms.

      To make the transient docking terminology less confusing, we have removed the word ‘transiently’ from the title and added a clarification of what transient docking is when it is first mentioned:

      “vesicles can dock within 15 ms of an action potential to replenish vacated release sites and undock over the next 100 ms”

      As noted by the reviewer, such a sequence of events, where vesicles dock within 14 ms, then undock over the course of 100 ms, then dock again over the course of 10 s, is an inference, but is based on predictions from electrophysiological data and modeling (see Silva, Tran, and Marty, 2021 for review; those authors use the term ‘calcium-dependent docking’ but this refers to the same process), and as yet there is no way to directly observe vesicle dynamics at synapses down to nanometer resolution in live cells.

      On the reviewers recommendation we have removed references to syt7 ‘feeding’ vesicles from the abstract and the beginning of the “physiological relevance” section of the discussion. This phrasing could imply a direct molecular pipeline between syt7 and syt1/Doc2, which is a misrepresentation of our actual model that syt7 simply helps recruit docked vesicles.

      “These findings result in a new model whereby syt7 drives activity-dependent docking, thus providing synaptic vesicles for synchronous (syt1) and asynchronous (Doc2 and other unidentified sensors) release during ongoing transmission.”

      “In the case of paired-pulse facilitation it can supply docked vesicles for syt1-mediated synchronous release to enhance signaling; it likely functions in the same manner to reduce synaptic depression during train stimulation. In the case of AR, syt-7-mediated docked vesicles can be used by Doc2α, which then directly triggers this slow mode of transmission.”

      (5) In this study, docking is phenomenologically defined and, therefore, arbitrary; vesicles are defined as docked if there is no space between them and the plasma membrane. What happens if the definition is broadened to include some small distance between the respective membranes? Does the timecourse of "recovery" change?

      We always quantify at least all vesicles within 100 nm of the active zone; these data are shown in Figure S6D. We show only docking in the main figures because, consistent with our previous work and as stated in the text, we found no change in the number of vesicles at any distance from the plasma membrane at the active zone after stimulation, nor did we find any difference in the mutants. In our previous work on syt7 (Vevea et al., 2021) we quantified all the vesicles within the synapse and also found no differences after stimulation or in the KO further from the active zone.

      The reviewer is correct that the term ‘docking’ at synapses is often used quite arbitrarily; even among morphological studies the definition is inconsistent. We consider our strict docking definition that we explain in the manuscript (in high-pressure-frozen and freeze-substituted samples) of no visible distance between membranes to be less arbitrary, since only the number of these attached vesicles decreases after stimulation (Watanabe et al., 2013, Kusick et al., 2020, Li et al., 2021, this study) and in SNARE knockouts (Imig et al., 2014). Broadening the definition, as is done in some other studies (for example Chang et al., 2018), retains the effect, since the majority of vesicles within 10 nm are at ~0 nm, but again all that is actually changing is the number of vesicles at ~0 nm.

      (6) My overall impression is that this model is not adding much to the story. Specifically, the model was not fit to any data and has a huge number of states and free parameters given the dynamics that it is trying to capture (ie I think this is overkill). Many of the free parameters were arbitrarily constrained with little to no justification and there was minimal parameter space exploration, in part because the model wasn't being quantitatively constrained to any data. While advertised to be a 3-state model, there is a combinatorial explosion of substates by distinguishing between levels of calcium occupancy simultaneously in three separate calcium sensors so that one ends up with 9 empty states, 9 tethered states, and 45 docked states for a total of 63 distinguishable states. At 63 states and 21 free parameters, one could of course model just about any dynamics imaginable. But the relatively simple dynamics of AR and its perturbation by removal of Doc2 and Syt7 can likely be captured with far fewer states and parameters (such as Neher's recent proposal). Specifically, starting with the Neher ES-LS-TS model along with adding a transient labile docked state affected by Syt7 and Doc2 (TSL in Neher nomenclature), I wonder if the authors could more or less capture what they are observing during stimulus trains. The advantage of a minimal model is that readers don't have to struggle with fairly elaborate systems of differential equations and parameter plots to get a feel for what's going on. Especially since the point of this model is to develop intuition rather than to capture with physical accuracy exactly what is transpiring at a docked vesicle (which would require many more details excluded from the current model).

      We would like to thank the reviewer for pointing out unclarities and mistakes in the description of the model. We have worked on improving on these points. We now more elaborately explain why we have made certain assumptions and what decisions we have made to constrain the parameter values in the model. As the reviewer points out other models might also work in explaining the dynamics of the experimental data presented in this paper. Thus, we agree that it is unlikely that this theory and model implementation is the only one that can account for the observations. With this model we aimed to investigate whether the theory proposed based on the experimental data could indeed reproduce the dynamics that are observed experimentally. In the section below we will briefly explain why we made different decisions in constructing the model to comment on the reviewer’s concerns. We will also discuss more precisely what adjustments we have made to the model’s description to improve its readability and be open about its limitations.

      One of the main concerns of the reviewer is that the model has many states and free parameters, some of which are poorly constrained. We agree that the model indeed contains many states. However, in essence, the model corresponds to a two-step docking model, in which SVs get tethered to an empty release site and subsequently dock/prime in a fusion-competent state. This structure of the model corresponds to the ES-LS-TS model (Neher and Brose 2018, Neuron) mentioned by the reviewer or the replacement-docking model (Miki et al., 2016, Neuron). As the reviewer points out, by making the transition rates calcium-dependent in those models, we would indeed be able to capture similar dynamics with these models as with ours. However, instead of directly implementing calcium-dependent rates, we let the rates depend on the number of calcium ions bound to syt7, Doc2 and Syt1. We decided to do so, as some information on the calcium binding dynamics of these proteins is available. By simulating the calcium binding to the proteins explicitly we could integrate this knowledge into our model. Moreover, by explicitly simulating calcium-binding to these proteins, we included the time it takes before a new steady state-binding occupancy is reached after a change of calcium levels. Especially for Ca2+ sensors with slow kinetics such as, syt7 and Doc2, this is crucial. These properties are highly relevant for asynchronous release (which we quantified as the release >5 ms after onset of AP). The consequence is that because of combinatorics (e.g., if we assume 5 calcium ions to bind to syt1 and 2 to Doc2 this leads to 24 different states), explicit simulation of all relevant states extends the number of potential different states a vesicle can be in. In the main text of the manuscript, we added this explanation on why we decided on the structure of the model as it is presented and discussed it in context of other previous models.

      Our decision to simulate calcium binding to syt1, syt7 and Doc2 also increased the number of parameters in our model. As the reviewer points out, the large number of parameters in our model compared to the relative low number of features in the experimental behavior the model is compared to – is a limitation. However, after thorough exploration of the model, we are certain that the model cannot create any type of desired dynamics. The large number of parameters does make it possible that different combinations of parameter values would lead to similar responses, as can be seen in the parameter space exploration in Figure S9. This means that our modelling effort does not provide estimates of parameter values. We now mention this explicitly in the discussion section of the model. Some of the parameter values we were able to constrain based on previous literature (10 parameters), others were more arbitrary set (8 parameters), and some of them were adjusted to match the experimental data closely (7 parameters). We indicated more clearly now in Supplementary Table 3 to which category each parameter value belongs in table. We determined the values of the model parameters through a manual exploration of the parameter space. One of the main reasons why we decided not to perform a fitting of the model to data obtained in this work is that the obtained parameters would not be informative (e.g., multiple combinations of parameters will lead to similar results). We agree with the reviewer that a direct quantitative comparison between model predictions and experimental data obtained by fitting would be nice. However, fitting the model to experimental data would be close to impossible computationally. This is in part because of the large number of states, but mainly due to the large number of APs that need to be simulated. Especially since the transients in our model have slow and fast parts (the decay of the residual Ca2+-transient, and the peak of the local Ca2+transient), the model is challenging to solve with ODE solvers available in Matlab, even when using a high-performance computer system optimized for parallel computation (32 cores). Moreover, fitting the model to experimental data would require the addition of extra assumptions and parameters to the model. As the experiments are performed using different samples, different parameter settings are probably required (e.g. it is likely that the number of release site or the fusion probability differs between cultured hippocampal neurons and hippocampal slices). Additionally, if we decide to fit the model, we would need to define a cost function (i.e., a quantitative measure of how well the model is fitting to experimental data), which requires us to determine the different weights the different experiments we are comparing our model predictions to have. The decision on how to weight the different types of data is very difficult (not to say arbitrary).

      Therefore, we constrained the parameter values in our model based on a manual (but systematic) exploration of the parameter space. The simulations of the model were evaluated based on the increase in the number of docked vesicles between 5 and 15 ms after AP stimulation (this should be as large as possible for the control and Doc2- model, and close to 0 for the syt7- model simulations), the peak release rates in response to the first AP (to be equal between all conditions), the ratio between the peak release rate of the 1st and 10th response (depressive phenotype should be more prominent in the syt7- model simulation and the least in the Doc2- simulation), and the amount of asynchronous release (syt7- and Doc2- simulations should have approximately half of the total amount of asynchronously released vesicles compared to the control simulations). Moreover, the parameter values for the calcium transient should be realistic. We do not know the exact parameter values of the calcium transient in the samples used in the experiments performed here, but previous studies have provided a range of realistic parameter values (Brenowitz and Regehr 2007, PMID: 17652580; Helmchen et al., 1998, PMID: 9138591; Sabatini and Regehr 1998, PMID: 9512051; Wang et al., 2008, PMID: 19118179). Furthermore, we decided to set the parameters describing calcium binding to syt7 and Doc2 to the same values, as the scope of the model was to investigate the role of syt7 and Doc2 in asynchronous release when they act on different steps in the reaction scheme. By using the same parameter values both proteins are identical except for their mechanism of action. We added this section to the methods of the manuscript.

      In the parameter space evaluation, we decided to vary parameters one-by-one or in pairs of two. We decided not to further extend the parameter space evaluation as it will be challenging to give a proper interpretation of these results, to visualize them, and to simulate it (computationally expensive).

      (7) The graphics, equations, and nomenclature all need some work. The equations aren't numbered or indexed, so I can't really refer to any of them in particular, but the symbols being used generally were not defined well enough for a naïve reader to follow. The 15 diffEQs compressed into a single expression at the bottom of page 19 are basically impenetrable. The 'equation' near the bottom of p. 20 is not an equation - it is a set of four symbols lacking a definition. The fusion rate equation (with f1 and f2 factors) isn't spelled out clearly enough (top of p. 20). Can fusion occur from any of the 45 docked states but just with a different probability? Or does fusion only occur from the 3 states where Doc2+Syt1 Ca occupancy = 5? The graphical representation of Syt7 occupancy and its effects in Fig S7 doesn't work well. Tons of color and detail but very hard to decipher and intuit what Syt7 is doing to the SV buried in the arrow lengths. And this is a crucial point of the paper - it really needs to shine through in this figure.

      We thank the reviewer for pointing out the unclarities in the description of the model. We have worked on improving this section. Specifically, we have improved the equations and now more clearly explain the symbols used in these equations. We have altered the graphical representation of the effect of calcium binding to syt7 on docking and undocking rates.

      (8) I would strongly recommend abandoning this large-scale soft modeling effort altogether, but if the authors feel that all the states and parameters are absolutely required, they need to justify this point, define all symbols systematically, number all equations, and provide some evidence of actual data fitting, systematic parameter space exploration, and more exposition of why they are making the various assumptions and constraints that were used to lower the number of free parameters. For instance, why are the tethering and untethering (or docking and undocking) rate constants set to equal each other? And why is it assumed that Syt7 enhances both the docking and undocking rates? Why is fusion set to occur as long as the sum of Syt1 and Doc2 calcium occupancy is exactly 5 regardless of the specific occupancy of either Syt1 or Doc2? Again probably quite important but unjustified physically. Given the efforts of this model to capture some sort of realistic calcium liganding by Syt1, Syt7, and Doc2, the model doesn't seem to take into account the copy number of each protein at a release site. Shouldn't it matter if there are 2 Syt7s vs 20 Syt7s? Or the stoichiometry between Doc2 and Syt1? Either this model assumes that there is exactly one copy of each protein at a release site or that all copies are always identically liganded and strictly act as a unit. Neither of these possibilities seems plausible.

      Despite the fact that this model (as all models) is a simplified version of reality and despite the fact that this model (as all models) has its limitations, we decided to keep the model in our work to illustrate that this well-defined hypothesis put forth in this paper is consistent with the experimental data. Again, we are not claiming that this model is the only one that may explain this, nor do we claim that we have uniquely identified its parameters. As indicated above, we worked on improving the description of the model in the methods and improved on our description of how the parameter values are constrained. For the reasons mentioned above (first and foremost because of infeasibility due to excessive computation time) we did not perform data fitting or changed the parameter space exploration. We would like to thank the reviewer for pointing out that some of the assumptions of the model are not well enough explained. We added an extra explanation of these assumptions to the main text.

      One of the assumptions we made, as the reviewer points out, is that the tethering and untethering and docking and undocking rates constants are set to equal each other. This is indeed an arbitrary assumption, with the main aim of reducing the number of free parameters in our model given that there is currently no experimental constraint on the relation between the two rate constants. We agree that this assumption is as good as any other, and we have pointed this out more clearly in the main text.

      In the model syt7 enhances both docking and undocking rates as we assumed it to function as a catalyst of the docking reaction. A catalyst lowers the energy barrier for the reaction and thereby promotes both forward and backward rates. One of the main reasons we decided on this is because in the model also syt1 and Doc2 are assumed to function by lowering the energy barrier for the fusion reaction. However, since fusion is irreversible this would only affect the forward reaction rate. We cannot exclude that syt7 acts on the forward rate only, which we now mention in the results section of the model.

      In our model fusion can occur from any possible docked SV state. The probability of fusion however increases the more calcium ions are bound to Doc2 or Syt1, with Syt1-bound to Calcium being more effective in promoting fusion. This structure matches the dual-sensor model proposed by Sun et al., 2007, Science (PMID: 18046404) and Kobbersmed et al. 2020, Elife (PMID: 32077852), and is based on the assumption that each protein bound to calcium lowers the energy barrier with a certain amount. We have explained this more in the results section of the model.

      We decided that syt1 and Doc2 together could have no more than five calcium ions bound to them. This is based on the idea that syt1 and Doc2 are competing for the same type of resources, which could for instance be a limited number of SNARE complexes that are available to execute the reaction. An indication for competition between the two proteins can be found in the synchronous release amplitudes after stimulus 2, which are larger in the Doc2KO.

      The reviewer rightfully points out that for realistic simulations of the role of syt1, syt7 and Doc2 the stoichiometry of these proteins at the release site is relevant. In the ideal scenario, we would have included this in our model. However, this would massively increase the possible number of states (which this reviewer criticizes already in our simpler model), making the model even more computationally expensive to run. Additionally, we currently have no reliable estimates of the number of syt7 and Doc2 molecules per release site. In our model, all syt1s expressed on an SV can bind up to five calcium ions. We have recently shown that this simplified model can capture the features of all syt1 proteins per vesicle that compete for the binding of three substrates on the plasma membrane to exert their function in speeding up fusion (Kobbersmed et al., 2022 eLife PMID: 35929728). This means that the copy number is indirectly covered in our model. This number of five calcium ions (and two for Doc2 and syt7) however is not based on the estimated number of syt1s on an SV (which would be around 15, Takamori 2006), but rather on the calcium-dependence of the fusion reaction. Similarly, the number of two calcium ions binding to Doc2 is based on the Calcium-dependence of asynchronous fusion rates (Sun et al., 2007). Based on the reviewer’s comment we now more explicitly mention in the text that the numbers of calcium ions binding to syt1, Doc2 and syt7 corresponds to the total number of calcium ions that can bind to each of these molecules per release site/SV.

      We again would like to thank the reviewer for asking us to improve the explanation on the assumptions made to construct our model and how we constrained the parameter values in our model.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the editor and reviewers’ careful and professional assessment of this manuscript. We are delighted with the reviewers’ instructive comments and suggestions. We have tried to address the raised points comprehensively. The reviewers’ scrutiny has helped us immensely to discuss and present our work extensively and properly. We are grateful for the reviewers’ efforts and insights. The detailed responses are listed here.

      Recommendations for the authors

      (1) The intuition behind the model is not properly explained, i.e., the derivation of Eqs. 1-2 and the biological meaning of the AA/OO logic modes. A different notation could be helpful.

      We thank the reviewers for this comment, and agree that the interpretation of our model in manuscript was indeed in need of improvement. We have incorporated this suggestion into the manuscript. For clarity, we have substituted AND-AND/OR-OR for original expression of AA/OO, and hope that new notations are helpful for interpreting our work.

      In general, considering the diverse audience including those with experimental background, we feel that it is essential to present this manuscript in a more digestible manner. We therefore retain the entire derivation of Eqs. 1-2 in the supplementary method. We have added a qualitative introduction to model derivation and molecular biological significance underlying different logic motifs (AND-AND/OR-OR) in the revised manuscript. Please refer to Page 5 of the revised manuscript, lines 161-167 (see below).

      “X and Y are TFs in the CIS network. n1 and n2 are the coefficients of molecular cooperation. k1-k3 in Eq1 and k4-k6 in Ep2 represent the relative probabilities for possible configurations of binding of TFs and CREs. (Fig2.A). d1 and d2 are degradation rates of X and Y, respectively. Here, we considered a total of four CRE’s configurations as shown in Figure 2A (i.e., TFs bind to the corresponding CREs or not, 22=4). Accordingly, depending on the transcription rates (i.e., r0x, r1, r2, r3 in Eq1, similarly in Eq2) of each configuration, we can model the dynamics of TFs in the Shea-Ackers formalism[1, 2].

      Thus, the distinct logic operations (AND/OR) of two inputs (e.g., activation by X itself and inhibition by Y) can be further implemented by assigning corresponding profile of transcription rates in four configurations (Fig2.A). From the perspective of molecular biology, the regulatory logics embody the complicated nature of TF regulation that TFs function in a context-dependent manner. Considering the CIS network, when X and Y bind respective CREs concurrently, whether the expression of target gene is turned on or off depends on the different regulatory logics (specifically, off in the AND logic and on in the OR logic; Fig2.A). Notably, instead of exploring the different logics of one certain gene[3, 4], we focus on different combinations of regulatory logics due to dynamics in cell fate decisions is generally orchestrated by GRN with multiple TFs.”

      (2) More clearly specify the used parameters and how these are chosen. This would be helpful to get a more quantitative grasp of the conditions that they compare.

      We appreciate the reviewers pointing out unspecified parts in the main text. We have now included related discussion in the revised manuscript. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”).

      We would like to highlight that the Boolean models with different logic motifs (Fig. 2B) explicitly display the difference of state spaces (i.e., attractor basin). Moreover, as the focus of this work is on the role of regulatory logics in cell fate decisions, we ponder that it is rational to specify the geometry of the landscape based on the hint from Boolean models. Therefore, we reason that it is intuitive and reliable to assign values to used parameters by mapping our ODE models (Eqs. 1-2) to corresponding Boolean models qualitatively (refer to the statement in our original manuscript, Page 5, lines 162-163, “With appropriate parameters, we are able to reproduce the Boolean-like attractor basin in the continuous models”). In producing Figure 2-5, setting of parameters was performed in a heuristic way without particular searching. However, to draw general conclusions, like the "trade-offs between progression and accuracy" and the presence of the fully-connected stage, we sampled a substantial number of sets parameters to ensure statistically robust findings.

      (3) Include the explanation of how the nullclines and basins shown in the figures (e.g., Fig. 2C, Fig. 4C, Fig. 4F, etc.) are calculated.

      We thank the reviewers for this suggestion. We have incorporated this into the legend of corresponding figures when first mentioned in the main text. Please refer to Page 7 of the revised manuscript, lines 217-223 (see below).

      “Fig2.C:

      (C) State spaces of the AND-AND (top panel) and OR-OR (bottom panel) motifs in ODE models. Dark and red lines represent nullclines of respectively. Stable steady states (SSS) are denoted as orange dots. Unstable Steady States (USSs) are denoted as white dots. Each axis represents the concentration of each transcription factor, which units are arbitrary. Blue, green and purple areas in state spaces indicate attractor basins representing LX, S and LY, respectively. Color of each point in state space was assigned by the attractors they finally enter according to the deterministic models (Eq1, Eq2). These annotations were used for the following Figure 3-7.”

      (4) Clarity on the decisions in the work is needed. For example, the "introduction" of asymmetry of the noise levels (as stated in line 215) appears completely arbitrary. The reason behind it can be guessed in the following paragraph, but the reader shouldn't have to guess.

      We agree entirely with the reviewers’ comment. Indeed, this should have been stated more explicitly. The motivation for incorporating asymmetry in the noise levels stems from our endeavor to mimic the inherent biological variability in gene expression within a cell population. We have adjusted the manuscript to better convey the motivation for investigating asymmetric noise level. Please refer to Page 8 of the revised manuscript, lines 237-238 (“In biological systems, it is unlikely that the noise level of different genes is kept perfectly the same.”).

      (5) Arbitrary and/or out-of-context jargon is used throughout the manuscript, making it hard to read and follow what the authors mean in some cases. For example, "temporal fully-connected stage" is used for the first time in line 290, and the term is not explained either in the main text or in the manuscript. Similarly, the reference to a Boolean-like and Boolean model (line 163 and Figure 1) without clarifying if this is just an analogy or if a formal model is built, nor the utility and implications of this comparison. Another problem related to jargon occurs on line 291, where the authors talk about "parameter sensibility", but such analysis (as it is normally understood in the field) is never performed; the authors perform a parameter exploration and make some general conclusions about the parameter space, but that is different than a parameter sensitivity analysis.

      We thank the reviewers for this comment, as it has prompted us to better clarify our manuscript. We have reviewed the manuscript and made the necessary adjustments to improve its clarity. We do hope that this revision meets the reviewers’ expectations on the clarity and comprehensiveness of our analysis.

      Regarding the jargon of "temporal fully-connected stage", we realized that this term was slightly vague and in need of improvement. Instead, we now employ “transitory fully-connected stage” in the revised manuscript to underline the short emergence of this particular stage. Please refer to Page 11 of the revised manuscript, lines 323.

      We thank the reviewers for pointing out the lack of clarity concerning the Boolean models. We have now amended the manuscript to make this implicit expression explicit. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B; see Methods), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”). Specifically, we employed the Boolean models (Fig.2B) as the reference to assist us to heuristically evaluate the applicability of used parameters in the ODE models. Therefore, the Boolean models are built formally, and corresponding updated rules are listed in Fig.2A (refer to the middle row in the table called “Logic Function”, now also noted in the legend of Fig.2B, Page 7, lines 213-214). Nevertheless, we do utilize the analogy between the attractor basins from Boolean models and ODE models (refer to Fig.2B-C). Accordingly, we used the term “Boolean-like” to describe the landscape presented by the continuous models (Eqs. 1-2; refer to the statement in our original manuscript, Page 5, lines 162-163, “With appropriate parameters, we are able to reproduce the Boolean-like attractor basin in the continuous models”).

      We appreciate the reviewers for this valuable comment, and agree that the usage of “parameter sensibility” was in need of adjustment. We have now amended the manuscript. Please refer to Page 10 of the revised manuscript, lines 318-321 (see below).

      “To manifest the generality, we globally screened 6,213 groups of parameter sets under the AND-AND motif, and this logic-dependent intermediated stage can be observed for 82.7% of them (see Methods; Table S1), indicating little dependence on particular parameter setting (1.8% in the OR-OR motif).”

      (6) Probably related just to the language clarity (i.e., the abuse of jargon), but we don't understand the conclusion on lines 296-298.

      We thank the reviewers for this comment. We have adjusted the manuscript accordingly. Please refer to Page 11 of the revised manuscript, lines 323-327 (see below). And we hope that the reviewers agree with our attempt at mapping into the particular stage in cell fate decisions from the point of landscape.

      “Furthermore, this transitory fully-connected stage locates between the fate-undetermined stage (Fig4.C top panel) and fate-determined stage (Fig4.C 3rd panel), comparable to the initiation (or activation) stage before the lineage commitment in experimental observations [5-7]. Therefore, we suspected that the robust fully-connected stage in the AND-AND motif may correspond to a specific period in cell fate decisions.”

      (7) The so-called "solution landscape" in Figure 4E needs to be better explained.

      We thank the reviewers for this comment. We have introduced the concept of solution landscape, which is a pathway map consisting of all stationary points and their connections, in lines 196-198 of the revised manuscript (see below).

      “Furthermore, we introduced the solution landscape method. Solution landscape is a pathway map consisting of all stationary points and their connections, which can describe different cell states and transfer paths of them [82-84].”

      In Figure 4E, we added detailed explanation of the solution landscape for the AND-AND motif. Specifically, it describes a hierarchical structure including one 2-saddle (yellow triangle), three 1-saddles (crimson X-cross sign), and three attractors (green dot). The layer of 1-saddles is represented by a blue translucent plane, and the bottom layer is the flow field diagram. The connections from 2-saddle to 1-saddles and from 1-saddles to the attractors are represented by red and blue lines, respectively. The arrow and color of the heatmap correspond to the flow direction and the length of the acceleration at each point in the state space.

      (8) Table S1 is not properly annotated, and then it is impossible to interpret how it supports the observations in the paragraph in lines 342-342.

      We appreciate the reviewers’ useful feedback. We have refined the annotations of all tables in our manuscript (Table S1-3). Please refer to “Supplementary Table” in resubmitted files.

      Specifically, we randomly collected 6,231 sets of parameters for the AND-AND motif and 6,682 sets for the OR-OR motif (k1-k6 in Eq1 and Eq2; refer to Page 6 of the revised supplementary method, see below).

      “First, to collect parameter sets with 3 SSSs, we used Latin hypercube sampling (LHS) to screen k-series parameters symmetrically (i.e., k1 = k4, k2 = k5, k3 = k6) ranging from 0.001 to 5 both in the AND-AND and OR-OR motifs. We ultimately collected 6,231 sets for the AND-AND motif and 6,682 sets for the OR-OR motifs (Table S1).”

      To analyze the sequence of vanishing SSSs, we further filtered parameter sets with 2 SSSs remained as increasing ux (corresponding to Eq3 in the revised manuscript, Page 10, lines 293). We then got a collection of 6,207 sets for the AND-AND motif and 6,634 sets for the OR-OR motif. Based on these parameter settings, we checked if the observations (refer to Page 13, lines 377-378, “The distinct sequences of attractor basin disappearance as ux increasing can be viewed as a trade-off between progression and accuracy.”) are artifacts of particular parameter choice.

      (9) The flow in Section 5 needs to be reorganised. For instance, it is not clear which question the authors are addressing in line 395, or how the proposed approach answers the question stated in lines 381-382.

      We greatly thank the reviewers for pointing this out, and acknowledge that the Section 5 was definitely in need of improvement. We have now amended the manuscript to make this implicit understanding explicit. Please refer to Page 15 of the revised manuscript, lines 426-430 (see below).

      “In prior sections, we systematically investigated two logic motifs under the noise- and signal-driven modes in silico. With various combinations of logic motifs and driving forces, features about fate-decision behaviors were characterized by computational models. Next, we questioned whether observations in computation can be mapped into real biological systems. And how to discern different logic motifs and driving modes is a prerequisite for answering this question.

      To end this, we first evaluated the performance of different models, specifically in simulating the process of stem cells differentiating towards LX (Fig6.A).”

      (10) There are two important weak points for the successful classification of the regulatory logic of real gene expression data as presented in the manuscript: (1) the small number of time-points in the datasets and clear peaks in gene expression heterogeneity cannot be identified, and (2) it is not always clear whether cell differentiation really exclusively relies on a CIS network, and which genes constitute it. These limitations should be solved or at least discussed in the manuscript.

      We thank the reviewer for this comment. First, we agree entirely that analysis of datasets with more time points will be more amenable to identifying the trends of gene expression variation. We have made a concerted effort towards searching for such datasets, but unfortunately, there are not many such datasets publicly available. Specifically, to apply our computational framework, the datasets of our interest need to fulfill the following three characteristics: (i) sampling at multiple time points (as many as possible); (ii) to illustrate/validate our findings clearly and representatively, we would like the cell fate decisions in the biological systems to follow the classical binary tree-like pattern. i.e., there is one stem cell fate (or progenitor) and two downstream cell fates in the systems; (iii) the core GRN circuits for orchestrating the fate-decision processes have been experimentally confirmed (at least clearly supported). We have also extended the discussion to include above points to explicitly note the limitations regarding the used datasets. Please refer to Page 25 of the revised manuscript, lines 762-766 (see below).

      “The gene expression datasets analyzed here are only available for a limited number of time points. Though they meet the need for discerning trends, it is evident that the application to the datasets with more time points will yield clearer and less ambiguous changing trends to support the conclusions of this paper more generally.”

      In regards to second point, we do acknowledge that the CIS network may not always be the core module for every fate-decision case (but to our knowledge, this can be assumed in many cases, especially in binary tree-like pattern). For applicability and potential relevance to our intended readership, we developed the models and draw our conclusions primarily based on the CIS topology for its representativeness. We intend to incorporate diverse topologies (like mutual activation with self-activation, Feed-Forward Loop, etc.) in our computational framework presented here in near future. Additionally, we have incorporated this point into the discussion in the revised manuscript. Please refer to Page 25 of the revised manuscript, lines 766-769 (see below).

      “Notwithstanding the fact that the CIS network is prevalent in fate-decision programs, there are other topologies of networks that serve important roles in the cell-state transitions, like feed-forward loop, etc. The framework presented in this work should further incorporate diverse network motifs in the future.”

      As referred by the reviewers, even if given the CIS network, we may not sure about which genes constitute it in some cases. We agree that further extension of our framework to mining key regulators is an interesting question. We also note that we have become very enthusiastic about recent work that shows how to nominate core factors from high-throughput data[8, 9]. Of note, in the last section of our manuscript titled “The chemical-induced reprogramming of human erythroblasts (EBs) to induced megakaryocytes (iMKs) is the signal-driven fate decisions with an OR-OR-like motif”, we leveraged patterns of temporal expression variance to filter out key regulators (Fig7.F and H). We thus underline the potential of mining genes comprising core GRN circuits through expression variance. Nevertheless, as the focus of the present paper is on the role of regulatory logic in cell fate decisions, we feel it is beyond the scope of the present article to continue the development of our results on this point. Instead, we have included discussion of case that genes comprising the CIS network are not defined. Please refer to Page 23 of the revised manuscript, lines 685-687 (see below).

      “Notably, if the genes that constituting the CIS network are not specified, we can conversely leverage the patterns of temporal expression variance to nominate key regulators in a model-guided manner.”

      (11) The models used in Figure S5 are never clearly described.

      We thank the reviewers for pointing this out. We have now introduced the settings of the models used in Figure S5 more clearly in the legend (see below).

      Two logic motifs with the noise-driven mode (FigS5.A, see below):

      Author response image 1.

      “Initial values were identical with attractor of S fate in Figure 2C (SSSs in green attractor basins). Simulation was preformed 1000 times for each pseudo-time point, with each temporal state (from left to right) recorded as a dot on the plot. Top panel: Noise level of X (σx) is set to 0.21, and σy is 0.09. Bottom panel: Noise level of Y (σy) is set to 0.21, and σx is 0.09. Red arrow represents the direction of fate transitions of S to LX. Other than adding a white noise, parameters were identical with those in Figure 2C.”

      Two logic motifs with the signal-driven mode (FigS5.B, see below):

      Author response image 2.

      “Initial values were identical with attractor of S fate in Figure 2C (SSSs in green attractor basins). Top panel: Noise level of X (σx) and Y (σy) are both set to 0.06. Simulation was preformed 1000 times, with each final state recorded as a dot on the plot. Parameter ux switched from 0 to 0.09 (0, 0.045, 0.09, from left to right). Bottom panel: Noise level of X (σx) and Y (σy) are both set to 0.05. Simulation was preformed 1000 times, with each final state recorded as a dot on the plot. Parameter ux switched from 0 to 0.24 (0, 0.12, 0.24, from left to right). Red arrow represents the direction of fate transitions of S to LX. Other model’s parameters were identical with those in Figure 2C.”

      (12) Up until Section 5, "noise levels" have been used to refer to an input/parameter in the model. Here it is assumed as an emergent property. Are the authors talking about the variance in expression (e.g., see line 398)? Is it defined as the coefficient of variation? Clarity is essential to interpret the observations in this section, e.g., "different driving modes change in the patterns of noise rather than expression levels" (lines 399-400).

      We greatly appreciate the reviewers pointing this ambiguity out. The term of “noise level” was indeed used to refer the strength of the noise in the models in Section 1-4. For classifying different logic motifs with two driving forces, we needed a practical metric that can be quantified from data, and we found population-level gene expression variance (i.e., “noise level” in line 398) is useful which defined as the coefficient of variation. For clarity, we carefully decide to substitute “expression variance” for “noise level” presented in Section 5-6. We have amended the manuscript accordingly, and hope this revision will be helpful for interpreting our result. Please refer to Page 15 of the revised manuscript.

      (13) "Pulse-like behaviour" is used in an arbitrary way, not as it is normally used in the field. Moreover, we consider this jargon expression does not contribute to the understanding of the paper. (The authors probably meant "discrete transitions" vs "gradual transitions".)

      We appreciate the reviewers’ valuable feedback regarding our use of the term “Pulse-like behavior”. We agree with the reviewers’ statement, and acknowledge that terminology of noise level’s patterns between different driving modes (noise-driven vs signal-driven; refer to Section 5 in our manuscript) was in need of improvement.

      Upon comprehensive consideration, we primarily decided to adopt the terms “monotonic transitions” and “nonmonotonic transitions” to recapitulate the trends of noise level, underlining the distinct temporal noise’s patterns in cell fate decisions brought by two driving forces in a more contrastive way. We anticipate that current jargon expressions will be beneficial for interpreting our work. Please refer to Page 15 of the revised manuscript.

      (14) The temporal resolution of the scRNAseq datasets that the authors used is too low to unambiguously distinguish a discrete pattern of gene expression heterogeneity from a rising profile. This limitation needs to be at least acknowledged in the text. Alternatively, the authors might want to identify more recent datasets with higher time resolution.

      We appreciate the reviewers’ insightful suggestions. We agree that analysis of datasets with higher time resolution will be more unambiguous to identifying the trends of gene expression variation. We have made a concerted effort towards searching for such datasets, but unfortunately, there are not many such datasets publicly available. Specifically, to apply our computational framework, the datasets of our interest need to fulfill the following three characteristics: (i) sampling at multiple time points (as many as possible); (ii) to illustrate/validate our findings clearly and representatively, we would like the cell fate decisions in the biological systems to follow the classical binary tree-like pattern. i.e., there is one stem cell fate (or progenitor) and two downstream cell fates in the systems; (iii) the core GRN circuits for orchestrating the fate-decision processes have been experimentally confirmed (at least clearly supported). Nevertheless, we recognize this limitation should be mentioned in the paper. So, we have also extended the discussion to include above points. Please refer to Page 25 of the revised manuscript, lines 762-766 (see below).

      “The gene expression datasets analyzed here are only available for a limited number of time points. Though they meet the need for discerning trends, it is evident that the application to the datasets with more time points will yield clearer and less ambiguous changing trends to support the conclusions of this paper more generally.”

      (15) In the case of embryonic stem cell differentiation, an additional complication is that this protocol yields heterogeneous cell type mixtures, whereas the authors' simulations usually are designed to give differentiation towards a single cell type. This difference makes it difficult to compare measures of gene expression heterogeneity between simulations and the experimental system to infer regulatory logic questionable.

      We thank the reviewers for this valuable comment and realize that we were not clear enough in the manuscript regarding the case of embryogenesis. In the biological system devised by Semrau et al[10], mouse embryonic stem cells (mESCs) differentiates into two lineages simultaneously, just as mentioned by the reviewers. We noticed this additional complication and performed other simulations in two logic motifs with increasing noise level of gene X and Y, as presented in Fig.S6E (see below).

      Author response image 3.

      “(E) Time courses on the coefficient of variation in expression levels of X and Y genes in silico during differentiation under the noise-driven mode. Initial values were set to the attractors of S fate in Figure 2C (SSSs in green attractor basins). Top panel: Noise level of X (σx) and Y (σy) are both set to 0.14. Bottom panel: Noise level of X (σx) and Y (σy) are both set to 0.1. Stochastic simulation was preformed 1000 times for each pseudo-time point.”

      Given the noise-driven mode, we further employed the expression pattern of Gbx2-Tbx3 circuit to heuristically infer the logic motif.

      (16) In contrast to the hematopoiesis example, the authors do not focus on a specific gene regulatory circuit with the ESC dataset. How their approach is possible on genome-wide data needs to be discussed.

      We thank the reviewers for this comment. Indeed, the core GRN orchestrating the fate-decision process reported by Semrau et al[10] is not fully elucidated. We here focus on the Gbx2-Tbx3 circuit (Fig.6H, Fig.S6D). These two TFs were filtered out from 22 candidate TFs and suggested as potential key regulators in the original paper[10]. Accordingly, at this point we followed the original paper’s statement.

      In regards to extension into biological systems without specific gene regulatory circuits, we have included discussions about the possibility that genes comprising the CIS network are not defined. Please refer to Page 23 of the revised manuscript, lines 685-687 (see below).

      “Notably, if the genes that constituting the CIS network are not specified, we can conversely leverage the patterns of temporal expression variance to nominate key regulators in a model-guided manner.”

      (17) [In supplemental material, pp.1] Possible typo: "In our word, we considered a GRN comprised...".

      Thanks for spotting this typo. We have amended it in the revised supplemental method (refer to Page 1 of the revised supplementary method).

      (18) [In supplemental material, pp.1] In Eqs. (1), the notation for the function HX([X]) implies that HX only depends on X, leaving the combinatorial regulation out. HX([X],[Y]) would be more general and accurate.

      Thanks for pointing this out. We have incorporated this suggestion into the manuscript. Please refer to Page 1 of the revised supplementary method.

      (19) [In supplemental material, pp.1] There are several works that have shown that the Hill coefficient is rarely representative of the number of binding elements. The model can be more general. See, for example, «Santillán, Moisés. "On the Use of the Hill Functions in Mathematical Models of Gene Regulatory Networks." Mathematical Modelling of Natural Phenomena 3, no. 2 (October 22, 2008): 85-97. https://doi.org/10.1051/mmnp:2008056.» and «Nam, Kee-Myoung, Rosa Martinez-Corral, and Jeremy Gunawardena. "The Linear Framework: Using Graph Theory to Reveal the Algebra and Thermodynamics of Biomolecular Systems." Interface Focus 12, no. 4 (June 10, 2022): 20220013. https://doi.org/10.1098/rsfs.2022.0013.»;

      We thank the reviewer for drawing our attention to this and highlighting the above works. Indeed, this is important information to include in the manuscript. We have incorporated this suggestion into the revised supplemental method (refer to Page 1 of the revised supplementary method). These references have now been included in the revised supplemental method (refer to references [2]-[3]).

      (20) [Minor] The configuration labels can be confusing, especially the AA, which is rather an AND NOT gate.

      We thank the reviewers for this comment. For clarity, we have substituted AND-AND/OR-OR for original expression of AA/OO, and hope that new notations are helpful for interpreting our work.

      (21) [Minor] Very low printing quality in Figure 1.

      Thanks for the feedback regarding the printing quality of Figure 1. We have made the necessary adjustments to improve its quality. We have also ensured that all other figures in the manuscript meet the required standards.

      (22) [Minor] We suggest including a quantitative scale for the bias in Fig. 3E.

      Thanks, we have incorporated this suggestion into the manuscript.

      (23) [Recommendation] Authors could also evaluate the cell fate decision processes as mutations or other perturbations affect a regulatory network.

      We appreciate the reviewers for this valuable recommendation. We agree with the reviewers that further involving new cases would be helpful, especially those mutation-driven disease-related fate-decision processes, such as neutropenia in chemotherapy. However, given the considerable effort towards searching for appropriate datasets, we carefully decide not to make this change.

      (24) [Recommendation] The authors could include some discussion of the likely impact of the work on the field and the utility of the methods and data to the community. For example, understanding the fluidity of the epigenetic landscape and the regulatory forces behind cell fate decisions can be of great importance in designing synthetic gene regulatory circuits.

      We greatly appreciate the reviewers pointing this out. In the original manuscript, we intentionally limited the length of the discussion to make the whole story more focus. We thank the reviewers for their insightful suggestions regarding the content of discussion. We have incorporated this suggestion into the revised manuscript. Please refer to Page 25, lines 751-757 (see below).

      “Recently, synthetic biology has realized the insertion of the CIS network in mammalian cells. One of the prerequisites for recapitulating the complex dynamics of fate transitions in synthetic biology is systematical understanding of the role of GRNs and driving forces in differentiation. And the logic motifs are the essential and indispensable elements in GRNs. Our work also provides a blueprint for designing logic motifs with particular functions. We are also interested in validating the conclusions drawn from our models in a synthetic biology system.”

      In addition, a longstanding question of our interest in cell fate decisions is what contributes the distinctive development cross species, like human, mice and so on forth. However, in addition to protein coding sequences, regulatory interactions between genes (i.e., activation and inhibition) also exhibit conservation as reported in recent work of multi-species cell atlas [11], and it is generally acknowledged that gene regulatory networks (GRNs) orchestrate fate-decision procedures. Namely, conserved regulatory programs further bring us a conserved topology of core GRNs. Thus, the logics of regulation, as another vital element in GRNs, is naturally under the spot light (related to the introduction, lines 99-120 of the revised manuscript). Nevertheless, to our knowledge, regulatory logic in cell fate decisions has received only scant attention. We hope that our elucidation of the role of logic motifs in cell fate decisions will attract more inquiries in community into GRN’s regulatory logic.

      Public reviews

      In this manuscript, Xue and colleagues investigate the fundamental aspects of cellular fate decisions and differentiation, focusing on the dynamic behaviour of gene regulatory networks. It explores the debate between static (noise-driven) and dynamic (signal-driven) perspectives within Waddington's epigenetic landscape, highlighting the essential role of gene regulatory networks in this process. The authors propose an integrated analysis of fate-decision modes and gene regulatory networks, using the Cross-Inhibition with Self-activation (CIS) network as a model. Through mathematical modelling, they differentiate two logic modes and their effect on cell fate decisions: requires both the presence of an activator and absence of a repressor (AA configuration) with one where transcription occurs as long the repressor is not the only species on the promoter (OO configuration).

      The authors establish a relationship between noise profiles, logic-motifs, and fate-decision modes, showing that defining any two of these properties allows the inference of the third. They also identify, under the signal-driven mode, two fundamental patterns of cell fate decisions: either prioritising progression or accuracy in the differentiation process. The authors apply this analysis to available high-throughput datasets of cell fate decisions in hematopoiesis and embryogenesis, proposing the underlying driving force in each case and utilising the observed noise patterns to nominate key regulators.

      The paper makes a substantial contribution by rigorously evaluating assumptions in gene regulatory network modelling. Notably, it extensively compares two model configurations based on different integration logic, illuminating the consequences of these assumptions in a clear, understandable manner. The practical simulation results effectively bridge theoretical models with real biological systems, adding relevance to the study's insights. With its potential to enhance our understanding of gene regulatory networks across biological processes, the paper holds promise. Its implications extend practically to synthetic circuit design, impacting biotechnology. The conclusions stand out, addressing cell fate decisions and noise's role in gene networks, contributing significantly to our understanding. Moreover, the adaptable approach proposed offers versatility for broader applications in diverse scenarios, solidifying its relevance beyond its current scope.

      We thank the reviewers for their enthusiasm for our work, and appreciate the professional, insightful and encouraging assessment.

      However, the manuscript in its current form also has some important weaknesses, including the lack of clarity in the text and the questionable generality of specific observations.

      We thank the reviewers for this comment. We have reviewed the manuscript and made the necessary adjustments to improve its clarity. We do hope that this revision meets the reviewers’ expectations on the clarity and comprehensiveness of our analysis.

      For instance, even when focusing on the CIS network, the effect of alternative model implementations is not discussed. Notably, the input signals are only considered as an additive effect over the differential equations, while signals can potentially affect each of the individual processes.

      We agree with the reviewers’ comment that signals may affect at each level of the central dogma, including transcription, translation, etc. Further, we have also included additional section titled “limitation of this study” on this point in the revised manuscript, and explicitly point to the potential limitations of our models. Please refer to Page 25 of the revised manuscript, lines 769-771 (see below).

      “In addition, for simplicity and intuition, we here considered signals as uncoupled and additive effects in ODE models, due to feasible mapping in real biological systems, such as ectopic overexpression.”

      The proposed model allows for a continuum of interactions/competition between transcription factors, yet only very restrictive scenarios are explored (strict AND/OR logic operations).

      We thank the reviewers for this comment, and appreciate them sharing the potential for further generalization of our framework. Indeed, in addition to logic operations, our framework is able to be applied to all two-node circuits (34=81 in total), including mutual activation with self-activation. As the focus of this work is to illustrate the role of logic motifs in cell fate decisions, we mainly concentrated on two classical, intuitive and representative (at least to us) logic operations AND/OR in the context of the CIS network. Nonetheless, we already have four combinations to consider (two logic motifs and two driving forces). And we feel that the currently involved scenarios have properly fulfilled our need to manifest the role of logic motifs. Hence, we carefully decided not to further explore more logic operations in this work. Instead, we have included additional section titled “limitation of this study” in the revised manuscript. Please refer to Page 25 of the revised manuscript, lines 760-762.

      “Although our framework enables the investigation of more logic motifs, we chose two classical and symmetrical logic combinations for our analysis. Future work should involve more logic gates like XOR and explore asymmetrical logic motifs like AND-OR.”

      Moreover, how the model parameters are chosen throughout the paper is not clear. Similarly, the concentration and times are not clearly specified, making their comparison to experimental data troublesome.

      We thank the reviewers for this comment. Regarding how to specify parameters in our model, we have now revised the manuscript. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B; see Methods), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”). In terms of concentration and time, we acknowledge that their units are arbitrary compared to a real experimental system. We now have noted this point in the legend of corresponding figures (Fig2.C, Fig3.B&D, Fig6.B-C, Fig7.E).

      We would like to highlight that our entire work is organized in a model-driven fashion (also called top-down). We did not fine-tune the sets of parameters used in our model to specifically match the experimental data. Actually, it is also a longstanding challenge in computational biology since experimental datasets are usually insufficient to specify the parameters in a dynamical model. So, in general, it is inevitable to involve more assumptions such as non-Markov process[12, 13] and may lead to artifacts. Thus, we decided to draw qualitative conclusions (e.g., trends over time) from a quantitative model with sampling of parameter sets. Hence, we did not intentionally tailor our models to fit different datasets (i.e., all models used in our work share same basic setting of parameters), mapping into real biological systems in a top-down manner.

      Regarding clarity, how the general model (equations 1-2) transforms into the specific cases evaluated in the paper is not clearly stated in the main text, nor are the positive and negative effects of individual transcription factors adequately explained. Similarly, in the main text and Figure 2, the authors refer to a Boolean model. However, they do not clearly explain how this relates to the differential equation model, nor its relevance to understanding the paper.

      We thank the reviewers for this comment, as it has prompted us to better clarify our manuscript. We have adjusted the manuscript accordingly and made the necessary adjustments to improve its clarity.

      Additionally, the term "noise levels" is generally used to refer to noise introduced in the "noise-driven" analysis (i.e., as an input or parameter in the models). Nonetheless, it is later claimed to be evaluated as an intrinsic property of the network (likely referring to expression level variability measured by the coefficient of variation).

      We greatly appreciate the reviewers pointing this ambiguity out. The term of “noise level” was indeed used to refer the strength of the noise in the models in Section 1-4. For classifying different logic motifs with two driving forces, we needed a practical metric that can be quantified from data, and we found population-level gene expression variance (i.e., “noise level” in line 398) is useful which defined as the coefficient of variation.

      For clarity, we carefully decide to substitute “expression variance” for “noise level” presented in Section 5-6. We have amended the manuscript accordingly.

      Finally, some jargon is introduced without sufficient context about its meaning (e.g., "temporal fully-connected stage").

      Regarding the jargon of "temporal fully-connected stage", we have realized that this term was slightly vague and in need of improvement. Instead, we now employ “transitory fully-connected stage” in the revised manuscript to underline the short emergence of this particular stage. Please refer to Page 10-11 of the revised manuscript, lines 316-327 (see below).

      “Notably, in the AND-AND motif we observed a brief intermediated stage before S attractor disappears, where all three fates are directly interconnected (Fig4.C 2nd panel and D 2nd panel, Fig.4E). To manifest the generality, we globally screened 6,213 groups of parameter sets under the AND-AND motif, and this logic-dependent intermediated stage can be observed for 82.7% of them (see Methods; Table S1), indicating little dependence on particular parameter setting (1.8% in the OR-OR motif). Unlike the indirect attractor adjacency structure mediated by S attractor (Fig2.D), the solution landscape with fully-connected structure facilitates transitions between any two pairs of fates. Furthermore, this transitory fully-connected stage locates between the fate-undetermined stage (Fig4.C top panel) and fate-determined stage (Fig4.C 3rd panel), comparable to the initiation (or activation) stage before the lineage commitment in experimental observations [5-7]. Therefore, we suspected that the robust fully-connected stage in the AND-AND motif may correspond to a specific period in cell fate decisions.”

      Additionally, proper discussion of previous work is also missing. For instance, the dynamics of the CIS network investigated by the authors have been extensively characterised (see e.g., Huang et al., Dev Biol, 2007), and how the author's results compare to this previous work should be discussed. In particular, the central assumptions behind the derivation of the model proposed in the manuscript must be assessed in the context of previous work.

      Thanks for pointing this out. We have extended the discussion to include above points. We have also discussed and cited the work of Huang mentioned above. Please refer to Page 22, lines 644-647 in the revised manuscript (see below).

      “One of the most representative work is that Huang et al. [14] modeled the bifurcation in hematopoiesis to reveal the lineage commitment quantitatively. Compared to simply modularizing activation or inhibition effect by employing Hill function in previous work, our models reconsidered the multiple regulations from the level of TF-CRE binding.”

      References

      (1) Ackers, G.K., A.D. Johnson, and M.A. Shea, Quantitative model for gene regulation by lambda phage repressor. Proc Natl Acad Sci U S A, 1982. 79(4): p. 1129.

      (2) Shea, M.A. and G.K. Ackers, The OR control system of bacteriophage lambda: A physical-chemical model for gene regulation. Journal of Molecular Biology, 1985. 181(2): p. 211-230.

      (3) Hunziker, A., et al., Genetic flexibility of regulatory networks. Proc Natl Acad Sci U S A, 2010. 107(29): p. 12998-3003.

      (4) Kittisopikul, M. and G.M. Suel, Biological role of noise encoded in a genetic network motif. Proc Natl Acad Sci U S A, 2010. 107(30): p. 13300-5.

      (5) Brand, M. and E. Morrissey, Single-cell fate decisions of bipotential hematopoietic progenitors. Curr Opin Hematol, 2020. 27(4): p. 232-240.

      (6) Zhang, Y., et al., Hematopoietic Hierarchy - An Updated Roadmap. Trends Cell Biol, 2018. 28(12): p. 976-986.

      (7) Arinobu, Y., et al., Reciprocal activation of GATA-1 and PU.1 marks initial specification of hematopoietic stem cells into myeloerythroid and myelolymphoid lineages. Cell Stem Cell, 2007. 1(4): p. 416-27.

      (8)Kamimoto, K., et al., Dissecting cell identity via network inference and in silico gene perturbation. Nature, 2023. 614(7949): p. 742-751.

      (9) Hammelman, J., et al., Ranking reprogramming factors for cell differentiation. Nat Methods, 2022. 19(7): p. 812-822.

      (10) Semrau, S., et al., Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells. Nat Commun, 2017. 8(1): p. 1096.

      (11) Li, J., et al., Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nature Genetics, 2022. 54(11): p. 1711-1720.

      (12) Stumpf, P.S., F. Arai, and B.D. MacArthur, Modeling Stem Cell Fates using Non-Markov Processes. Cell Stem Cell, 2021. 28(2): p. 187-190.

      (13) Stumpf, P.S., et al., Stem Cell Differentiation as a Non-Markov Stochastic Process. Cell Syst, 2017. 5(3): p. 268-282 e7.

      (14) Huang, S., et al., Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Dev Biol, 2007. 305(2): p. 695-713.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses molecular dynamics simulations to understand how forces felt by the intracellular domain are coupled to the opening of the mechanosensitive ion channel NOMPC. The concept is interesting - as the only clearly defined example of an ion channel that opens due to forces on a tethered domain, the mechanism by which this occurs is yet to be fully elucidated. The main finding is that twisting of the transmembrane portion of the protein - specifically via the TRP domain that is conserved within the broad family of channels- is required to open the pore. That this could be a common mechanism utilised by a wide range of channels in the family, not just mechanically gated ones, makes the result significant. It is intriguing to consider how different activating stimuli can produce a similar activating motion within this family. However, the support for the finding can be strengthened as the authors cannot yet exclude that other forces could open the channel if given longer or at different magnitudes. In addition, they do not see the full opening of the channel, only an initial dilation. Even if we accept that twist is essential for this, it may be that it is not sufficient for full opening, and other stimuli are required.

      Strengths:

      Demonstrating that rotation of the TRP domain is the essential requirement for channel opening would have significant implications for other members of this channel family.

      Thank you for your positive summary and comments.

      Weaknesses:

      The manuscript centres around 3 main computational experiments. In the first, a compression force is applied on a truncated intracellular domain and it is shown that this creates both a membrane normal (compression) and membrane parallel (twisting) force on the TRP domain. This is a point that was demonstrated in the authors’ prior eLife paper - so the point here is to quantify these forces for the second experiment.

      The second experiment is the most important in the manuscript. In this, forces are applied directly to two residues on the TRP domain with either a membrane normal (compression) or membrane parallel (twisting) direction, with the magnitude and directions chosen to match that found in the first experiment. Only the twisting force is seen to widen the pore in the triplicate simulations, suggesting that twisting, but not compression can open the pore. This result is intriguing and there appears to be a significant difference between the dilation of pore with the two force directions.

      However, there are two caveats to this conclusion. Firstly, is the magnitude of the forces - the twist force is larger than the applied normal force to match the result of experiment 1. However, it is possible that compression could also open the pore at the same magnitude or if given longer. It may be that twist acts faster or more easily, but I feel it is not yet possible to say it is the key and exclude the possibility that compression could do something similar.

      Thank you for your insightful comment. As you pointed out, the membranenormal pushing forces exerted at residues E1571 and R1581 are approximately onethird and two-thirds, respectively, of the membrane-parallel twisting forces. These magnitudes were derived from a previous simulation (Wang et al., 2021), in which we decomposed the resultant force into its membrane-parallel and membrane-normal components upon applying a compressive force to the intracellular AR end. Our results indicated that, upon reaching the TRP helix, the induced twisting force is indeed greater, which partially reflects actual physiological conditions. Therefore, considering the magnitudes of the resultant forces alone, the twisting force is predominantly greater than the pushing force when the AR domain is subjected to compression.

      Then the question became, if forces of the same magnitude are applied in either the membrane-normal or membrane-parallel directions, what would the outcome be? To address this, we conducted additional simulations. Considering the situations discussed above, we applied a smaller membrane-parallel force instead of a larger membranenormal force that may disrupt the integrity of protein and membrane structure. As shown in the new Figure S6, we adjusted the applied membrane-parallel force to either half or one-third of the original value. When we applied half of the force used in the original setup, the channel opened in two out of three trajectories. When applying onethird of the force, the channel opened in one out of three trajectories. Together with our previous results, these findings suggest that if forces of equal magnitude are applied in the membrane-normal and membrane-parallel directions, the membrane-parallel force has a higher probability of inducing channel opening.

      Still, one cannot completely exclude the possibility that the pushing force on the TRP helix can open the channel if given a very long time. This becomes unfeasible to examine with MD simulations, so we investigated the likely conformational changes of multiple TRP family proteins upon opening, and found that the TRP rotation is a universal conformational change, while the TRP tilt is much less consistent (Figure 6). These findings gives us more confidence that the twist force plays a more crucial role in channel gating than the pushing force. We have added a new table (Table 1) and a new figure (Figure 6) to present this analysis.

      In addition, we did not intend to imply that compression is incapable of contributing to channel opening. In fact, our aim was to highlight that compression can generate both a twisting force and a pushing force, with the twisting force appearing to be the more critical component for facilitating channel opening. We concur that we cannot completely dismiss the possibility that the pushing component may also assist in channel opening. Consequently, we have revised our discussion on pages 4,6 to enhance clarity.

      I also note that when force was applied to the AR domain in experiment 1, the pore widened more quickly than with the twisting force alone, suggesting that compression is doing something to assist with opening.

      You are correct that the trajectory corresponding to Experiment 1 (Figure S1(b)) indicates pore opening around 300-400 ns, while the trajectory for Experiment 2 (800 ns) shows pore opening around 600 ns. This observation may suggest that the pore opens more rapidly in Experiment 1, assuming that the simulation conditions were identical for both experiments. However, it is important to note that in Experiment 1, an external force was applied to AR29. In contrast, in Experiment 2, the force was applied exclusively to two selected residues on the TRP domain, while other TRP residues also experienced mechanical forces, albeit to a lesser extent. The differing methods of force application in the two experiments complicate the comparison of pore opening speeds under these conditions.

      We acknowledge that the compression of the AR spring can facilitate pore opening. This compression generates both a twisting component and a pushing component on the TRP domain. Our simulations and structural analyses of multiple TRP channels suggest that the twisting component plays a predominant role in gating. However, we cannot entirely rule out the possibility that the pushing component may also contribute to this process. We have carefully revised our Result (page 6), Discussion (pages 10–12) and Methods (pages 14–17) sections to enhance clarity.

      Given that the forces are likely to be smaller in physiological conditions it could still be critical to have both twist and compression present. As this is the central aspect of the study, I believe that examining how the channel responds to different force magnitudes could strengthen the conclusions and recommend additional simulations be done to examine this.

      Thank you for your valuable comments. We agree that the force applied in Experiment 2 is possible to be larger than the physiological conditions. Therefore, we performed additional simulations to investigate the possibility of opening the pore using smaller torsional forces.

      As shown in the new Figure S6, we applied half and one-third of the original force and performed three replicate simulations for each condition. With half the force, the pore opened in two out of the three simulations. And with one-third of the applied force, the pore opened in one out of the three replicate simulations. The probability of pore opening within the same simulation time decreased as the applied force was reduced, consistent with our expectations. These new results are provided as supplementary figures (Figure S6) in the revised manuscript.

      We anticipate that further reductions in the forces will result in additional delays in the opening process; however, this would lead to prohibitive computational costs. Consequently, we have decided to conclude our analysis at this stage and have discussed this matter on page 6 of the revised manuscript.

      The second important consideration is that the study never sees a full pore opening, but rather a widening that is less than that seen in open state structures of other TRP channels and insufficient for rapid ion currents. This is something the authors acknowledge in their prior manuscript in eLife 2021. Although this may simply be due to the limited timescale of the simulations, it needs to be clearly stated as a caveat to the conclusions. Twist may be the key to getting this dilation, but we do not know if it is the key to full pore opening. To demonstrate that the observed dilation is a first step in the opening of pores, a structural comparison to open-state TRP channels would be beneficial in providing evidence that this motion is along the expected pathway of channel gating.

      We are grateful for this insightful comment. We acknowledge that our simulations do not capture a fully open state, but rather a dilation that is smaller than the open-state structures of other TRP channels. In our simulations, a pore radius exceeding 2 Å is considered as a partially open state, as this is generally sufficient for the permeation of water molecules or even small cations such as K<sup>+</sup> and Na<sup>+</sup> However, the passage of larger molecules and ions, such as Ca<sup>2+</sup> and clusters of hydrated ions, remains challenging. As you noted, this partial opening may be attributed to the limited timescale of the simulations.

      Furthermore, in accordance with your suggestion, we analyzed numerous TRP proteins for which multiple open or intermediate states have been resolved, and we have included a new figure (Figure 6). A clockwise rotation of the TRP domain is observed in the majority of these proteins upon gating. For instance, in the case of RnTRPV1, our analysis revealed that during TRPV1 activation, when different ligands are bound (RTX, DkTX), the pore undergoes gradual dilation, which involves a progressive clockwise rotation of the TRP domain. This analysis provides evidence that the observed motion aligns with expected gating transitions, supporting the notion that twist-induced TRP rotation and pore dilation may represent an initial step in the pore opening process.

      Nonetheless, we concur that further studies, including extended simulations, which are currently unfeasible, or experimental validation, will be necessary to ascertain whether our proposed mechanism is adequate for the complete opening of the pore. We have carefully discussed this on pages 10–12.

      Experiment three considers the intracellular domain and determines the link between compression and twisting of the intracellular AR domain. In this case, the end of the domain is twisted and it is shown that the domain compresses, the converse to the similar study previously done by the authors in which compression of the domain was shown to generate torque. While some additional analysis is provided on the inter-residue links that help generate this, this is less significant than the critical second experiment.

      Although experiment three is less significant in revealing the underlying gating mechanism, it provides quantitative measurements of the mechanical properties of the intriguing AR spring structure, which are currently challenging to obtain experimentally. These provide computational predictions for future experiments to validate.

      Reviewer #2 (Public review):

      This study uses all-atom MD simulation to explore the mechanics of channel opening for the NOMPC mechanosensitive channel. Previously the authors used MD to show that external forces directed along the long axis of the protein (normal to the membrane) result in AR domain compression and channel opening. This force causes two changes to the key TRP domains adjacent to the channel gate: 1) a compressive force pushes the TRP domain along the membrane normal, while 2) a twisting torque induces a clock-wise rotation on the TRP domain helix when viewing the bottom of the channel from the cytoplasm. Here, the authors wanted to understand which of those two changes is responsible for increasing the inner pore radius, and they show that it is the torque. The simulations in Figure 2 probe this question with different forces, and we can see the pore open with parallel forces in the membrane, but not with the membrane-normal forces. I believe this result as it is reproducible, the timescales are reaching 1 microsecond, and the gate is clearly increasing diameter to about 4 Å. This seems to be the most important finding in the paper, but the impact is limited since the authors already show how forces lead to channel opening, and this is further teasing apart the forces and motions that are actually the ones that cause the opening.

      Thank you for your insightful comments. We appreciate your recognition of our key finding that torque is responsible for increasing the inner pore radius. Indeed, our simulations illustrated in Figure 2 systematically explore the effects of different forces on pore opening. These results demonstrate that membrane-parallel forces are effective, while membrane-normal forces are not within the simulation time. We acknowledge that this study builds upon previous findings regarding force-induced channel opening. However, we believe that further decomposition of the specific forces and motions responsible for this process provides valuable mechanistic insights. By distinguishing the role of torque from the membrane-normal forces of the TRP helix, which is highly conserved across the TRP channel family, our work contributes to a more precise understanding of TRP channel gating. Moreover, in the revised manuscript, we conducted a systematic analysis of the structures of TRP family proteins and discovered that the clockwise rotation of the TRP domain is likely a universal gating mechanism among the TRP family, which significantly enhances and strengthens our original findings (Figure 6).

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Duan and Song interrogates the gating mechanisms and specifically force transmission in mechanosensitive NOMPC channels using steered molecular dynamics simulations. They propose that the ankyrin spring can transmit force to the gate through torsional forces adding molecular detail to the force transduction pathways in this channel.

      Strengths:

      Detailed, rigorous simulations coupled with a novel model for force transduction.

      Thank you for your positive comments.

      Weaknesses:

      Experimental validation of reduced mechanosensitivity through mutagenesis of proposed ankyrin/TRP domain coupling interactions would greatly enhance the manuscript. I have some additional questions documented below:

      We attempted to measure the mechanical properties of the AR domain and conduct mutagenesis experiments in collaboration with Prof. Jie Yan’s laboratory at the Mechanobiology Institute, National University of Singapore; however, this proved to be a significant challenge at this time. Given the urgency of the publication, we have decided to first publish the computational results and reserve further experimental studies for future investigations.

      (1) The membrane-parallel torsion force can open NOMPC

      How does the TRP domain interact with the S4-S5 linker? In the original structural studies, the coordination of lipids in this region seems important for gating. In this manner does the TRP domain and S4-S5 linker combined act like an amphipathic helix as suggested first for MscL (Bavi et al., 2016 Nature Communications) and later identified in many MS channels (Kefauver et al., 2020 Nature).

      In our analysis of the compression trajectories (trajectory: CI-1, Figure S4), we identified stable interactions between the TRP domain and the S4-S5 linker. These interactions primarily involve the residues S1421 and F1422 of the S4-S5 linker, as indicated by the large pink data points in Figure S4. Therefore, we agree that the TRP helix and the S4–S5 linker can be considered an amphipathic helical unit, analogous to the amphipathic helix observed in MscL and other mechanosensitive channels. Moreover, the pocket adjacent to the S4-S5 linker has been recognized as a binding site for small molecules in other ligand-activated TRP channels, such as the vanilloid-binding TRPV1. We hypothesize that this unit is likely to play a critical role in the polymodal gating of the TRP channel family, including ligand-induced activation. In the revised manuscript, we have included an analysis of the interaction between the TRP domain and the transmembrane (TM) domain on page 4 (Figure S4), and we have briefly discussed its implications on pages 10 and 12.

      (2) Torsional forces on shorter ankyrin repeats of mammalian TRP channels

      Is it possible torsional forces applied to the shorter ankyrin repeats of mammalian TRPs may also convey force in a similar manner?

      This is an intriguing question.

      To answer your question, we studied the full-length squirrel TRPV1 (PDB: 7LQY, Nadezhdin et al. (2021)) using all-atom steered MD simulations. We applied pushing or torsional forces to the intracellular AR1-2 region of TRPV1, separately (Figure S10(a)). Similar to NOMPC, rotation of the TRP domain was observed under both types of mechanical stimulation (Figure S10(b-e)). The conformational change induced by the torsional force on the TRP domain resembles the change observed in NOMPC. This suggests that a torsional force applied to the shorter ankyrin repeats of mammalian TRPs may yield similar effects on channel gating. However, given that these ankyrin repeats do not act like tether elements, the implications of these results in the context of biological functions remain unclear. Additionally, in NOMPC, the AR domain is connected to the TRP domain through a linker helix (LH) domain, composed of multiple stacked helices that form a relatively compact structure (Figure 1(a)). In contrast, TRPV1 does not possess a similarly compact LH domain connecting the AR domain to the TRP domain (Figure S10(a)). These structural differences render our conclusions regarding NOMPC not directly applicable to TRPV1. We have included an additional discussion about this on page 12 (Figure S10).

      (3) Constant velocity or constant force

      For the SMD the authors write "and a constant velocity or constant force". It’s unclear from this reviewer’s perspective which is used to generate the simulation data.

      Thank you for pointing out this ambiguity. In our simulations, we first applied constant-velocity pulling to achieve specific force magnitudes, followed by constantforce pulling. This protocol allowed us to initiate the motion of the protein in a controlled manner and observe the response of the system under sustained forces. We have now clarified this in the revised Methods section.

      Reviewer #1 (Recommendations for the authors):

      The language in the paper requires some editing - particularly in the introduction. For example, what is meant by ion channels ’coalescing to form mechanical receptors’? Are the authors implying it requires multiple channels to form a receptor? It is stated that mechanically gated ion channels are only found in nerve endings when in fact they are found in almost every cell type. Another example is the statement ’In the meantime’ the TRP domain was observed to rotate when this observation came prior to the others mentioned before. While these sound like minor edits, they significantly change the meaning of the introduction. I recommend careful editing of the manuscript to avoid accidental inaccuracies like this.

      Thank you for your feedback on the clarity and accuracy of the introduction. We have carefully revised the manuscript, particularly the abstract and instroduction sections, to address these concerns:

      (1) We have reworded the original sentence ’These mechanosensitive ion channels, coalescing to form mechanical receptors, are strategically positioned within the sensory neuron terminals intricately nestled within the epidermal layer.’ into ’In both vertebrates and invertebrates, mechanosensitive ion channels are widely expressed in peripheral sensory neurons located near or within the surface tissues responsible for detecting mechanical stimuli.’

      (2) We have replaced the phrase "In the meantime" with "Interestingly" to introduce the conformational change of the TRP domain that we believe is crucial.

      (3) We have carefully reviewed the entire manuscript and used a language editing tool, Writefull integrated within Overleaf, to proof-check the language problems.

      Reviewer #2 (Recommendations for the authors):

      How do the energy values in Figure 3b, compare with the continuum energy values reported by Argudo et al. JGP (2019)? I wonder what value the authors would get with a new replicate run slower - say 200 ns total aggregate simulation? This would probe the convergence of this energy value. It seems important to determine whether the loading velocity of the experiments performed here with the steered MD is slow enough to allow the protein to relax and adopt lower energy configurations during the transition. The true loading is likely to occur on the millisecond timescale, not the nanosecond to low microsecond timescale. That said, I don’t mean to detract from the result in Figure 2, as this is likely quite solid in my opinion given the nearly 1 microsecond simulations and the replicates showing the same results.

      Thank you for your valuable suggestions. It is important to note that we calculated different physical quantities compared to those reported in Argudo’s study. In Figure 3b, we calculated the torque ( instead of the energy, although they share the same dimensional units) of the long AR bundle (AR9-29 of the four filaments combined) and subsequently determined its torsion coefficient. Argudo’s study calculated the torsional spring constant (𝑘<sub>ɵ</sub>) of three 6-AR-unit stretches of one filament, which were designated as ANK1 (AR 12-17), ANK2 (AR 17-22) and ANK3 (AR 22–27). As the four filaments are coupled within the bundled structure and the torsional axes differ between an individual filament and the four-filament bundle, a direct comparison of the torsional spring constants reported in the two studies is not meaningful.

      We agree that extending the simulation time may provide deeper insights into the convergence of energy values. In accordance with your suggestion, we conducted additional simulations to further investigate convergence and compare the results with our existing data, thereby ensuring robustness and consistency. Specifically, we slowed down the original operation of twisting from 10 degrees over 100 ns to 10 degrees over 200 ns, and extended the holding time for selected frames (sampled every 2.5 degrees) from 100 ns to 200 ns. We have updated Figure 3 and relevant main text accordingly (page 7). The results of the new simulations are similar to those of the previous ones, with the fitted torsion coefficient revised from (2.31 ± 0.44) × 10<sup>3</sup>kJ mol<sup>−1</sup>  ra<sup>−1</sup> 1 to (2.30 ± 0.31) × 10<sup>3</sup> kJmol<sup>−1</sup> rad<sup>−1</sup>  This close agreement indicates that our simulations are well-converged. Additionally, we updated the compression–twist coupling coefficient, , from (1.67 ± 0.14) nmrad<sup>−1</sup> to (1.32 ± 0.11) nmrad<sup>−1</sup>

      As you suggested, we conducted an additioanl analysis to determine whether the loading velocity/force with the steered MD is sufficiently slow to facilitate the relaxation of the protein and its adoption of lower-energy configurations during the transition. For simulations involving the application of membrane-normal or membrane-parallel force on the TRP domain, we utilized DSSP (Define Secondary Structure of Proteins) analysis to assess the stability of the secondary structure of the TRP domain. The results indicated that, during the application of external forces, the secondary structure of the TRP domain maintained good stability, as illustrated in Figure S11. For simulations involving the rotation of the AR domain, we also analyzed the DSSP of the AR9 to AR11 units, which are positioned directly above the AR8 domain where the twisting force is applied. The secondary structure of the AR domain also exhibited good stability (Figure S12). These are briefly discussed in the Methods section of the revised manuscript (page 17).

      It is unclear to me that the force transmission analysis in Figure 4 provides much insight into the mechanics of opening. Perhaps the argument was made, but I did not appreciate it. Related to this the authors state that the transfer velocity is 1.8 nm/ps based on their previous study. Is this value profound or is it simply the velocity of sound in the protein?

      The analysis of force transmission presented in Figure 4 offers detailed insights into the transfer of force along the AR domain. While this may appear straightforward, the information elucidates how a pushing force can induce a twisting force during its transmission through the AR spring structure, as well as the primary contributions that stabilize this transmission pathway. To enhance clarity, we have included an additional discussion on page 9.

      The force transfer velocity is expected to align with the velocity of sound within the protein. The value of 1.8 nm/ps, however, is specific to the unique structure of the AR spring, which is quite interesting to report in our opinion. Additionally, this rapid transfer speed suggests that the simulation timescale is sufficient for enabling the transfer of compression force from the bottom of the AR domain to the TRP domain in our simulations, given that the simulation timescale is considerably longer than the force propagation timescale within the protein.

      The methods description is largely complete, but is missing some details on the MD simulations (barostat, thermostat, piston constants, etc.).

      Thank you for pointing out the missing details; we have added the additional information in the revised Methods section.

      References

      Nadezhdin, K. D., A. Neuberger, Y. A. Nikolaev, L. A. Murphy, E. O. Gracheva, S. N. Bagriantsev, and A. I. Sobolevsky (2021). Extracellular cap domain is an essential component of the trpv1 gating mechanism. Nature communications 12(1), 2154.

      Wang, Y., Y. Guo, G. Li, C. Liu, L. Wang, A. Zhang, Z. Yan, and C. Song (2021). The pushto-open mechanism of the tethered mechanosensitive ion channel nompc. Elife 10, e58388.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This paper presents a comprehensive study of how neural tracking of speech is a ected by background noise. Using five EEG experiments and Temporal response function (TRF), it investigates how minimal background noise can enhance speech tracking even when speech intelligibility remains very high. The results suggest that this enhancement is not attention-driven but could be explained by stochastic resonance. These findings generalize across di erent background noise types and listening conditions, o ering insights into speech processing in real-world environments. I find this paper well-written, the experiments and results are clearly described. However, I have a few comments that may be useful to address.

      I thank the reviewer for their positive feedback.

      (1) The behavioral accuracy and EEG results for clear speech in Experiment 4 di er from those of Experiments 1-3. Could the author provide insights into the potential reasons for this discrepancy? Might it be due to linguistic/ acoustic di erences between the passages used in experiments? If so, what was the rationale behind using di erent passages across di erent experiments?

      The slight di erences in behavior and EEG magnitudes may be due to several factors. Di erent participants took part in the di erent experiments (with some overlap). Stories and questions were generated using ChatGPT using the same approach, but di erent research assistants have supported story and question generation, and ChatGPT advanced throughout the course of the study, such that di erent versions were used over time (better version control was only recently introduced by OpenAI). The same Google voice was used for all experiments, so this cannot be a factor. Most critically, within each experiment, assignment of speech-clarity conditions to di erent stories was randomized, such that statistical comparisons are una ected by these minor di erences between experiments. The noise-related enhancement generalizes across all experiments, showing that minor di erences in experimental materials do not impact it.

      (2) Regarding peak amplitude extraction, why were the exact peak amplitudes and latencies of the TRFs for each subject not extracted, and instead, an amplitude average within a 20 ms time window based on the group-averaged TRFs used? Did the latencies significantly di er across di erent SNR conditions?

      Estimation of peak latency can be challenging if a deflection is not very pronounced in a participant. Especially the N1 was small for some conditions. Using the mean amplitude in a specific time window is very common practice in EEG research that mitigates this issue. Another, albeit less common, approach is to use a Jackknifing procedure to estimate each participant’s latencies (Smulders 2010 Psychophysiology; although this may sometimes not work well). For the revision, I used the Jackknifing approach to estimate peak latencies for each participant and condition, and extracted the mean amplitude around the peak latency. As expected, this approach provides very similar e ects as reported in the main article, here exemplified for Experiments 1 and 2. The results are thus not a ected by this data analysis choice. The estimated latencies di ered across SNRs, e.g., the N1 increased with decreasing SNR (this is less surprising/novel and was thus not added to the manuscript to avoid increasing the amount of information).

      Author response image 1.

      P1-minus-N1 amplitude for Experiment 1 and 2, using amplitudes centered on individually estimated peak latencies. The asterisk indicates a significant di erence from the clear speech condition (FDR-thresholded).

      (3) How is neural tracking quantified in the current study? Does improved neural tracking correlate with EEG prediction accuracy or individual peak amplitudes? Given the di ering trends between N1 and P2 peaks in babble and speech-matched noise in experiment 3, how is it that babble results in greater envelope tracking compared to speech-matched noise?

      Neural tracking is generally used for responses resulting from TRF analyses, crosscorrelations, or coherence, where the speech envelope is regressed against the brain signals (see review of Brodbeck & Simon 2020 Current Opinion in Physiology). Correlations between EEG prediction accuracy and individual peak amplitudes was not calculated because the data used for the analyses are not independent. The EEG prediction accuracy essentially integrates information over a longer time interval (here 0–0.4 s), whereas TRF amplitudes are more temporally resolved. If one were to shorten the time interval (e.g., 0.08–0.12 s), then EEG prediction accuracy would look more similar to the TRF results (because the TRF is convolved with the amplitude-onset envelope of the speech [predicted EEG] before calculating the EEG prediction accuracy). Regarding the enhancement di erence between speech-matched noise and babble, I have discussed a possible interpretation in the discussion section. The result is indeed surprising, but it replicates across two experiments (Experiments 3 and 4), and is consistent with previous work using speech-matched noise that did not find the enhancement. I reproduce the part of the discussion here.

      “Other work, using a noise masker that spectrally matches the target speech, have not reported tracking enhancements (Ding and Simon, 2013; Zou et al., 2019; Synigal et al., 2023). However, in these works, SNRs have been lower (<10 dB) to investigate neural tracking under challenging listening conditions. At low SNRs, neural speech tracking decreases (Ding and Simon, 2013; Zou et al., 2019; Yasmin et al., 2023; Figures 1 and 2), thus resulting in an inverted u-shape in relation to SNR for attentive and passive listening (Experiments 1 and 2).”

      “The noise-related enhancement in the neural tracking of the speech envelope was greatest for 12talker babble, but it was also present for speech-matched noise, pink noise, and, to some extent, white noise. The latter three noises bare no perceptional relation to speech, but resemble stationary, background buzzing from industrial noise, heavy rain, waterfalls, wind, or ventilation. Twelve-talker babble – which is also a stationary masker – is clearly recognizable as overlapping speech, but words or phonemes cannot be identified (Bilger, 1984; Bilger et al., 1984; Wilson, 2003; Wilson et al., 2012b). There may thus be something about the naturalistic, speech nature of the background babble that facilitates neural speech tracking.”

      “Twelve-talker babble was associated with the greatest noise-related enhancement in neural tracking, possibly because the 12-talker babble facilitated neuronal activity in speech-relevant auditory regions, where the other, non-speech noises were less e ective.”

      (4) The paper discusses how speech envelope-onset tracking varies with di erent background noises. Does the author expect similar trends for speech envelope tracking as well? Additionally, could you explain why envelope onsets were prioritized over envelope tracking in this analysis?

      The amplitude-onset envelope was selected because several previous works have used the amplitude-onset envelope, our previous work that first observed the enhancement also used the amplitude-onset envelope, and the amplitude-onset envelope has been suggested to work better for speech tracking. This was added to the manuscript. For the manuscript revision, analyses were calculated for the amplitude envelope, largely replicating the results for the amplitude-onset envelope. The results for the amplitude envelope are now presented in the Supplementary Materials and referred to in the main text.

      “The amplitude-onset envelope was selected because a) several previous works have used it (Hertrich et al., 2012; Fiedler et al., 2017; Brodbeck et al., 2018a; Daube et al., 2019; Fiedler et al., 2019), b) our previous work first observing the enhancement also used the amplitude-onset envelope (Yasmin et al., 2023; Panela et al., 2024), and c) the amplitude-onset envelope has been suggested to elicit a strong speech tracking response (Hertrich et al., 2012). Results for analyses using the amplitude envelope instead of the amplitude-onset envelope show similar e ects and are provided in the Supplementary Materials (Figure 1-figure supplement 1).”

      Recommendations for the authors:

      (1) Include all relevant parameters related to data analysis where applicable. For example, provide the filter parameters (Line 154, Line 177, Line 172), and the default parameters of the speech synthesizer (Line 131).

      Additional filter information and parameter values are provided in the revised manuscript.

      (2) Please share the data and codes or include a justification as to why the data cannot be shared.

      Data and code are provided on OSF (https://osf.io/zs9u5/). A materials availability statement has been added to the manuscript.

      Reviewer #2 (Public review):

      The author investigates the role of background noise on EEG-assessed speech tracking in a series of five experiments. In the first experiment, the influence of di erent degrees of background noise is investigated and enhanced speech tracking for minimal noise levels is found. The following four experiments explore di erent potential influences on this e ect, such as attentional allocation, di erent noise types, and presentation mode. The step-wise exploration of potential contributors to the e ect of enhanced speech tracking for minimal background noise is compelling. The motivation and reasoning for the di erent studies are clear and logical and therefore easy to follow. The results are discussed in a concise and clear way. While I specifically like the conciseness, one inevitable consequence is that not all results are equally discussed in depth. Based on the results of the five experiments, the author concludes that the enhancement of speech tracking for minimal background noise is likely due to stochastic resonance. Given broad conceptualizations of stochastic resonance as a noise benefit this is a reasonable conclusion. This study will likely impact the field as it provides compelling support questioning the relationship between speech tracking and speech processing.

      I thank the reviewer for the positive review and thoughtful feedback.

      Recommendations for the authors:

      As mentioned in the public review, I like the conciseness. However, some points might benefit from addressing them.

      (1) The absence of comprehension e ects is on the one hand surprising, as the decreased intelligibility should (theoretically) be visible in this data. On the other hand, from my own experience, the generation of "good" comprehension questions is quite di icult. While it is mentioned in the methods section, that comprehension accuracy and gist rating go hand in hand, this is not the case here. I am wondering if the data here should be rather understood as "there is no di erence in intelligibility" or that comprehension assessment via comprehension questions is potentially not a valid measure.

      I assume that the reviewer refers to Experiment 1, where SNRs approximately below 15 dB led to reduced gist ratings (used as a proxy for speech intelligibility; Davis and Johnsrude, 2003, J Neurosci; Ritz et al., 2022, J Neurosci). That story comprehension accuracy does not decrease could be due to the comprehension questions themselves (as indicated by the reviewer, “good” questions can be hard to generate, potentially having low sensitivity). On the other hand, speech for the most di icult SNR was still ‘reasonably’ intelligible (gist ratings suggest ~85% of words could be understood), and participants may still have been able to follow the thread of the story. I do not further discuss this point in the manuscript, since it is not directly related to the noise-related enhancement in the neural tracking response, because the enhancement was present for high SNRs for which gist ratings did not show a di erence relative to clear speech (i.e., 20 dB and above).

      (2) However, if I understood correctly, the "lower" manipulation (same RMS for the whole sound stimulus) of experiment 3 was, what was also used in experiment 1. In experiment 3, unlike 1, there are comprehension e ects. I wondered if there are ideas about why that is.

      Yes indeed, the ‘lower’ manipulation in Experiment 3 was also used in Experiments 1, 2, 4, and 5. The generation of the stimulus materials was similar across experiments. However, a new set of stories and comprehension questions was used for each experiment and the participants di ered as well (with some overlap). These aspects may have contributed to the di erence. 

      (3) Concerning the prediction accuracy, for a naive reader, some surrounding information would be helpful: What is the purpose/expectation of this measure? Is it to show that all models are above chance?

      EEG prediction accuracy was included here, mainly because it is commonly used in studies using TRFs. A reader may wonder about EEG prediction accuracy if it were not reported. The hypotheses of the current study are related to the TRF weights/amplitude. This was added to the manuscript.

      “EEG prediction accuracy was calculated because many previous studies report it (e.g., Decruy et al., 2019; Broderick et al., 2021; Gillis et al., 2021; Weineck et al., 2022; Karunathilake et al., 2023), but the main focus of the current study is on the TRF weights/amplitude.”

      (4) Regarding the length of training and test data I got confused: It says per story 50 25-s snippets. As the maximum length of a story was 2:30 min, those snippets were mostly overlapping, right? It seems that depending on the length of the story and the "location within the time series" of the snippets, the number of remaining non-over-lapping snippets is variable. Also, within training, the snippets were overlapping, correct? Otherwise, the data for training would be too short. Again, as a naive reader, is this common, or can overlapping training data lead to overestimations?

      The short stories made non-overlapping windows not feasible, but the overlap unlikely a ects the current results. Using cross-correlation (Hertrich et al 2012 Psychophysiology; which is completely independent for di erent snippets) instead of TRFs shows the same results (now provided in the supplementary materials). In one of our previous studies where the enhancement was first observed (Yasmin et al. 2023 Neuropsychologia), non-overlapping data were used because the stories were longer. This makes any meaningful impact of the overlap very unlikely. Critically, speech-clarity levels were randomized and all analyses were conducted in the same way for all conditions, thus not confounding any of the results/conclusions. The methods section was extended to further explain the choice of overlapping data snippets.

      “Speech-clarity levels were randomized across stories and all analyses were conducted similarly for all conditions. Hence, no impact of overlapping training data on the results is expected (consistent with noise-related enhancements observed previously when longer stories and non-overlapping data were used; Yasmin et al., 2023). Analyses using cross-correlation, for which data snippets are treated independently, show similar results compared to those reported here using TRFs (Figure 1figure supplement 2).”

      (5) For experiment 1, three stories were clear, while the other 21 conditions were represented by one story each. Presumably, the ratio of 3:1 can a ect TRFs?

      TRFs were calculated for each story individually and then averaged across three stories: either three clear stories, or three stories in babble for neighboring SNRs. Hence, the same number of TRFs were averaged for clear and noise conditions, avoiding exactly this issue. This was described in the methods section and is reproduced here:

      “Behavioral data (comprehension accuracy, gist ratings), EEG prediction accuracy, and TRFs for the three clear stories were averaged. For the stories in babble, a sliding average across SNR levels was calculated for behavioral data, EEG prediction accuracy, and TRFs, such that data for three neighboring SNR levels were averaged. Averaging across three stories was calculated to reduce noise in the data and match the averaging of three stories for the clear condition.”

      (6) Was there an overlap in participants?

      Some participants took part in several of the experiments in separate sessions on separate days. This was added to the manuscript.

      “Several participants took part in more than one of the experiments, in separate sessions on separate days: 7, 7, 9, 9, and 14 (for Experiments 1-5, respectively) participated only in one experiment; 3 individuals participated in all 5 experiments; 68 unique participants took part across the 5 experiments.”

      (7) Can stochastic resonance also explain inverted U-shape results with vocoded speech?

      This is an interesting question. Distortions to the neural responses to noise-vocoding may reflect internal noise, but this would require additional research. For example, the Hauswald study (2022 EJN), showing enhancements due to noise-vocoding, used vocoding channels that also reduced speech intelligibility. The study would ideally be repeated with a greater number of vocoding channels to make sure the e ects are not driven by increased attention due to reduced speech intelligibility. I did not further discuss this in detail in the manuscript as it would go too far away from the experiments of the current study.

      (8) Typo in the abstract: box sexes is probably meant to say both sexes?

      This text was removed, because more detailed gender identification is reported in the methods, and the abstract needed shortening to meet the eLife guidelines.

      Reviewing Editor Comments:

      Interesting series of experiments to assess the influence of noise on cortical tracking in di erent conditions, interpreting the results with the mechanism of stochastic resonance.

      I thank the editor for their encouraging feedback.

      For experiment 2, the author wishes to exclude the role of attention, by making participants perform a visual task. Data from low performers on the visual task was excluded, to avoid that participants attended the spoken speech. However, from the high performers on the visual task, how can you be sure that they did not pay attention to the auditory stimuli as well (as auditory attention is quite automatic, and these participants might be good at dividing their attention)? I understand that you can not ask participants about the auditory task during the experiment, but did you ask AFTER the experiment whether they were able to understand the stimuli? I think this is crucial for your interpretation.

      Participants were not asked whether they were able to understand the stimuli. Participants would unlikely invest e ort/attention in understanding the stories in babble without a speech-related task. Nevertheless, for follow-up analyses, I removed participants who performed above 0.9 in the visual task (i.e., the high performers), and the di erence between clear speech and speech in babble replicates. In the plots, data from all babble conditions above 15 dB SNR (highly intelligible) were averaged, but the results look almost identical if all SNRs are averaged. Moreover, the correlation between visual task performance and the babble-related enhancement was not-significant. These analyses were added to the Supplementary Materials (Figure 2-figure supplement 1).  

      Statistics: inconsistencies across experiments with a lot of simple tests (FDR corrected) and in addition sometimes rmANOVA added - if interactions in rmANOVA are not significant then all the simple tests might not be warranted. So a bit of double dipping and over-testing here, but on the whole the conclusions do not seem to be overstated.

      The designs of the di erent experiments di ered, thus requiring di erent statistical approaches. Moreover, the di erent tests assess di erent comparisons. For all experiments, contrasting the clear condition to all noise conditions was the main purpose of the experiments. To correct for multiple comparison, the False Discovery Rate correction was used. Repeated-measures ANOVAs were conducted in addition to this – excluding the clear condition because it would not fit into a factorial structure (e.g., Experiment 3) or to avoid analyzing it twice (e.g., Experiment 5) – to investigate di erences between di erent noise conditions. There was thus no over-testing in the presented study.

      Small points:

      Question on methods: For each story, 50 25-s data snippets were extracted (Page 7, line 190). As you have stories with a duration of 1.5 to 2 minutes, does that mean there is a lot of overlap across data snippets? How does that influence the TRF/prediction accuracy?

      The short stories made non-overlapping windows not feasible, but the overlap unlikely a ects the current results. Using cross-correlation (Hertrich et al 2012 Psychophysiology; which is completely independent for di erent snippets) instead of TRFs shows the same results (newly added Figure 1-figure supplement 2). In one of our previous studies where the enhancement was first observed (Yasmin et al. 2023 Neuropsychologia), non-overlapping data were used because the stories were longer. This makes any meaningful impact of the overlap very unlikely. Critically, speechclarity levels were randomized and all analyses were conducted in the same way for all conditions, thus not confounding any of the results/conclusions. The methods section was extended to further explain the choice of overlapping data snippets.

      “Overlapping snippets in the training data were used to increase the amount of data in the training given the short duration of the stories. Speech-clarity levels were randomized across stories and all analyses were conducted similarly for all conditions. Hence, no impact of overlapping training data on the results is expected (consistent with noise-related enhancements observed previously when longer stories and non-overlapping data were used; Yasmin et al., 2023). Analyses using crosscorrelation, for which data snippets are treated independently, show similar results compared to those reported here using TRFs (Figure 1-figure supplement 2).”

      Results Experiment 3: page 17, line 417: no di erences were found between clear speech and masked speech - is this a power issue (as it does look di erent in the figure, Figure 4b)?

      I thank the editor for pointing this out. Indeed, I made a minor mistake. Two comparisons were significant after FDR-thresholding. This is now included in the revised Figure 4. I also made sure the mistake was not present for other analyses; which it was not.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript studies nutrient intake rates for stationary and motile microorganisms to assess the effectiveness of swim vs. stay strategies. This work provides valuable insights on how the different strategies perform in the context of a simplified mathematical model that couples hydrodynamics to nutrient advection and diffusion. The swim and stay strategies are shown to yield similar nutrient flux under a range of conditions.

      Strengths:

      Strengths of the work include (i) the model prediction in Fig. 3 of nutrient flux applied to a range of microorganisms including an entire clade that are known to use different feeding strategies and (ii) a study of the interaction between cilia and absorption coverage showing the robustness of their predictions provided these regions have sufficient overlap.

      We thank the referee for their thorough review of our manuscript and for their constructive feedback.

      Weaknesses: To improve the work, the authors should further expand their discussion of the following points:

      (1) The authors comment that a number of species alternate between sessile and motile behavior. It would be helpful to discuss what is known about what causes switching between these modes and whether this provides insights regarding the advantages of the different behaviors.

      The transition between sessile and motile states is often influenced by external environmental conditions, such as prey availability and predator presence, which determine the most advantageous state at any given time. For instance, members of the genus Stentor are known to detach from their colonies and exhibit solitary swimming behavior in response to low prey abundance (Tartar, 2013) or when avoiding predators (Dexter et al. 2019). Similarly, the transition in Vorticella is influenced by chemical cues, such as pH (Baufer et al., 1999) or algae concentration (Langlois, 1975).

      References:

      Dexter, J. P., Prabakaran, S., & Gunawardena, J. (2019). A complex hierarchy of avoidance behaviors in a single-cell eukaryote. Current biology, 29(24), 4323-4329.

      Tartar, V. (2013). The biology of stentor: International series of monographs on pure and applied biology: Zoology. Elsevier.

      BAUFER, P. J. D., Amin, A. A., Pak, S. C., & BUHSE JR, H. E. (1999). A method for the synchronous induction of large numbers of telotrochs in Vorticella convallaria by monocalcium phosphate at low pH. Journal of Eukaryotic Microbiology, 46(1), 12-16.

      LANGLOIS, G. A. (1975). Effect of algal exudates on substratum selection by motile telotrochs of the marine peritrich ciliate Vorticella marina. The Journal of Protozoology, 22(1), 115-123.

      (2) An encounter zone of R=1.1a appears be used throughout the manuscript, but I could not find a biological justification for this particular value. This results appear to be quite sensitive to this choice, as shown in Supplement Fig. 3(B). In the Discussion, it is mentioned that using a much larger exclusion zone leads to significantly different nutrient flux, and it is implied that such a large exclusion zone is not biologically plausible, but this was not explained sufficiently.

      Thank you for pointing this out. We chose the value of the encounter zone based on a rough calculation of cilia length relative to body length. Cilia are typically of the order of 10 microns in length, and the cell body of a ciliate is typically of the order of 100-1000 microns. 

      For example, in the work of Jiang, H., & Buskey, E. J., 2024, I&II, the nutrient encounter is reported at the leading edge of the ciliary band in Strombidium and Amphorides. Here, cilia appear to be about 20% of the body length and the particles are absorbed quite close to the cell surface. A similar encounter near the cell surface is reported in Gilmour, 1978 and Thomazo et al., 2020.

      In the theoretical model of Andersen and Kiørboe (2020), a much larger encounter zone, extending 10 times the body length (that is, an encounter zone that is 1000% larger than the body length). This is obviously not biologically justifiable. 

      We edited the manuscript to better justify our choices and provide supporting references. 

      References:

      Andersen, A., & Kiørboe, T. (2020). The effect of tethering on the clearance rate of suspension-feeding plankton. Proceedings of the National Academy of Sciences, 117(48), 30101-30103.

      Jiang, H., & Buskey, E. J. (2024). Relating ciliary propulsion morphology and flow to particle acquisition in marine planktonic ciliates II: the oligotrich ciliate Strombidium capitatum. Journal of Plankton Research, fbae011.

      Jiang, H., & Buskey, E. J. (2024). Relating ciliary propulsion morphology and flow to particle acquisition in marine planktonic ciliates I: the tintinnid ciliate Amphorides quadrilineata. Journal of Plankton Research, fbae012.

      Gilmour, T. H. J. (1978). Ciliation and function of the food-collecting and waste-rejecting organs of lophophorates. Canadian Journal of Zoology, 56(10), 2142-2155.

      Thomazo, J. B., Le Révérend, B., Pontani, L. L., Prevost, A. M., & Wandersman, E. (2021). A bending fluctuation-based mechanism for particle detection by ciliated structures. Proceedings of the National Academy of Sciences, 118(31), e2020402118.

      (3) In schematic of the in Fig. 2(B) it was unclear if the encounter zone in the envelope model is defined analogously to the Stokeslet model or if a different formulation is used.

      Yes, we defined the encounter zone the same in both models. In fact, we used two metrics for evaluating nutrient uptake: one considers only the fluid flow rate through an encounter zone, another considers the mass transport within the fluid and absorption at the entire ciliary surface. For the first metric, the clearance rate Q, evaluated by calculating the flow rate past an annular disk, it is consistent applied to all models, depicted in Figure 2(B). The second metric, the nutrient uptake rate, which we define as the dimensionless integration of mass flux over the entire spherical surface, is also consistently applied to all models to evaluate Sh number. Both metrics are evaluated on the Stokeslet and envelope models.

      We edited the main text to further clarify these two metrics in the revision.

      (4) The force balance argument should be clarified. Equation (3) of the supplement gives the force-velocity relation in the motile case. Since equation (4), which the authors state is the net force in the sessile case, seems to involve the same expression, would it not follow from U=0 in the sessile case that one would simply obtain quiescent flow with Fcilia = 0?

      The force balance equations for the model organism differ between the motile and sessile modes. In the submitted version, SI Eq.(3) and SI Eq.(4) are derived from different force balance equations, where the velocity U does not appear in the sessile Stokeslet model.

      Author response image 1.

      For the Stokeslet model, the force generated by the flagella acting on the fluid is modeled as a point force

      Motile Stokeslet model:

      The force balance on the sphere is given by:

      Where  is the thrust force generated by the flagella in the direction of swimming, is the drag force due to a moving sphere in fluid with speed U, and K is the hydrodynamic force acting on the sphere by the flow generated by the point force F. For a given strength of the Stokeslet, , the swimming speed U can be calculated by the force balance.

      Sessile Stokeslet model:

      The force balance on the sphere is given by:

      Where , T= -F, and K are defined as above. Similarly, for a given point force F, the required force provided by a stalk to fix the sphere can be calculated by the force balance.

      Therefore, SI Eq.(3) and (4), are not directly applicable across both the Stokeslet and envelope models. While the expressions appear similar due to the presence of the forces F and K, separate calculations are needed depending on the force model.

      We edited the SI document and SI Figure 3 to clarify this.

      Reference:

      Andersen, A., & Kiørboe, T. (2020). The effect of tethering on the clearance rate of suspension-feeding plankton. Proceedings of the National Academy of Sciences, 117(48), 30101-30103.

      Reviewer #2 (Public Review):

      Summary:

      The authors have collected a significant amount of data from the literature on the flow regimes associated with microorganisms whose propulsion is achieved through the action of cilia or flagella, with particular interest in the competition between sessile and motile lifestyles. They then use several distinct hydrodynamic models for the cilia-driven flows to quantify the nutrient uptake and clearance rate, reported as a function of the Peclet number. Among the interesting conclusions the authors draw concerns the question of whether, for certain ciliates, there is a clear difference in nutrient uptake rates in the sessile versus motile forms. The authors show that this is not the case, thereby suggesting that the evolutionary pressure associated with such a difference is not present. The analysis also includes numerical calculations of the uptake rate for spherical swimmers in the regime of large Peclet numbers, where the authors note an enhancement due to advection-generated thinning of the solutal boundary layer around the organism.

      Strengths:

      In addressing the whole range of organism sizes and Peclet numbers the authors have achieved an important broad perspective on the problem of nutrient uptake of ciliates, with implications for understanding evolutionary driving forces toward particular lifestyles (e.g. sessile versus motile).

      We thank the referee for their thorough review of our manuscript and for their feedback regarding the inclusion of more relevant references.

      Weaknesses:

      The authors appear to be unaware of rather similar calculations that were done some years ago in the context of Volvox, in which the issue of the boundary layer size and nutrient uptake enhancement were clearly recognized [M.B. Short, et al., Flows Driven by Flagella of Multicellular Organisms Enhance Long-Range Molecular Transport, PNAS 103, 8315-8319 (2006)]. This reference also introduced the model of a fixed shear stress at the surface of the sphere as a representation of the action of the cilia, which may be more realistic than the squirmer-type boundary condition, although the two lead to similar large-Pe scalings.

      We apologize for having missed to include this reference in the submitted version of the manuscript. We read this work thoroughly, it is indeed highly relevant to the present study.

      The findings reported in Figure 4, that the uptake rate is robust to variations in cilia coverage and absorption fraction, are similar in spirit to an observation made recently in the context of the somatic cell neighbourhood areas in Vovox [Day, et al., eLife 11, e72707 (2022)]. There, it was found that while there is a broad distribution of those areas, and hence of the coarse-grained tangential flagellar force acting on the fluid, the propulsion speed is rather insensitive to those variations.

      Thank you for pointing us to the work of Day, et al., eLife 11, e72707 (2022). We did not know about this study and have not read it before. The work is broadly relevant to our study, and we added a reference to this work in the discussion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      1. Evidence for a disulfide bridge contained in membrane-associated FGF2 dimers

      This aspect was brought up in detail by both Reviewer #1 and Reviewer #3. It has been addressed in the revised manuscript by (i) new experimental and computational analyses, (ii) a more detailed discussion of previous work from our lab in which experiments were done the reviewers were asking for and (iii) a more general discussion of known examples of disulfide formation in protein complexes with a particular focus on membrane surfaces facing the cytoplasm, the inner plasma membrane leaflet being a prominent example. Please find our detailed comments in our direct response to Reviewers #1 and #3, see below.

      1. Affinity towards PI(4,5)P2 comparing FGF2 dimers versus monomers

      This is an aspect that has been raised by Reviewer 3 along with additional comments on the interaction of FGF2 with PI(4,5)P2. Please find our detailed response below. With regard to PI(4,5)P2 affinity aspects of FGF2 dimers versus FGF2 monomers, we think that the increased avidity of FGF2 dimers with two high affinity binding pockets for PI(4,5)P2 are a good explanation for the different values of free energies of binding that were calculated from the atomistic molecular dynamics simulations shown in Fig. 9. This phenomenon is well known for many biomolecular interactions and is also consistent with the cryoEM data contained in our manuscript, showing a FGF2 dimer with two PI(4,5)P2 binding sites facing the membrane surface.

      1. C95-C95 FGF2 dimers as signaling units

      We have put forward this hypothesis since in structural studies analyzing the FGF ternary signaling complex consisting of FGF2, FGF receptor and heparin, FGF2 mutants were used that lack C95. Nevertheless, two FGF2 molecules are contained in FGF signaling complexes. In addition to the papers on the structure of the FGF signaling complex, we have cited work that showed that C95-C95 crosslinked FGF2 dimers are efficient FGF signaling modules (Decker et al, 2016; Nawrocka et al, 2020). Therefore, being based on an assembly/disassembly mechanism with the transient formation of poreforming FGF2 oligomers, we think it is an interesting idea that the FGF2 secretion pathway produces C95-C95 disulfide-linked FGF2 dimers at the outer plasma membrane leaflet that can engage in FGF2 ternary signaling complexes. While this is a possibility we put forward to stimulate the field, it of course remains a hypothesis which has been clearly indicated as such in the revised manuscript.

      Reviewer #1:

      1. Evidence for disulfide-bridged FGF2 dimers and higher oligomers on non-reducing versus reducing SDS gels

      The experiment suggested by Reviewer #1 is an important one that has been published by our group in previous work. In these studies, we found FGF2 oligomers analyzed on non-reducing SDS gels to be sensitive to DTT, turning the vast majority of oligomeric FGF2 species into monomers [(Müller et al, 2015); Fig. 3, compare panel D with panel H]. This phenomenon could be observed most clearly after short periods of incubations (0.5 hours) of FGF2 with PI(4,5)P2-containing liposomes. These findings constituted the original evidence for PI(4,5)P2-induced FGF2 oligomerization to depend on the formation of intermolecular disulfide bridges.

      In the current manuscript, we established the structural principles underlying this process and identified C95 to be the only cysteine residue involved in disulfide formation. Based on biochemical cross-linking experiments in cells, cryo-electron tomography, predictions from AlphaFold-2 Multimer and molecular dynamics simulations, we demonstrated a strong FGF2 dimerization interface in which C95 residues are brought into close proximity when FGF2 is bound to membranes in a PI(4,5)P2-dependent manner. These findings provide the structural basis by which disulfide bridges can be formed from the thiols contained in the side chains of two C95 residues directly facing each other in the dimerization interface. In the revised manuscript, we included additional data that further strengthen this analysis. In the experiments shown in the new Fig. 10, we combined chemical cross-linking with mass spectrometry, further validating the reported FGF2 dimerization interface. In addition, illustrated in the new Fig. 8, we employed a new computational analysis combining 360 individual atomistic molecular dynamics simulations, each spanning 0.5 microseconds, with advanced machine learning techniques. This new data set corroborates our findings, demonstrating that the C95-C95 interface self-assembles independently of C95-C95 disulfide formation, based on electrostatic interactions. Intriguingly, it is consistent with our experimental findings based on cross-linking mass spectrometry (new Fig. 10) where cross-linked peptides could also be observed with the C77/95A variant form of FGF2, suggesting a protein-protein interface whose formation does not depend on disulfide formation. Therefore, we propose that disulfide formation occurs in a subsequent step, representing the committed step of FGF2 membrane translocation with the formation of disulfide-bridged FGF2 dimers being the building blocks for pore-forming FGF2 oligomers.

      As a more general remark on the mechanistic principles of disulfide formation in different cellular environments, we would like to emphasize that it is a common misconception that the reducing environment of the cytoplasm generally makes the formation of disulfide bridges unlikely or even impossible. From a biochemical point of view, the formation of disulfide bridges is not limited by a reducing cellular environment but is rather controlled by kinetic parameters when two thiols are brought into proximity. Indeed, it has become well established that disulfide bridges can also be formed in compartments other than the lumen of the ER/Golgi system, including the cytoplasm. For example, viruses maturing in the cytoplasm can form stable structural disulfide bonds in their coat proteins (Locker & Griffiths, 1999; Hakim & Fass, 2010). Moreover, many cytosolic proteins, including phosphatases, kinases and transcriptions factors, are now recognized to be regulated by thiol oxidation and disulfide bond formation, formed as a post-transcriptional modification (Lennicke & Cocheme, 2021). In numerous cases with direct relevance for our studies on FGF2, disulfide bond formation and other forms of thiol oxidation occur in association with membrane surfaces. In fact, many of these processes are linked to the inner plasma membrane leaflet (Nordzieke & Medrano-Fernandez, 2018). Growth factors, hormones and antigen receptors are observed to activate transmembrane NADPH oxidases generating O2·-/H2O2 (Brown & Griendling, 2009). For example, the local and transient oxidative inactivation of membrane-associated phosphatases (e.g., PTEN) serves to enhance receptor associated kinase signaling (Netto & Machado, 2022). It is therefore conceivable that similar processes introduce disulfide bridges into FGF2 while assembling into oligomers at the inner plasma membrane leaflet. In the revised version of our manuscript, we have discussed the above-mentioned aspects in more detail, with the known role of NADPH oxidases in disulfide formation at the inner plasma membrane leaflet being highlighted.

      Reviewer #2:

      1. Potential effects of a C95A substitution on protein folding and comparison with a C95S substitution with regard to phenotypes observed in FGF2 secretion

      A valid point that we indeed addressed at the beginning of this project. Most importantly, we tested whether both FGF2 C95A and FGF2 C95S are characterized by severe phenotypes in FGF2 secretion efficiency. As shown in the revised Fig. 1, cysteine substitutions by serine showed very similar FGF2 secretion phenotypes compared to cysteine to alanine substitutions (Fig. 1C and 1D). In addition, in the pilot phase of this project, we also compared recombinant forms of FGF2 C95A and FGF2 C95S in various in vitro assays. For example, we tested the full set of FGF2 variants in membrane integrity assays as the ones contained in Fig. 4. As shown in Author response image 1, FGF2 variant forms carrying a serine in position 95 behaved in a very similar manner as compared to FGF2 C95A variant forms. Relative to FGF2 wild-type, membrane pore formation was strongly reduced for both types of C95 substitutions. By contrast, both FGF2 C77S and C77A did show activities that were similar to FGF2 wild-type.

      Author response image 1.

      From these experiments, we conclude that changes in protein structure are not the basis for the phenotypes we report on the C95A substitution in FGF2.

      1. Effects of a C77A substitution on FGF2 membrane recruitment in cells

      The effect of a C77A substitution in FGF2 recruitment to the inner plasma membrane leaflet is indeed a moderate one. This is likely to be the case because C77 is only one residue of a more complex surface that contacts the α1 subunit of the Na,K-ATPase. Stronger effects can be observed when K54 and K60 are changed, residues that are positioned in close proximity to C77 (Legrand et al, 2020). Nevertheless, as shown in the revised Fig. 1, we consistently observed a reduction in membrane recruitment when comparing FGF2 C77A with FGF2 wild-type. When analyzing the raw data without GFP background subtraction, a significant reduction of FGF2 C77A was observed compared to FGF2 wild-type (Fig. 1A and 1B). We therefore conclude that C77 does not only play a role in FGF2/α1 interactions in biochemical assays using purified components (Fig. 7) but also impairs FGF2/α1 interactions in a cellular context (Fig. 1A and 1B).

      1. Identity of the protein band in Fig. 3 labeled with an empty diamond

      This is a misunderstanding as we did not assign this band to a FGF2-GFP dimer. When we produced the corresponding cell lines, we used constructs that link FGF2 with GFP via a ‘self-cleaving’ P2A sequence. During translation, even though arranged on one mRNA, this causes the production of FGF2 and GFP as separate proteins in stoichiometric amounts, the latter being used to monitor transfection efficiency. However, a small fraction is always expressed as a complete FGF2-P2A-GFP fusion protein (a monomer). This band can be detected with the FGF2 antibodies used and was labeled in Fig. 3 by an empty diamond.

      1. Labeling of subpanels in Fig. 5A

      We have revised Fig. 5 according to the suggestion of Reviewer #2.

      1. FGF2 membrane binding efficiencies shown in Fig. 5C

      It is true that FGF2 variant forms defective in PI(4,5)P2-dependent oligomerization (C95A and C77/95A) bind to membranes with somewhat reduced efficiencies. This is also evident form the intensity profiles shown in Fig. 5A and was observed in biochemical in vitro experiments as well. A plausible explanation for this phenomenon would be the increased avidity when FGF2 oligomerizes, stabilizing membrane interactions (see also Fig. 9B).

      1. Residual activities of FGF2 C95A and C77/95A in membrane pore formation?

      We do not assign the phenomenon in Fig. 5 Reviewer #2 is referring to as controlled activities of FGF2 C95A and C77/95A in membrane pore formation. Rather, GUVs containing PI(4,5)P2 are relatively labile structures with a certain level of integrity issues upon protein binding and extended incubation times being conceivable. It is basically a technical limitation of this assay with GUVs incubated with proteins for 2 hours. Even after substitution of PI(4,5)P2 with a Ni-NTA membrane lipid, background levels of loss of membrane integrity can be observed (Fig. 6). Therefore, as compared to FGF2 C95A and C77/95A, the critical point here is that FGF2 wt and FGF2 C77A do display significantly higher levels of a loss of membrane integrity in PI(4,5)P2-containing GUVs, a phenomenon that we interpret as controlled membrane pore formation. By contrast, all variant forms of FGF2 show only background levels for loss of membrane integrity in GUVs containing the Ni-NTA lipid.

      1. Why does PI(4,5)P2 induce FGF2 dimerization?

      This has been studied extensively in previous work (Steringer et al, 2017). As also discussed in the current manuscript, the interaction of FGF2 with membranes through its high affinity PI(4,5)P2 binding pocket orients FGF2 molecules on a 2D surface that increase the likelihood of the formation of the C95containing FGF2 dimerization interface. Moreover, in the presence of cholesterol at levels typical for plasma membranes, PI(4,5)P2 clusters containing up to 4 PI(4,5)P2 molecules (Lolicato et al, 2022), a process that may further facilitate FGF2 dimerization.

      1. Is it possible to pinpoint the number of FGF2 subunits in oligomers observed in cryo-electron tomography?

      We indeed took advantage of the Halo tags that appear as dark globular structures in cryo-electron tomography. For most FGF2 oligomers with FGF2 subunits on both sides of the membrane, we could observe 4 to 6 Halo tags which is consistent with the functional subunit number that has been analyzed for membrane pore formation (Steringer et al., 2017; Sachl et al, 2020; Singh et al, 2023). However, since the number of higher FGF2 oligomers we observed in cryo-electron tomography was relatively small and the nature of these oligomers appears to be highly dynamic, caution should be taken to avoid overinterpretation of the available data.

      Reviewer #3:

      1. Conclusive demonstration of disulfide-linked FGF2 dimers

      A similar point was raised by Reviewer #1, so that we would like to refer to our response on page 2, see above.

      1. Identity of FGF2-P2A-GFP observed in Fig. 3

      Again, a similar point has been made, in this case by Reviewer #2 (Point 3). The observed band is not a FGF2-P2A-GFP dimer but rather the complete FGF2-P2A-GFP fusion protein (a monomer) that corresponds to a small population produced during mRNA translation where the P2A sequence did not cause the production of FGF2 and GFP as separate proteins in stoichiometric amounts.

      1. Quantification of GFP signals in Fig. 6

      Fig. 6 has been revised according to the suggestion of Reviewer #3. A comprehensive comparison of PI(4,5)P2 and the Ni-NTA membrane lipid in FGF2 membrane translocation assays is also contained in previous work that introduced the GUV-based FGF2 membrane translocation assay (Steringer et al., 2017).

      1. Experimental evidence for various aspects of FGF2 interactions with PI(4,5)P2

      Most of the points raised by Reviewer #3 have been addressed in previous work. For example, FGF2 has been demonstrated to dimerize only on membrane surfaces containing PI(4,5)P2 (Müller et al., 2015). In solution, FGF2 remained a monomer even after hours of incubation as analyzed by native gel electrophoresis and reducing vs. non-reducing SDS gels (see Fig. 3 in Müller et al, 2015). In the same paper, the first evidence for a potential role of C95 in FGF2 oligomerization has been reported, however, at the time, our studies were limited to FGF2 C77/95A. In the current manuscript, the in vitro experiments shown in Figs. 2 to 6 establish the unique role of C95 in PI(4,5)P2-dependent FGF2 oligomerization. As discussed above, FGF2 oligomers have been shown to contain disulfide bridges based on analyses on non-reducing gels in the absence and presence of DTT (Müller et al., 2015).

      References

      Brown DI, Griendling KK (2009) Nox proteins in signal transduction. Free Radic Biol Med 47: 1239-1253 Decker CG, Wang Y, Paluck SJ, Shen L, Loo JA, Levine AJ, Miller LS, Maynard HD (2016) Fibroblast growth factor 2 dimer with superagonist in vitro activity improves granulation tissue formation during wound healing. Biomaterials 81: 157-168

      Hakim M, Fass D (2010) Cytosolic disulfide bond formation in cells infected with large nucleocytoplasmic DNA viruses. Antioxid Redox Signal 13: 1261-1271

      Legrand C, Saleppico R, Sticht J, Lolicato F, Muller HM, Wegehingel S, Dimou E, Steringer JP, Ewers H, Vattulainen I et al (2020) The Na,K-ATPase acts upstream of phosphoinositide PI(4,5)P2 facilitating unconventional secretion of Fibroblast Growth Factor 2. Commun Biol 3: 141

      Lennicke C, Cocheme HM (2021) Redox metabolism: ROS as specific molecular regulators of cell signaling and function. Mol Cell 81: 3691-3707

      Locker JK, Griffiths G (1999) An unconventional role for cytoplasmic disulfide bonds in vaccinia virus proteins. J Cell Biol 144: 267-279

      Lolicato F, Saleppico R, Griffo A, Meyer A, Scollo F, Pokrandt B, Muller HM, Ewers H, Hahl H, Fleury JB et al (2022) Cholesterol promotes clustering of PI(4,5)P2 driving unconventional secretion of FGF2. J Cell Biol 221

      Müller HM, Steringer JP, Wegehingel S, Bleicken S, Munster M, Dimou E, Unger S, Weidmann G, Andreas H, GarciaSaez AJ et al (2015) Formation of Disulfide Bridges Drives Oligomerization, Membrane Pore Formation and Translocation of Fibroblast Growth Factor 2 to Cell Surfaces. J Biol Chem 290: 8925-8937

      Nawrocka D, Krzyscik MA, Opalinski L, Zakrzewska M, Otlewski J (2020) Stable Fibroblast Growth Factor 2 Dimers with High Pro-Survival and Mitogenic Potential. Int J Mol Sci 21

      Netto LES, Machado L (2022) Preferential redox regulation of cysteine-based protein tyrosine phosphatases: structural and biochemical diversity. FEBS J 289: 5480-5504

      Nordzieke DE, Medrano-Fernandez I (2018) The Plasma Membrane: A Platform for Intra- and Intercellular Redox Signaling. Antioxidants (Basel) 7

      Sachl R, Cujova S, Singh V, Riegerova P, Kapusta P, Muller HM, Steringer JP, Hof M, Nickel W (2020) Functional Assay to Correlate Protein Oligomerization States with Membrane Pore Formation. Anal Chem 92: 14861-14866

      Singh V, Macharova S, Riegerova P, Steringer JP, Muller HM, Lolicato F, Nickel W, Hof M, Sachl R (2023) Determining the Functional Oligomeric State of Membrane-Associated Protein Oligomers Forming Membrane Pores on Giant Lipid Vesicles. Anal Chem 95: 8807-8815

      Steringer JP, Lange S, Cujova S, Sachl R, Poojari C, Lolicato F, Beutel O, Muller HM, Unger S, Coskun U et al (2017) Key steps in unconventional secretion of fibroblast growth factor 2 reconstituted with purified components. eLife 6: e28985

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors' finding that PARG hydrolase removal of polyADP-ribose (PAR) protein adducts generated in response to the presence of unligated Okazaki fragments is important for S-phase progression is potentially valuable, but the evidence is incomplete, and identification of relevant PARylated PARG substrates in S-phase is needed to understand the role of PARylation and dePARylation in S-phase progression. Their observation that human ovarian cancer cells with low levels of PARG are more sensitive to a PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation, suggests that low PARG protein levels could serve as a criterion to select ovarian cancer patients for treatment with a PARG inhibitor drug.

      Thank you for the assessment and summary. Please see below for details as we have now addressed the deficiencies pointed out by the reviewers.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      Public Reviews:

      Reviewer #1 (Public Review):

      I have a major conceptual problem with this manuscript: How can the full deletion of a gene (PARG) sensitize a cell to further inhibition by its chemical inhibitor (PARGi) since the target protein is fully absent?

      Please see below for details about this point. Briefly, we found that PARG is an essential gene (Fig. 7). There was residual PARG activity in our PARG KO cells, although the loss of full-length PARG was confirmed by Western blotting and DNA sequencing (Fig. S9). The residual PARG activity in these cells can be further inhibited by PARG inhibitor, which eventually lead to cell death.

      The authors state in the discussion section: "The residual PARG dePARylation activity observed in PARG KO cells likely supports cell growth, which can be further inhibited by PARGi". What does this statement mean? Is the authors' conclusion that their PARG KOs are not true KOs but partial hypomorphic knockdowns? Were the authors working with KO clones or CRISPR deletion in populations of cells?

      The reviewer is correct that our PARG KOs are not true KOs. We were working with CRISPR edited KO clones. As shown in this manuscript, we validated our KO clones by Western blotting, DNA sequencing and MMS-induced PARylation. Despite these efforts and our inability to detect full-length PARG in our KO clones, we suspect that our PARG KO cells may still express one or more active fragments of PARG due to alternative splicing and/or alternative ATG usage.

      As shown in Fig. 7, we believe that PARG is essential for proliferation. Our initial KO cell lines are not complete PARG KO cells and residual PARG activity in these cells could support cell proliferation. Unfortunately, due to lack of appropriate reagents we could not draw solid conclusions regarding the isoforms or the truncated PARG expressed in these cells (Please see Western blots below).

      Are there splice variants of PARG that were not knocked down? Are there PARP paralogues that can complement the biochemical activity of PARG in the PARG KOs? The authors do not discuss these critical issues nor engage with this problem.

      There are five reviewed or potential PARG isoforms identified in the Uniprot database. The two sgRNAs (#1 and #2) used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), and sgRNA#2 used in HeLa cells also targets isoforms 4 and 5, but these isoforms are considered catalytically inactive according to the Uniprot database. However, it is likely that sgRNA-mediated genome editing may lead to the creation of new alternatively spliced PARG mRNAs or the use of alternative ATG, which can produce catalytically active forms of PARG. Instead of searching for these putative spliced PARG RNAs, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown below. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. These data clearly indicate that residual PARG activity are present and detected in our KO cells, but the precise nature of these truncated forms of PARG remains elusive.

      Author response image 1.

      These issues have to be dealt with upfront in the manuscript for the reader to make sense of their work.

      We thank this reviewer for his/her constructive comments and suggestions. We will include the data above and additional discussion upfront in our revised manuscript to avoid any further confusion by our readers.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Nie et al investigate the effect of PARG KO and PARG inhibition (PARGi) on pADPR, DNA damage, cell viability, and synthetic lethal interactions in HEK293A and Hela cells. Surprisingly, the authors report that PARG KO cells are sensitive to PARGi and show higher pADPR levels than PARG KO cells, which are abrogated upon deletion or inhibition of PARP1/PARP2. The authors explain the sensitivity of PARG KO to PARGi through incomplete PARG depletion and demonstrate complete loss of PARG activity when incomplete PARG KO cells are transfected with additional gRNAs in the presence of PARPi. Furthermore, the authors show that the sensitivity of PARG KO cells to PARGi is not caused by NAD depletion but by S-phase accumulation of pADPR on chromatin coming from unligated Okazaki fragments, which are recognized and bound by PARP1. Consistently, PARG KO or PARG inhibition shows synthetic lethality with Pol beta, which is required for Okazaki fragment maturation. PARG expression levels in ovarian cancer cell lines correlate negatively with their sensitivity to PARGi.

      Thank you for your nice comments. The complete loss of PARG activity was observed in PARG complete/conditional KO (cKO) cells. These cKO clones were generated using wild-type cells transfected with sgRNAs targeting the catalytic domain of PARG in the presence of PARP inhibitor.

      Strengths:

      The authors show that PARG is essential for removing ADP-ribosylation in S-phase.

      Thanks!

      Weaknesses:

      1. This begs the question as to the relevant substrates of PARG in S-phase, which could be addressed, for example, by analysing PARylated proteins associated with replication forks in PARG-depleted cells (EdU pulldown and Af1521 enrichment followed by mass spectrometry).

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      1. The results showing the generation of a full PARG KO should be moved to the beginning of the Results section, right after the first Results chapter (PARG depletion leads to drastic sensitivity to PARGi), otherwise, the reader is left to wonder how PARG KO cells can be sensitive to PARGi when there should be presumably no PARG present.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      1. Please indicate in the first figure which isoforms were targeted with gRNAs, given that there are 5 PARG isoforms. You should also highlight that the PARG antibody only recognizes the largest isoform, which is clearly absent in your PARG KO, but other isoforms may still be produced, depending on where the cleavage sites were located.

      The two sgRNAs (#1 and #2) used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), and sgRNA#2 used in HeLa cells also targets isoforms 4 and 5, but these isoforms are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends.

      The manufacturer instruction states that the Anti-PARG antibody (66564S) can only recognize isoform 1, this antibody could recognize isoforms 2 and 3 albeit weakly based on Western blot results with lysates prepared from PARG cKO cells reconstituted with different PARG isoforms, as shown below. As suggested, we will add a statement in the revised manuscript and provide the Western blotting data below.

      Author response image 2.

      To test whether other isoforms were expressed in 293A and/or HeLa cells, we used two independent antibodies that recognize the C-terminus of PARG for WB as shown below. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoforms or truncated forms were expressed in our PARG KO cells.

      Author response image 3.

      1. FACS data need to be quantified. Scatter plots can be moved to Supplementary while quantification histograms with statistical analysis should be placed in the main figures.

      We agree with this reviewer that quantification of FACS data may provide straightforward results in some of our data. However, it is challenging to quantify positive S phase pADPr signaling in some panels, for example in Fig. 3A and Fig. 4C. In both panels, pADPr signaling was detected throughout the cell cycle and therefore it is difficult to know the percentage of S phase pADPr signaling in these samples. Thus, we decide to keep the scatter plots to demonstrate the dramatic and S phase-specific pADPr signaling in PARG KO cells treated with PARGi. We hope that these data are clear and convincing even without any quantification.

      1. All colony formation assays should be quantified and sensitivity plots should be shown next to example plates.

      As suggested, we will include the sensitivity plot next to Fig. 3D. However, other colony formation assays in this study were performed with a single concentration of inhibitor and therefore we will not provide sensitivity plots for these experiments. Nevertheless, the results of these experiments are straightforward and easy to interpret.

      1. Please indicate how many times each experiment was performed independently and include statistical analysis.

      As suggested, we will add this information in the revised manuscript.

      Reviewer #3 (Public Review):

      Here the authors carried out a CRISPR/sgRNA screen with a DDR gene-targeted mini-library in HEK293A cells looking for genes whose loss increased sensitivity to treatment with the PARG inhibitor, PDD00017273 (PARGi). Surprisingly they found that PARG itself, which encodes the cellular poly(ADP-ribose) glycohydrolase (dePARylation) enzyme, was a major hit. Targeted PARG KO in 293A and HeLa cells also caused high sensitivity to PARGi. When PARG KO cells were reconstituted with catalytically-dead PARG, MMS treatment caused an increase in PARylation, not observed when cells were reconstituted with WT PARG or when the PARG KO was combined with PARP1/2 DKO, suggesting that loss of PARG leads to a strong PARP1/2-dependent increase in protein PARylation. The decrease in intracellular NADH+, the substrate for PARP-driven PARylation, observed in PARG KO cells was reversed by treatment with NMN or NAM, and this treatment partially rescued the PARG KO cell lethality. However, since NAD+ depletion with the FK868 nicotinamide phosphoribosyltransferase (NAMPT) inhibitor did not induce a similar lethality the authors concluded that NAD+ depletion/reduction was only partially responsible for the PARGi toxicity. Interestingly, PARylation was also observed in untreated PARG KO cells, specifically in S phase, without a significant rise in γH2AX signals. Using cells synchronized at G1/S by double thymidine blockade and release, they showed that entry into S phase was necessary for PARGi to induce PARylation in PARG KO cells. They found an increased association of PARP1 with a chromatin fraction in PARG KO cells independent of PARGi treatment, and suggested that PARP1 trapping on chromatin might account in part for the increased PARGi sensitivity. They also showed that prolonged PARGi treatment of PARG KO cells caused S phase accumulation of pADPr eventually leading to DNA damage, as evidenced by increased anti-γH2AX antibody signals and alkaline comet assays. Based on the use of emetine, they deduced that this response could be caused by unligated Okazaki fragments. Next, they carried out FACS-based CRISPR screens to identify genes that might be involved in cell lethality in WT and PARG KO cells, finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity, whereas loss of PARP1 had the opposite effects. They also found that BER pathway disruption exhibited synthetic lethality with PARGi treatment in both PARG KO cells and WT cells, and that loss of genes involved in Okazaki fragment ligation induced S phase pADPr signaling. In a panel of human ovarian cancer cell lines, PARGi sensitivity was found to correlate with low levels of PARG mRNA, and they showed that the PARGi sensitivity of cells could be reduced by PARPi treatment. Finally, they addressed the conundrum of why PARG KO cells should be sensitive to a specific PARG inhibitor if there is no PARG to inhibit and found that the PARG KO cells had significant residual PARG activity when measured in a lysate activity assay, which could be inhibited by PARGi, although the inhabited PARG activity levels remained higher than those of PARG cKO cells (see below). This led them to generate new, more complete PARG KO cells they called complete/conditional KO (cKO), whose survival required the inclusion of the olaparib PARPi in the growth medium. These PARG cKO cells exhibited extremely low levels of PARG activity in vitro, consistent with a true PARG KO phenotype.

      We thank this reviewer for his/her constructive comments and suggestions.

      The finding that human ovarian cancer cells with low levels of PARG are more sensitive to inhibition with a small molecule PARG inhibitor, presumably due to the accumulation of high levels of protein PARylation (pADPr) that are toxic to cells is quite interesting, and this could be useful in the future as a diagnostic marker for preselection of ovarian cancer patients for treatment with a PARG inhibitor drug. The finding that loss of base excision repair (BER) and DNA repair genes led to increased PARylation and PARGi sensitivity is in keeping with the conclusion that PARG activity is essential for cell fitness, because it prevents excessive protein PARylation. The observation that increased PARylation can be detected in an unperturbed S phase in PARG KO cells is also of interest. However, the functional importance of protein PARylation at the replication fork in the normal cell cycle was not fully investigated, and none of the key PARylation targets for PARG required for S phase progression were identified. Overall, there are some interesting findings in the paper, but their impact is significantly lessened by the confusing way in which the paper has been organized and written, and this needs to be rectified.

      We believe that PARP1 is one of the major relevant PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression, as PARG is known to be recruited to DNA damage sites through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Precisely how PARG regulates S phase progression warrants further investigation.

      As suggested, we will revise our manuscript accordingly and provide additional explanation/statement upfront to avoid any misunderstandings.  

      Reviewer #1 (Recommendations For The Authors):

      1. Figure 1c. Why does the viability of PARG KO cells improve at higher doses of PARGi? How do the authors explain this paradox?

      This phenomenon was observed in 293A PARG KO cells and happened in CellTiter-Glo assay, especially with the top three PARGi concentrations (100 µM, 33.33 µM and 11.11 µM). This may due to the low solubility of this PARGi in the medium, since we sometimes observed precipitation at high concentrations when PARGi stock was diluted in medium.

      1. Figure 2d. The authors show that PARGi reduced NAD+ level by 20%. This reduction in NAD+ probably does not explain the cell death phenotype observed by parthanatos cell death. What pathway is activated by PARGi to induce cell death?

      Since PARG KO cells treated with PARGi led to uncontrolled pADPr accumulation, it is possible that some of these cells may die due to parthanotos. However, we did not observe a dramatic reduction in NAD+ level. A previous study showed that Parg(-/-) mouse ES cells predominantly underwent caspase-dependent apoptosis (Shirai et al., 2013). Indeed, PARP1 cleavage was detected in PARG KO cells with prolonged PARGi treatment, indicating that at least some of these cells die due to apoptosis (Fig. 2A). Cytotoxicity of PARGi in PARG KO cells may due to several mechanisms including apoptosis, parthanatos and NAD+ reduction.

      1. The authors refer to FK866 in the text without explaining what this agent is. FK866 is a noncompetitive inhibitor of nicotinamide phosphoribosyltransferase (NAPRT), a key enzyme in the regulation of NAD+ biosynthesis from the natural precursor nicotinamide. The authors should explain experimental tools in the text as they use them for clarity to the reader.

      Thanks for the suggestion! We will include additional citations and discuss how FK866 works in our revised manuscript.

      1. In addition to these issues, there are significant formatting and textual problems, such that there are multiple gaps in the body of the text that make coherent reading of the manuscript impossible. Examples are: Page 3 line 10. Page 6 line 5 and line 15, Page 7 line 2, 3, and line 8. Page 8, line 1, and line 3 from bottom. Page 9 line 1, line 7 from bottom and line 9 from the bottom, Page 18 of the results in several places, etc. etc. etc. These formatting errors convey the impression that the submitting authors did not adequately review the manuscript for technical problems prior to submission. The authors need to correct these errors.

      Sorry, we will edit the text and remove these gaps as suggested.

      Reviewer #3 (Recommendations For The Authors):

      1. The major problem with this paper is conceptual - namely, how could PARG knockout cells be hypersensitive to a selective PARG small molecular inhibitor. The evidence in Figure 7 that there is measurable residual PARG activity in the so-called PARG KO 293A and HeLa cells provides a partial explanation for why PARG inhibitor treatment might be deleterious to the PARG KO cells, i.e., because PARGi blocks this residual PARG activity. However, although the authors characterized the PARG alleles in the 293A PARG KO cells by sequencing, the molecular origin of the significant level of residual PARG activity remains unclear (see points 7-9).

      Yes, in our study we showed that PARGi treatment inhibited the residual PARG activity in PARG KO cells, which mimics complete loss of PARG as PARG is an essential gene. These data agree with a previous study using Parg(-/-) mouse cells (Koh et al., 2004).We attempted to define the molecular origin of the residual PARG activity, unfortunately this was challenging (please see below for additional discussions). Nevertheless, we showed that residual PARG activity could be detected in PARG KO cells and more importantly cells with reduced PARG expression or activity are sensitive to PARGi. These results indicate that PARG expression and/or activity may be used as a biomarker for PARGi-based therapy.

      1. Although the most obvious explanation for the PARGi sensitivity data presented in Figures 1-4 is that the PARG KO cells have residual PARG activity, the authors wait until the discussion on page 26 to raise the possibility that the PARG KO cells might have residual PARG activity that renders them sensitive to PARGi. It would be more logical to move the PARG activity data in Figure 7 earlier in the paper as a supplementary figure, so that the reader is not left wondering how a PARG KO cell remains sensitive to a PARG inhibitor. For this reason, it is recommended that the whole paper be reorganized and rewritten to provide a more logical flow that allows the reader to understand what was done, and why it is hard to generate complete PARG KO cells because the accumulation of pADPR adducts is toxic to the cell.

      Thank you for your suggestion! However, we would like to keep the complete PARG KO result at the end of the Results section, since this was how this project evolved. Initially, we did not know that PARG is an essential gene. Thus, we speculated that PARGi may target not only PARG but also a second target, which only becomes essential in the absence of PARG. To test this possibility, we performed FACS-based and cell survival-based whole-genome CRISPR screens (Fig. 5). However, this putative second target was not revealed by our CRISPR screening data (Fig. 5). We then tested the possibility that these cells may have residual PARG expression or activity and only cells with very low PARG expression are sensitive to PARGi, which turned out to be the case for ovarian cancer cells. Equipped with PARP inhibitor and sgRNAs targeting the catalytic domain of PARG, we finally generated cells with complete loss of PARG activity to prove that PARG is an essential gene (Fig. 7). This series of experiments underscore the challenge of validating any KO cell lines, i.e. the identification of frame-shift mutations, absence of full-length proteins, and phenotypic changes may still not be sufficient to validate KO clones. This is an important lesson we learned and we would like to share it with the scientific community.

      To avoid further misunderstanding, we will include additional statements/comments at the end of “PARG depletion leads to drastic sensitivity to PARGi” section and at the beginning of “CRISPR screens reveal genes responsible for regulating pADPr signaling and/or cell lethality in WT and PARG KO cells”. Hope that our revised manuscript will make it clear.

      1. Exactly how PARG activity would be coordinated with PARP1/2 activity during normal S phase to ensure that PARylation can serve its required function, whatever that may be, and is then removed by PARG is unclear - how would this be orchestrated at the level of a replication fork?

      PARG is known to be recruited to sites of DNA damage through pADPr- and PCNA-dependent mechanisms (Mortusewicz et al., 2011). Our current hypothesis is that PARP1 is one of the major PARG substrates in S phase cells. Previous studies reported that PARP1 recognizes unligated Okazaki fragments and induces S phase PARylation, which recruits single-strand break repair proteins such as XRCC1 and LIG3 that acts as a backup pathway for Okazaki fragment maturation (Hanzlikova et al., 2018; Kumamoto et al., 2021). In this study, we revealed that accumulation of PARP1/2-dependent S phase PARylation eventually led to cell death (Fig. 2). Furthermore, we found that chromatin-bound PARP1 as well as PARylated PARP1 increased in PARG KO cells (Fig. S4A and Fig. 4A), suggesting that PARP1 is one of the key substrates of PARG in S phase cells. Of course, PARG may have additional substrates besides PARP1 which are required for its roles in S phase progression. Precisely how PARG regulates S phase progression warrants further investigation.

      1. Figure 2B: What gRNAs were used to generate the 293A and HeLa PARG knock clones, i.e., where are they located in the PARG gene? If they are not in the catalytic domain it might be possible to generate PARG proteins with N-terminal deletions that are still active (see points 8-10 below).

      The two sgRNAs (#1 and #2) used to generate initial PARG KO cells in this manuscript target all three catalytically active isoforms (isoforms 1, 2 and 3), and sgRNA#2 used in HeLa cells also targets isoforms 4 and 5, but these isoforms are considered catalytically inactive according to the Uniprot database. As suggested, we will modify Fig. S1D and the figure legends to show the localization of gRNAs.

      We agree with this reviewer that truncated but active forms of PARG exist in these KO cells. We attempted to identify these trunated forms of PARG by using two independent antibodies that recognize the C-terminus of PARG for WB as shown below. Unfortunately, besides full-length PARG, these antibodies also recognized several other bands, some of them were reduced or absent in PARG KO cells, others were not. Thus, we could not draw a clear conclusion which functional isoform/truncated form was expressed in our PARG KO cells. Nevertheless, we directly measured PARG activity in PARG KO cells (Fig. S9) and showed that we were still able to detect residual PARG activity in these PARG KO cells. Based on these results, we stated that the residual PARG activity was detected in our KO cells, but we were not able to specify the truncated variants of PARG in these cells.

      Author response image 4.

      1. Figure 3B/page 19: The authors state that "emetine, which diminishes Okazaki fragments, greatly inhibited S phase pADPr signaling in PARG KO cells", and from this deduced that Okazaki fragments on the lagging strand activate PARylation. However, emetine is not a specific lagging strand synthesis inhibitor, as implied here, but rather a protein synthesis inhibitor, which inhibits Okazaki fragment formation indirectly (see PMID: 36260751). The authors need to rewrite this section to explain how emetine works in this context.

      As suggested, we will cite this reference and discuss how emetine inhibits Okazaki fragment maturation in our revised manuscript. Additionally, we used three different POLA1 inhibitors to diminish Okazaki fragments. As shown in Fig. S3B, all three POLA1 inhibitors significantly abolished S-phase pADPr induced by PARGi in PARG KO cells. Furthermore, POLA1 inhibitors, adarotene and CD437, were able to rescue cell lethality caused by PARGi in PARG KO cells (Fig. 3E).

      1. Figure 7: It is not clear why these cells are called PARG complete/conditional KO cells (cKO). Generally, "conditional knockout" refers to a cell or animal in which a gene can be conditionally knocked out by inducible expression of Cre. Here, it appears that "conditional" refers to the fact that the PARG KO cells only grow in the presence of olaparib - is this the case?

      Yes, we used the name to separate these cells from our initial PARG KO cells. Moreover, we were only able to obtain and maintain these PARG cKO clones with complete loss of PARG activity in the presence of PARP inhibitor. Therefore, we called them PARG complete/conditional KO (cKO) cells.

      1. Figure 7B and D: The level of full-length PARG protein was much lower in the 293A and HeLa cKO cells compared to WT cells consistent with cKO cells representing a more complete PARG KO. The level of PARG protein in the 293A PARG cKO cells was apparently also lower than in the original PARG KO cells, but the KO and cKO samples should be run side by side to demonstrate this conclusively, and the bands need to be quantified. In panel B, it is not clear from the legend what cKO_3 and cKO_4 are, but presumably, they are different clones, and this should be stated.

      Full-length PARG was not detected in either PARG KO or PARG cKO cells by WB. The apparent lower level of endogenous PARG in Fig. 7D was due to the fact that reconstituted cells had high exogenous PARG expression and therefore we had to reduce exposure time for WB.

      As for cKO_3 and cKO_4 in Fig.7, they are different clones created by different sgRNAs. As suggested, we will include additional information in figure legends to clearly state which sgRNA was used to generate the respective KO and cKO clones.

      1. Figure S8: There is not enough information here or in the text to allow the reader to interpret these PARG allele sequences obtained from the PARG KO cells. From the Methods section, it appears that the PARG KO cells were clonal, with sequence data from one clone of each of the 293A and HeLa cell PARG KO cells being shown. If this is right, then in both cell types one out of four PARG alleles is wild type, and therefore one would expect the PARG protein signal to be ~25% of that in WT cells. However, based on the 293A PARG KO cells PARG immunoblot in Figure 2B the PARG protein signal is clearly much lower than 25% (these bands need to be quantified), and this discrepancy needs to be explained. What is the level of PARG protein in the PARG KO HeLa cells? If different PARG KO cell clones are analyzed by sequencing, do they all have an apparently intact PARG allele? Four different gRNA target sites in the PARG gene are shown in panel A in Figure 7, but the description in the text regarding how the four gRNAs were used is totally inadequate - were all four used simultaneously or only the two in the catalytic domain? Were pairs of gRNAs used in an attempt to generate a large intervening deletion - some Southern blots of the PARG gene region in the PARG cKO cells are needed to figure this out. The gRNAs are given numbers in Figure 7A, but it is unclear from the sequences shown in Figures S8 and S9 which gRNA sites are shown. All of this has to be clarified, so that the reader can understand the nature of the KO/cKO cells knockout alleles, and what PARG-related products, if any, they can express.

      Yes, all KO and cKO cells used in this study are single clones. As suggested, we will revise figure legends in Fig.7, S8 and S9 to include detailed information. To avoid any further misunderstanding, we will label the allele “WT” to “WT (reference)” in Fig. S8 and S9. We did not detect intact/wild-type PARG sequence in any single KO/cKO clone by DNA sequencing. Sequencing of single KO/cKO clones was performed by using TOP TA Cloning kit. Briefly, genomic DNA was extracted from each single KO/cKO clone. Approximately 300bp surrounding the sgRNA targeting sequence was amplified by PCR. The PCR product was cloned into the vector and approximately 10-15 bacteria clones were extracted and sent for sequencing. If any intact/wild-type PARG sequence was detected in these 10-15 bacteria clones, this KO/cKO clone was considered heterozygous clone and discarded.

      HEK293A and HeLa cells are not diploid cells and have complex karyotypes. PARG gene is located on chromosome 10. Karyotyping by M-FISH shows that HeLa cells have 3 copies of chromosome 10 (Landry et al., 2013). HEK293 cells predominantly have 3 copies of chromosome 10 and sometimes 4 copies can be detected by G-banding (Binz et al., 2019). Therefore, it is anticipated that 1 to 4 mutant alleles would be detected in each KO/cKO clone by sequencing.

      Only one sgRNA was transfected into cells for the selection of single clones. We did not use paired or multiple sgRNAs in any of these experiments. As shown in Fig. S1D and Fig. 7A, HEK293A derived and HeLa derived PARG KO single clones were generated with the use of different sgRNAs. In addition, the two PARG cKO single clones from HEK293A and HeLa cells were also generated by the use of two different sgRNAs, as shown in Fig. 7A-B. We will include all the information above in the revised manuscript, i.e. in Methods section as well as in figure legends.

      1. Figure S9A: The sequences of the 293A PARG alleles in the cKO cells suggest that these cells also have one intact PARG allele, which again does not fit with the very low level of intact PARG protein shown in Figure 7B. How do the authors explain this?

      Sorry, this is a misunderstanding. The allele “WT” in Fig. S8 and S9 is the reference sequence. We will change it to “Reference sequence” to avoid further confusion. As mentioned above, we did not detect any intact/wild-type PARG sequence in any of our single KO/cKO clones by sequencing.

      1. Figure S9B: These critical lysate activity data show that the PARG KO cells have ~50% of the PARG activity detected in WT cells. However, this is not consistent with the PARG protein level detected in PARG immunoblot in Figure 1B, which appears to be less than 5% of the PARG protein level in WT cells (with one intact PARG allele in these cells one would theoretically expect~ 25%, although this depends on whether all four alleles are expressed equally). One possibility is that active PARG fragments are generated from one or more of the PARG KO alleles in the PARG KO cells. Targeted sequencing of PARG mRNAs might reveal whether there are shorter RNAs that could encode a protein containing the C-terminal catalytic domain (aa 570-910). In addition, the authors need to show the entire immunoblot to determine if there are smaller proteins recognized by the anti-PARG antibodies that might represent shorter PARG gene products (for this we need to know where the epitope against which the PARG antibodies are directed are located within the PARG protein - ideally they authors need to use an antibody directed against an epitope near the C-terminus).

      As stated in the Methods section, we incubated cell lysates with substrates overnight to evaluate the maximum level of pADPr hydrolysis, i.e. PARG activity, we were able to detect in this assay. It is very likely that the PARG activity in PARG KO cells was much lower than 50%, due to saturation of signals for lysates isolated from wild-type cells. Thus, the data presented in our manuscript probably underestimate the reduction of PARG activity in PARG KO cells. Nevertheless, these data indicate that residual PARG activity was detected in PARG KO cells, however this activity was absent in PARG cKO cells.

      As aforementioned, we used two independent antibodies that recognize the C-terminus of PARG for WB. Unfortunately, we could not draw a clear conclusion which functional isoforms or truncated proteins were expressed in our PARG KO cells. The dePARylation assay used here may be the best way to test the residual PARG activity in our KO and cKO cells.

      1. Figure 7D: In this experiment, the level of re-expressed WT PARG protein was much higher than that of the endogenous PARG protein (quantification is needed) - how might this affect the interpretation of these experiments (N.B., WT and catalytically-dead PARG were also re-expressed for the experiments shown in Figure 1, but there are no PARG immunoblots to demonstrate how much the exogenous proteins were overexpressed, or activity measurements). If regulated pADPr signaling is important for a normal S phase, then one would have thought that expressing a very high level of active PARG would create problems.

      In Fig. S1E, we blotted endogenous PARG level in control cells and exogenous PARG level in reconstituted cells. The reviewer is correct that exogenous PARG expression was much higher (~10-fold) than that of endogenous PARG in WT control cells. Nevertheless, we did not observe any obvious phenotypes in PARG KO/cKO cells reconstituted with high level of exogeneous PARG, which may reflect excess PARG level/activity in wild-type control cells.

      References:

      Binz, R. L., Tian, E., Sadhukhan, R., Zhou, D., Hauer-Jensen, M., and Pathak, R. (2019). Identification of novel breakpoints for locus- and region-specific translocations in 293 cells by molecular cytogenetics before and after irradiation. Sci Rep 9, 10554.

      Hanzlikova, H., Kalasova, I., Demin, A. A., Pennicott, L. E., Cihlarova, Z., and Caldecott, K. W. (2018). The Importance of Poly(ADP-Ribose) Polymerase as a Sensor of Unligated Okazaki Fragments during DNA Replication. Mol Cell 71, 319-331 e313.

      Koh, D. W., Lawler, A. M., Poitras, M. F., Sasaki, M., Wattler, S., Nehls, M. C., Stoger, T., Poirier, G. G., Dawson, V. L., and Dawson, T. M. (2004). Failure to degrade poly(ADP-ribose) causes increased sensitivity to cytotoxicity and early embryonic lethality. Proc Natl Acad Sci U S A 101, 17699-17704.

      Kumamoto, S., Nishiyama, A., Chiba, Y., Miyashita, R., Konishi, C., Azuma, Y., and Nakanishi, M. (2021). HPF1-dependent PARP activation promotes LIG3-XRCC1-mediated backup pathway of Okazaki fragment ligation. Nucleic Acids Res 49, 5003-5016.

      Landry, J. J., Pyl, P. T., Rausch, T., Zichner, T., Tekkedil, M. M., Stutz, A. M., Jauch, A., Aiyar, R. S., Pau, G., Delhomme, N., et al. (2013). The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda) 3, 1213-1224.

      Mortusewicz, O., Fouquerel, E., Ame, J. C., Leonhardt, H., and Schreiber, V. (2011). PARG is recruited to DNA damage sites through poly(ADP-ribose)- and PCNA-dependent mechanisms. Nucleic Acids Res 39, 5045-5056.

      Shirai, H., Fujimori, H., Gunji, A., Maeda, D., Hirai, T., Poetsch, A. R., Harada, H., Yoshida, T., Sasai, K., Okayasu, R., and Masutani, M. (2013). Parg deficiency confers radio-sensitization through enhanced cell death in mouse ES cells exposed to various forms of ionizing radiation. Biochem Biophys Res Commun 435, 100-106.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors examined the putative functions of hypothalamic groups identifiable through Foxb1 expression, namely the parvofox Foxb1 of the LHA and the PMd Foxb1, with emphasis on innate defensive responses. First, they reported that chemogenetic activation of Foxb1hypothalamic cell groups led to tachypnea. The authors tend to attribute this effect to the activation of hM3Dq expressed in the parvofox Foxb1 but did not rule out the participation of the PMd Foxb1 cell group which may as well have expressed hM3Dq, particularly considering the large volume (200 nl) of the viral construct injected. It is also noteworthy that the activation of the Foxb1hypothalamic cell groups in this experiment did not alter the gross locomotor activity, such as time spent immobile state. Thus, contrasts with the authors finding on the optogenetic activation of the Foxb1hypothalamic fibers projecting to the dorsolateral PAG. In the second experiment, the authors applied optogenetic ChR2-mediated excitation of the Foxb1+ cell bodies' axonal endings in the dlPAG leading to freezing and, in a few cases, bradycardia as well. The effective site to evoke freezing was the rostral PAGdl, and fibers positioned either ventral or caudal to this target had no response. Considering the pattern of Foxb1hypothalamic cell groups projection to the PAG, the fibers projecting to the rostral PAGdl are likely to arise from the PMd Foxb1 cell group, and not from the parvofox Foxb1 of the LHA. Here it is important to consider that optogenetic ChR2-mediated excitation of the axonal endings is likely to have activated the cell bodies originating these fibers, and one cannot ascertain whether the behavioral effects are related to the activation of the terminals in the PAGdl or the cell bodies originating the projection.

      Authors’ reply: We acknowledge and agree about the possibility of backpropagation in ChR2mediated terminal stimulation experiments. We have introduced a paragraph in the discussion section discussing this issue. In short, the observation of an opposing phenotype in ArchT3.0 animals indicates, that the ChR2-mediated phenotype is indeed Foxb1PAG projection specific. This is due to the fact, that the use of light-activated proton pumps for terminal stimulation can not induce backpropagation of an inhibitory effect to the soma. Potential downsides of the use of proton pumps in small compartments as e.g. in the axon are also discussed.

      Moreover, activation of PMd CCK cell group, which consists of around 90% of the PMd cells, evokes escape, and not freezing. According to the present findings, a specific population of PMd Foxb1 cells may be involved in producing freezing. In addition, only a small number of the animals with correct fiber placement presented sudden onset of bradycardia in response to the photostimulation. Considering the authors' findings, the Foxb1+ hypothalamic groups are likely to mediate behavioral responses related to innate defensive responses, where the parvofox Foxb1 of the LHA would be involved in promoting tachypnea and the PMd Foxb1group in mediating freezing and bradycardia. These findings are very interesting, and, at this point, they need to be tested in a scenario of real exposure to a natural predator.

      Authors’ reply: We fully agree with the proposed experiments. Due to the previously mentioned retirement of Prof. Celio and the concomitant expiration of licenses for animal experimentation we are prevented from conducting these experiments on our own. We have integrated a statement in the discussion, regarding these potential future experiments.

      Reviewer #2 (Public Review):

      The authors aimed to examine the role of a group of neurons expressing Foxb1 in behaviors through projections to the dlPAG. Standard chemogenetic activation or inhibition and optogentic terminal activation or inhibition at local PAG were used and results suggested that, while activation led to reduced locomotion and breathing, inhibition led to a small degree of increased locomotion.

      The observed effects on breathing are evident and dramatic. However, this study needs significant improvements in terms of data analysis and presentation and some of studies seem incomplete; and therefore the data may not yet support the conclusion.

      1. Fig.1 has no experimental data and needs to be replaced with detailed pictures from the viral injected mice showing the projections diagrammed.

      Authors’ reply: We believe that this graphic illustration is helpful to the reader to comprehend the spatial relationship between the parvafoxFoxb1 nucleus, the mammillary nuclei, and the PAG. In a previous study we have characterized the projections of the parvafoxFoxb1 nucleus in detail (using the same Foxb1-Cre mouse line as in the present study) and, in this regard, would like to refer Reviewer #2 to this publication (https://onlinelibrary.wiley.com/doi/10.1002/cne.24057).

      1. Fig. 3 needs control pictures and statistical comparison with different conditions in c-Fos. Also expression in other nearby regions needs to be presented to demonstrate the specificity of the expression.

      Authors’ reply: We have modified the original Fig. 3 with more pictures across all three conditions used in the chemogenetic experiments. Since the new figure now takes up a whole page, and because the data in this figure is for validation purpose of the DREADD experiments, we have decided to rather put it into the supplementary files. The figure is now labelled as “Supplementary File S1”. All figure and file numberings throughout the text have been adjusted accordingly.

      1. Fig. 5, a great effort has been made to illustrate the point that CCK and Foxb1 are differentially expressed. Why not just perform a double in situ experiment to directly illustrate the point?

      Authors’ reply: We have addressed this comment in the initial release of the eLife manuscript. In short, we agree that a double ISH experiment would have been an alternative approach, but would like to state that scRNAseq is a well established and valid method for this purpose.

      1. Fig. 7 data on optogenetic stimulation on immobility and breathing, since not all mice showed the same phenotype, what is the criterion for allocating these mice to hit or no hit groups? Given the dramatically reduced breathing and locomotion, what is the temperature response? More data needs to be gathered to support that this is a defense behavior.

      Authors’ reply: The criteria for allocation of animals to the experimental groups is described in section “Optogenetic modulation of Foxb1 terminals in the dlPAG induces immobility” and is based on the stereotaxic coordinates of the tips of the glass fiber implants. We did not perform any experiments, in which we recorded body temperatures or temperature preferences in optogenetic animals. Such experiments were outside the scope of the study. As mentioned in a previous comment above, we have added an additional paragraph to the discussion section regarding future investigations of these hypothalamic Foxb1 neurons during exposure to natural predators. Such experiments would certainly allow more insight into the defensive nature of the described phenotype.

      1. The authors claim to target dlPAG. However, in the picture shown in Fig. 8C, almost all PAG contains ChR2 fibers and it is likely all the fibers will be activated by light. Thus, as presented, the data does not support the claim of the specificity on dlPAG. Also c-Fos data needs to be presented on the degree of activation of downstream PAG neurons after light exposure.

      Authors’ reply: We attach the original image 8c, without arrows and indications, in which the localization of ChR2-positive fibers in the dlPAG is better visible. They are located exactly under the tip of the fiberoptic fiber. We do not know the functional characteristics of the post-synaptic PAG neurons and have not determined experimentally their downstream targets. Investigating the downstream target was outside the scope of the current publication.

      Author response image 1.

      1. Fig. 9 only showed one case. A statistical comparison needs to be presented.

      Authors’ reply: Our cardiovascular experiments are of exploratory and descriptive nature (i.e. pilot experiments). It was a conscious decision to not perform hypothesis tests on these experiments. We did not have enough mice to perform statistical tests with sufficient statistical power. Providing results from hypothesis tests on these data would lead to statistically unjustified conclusions. To clarify this issue, we have added a paragraph to the relevant results section.

      1. Optogentic terminal activation in the PAG will likely elicit back-propagation and subsequent activation of additional downstream brain sites of Foxb1 neurons. More experiments need to be done to assess this and as presented, the data does not support the role of PAG necessarily.

      Authors’ reply: Please see our answer to Reviewer #1 regarding the same issue.

      1. The authors claim negative data from PVH-Cre mice. More data need to be presented to make this case.

      Authors’ reply: We would like to refer to our answer to point 6) that was raised by Reviewer #2

      The conclusion, even as presented, adds to the known evidence of the PAG in the defense behavior.

      Reviewer #1 (Recommendations For The Authors):

      In the pharmacogenetic experiments, the authors need to clarify which Foxb1hypothalamic presented the activation of hM3Dq. It is important to know whether this activation-producing tachypnea was restricted to the parvofoxFoxb1 or also included the PMd Foxb1 group. It would be important to isolate the effect of the pharmacogenetic activation of each one of these Foxb1 hypothalamic cell groups.

      After determining which cell group would be involved in mediating this respiratory effect, it would be nice to discuss the possible pathways involved in this effect.<br /> In the optogenetic experiments, the authors should differentiate between the effects of the PAG projecting fibers from the PMd and those from the parvofox groups. As it stands, it seems that the freezing and bradycardia depend on projection from the PMd Foxb1 group to the rostral PAGdl. However, considering the large volume (200 nl) of the viral construct injected, both groups were likely to express channelrhodopsin, and it would be important if the authors could restrict the viral injections to each one of the Foxb1 hypothalamic cell groups.

      Authors’ reply: We fully agree with the suggestion, but due to the recent retirement of Prof. Celio we unfortunately not allowed to conduct any further animal experiments.

      The authors also reported that photoactivation ventral to the PAGdl, possibly in the PAGl did not yield any clear behavioral response. However, as pointed out in the discussion, a recent publication found that the parvofox Foxb1 projection to the lateral PAG drives social avoidance, and we were wondering whether there was any avoidance behavior during the photoactivation of the PAGl fibers.

      Authors’ reply: We did not conduct any social avoidance experiments ourselves. However, we did perform ultrasonic vocalization experiments (unpublished data) in which we optogenetically stimulated Foxb1+ terminals in the PAG. Due to experimental issues related to the age of the tested mice, we did not obtain conclusive results regarding the ultrasonic vocalizations. By a purely observational account, we did not observe any active avoidance during optogenetic stimulation, but rather a cessation of interaction. We are unable to judge whether this was more pronounced in the PAGl targeted mice or not.

      Another important point is that optogenetic ChR2-mediated excitation of the axonal endings is likely to activate the cell bodies originating these fibers, and one cannot ascertain whether the behavioral effects depend on the activation of the terminals in the PAGdl or the activation of the cell bodies originating these terminals. Note, in the present case, PMd cell bodies may also project elsewhere, such as the cuneiform nucleus, known to mediate freezing responses. To circumvent this problem, during photoactivation of the PAGdl terminals, the authors should inhibit the cell bodies originating these terminals.

      Authors’ reply: We would like to refer to the answer we provided above regarding the issue of backpropagation or ChR2-mediated phenotypes and projection-specificity.

      Another important issue is related to the fact that around 90% of the PMd express CCK (Wang et al., 2021), and previous work showed that activation of these cells yielded escape and not freezing (Wang et al., 2021). Although the authors claim that the single-cell RNA sequencing dataset reveals distinct Foxb1 expression in the PMd, these results derive from tissues collected in the posterior hypothalamus, not exactly restricted to the PMd. Therefore, it would be desirable if the authors could show CCK and Foxb1doulbe labeled PMd sections to evaluate the exact percentage of cells expressing either one of these peptides.

      Authors’ reply: The tissues for the scRNAseq data were obtained from hypothalamic tissues between stereotaxic coordinates of AP-2.54 to AP-3.16 (please see Fig. 1b in Mickelsen et al. 2020) and not purely from the posterior hypothalamic nucleus. These tissues hence include a large proportion of the PMd neurons. We would like to point out that the expression profile of the PMd cluster matches well with the ISH data from the Allen Brain Atlas that we have put together in "Supplementary File S6” (originally “Supplementary File S5”)

      The authors should also explain why only a small number of animals that received PAGdl photoactivation presented bradycardia. Moreover, they should also discuss the possible pathways mediating this effect. Here, it is important to point out that the cuneiform nucleus, as suggested by the authors as one possible way to mediate this effect, promotes sympathetic vasomotor activity (Verbene, 1995).

      We have added the sentence: “The projections of the cuneiform nucleus to the rostral ventrolateral medulla promote sympathetic vasomotor activity (Verberne 1995).” to the Discussion section.

      Reviewer #2 (Recommendations For The Authors):

      In this reviewer's view, this study needs substantial improvement:

      1. The writing is very sloppy and difficult to follow. There is no clear logic flow in the main text and the figures need substantial realigning for panels, additions of labelling etc.

      We have added the sentence.

      1. Fig. 6 the hot plate data is out of place and should be placed in supplementary or removed completely.

      Authors’ reply: We and others have previously shown that the parvalbumin+ population of the Parvafox nucleus is involved in nociceptive behavior. Hence, we believe it is of interest to show, that we do not see the same phenotype with the stimulation of the Foxb+ population of the parvafox nucleus. This data shows that the nociceptive component of the parvafox nucleus is confined to its parvalbumin+ population.

      1. The authors discussed social behavior data in the Discussion, but no such data is presented, which is very confusing.

      Authors’ reply: Indeed we did not perform any experiments to investigate social behavior. However, we address that the observed locomotive phenotype of optogenetic Foxb1+-terminals could have lead to a bias in the interpretation of the social behavior experiments published elsewhere by others.

      1. The authors discussed a great deal on potential differences between parvafox and PMd Foxb1 neurons, however, no clear data was presented to show a functional difference between them, which is also confusing.

      Authors’ reply: Even though investigations on the functional differences of parvafox and PMd Foxb1 neurons would be highly interesting, it was outside the scope of the current study. Due to the recent retirement of Prof. Celio, we are not allowed to perform any additional animal experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In their paper, Kang et al. investigate rigidity sensing in amoeboid cells, showing that, despite their lack of proper focal adhesions, amoeboid migration of single cells is impacted by substrate rigidity. In fact, many different amoeboid cell types can durotax, meaning that they preferentially move towards the stiffer side of a rigidity gradient.

      The authors observed that NMIIA is required for durotaxis and, building on this observation, they generated a model to explain how durotaxis could be achieved in the absence of strong adhesions. According to the model, substrate stiffness alters the diffusion rate of NMAII, with softer substrates allowing for faster diffusion. This allows for NMAII accumulation at the back, which, in turn, results in durotaxis.

      The experiments support the main message of the paper regarding durotaxis by amoeboid cells. In my opinion, a few clarifications on the mechanism proposed to explain this phenomenon could strengthen this research:

      (1) According to your model, the rear end of the cell, which is in contact with softer substrates, will have slower diffusion rates of MNIIA. Does this mean that bigger cells will durotax better than smaller cells because the stiffness difference between front and rear is higher? Is it conceivable to attenuate the slope of the durotactic gradient to a degree where smaller cells lose their ability to durotact, while longer cells retain their capacity for directional movement?

      We thank the reviewer for this comment. In fact, it is not always the case that bigger cells will durotax better than smaller cells. Although bigger cells will sense higher stiffness difference between the front and rear, cells placed on different regions of underlying substrates may respond differently. This is because diffusion coefficient difference is not proportional to stiffness difference in our theoretical model. Therefore, when cells are placed on a very stiff substrate, cells may not durotax. When cells are placed on a region with suitable stiffness, where cells are sensitive to stiffness gradient, bigger cells will durotax better than smaller cells. In this situation, as you mentioned, lowering the stiffness gradient will make smaller cells become adurotactic while longer cells still durotax.

      We tried to further address this question by our durotaxis assay but there was a challenge: the amoeboid cells we use, including CD4+ Naïve T cells, neutrophils, dHL-60 cells and Dictysotelium, frequently protrude, retract and alter contact area with the substrate which make it difficult for us to distinguish between bigger and smaller cells in a particular cell type. Previously reported durotactic cell lines, such as MDA-MB-231 and HT1080 cells, are bigger than the amoeboid cells we use but they are mesenchymal cells and adopt distinct mechanisms which always involve stable focal adhesions. Due to this, although we are eager to answer this question by experiments and that the stiffness gradient is tunable in our system, we have not found an appropriate approach and experimental setup.

      (2) Where did you place the threshold for soft, middle, and stiff regions (Figure 6)? Is it possible that you only have a linear rigidity gradient in the center of your gel and the more you approach the borders, the flatter the gradient gets? In this case, cells would migrate randomly on uniform substrates. Did you perform AFM over the whole length of the gel or just in the central part?

      We thank the reviewer for this comment. We have performed AFM over the whole length of our gradient gel (Fig. S1A). We divide the gel into three equal parts (stiff: 1-4 mm; middle: 4-7 mm; soft: 7-10 mm) and the stiffness gradient is almost linear within each part as shown in Fig. S1A.

      (3) In which region (soft, middle, stiff) did you perform all the cell tracking of the previous figures?

      We thank the reviewer for this question. We performed the cell tracking in the soft region of the gradient gel.

      (4) What is the level of confinement experienced by the cells? Is it possible that cells on the soft side of the gels experience less confinement due to a "spring effect" whereby the coverslips descending onto the cells might exert diminished pressure because the soft hydrogels act as buffers, akin to springs? If this were the case, cells could migrate following a confinement gradient.

      We thank the reviewer for this comment. Although the possibility that our thin hydrogel layers act as buffers cannot be completely excluded, we have performed the durotaxis assay without upper gradient gel providing confinement (Author response image 1A). In this case, CD4+ Naïve T cells, neutrophils, dHL-60 cells and Dictysotelium can still durotax (Author response image 1B-E), indicating stiffness gradient itself is sufficient to direct amoeboid cell migration.

      Author response image 1.

      Illustration of the durotaxis system without confinement (A) and y-FMI of CD4+ Naïve T cells (B), neutrophils (C), dHL-60 cells (D) and Dictysotelium (E) cultured on uniform substrate or gradient substrate (n ≥ 30 tracks were analyzed for each experiment, N = 3 independent experiments for each condition, replicates are biological). All error bars are SEM. ****, P < 0.0001, by Student’s t-test.

      Reviewer #2 (Public Review):

      Summary:

      The authors developed an imaging-based device that provides both spatialconfinement and stiffness gradient to investigate if and how amoeboid cells, including T cells, neutrophils, and Dictyostelium, can durotax. Furthermore, the authors showed that the mechanism for the directional migration of T cells and neutrophils depends on non-muscle myosin IIA (NMIIA) polarized towards the soft-matrix-side. Finally, they developed a mathematical model of an active gel that captures the behavior of the cells described in vitro.

      Strengths:

      The topic is intriguing as durotaxis is essentially thought to be a direct consequence of mechanosensing at focal adhesions. To the best of my knowledge, this is the first report on amoeboid cells that do not depend on FAs to exert durotaxis. The authors developed an imaging-based durotaxis device that provides both spatial confinement and stiffness gradient and they also utilized several techniques such as quantitative fluorescent speckle microscopy and expansion microscopy. The results of this study have well-designed control experiments and are therefore convincing.

      Weaknesses:

      Overall this study is well performed but there are still some minor issues I recommend the authors address:

      (1) When using NMIIA/NMIIB knockdown cell lines to distinguish the role of NMIIA and NMIIB in amoeboid durotaxis, it would be better if the authors took compensatory effects into account.

      We thank the reviewer for this suggestion. We have investigated the compensation of myosin in NMIIA and NMIIB KD HL-60 cells using Western blot and added this result in our updated manuscript (Fig. S4B, C). The results showed that the level of NMIIB protein in NMIIA KD cells doubled while there was no compensatory upregulation of NMIIA in NMIIB KD cells. This is consistent with our conclusion that NMIIA rather than NMIIB is responsible for amoeboid durotaxis since in NMIIA KD cells, compensatory upregulation of NMIIB did not rescue the durotaxis-deficient phenotype.

      (2) The expansion microscopy assay is not clearly described and some details are missed such as how the assay is performed on cells under confinement.

      We thank the reviewer for this comment. We have updated details of the expansion microscopy assay in our revised manuscript in line 481-485 including how the assay is performed on cells under confinement:

      Briefly, CD4+ Naïve T cells were seeded on a gradient PA gel with another upper gel providing confinement. 4% PFA was used to fix cells for 15 min at room temperature. After fixation, the upper gradient PA gel is carefully removed and the bottom gradient PA gel with seeded cells were immersed in an anchoring solution containing 1% acrylamide and 0.7% formaldehyde (Sigma, F8775) for 5 h at 37 °C.

      (3) In this study, an active gel model was employed to capture experimental observations. Previously, some active nematic models were also considered to describe cell migration, which is controlled by filament contraction. I suggest the authors provide a short discussion on the comparison between the present theory and those prior models.

      We thank the reviewer for this suggestion. Active nematic models have been employed to recapitulate many phenomena during cell migration (Nat Commun., 2018, doi: 10.1038/s41467-018-05666-8.). The active nematic model describes the motion of cells using the orientation field, Q, and the velocity field, u. The director field n with (n = −n) is employed to represent the nematic state, which has head-tail symmetry. However, in our experiments, actin filaments are obviously polarized, which polymerize and flow towards the direction of cell migration. Therefore, we choose active gel model which describes polarized actin field during cell migration. In the discussion part, we have provided the comparison between active gel model and motor-clutch model. We have also supplemented a short discussion between the present model and active nematic model in the main text of line 345-347:

      The active nematic model employs active extensile or contractile agents to push or pull the fluid along their elongation axis to simulate cells flowing (61).

      (4) In the present model, actin flow contributes to cell migration while myosin distribution determines cell polarity. How does this model couple actin and myosin together?

      We thank the reviewer for this question. In our model, the polarization field P(r,t) is employed to couple actin and myosin together. It is obvious that actin accumulate at the front while myosin diffuses in the opposite direction. Therefore, we propose that actin and myosin flow towards the opposite direction, which is captured in the convection term of actin (∇[c(v+wP)])  and myosin (∇[m(-wP)]) density field.

      Reviewing Editor (Recommendations For The Authors):

      We suggest that you cite the publication about confinement force microscopy from the Betz lab (https://doi.org/10.1101/2023.08.22.554088).

      We thank the editor for this suggestion. We have cited this publication in line 89 in our updated manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Minor points and text corrections:

      - In line 288 you state that NMIIA basal diffusion rate is larger on softer substrates, while in line 315 you say that NMIIA is more diffusive on stiff. The two sentences seem to contradict each other.

      We thank the reviewer for pointing out this mistake. In our active gel model, the basal diffusion rate of NMIIA is larger on stiffer substrate. We have corrected this mistake in line 288 (line 283 in the updated manuscript) in our revised manuscript.

      - How were the non-muscle myosin images (Figure 3F) collected?

      We thank the reviewer for this question. The non-muscle myosin images in Fig. 3F are single planes collected by epifluorescence-confocal microscopy. We have updated the related method in our revised manuscript in line 477-478:

      After mounting medium is solidified, single plane images were captured using a 63×1.4 NA objective lens on Andor Dragonfly epi-fluorescence confocal imaging system.

      - Is there a quantification of NMAII accumulation at the back?

      We thank the reviewer for this question. We have a quantification of NMIIA distribution in Fig. 3G. We measured the fluorescence intensity of NMIIA and NMIIB in the soft and stiff region of cells and found that the soft/stiff fluorescence ratio of NMIIB is about 0.95 and the ratio of NMIIA is about 1.82, indicating NMIIA tend to be localized at back while NMIIB is evenly distributed in the soft and stiff region of cells.

      - At which frequency were images acquired for Fluorescent Speckle Microscopy? Overall, I think it would help to state the length and frequency of videos in the legends.

      We thank the reviewer for this comment. We have updated the length (10 min for movie 6-10 and 80 sec for movie11) and frequency (15 sec intervals for movie 6-10 and 2 sec intervals for movie11) of Fluorescent Speckle Microscopy videos in our revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The cell contour of Figure S5C is not very clear.

      We thank the reviewer for this comment. We have marked the outline of the cell in Fig. S5C in our updated manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Kroll et al. conduct an in-depth behavioral analysis of F0 knockouts of 4 genes associated with late-onset Alzheimer's Disease (AD), together with 3 genes associated with early-onset AD. Kroll and colleagues developed a web application (ZOLTAR) to compare sleep-associated traits between genetic mutants with those obtained from a panel of small molecules to promote the identification of affected pathways and potential therapeutic interventions. The authors make a set of potentially important findings vis-à-vis the relationship between AD-associated genes and sleep. First, they find that loss-of-function in late-onset AD genes universally results in night-time sleep loss, consistent with the well supported hypothesis that sleep disruption contributes to Alzheimer's-related pathologies. psen-1, an early-onset associated AD gene, which the authors find is principally responsible for the generation of AB40 and AB42 in zebrafish, also shows a slight increase in activity at night and slight decreases in night-time sleep. Conversely, psen-2 mutations increase daytime sleep, while appa/appb mutations have no impact on sleep. Finally, using ZOLTAR, the authors identify serotonin receptor activity as potentially disrupted in sorl1 mutants, while betamethasone is identified as a potential therapeutic to promote reversal of psen2 knockout-associated phenotypes.

      This is a highly innovative and thorough study, yet a handful of key questions remain. First, are night-time sleep loss phenotypes observed in all knockouts for late-onset AD genes in the larval zebrafish a valid proxy for AD risk?

      We cannot say, but it is an interesting question. We selected the four late-onset Alzheimer’s risk genes (APOE, CD2AP, CLU, SORL1) based on human genetics data and brain expression in zebrafish larvae, not based on their likelihood to modify sleep behaviour, which we could have tried by searching for overlaps with GWAS of sleep phenotypes, for example. Consequently, we find it remarkable that all four of these genes caused a night-time sleep phenotype when mutated. We also find it reassuring that knockout of appa/appb and psen2 did not cause a night-time sleep phenotype, which largely excludes the possibility that the phenotype is a technical artefact (e.g. caused by the F0 knockout method) or a property of every gene expressed in the larval brain.

      Having said that, it could still be a coincidence, rather than a special property of genes associated with late-onset AD. In addition to testing additional late-onset Alzheimer’s risk genes, the ideal way to answer this question would be to test in parallel a random set of genes expressed in the brain at this stage of development. From this random set, one could estimate the proportion of genes that cause a night-time sleep phenotype when mutated. One could then use that information to test whether late-onset Alzheimer’s risk genes are indeed enriched for genes that cause a night-time sleep phenotype when mutated.

      For those mutants that cause night-time sleep disturbances, do these phenotypes share a common underlying pathway? e.g. Do 5-HT reuptake inhibitors promote sleep across all 4 late-onset genes in addition to psen1? Can 5-HT reuptake inhibitors reverse other AD-related pathologies in zebrafish? Can compounds be identified that have a common behavioral fingerprint across all or multiple AD risk genes? Do these modify sleep phenotypes?

      To attempt to answer these questions, we used ZOLTAR to generate predictions for all the knockout behavioural fingerprints presented in the study, in the same way as for sorl1 in Fig. 5 and Fig. 5–supplement 1. Here are the indications, targets, and KEGG pathways which are shared by the largest number of knockouts (Author response image 1):

      – One indication is shared by 4/7 knockouts: “opioid dependence” (significant for appa/appb, psen1, apoea/apoeb, cd2ap).

      – Four targets are shared by 4/7 knockouts: “strychnine-binding glycine receptor” (psen1, apoea/apoeb, clu, sorl1); “neuronal acetylcholine receptor beta-2” (psen1, apoea/apoeb, cd2ap, clu); thyroid peroxidase (psen1, apoea/apoeb, cd2ap, clu); carbonic anhydrase IV (appa/appb, psen1, psen2, cd2ap).

      – Three KEGG pathways are shared by 5/7 knockouts: “cholinergic synapse” (psen1, apoea/apoeb, cd2ap, clu, sorl1); tyrosine metabolism (psen2, apoea/apoeb, cd2ap, clu, sorl1); and “nitrogen metabolism” (appa/appb, psen1, psen2, apoea/apoeb, cd2ap).

      As reminder, we hypothesised that loss of Sorl1 affected serotonin signalling based on the following annotations being significant: indication “depression”, target “serotonin transporter”, and KEGG pathway “serotonergic synapse”. Indication “depression” is only significant for sorl1 knockouts; target “serotonin transporter” is also significant for appa/appb and psen2 knockouts; and KEGG pathway “serotonergic synapse” is also significant for psen2 knockouts. ZOLTAR therefore does not predict serotonin signalling to be a major theme common to all mutants with a night-time sleep loss phenotype.

      Particularly interesting is cholinergic signalling appearing in the most common targets and KEGG pathways. Acetylcholine signalling is a major theme in research on AD. For example, the first four drugs ever approved by the FDA to treat AD were acetylcholinesterase inhibitors, which increase acetylcholine signalling by preventing its breakdown by acetylcholinesterase. These drugs are generally considered only to treat symptoms and not modify disease course, but this view has been called into question (Munoz-Torrero, 2008; Relkin, 2007). If, as ZOLTAR suggests, mutations in several Alzheimer’s risk genes affect cholinergic signalling early in development, this would point to a potential causal role of cholinergic disruption in AD.

      Author response image 1.

      Common predictions from ZOLTAR for the seven Alzheimer’s risk genes tested. Predictions from ZOLTAR which are shared by multiple knockout behavioural fingerprints presented in the study. Only indications, targets, and KEGG pathways which are significant for at least three of the seven knockouts tested are shown, ranked from the annotations which are significant for the largest number of knockouts.

      Finally, the web- based platform presented could be expanded to facilitate comparison of other behavioral phenotypes, including stimulus-evoked behaviors.

      Yes, absolutely. The behavioural dataset we used (Rihel et al., 2010) did not measure other stimuli than day/night light transitions, but the “SauronX” platform and dataset (MyersTurnbull et al., 2022) seems particularly well suited for this. To provide some context, we and collaborators have occasionally used the dataset by Rihel et al. (2010) to generate hypotheses or find candidate drugs that reverse a behavioural phenotype measured in the sleep/wake assay (Ashlin et al., 2018; Hoffman et al., 2016). The present work was the occasion to enable a wider and more intuitive use of this dataset through the ZOLTAR app, which has already proven successful. Future versions of ZOLTAR may seek to incorporate larger drug datasets using more types of measurements.

      Finally, the authors propose but do not test the hypothesis that sorl1 might regulate localization/surface expression of 5-HT2 receptors. This could provide exciting / more convincing mechanistic support for the assertion that serotonin signaling is disrupted upon loss of AD-associated genes.

      While working on the Author Response, we made some changes to the analysis ran by ZOLTAR to calculate enrichments (see Methods and github.com/francoiskroll/ZOLTAR, notes on v2). With the new version, 5-HT receptor type 2 is not a significantly enriched target for the sorl1 knockout fingerprint but type 4 is. 5-HT receptor type 4 was also shown to interact with sorting nexin 27, a subunit of retromer, so is a promising candidate (Joubert et al., 2004). Antibodies against human 5-HT receptor type 2 and 4a exist; whether they would work in zebrafish remains to be tested. In our experience, the availability of antibodies suitable for immunohistochemistry in the zebrafish is a serious experimental roadblock.

      Note, all the results presented in the “Version of Records” are from ZOLTAR v2.

      Despite these important considerations, this study provides a valuable platform for highthroughput analysis of sleep phenotypes and correlation with small-molecule-induced sleep phenotypes.

      Strengths:

      - Provides a useful platform for comparison of sleep phenotypes across genotypes/drug manipulations.

      - Presents convincing evidence that night-time sleep is disrupted in mutants for multiple late onset AD-related genes.

      - Provides potential mechanistic insights for how AD-related genes might impact sleep and identifies a few drugs that modify their identified phenotypes

      Weaknesses:

      - Exploration of potential mechanisms for serotonin disruption in sorl1 mutants is limited.

      - The pipeline developed can only be used to examine sleep-related / spontaneous movement phenotypes and stimulus-evoked behaviors are not examined.

      - Comparisons between mutants/exploration of commonly affected pathways are limited.

      Thank you for these excellent suggestions, please see our answers above.

      Reviewer #2 (Public Review):

      Summary:

      This work delineates the larval zebrafish behavioral phenotypes caused by the F0 knockout of several important genes that increase the risk for Alzheimer's disease. Using behavioral pharmacology, comparing the behavioral fingerprint of previously assayed molecules to the newly generated knockout data, compounds were discovered that impacted larval movement in ways that suggest interaction with or recovery of disrupted mechanisms.

      Strengths:

      This is a well-written manuscript that uses newly developed analysis methods to present the findings in a clear, high-quality way. The addition of an extensive behavioral analysis pipeline is of value to the field of zebrafish neuroscience and will be particularly helpful for researchers who prefer the R programming language. Even the behavioral profiling of these AD risk genes, regardless of the pharmacology aspect, is an important contribution. The recovery of most behavioral parameters in the psen2 knockout with betamethasone, predicted by comparing fingerprints, is an exciting demonstration of the approach. The hypotheses generated by this work are important stepping stones to future studies uncovering the molecular basis of the proposed gene-drug interactions and discovering novel therapeutics to treat AD or co-occurring conditions such as sleep disturbance.

      Weaknesses:

      - The overarching concept of the work is that comparing behavioral fingerprints can align genes and molecules with similarly disrupted molecular pathways. While the recovery of the psen2 phenotypes by one molecule with the opposite phenotype is interesting, as are previous studies that show similar behaviorally-based recoveries, the underlying assumption that normalizing the larval movement normalizes the mechanism still lacks substantial support. There are many ways that a reduction in movement bouts could be returned to baseline that are unrelated to the root cause of the genetically driven phenotype. An ideal experiment would be to thoroughly characterize a mutant, such as by identifying a missing population of neurons, and use this approach to find a small molecule that rescues both behavior and the cellular phenotype. If the connection to serotonin in the sorl1 was more complete, for example, the overarching idea would be more compelling.

      Thank you for this cogent criticism.

      On the first point, we were careful not to claim that betamethasone normalises the molecular/cellular mechanism that causes the psen2 behavioural phenotype. Having said that, yes, to a certain extent that would be the hope of the approach. As you say, every compound which normalises the behavioural fingerprint will not normalise the underlying mechanism, but the opposite seems true: every compound that normalises the underlying mechanism should also normalise the behavioural fingerprint. We think this logic makes the “behaviour-first” approach innovative and interesting. The logic is to discover compounds that normalise the behavioural phenotype first, only subsequently test whether they also normalise the molecular mechanism, akin to testing first whether a drug resolves the symptoms before testing whether it actually modifies disease course. While in practice testing thousands of drugs in sufficient sample sizes and replicates on a mutant line is challenging, the dataset queried through ZOLTAR provides a potential shortcut by shortlisting in silico compounds that have the opposite effect on behaviour.

      You mention a “reduction in movement bouts” but note here that the number of behavioural parameters tested is key to our argument. To take the two extremes, say the only behavioural parameter we measured in psen2 knockout larvae was time active during the day, then, yes, any stimulant used at the right concentration could probably normalise the phenotype. In this situation, claiming that the stimulant is likely to also normalise the underlying mechanism, or even that it is a genuine “phenotypic rescue”, would not be convincing. Conversely, say we were measuring thousands of behavioural parameters under various stimuli, such as swimming speed, position in the well, bout usage, tail movements, and eye angles, it seems almost impossible for a compound to rescue most parameters without also normalising the underlying mechanism. The present approach is somewhere inbetween: ZOLTAR uses six behavioural parameters for prediction (e.g. Fig 6a), but all 17 parameters calculated by FramebyFrame can be used to assess rescue during a subsequent experiment (Fig. 6c). For both, splitting each parameter in day and night increases the resolution of the approach, which partly answers your criticism. For example, betamethasone rescued the day-time hypoactivity without causing night-time hyperactivity, so we are not making the “straw man argument” explained above of using any broad stimulant to rescue the hypoactivity phenotype.

      Furthermore, for diseases where the behavioural defect is the primary concern, such as autism or bipolar disorder, perhaps this behaviour-first approach is all that is needed, and whether or not the compound precisely rescues the underlying mechanism is somewhat secondary. The use of lithium to prevent manic episodes in bipolar disorder is a good example. It was initially tested because mania was thought to be caused by excess uric acid and lithium can dissolve uric acid (Mitchell and Hadzi-Pavlovic, 2000). The theory is now discredited, but lithium continues to be used without a precise understanding of its mode of action. In this example, behavioural rescue alone, assuming the secondary effects are tolerable, is sufficient to be beneficial to patients, and whether it modulates the correct causal pathway is secondary.

      On the second point, we agree that testing first ZOLTAR on a mutant for which we have a fairly good understanding of the mechanism causing the behavioural phenotype could have been a productive approach. Note, however, that examples already exist in the literature (Ashlin et al., 2018; Hoffman et al., 2016). The example from Hoffman et al. (2016) is especially convincing. Drugs generating behavioural fingerprints that positively correlate with the cntnap2a/cntnap2b double knockout fingerprint were enriched with NMDA and GABA receptor antagonists. In experiments analogous to our citalopram and fluvoxamine treatments (Fig. 5c,d and Fig. 5–supplement 1c,d), cntnap2a/cntnap2b knockout larvae were overly sensitive to the NMDA receptor antagonist MK-801 and the GABAA receptor antagonist pentylenetetrazol (PTZ). Among other drugs tested, zolpidem, a GABAA receptor agonist, caused opposite effects on wild-type and cntnap2a/cntnap2b knockout larvae. Knockout larvae were found to have fewer GABAergic neurons in the forebrain. While these studies did not use precisely the same analysis that ZOLTAR runs, they used the same rationale and behavioural dataset to make these predictions (Rihel et al., 2010), which shows that approaches like ZOLTAR can point to causal processes.

      On your last point, we hope our experiment testing fluvoxamine, another selective serotonin reuptake inhibitor (SSRI), makes the connection between Sorl1 and serotonin signalling more convincing.

      - The behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram is based on a small number of animals. The KO Euclidean distance measure is also more spread out than for the other datasets, and it looks like only five or so fish are driving the group difference. It also appears as though the numbers were also from two injection series. While there is nothing obviously wrong with the data, I would feel more comfortable if such a strong statement of a result from a relatively subtle phenotype were backed up by a higher N or a stable line. It is not impossible that the observed difference is an experimental fluke. If something obvious had emerged through the HCR, that would have also supported the conclusions. As it stands, if no more experiments are done to bolster the claim, the confidence in the strength of the link to serotonin should be reduced (possibly putting the entire section in the supplement and modifying the discussion). The discussion section about serotonin and AD is interesting, but I think that it is excessive without additional evidence.

      We mostly agree with this criticism. One could interpret the larger spread of the data for sorl1 KO larvae treated with 10 µM citalopram as evidence that the knockout larvae do indeed react differently to the drug at this dose, regardless of being driven by a subset of the animals. The result indeed does not survive removing the top 5 (p = 0.87) or top 3 (p = 0.18) sorl1 KO + 10 µM larvae, but this amounts to excluding 20 (3/14) or 35 (5/14) % of the datapoints as potential outliers, which is unreasonable. In fact, excluding the top 5 sorl1 KO + 10 µM is equivalent to calling any datapoint with z-score > 0.2 an outlier (z-scores of the top 5 datapoints are 0.2–1.8). Applying consistently the same criterion to the scrambled + 10 µM group would remove the top 6 datapoints (z-scores = 0.5–3.9). Comparing the resulting two distributions again gives the sorl1 KO + 10 µM distribution as significantly higher (p = 0.0015). We would also mention that Euclidean distance, as a summary metric for distance between behavioural fingerprints, has limitations. For example, the measure will be more sensitive to changes in some parameters but not others, depending on how much room there is for a given parameter to change. We included this metric to lend support to the observation one can draw from the fingerprint plot (Fig. 5c) that sorl1 mutants respond in an exaggerated way to citalopram across many parameters, while being agnostic to which parameter might matter most.

      Given that the HCR did not reveal anything striking, we agree with you that too much of our argument relied on this result being robust. As you and Reviewer #3 suggested, we repeated this experiment with a different SSRI, fluvoxamine (Fig. 5–supplement 1). We cannot readily explain why the result was opposite to what we found with citalopram, but in both cases sorl1 knockout larvae reacted differently than their control siblings, which adds an argument to our claim that ZOLTAR correctly predicted serotonin signalling as a disrupted pathway from the behavioural fingerprint. Accordingly, we mostly kept the Discussion on Sorl1 the same, although we concede that we may not have identified the molecular mechanism.

      - The authors suggest two hypotheses for the behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram. While the first is tested, and found to not be supported, the second is not tested at all ("Ruling out the first hypothesis, sorl1 knockouts may react excessively to a given spike in serotonin." and "Second, sorl1 knockouts may be overly sensitive to serotonin itself because post-synaptic neurons have higher levels of serotonin receptors."). Assuming that the finding is robust, there are probably other reasons why the mutants could have a different sensitivity to this molecule. However, if this particular one is going to be mentioned, it is surprising that it was not tested alongside the first hypothesis. This work could proceed without a complete explanation, but additional discussion of the possibilities would be helpful or why the second hypothesis was not tested.

      There are no strong scientific reasons why this hypothesis was not tested. The lead author (F Kroll) moved to a different lab and country so the project was finalised at that time. We do not plan on testing this hypothesis at this stage. However, we adapted the wording to make it clear this is one possible alternative hypothesis which could be tested in the future. The small differences found by HCR are actually more in line with the new results from the fluvoxamine experiment, so it may also be that both hypotheses (pre-synaptic neurons releasing less serotonin when reuptake is blocked; or post-synaptic neurons being less sensitive) contribute. The fluvoxamine experiment was performed in a different lab (ICM, Paris; all other experiments were done in UCL, London) in a different wild-type strain (TL in ICM, AB x Tup LF in UCL), which complicates how one interprets this discrepancy.

      - The authors claim that "all four genes produced a fairly consistent phenotype at night". While it is interesting that this result arose in the different lines, the second clutch for some genes did not replicate as well as others. I think the findings are compelling, regardless, but the sometimes missing replicability should be discussed. I wonder if the F0 strategy adds noise to the results and if clean null lines would yield stronger phenotypes. Please discuss this possibility, or others, in regard to the variability in some phenotypes.

      For the first part of this point, please see below our answer to Reviewer #3, point (2) c.

      Regarding the F0 strategy potentially adding variability, it is an interesting question which we tested in a larger dataset of behavioural recordings from F0 and stable knockouts for the same genes (unpublished). In summary, the F0 knockout method does not increase clutchto-clutch or larva-to-larva variability in the assay. F0 knockout experiments found many more significant parameters and larger effect sizes than stable knockout experiments, but this difference could largely be explained by the larger sample sizes of F0 knockout experiments. In fact, larger sample sizes within individual clutches appears to be a major advantage of the F0 knockout approach over in-cross of heterozygous knockout animals as it increases sensitivity of the assay without causing substantial variability. We plan to report in more detail on this analysis in a separate paper as we think it would dilute the focus of the present work.

      - In this work, the knockout of appa/appb is included. While APP is a well-known risk gene, there is no clear justification for making a knockout model. It is well known that the upregulation of app is the driver of Alzheimer's, not downregulation. The authors even indicate an expectation that it could be similar to the other knockouts ("Moreover, the behavioural phenotypes of appa/appb and psen1 knockout larvae had little overlap while they presumably both resulted in the loss of Aβ." and "Comparing with early-onset genes, psen1 knockouts had similar night-time phenotypes, but loss of psen2 or appa/appb had no effect on night-time sleep."). There is no reason to expect similarity between appa/appb and psen1/2. I understand that the app knockouts could unveil interesting early neurodevelopmental roles, but the manuscript needs to be clarified that any findings could be the opposite of expectation in AD.

      On “there is no reason to expect similarity […]”, we disagree. Knockout of appa/appb and knockout of psen1 will both result in loss of Aβ (appa/appb encode Aβ and psen1 cleaves Appa/Appb to release Aβ, cf. Fig. 3e). Consequently, a phenotype caused by the loss of Aβ, or possibly other Appa/Appb cleavage products, should logically be found in both appa/appb and psen1 knockouts.

      On “it is well known that the upregulation of APP is the driver of Alzheimer’s, not downregulation”; we of course agree. Among others, the examples of Down syndrome, APP duplication (Sleegers et al., 2006), or mouse models overexpressing human APP show definitely that overexpression of APP is sufficient to cause AD. Having said that, we would not be so quick in dismissing APP knockout as potentially relevant to understanding of AD.

      Loss of soluble Aβ due to aggregation could contribute to pathology (Espay et al., 2023). Without getting too much into this intricate debate, links between levels of Aβ and risk of disease are often counter-intuitive too. For example, out of 138 PSEN1 mutations screened in vitro, 104 reduced total Aβ production and 11 even seemingly abolished the production of both Aβ40 and Aβ42 (Sun et al., 2017). In short, loss of soluble Aβ occurs in both AD and in our appa/appb knockout larvae.

      We added a sentence in Results (section psen2 knockouts […]) to briefly justify our appa/appb knockout approach. To be clear, we do not want to imply, for example, that the absence of a night-time sleep phenotype for appa/appb is contradictory to the body of literature showing links between Aβ and sleep, including in zebrafish (Özcan et al., 2020). As you say, our experiment tested loss of App, including Aβ, while the literature typically reports on overexpression of APP, as in APP/PSEN1-overexpressing mice (Jagirdar et al., 2021).

      Reviewer #3 (Public Review):

      In this manuscript by Kroll and colleagues, the authors describe combining behavioral pharmacology with sleep profiling to predict disease and potential treatment pathways at play in AD. AD is used here as a case study, but the approaches detailed can be used for other genetic screens related to normal or pathological states for which sleep/arousal is relevant. The data are for the most part convincing, although generally the phenotypes are relatively small and there are no major new mechanistic insights. Nonetheless, the approaches are certainly of broad interest and the data are comprehensive and detailed. A notable weakness is the introduction, which overly generalizes numerous concepts and fails to provide the necessary background to set the stage for the data.

      Major points

      (1) The authors should spend more time explaining what they see as the meaning of the large number of behavioral parameters assayed and specifically what they tell readers about the biology of the animal. Many are hard to understand--e.g. a "slope" parameter.

      We agree that some parameters do not tell something intuitive about the biology of the animal. It would be easy to speculate. For example, the “activity slope” parameter may indicate how quickly the animal becomes tired over the course of the day. On the other hand, fractal dimension describes the “roughness/smoothness” of the larva’s activity trace (Fig. 2–supplement 1a); but it is not obvious how to translate this into information about the physiology of the animal. We do not see this as an issue though. While some parameters do provide intuitive information about the animal’s behaviour (e.g. sleep duration or sunset startle as a measure of startle response), the benefit of having a large number of behavioural parameters is to compare behavioural fingerprints and assess rescue of the behavioural phenotype by small molecules (Fig. 6c). For this purpose, the more parameters the better. The “MoSeq” approach from Wiltschko et al., 2020 is a good example from literature that inspired our own Fig. 6c. While some of the “behavioural syllables” may be intuitive (e.g. running or grooming), it is probably pointless to try to explain the ‘meaning’ of the “small left turn in place with head motion” syllable (Wiltschko et al., 2020). Nonetheless, this syllable was useful to assess whether a drug specifically treats the behavioural phenotype under study without causing too many side effects. Unfortunately, ZOLTAR has to reduce the FramebyFrame fingerprint (17 parameters) to just six parameters to compare it to the behavioural dataset from Rihel et al., 2010, but here, more parameters would almost certainly translate into better predictions too, regardless of their intuitiveness.

      It is true however that we did not give much information on how some of the less intuitive parameters, such as activity slope or fractal dimension, are calculated or what they describe about the dataset (e.g. roughness/smoothness for fractal dimension). We added a few sentences in the legend of Fig. 2–supplement 1.

      (2) Because in the end the authors did not screen that many lines, it would increase confidence in the phenotypes to provide more validation of KO specificity. Some suggestions include:

      a. The authors cite a psen1 and psen2 germline mutant lines. Can these be tested in the FramebyFrame R analysis? Do they phenocopy F0 KO larvae?

      We unfortunately do not have those lines. We investigated the availability of importing a psen2 knockout line from abroad, but the process of shipping live animals is becoming more and more cost and time prohibitive. However, we observed the same pigmentation phenotype for psen2 knockouts as reported by Jiang et al., 2018, which is at least a partial confirmation of phenocopying a loss of function stable mutant.  

      b. psen2_KO is one of the larger centerpieces of the paper. The authors should present more compelling evidence that animals are truly functionally null. Without this, how do we interpret their phenotypes?

      We disagree that there should be significant doubt about these mutants being truly functionally null, given the high mutation rate and presence of the expected pigmentation phenotype (Jiang et al., 2018, Fig. 3f and Fig. 3–supplement 3a). The psen2 F0 knockouts were virtually 100% mutated at three exons across the gene (mutation rates were locus 1: 100 ± 0%; locus 2: 99.99 ± 0.06%; locus 3: 99.85 ± 0.24%). Additionally, two of the three mutated exons had particularly high rates of frameshift mutations (locus 1: 97 ± 5%; locus 2: 88 ± 17% frameshift mutation rate). It is virtually impossible that a functional protein is translated given this burden of frameshift mutations. Phenotypically, in addition to the pigmentation defect, double psen1/psen2 F0 knockout larvae had curved tails, the same phenotype as caused by a high dose of the γ-secretase inhibitor DAPT (Yang et al., 2008). These double F0 knockouts were lethal, while knockout of psen1 or psen2 alone did not cause obvious morphological defects. Evidently, most larvae must have been psen2 null mutants in this experiment, otherwise functional Psen2 would have prevented early lethality.

      Translation of zebrafish psen2 can start at downstream start codons if the first exon has a frameshift mutation, generating a seemingly functional Psen2 missing the N-terminus (Jiang et al., 2020). Zebrafish homozygous for this early frameshift mutation had normal pigmentation, showing it is a reliable marker of Psen2 function even when it is mutated. This mechanism is not a concern here as the alternative start codons are still upstream of two of the three mutated exons (the alternative start codons discovered by Jiang et al., 2020 are in exon 2 and 3, but we targeted exon 3, exon 4, and exon 6).

      We understand that the zebrafish community may be cautious about F0 phenotyping compared to stably generated mutants. As mentioned to Reviewer #2, we are planning to assemble a paper that expressly compares behavioural phenotypes measured in F0 vs. stable mutants to allay some of these concerns. Our current manuscript, which combines CRISPR-Cas9 rapid F0 screening with in silico pharmacological predictions, inevitability represents a first step in characterizing the functions of these genes. 

      c. Related to the above, for cd2AP and sorl1 KO, some of the effect sizes seem to be driven by one clutch and not the other. In other words, great clutch-to-clutch variability. Should the authors increase the number of clutches assayed?

      Correct, there is substantial clutch-to-clutch variability in this behavioural assay. This is not specific to our experiments. Even within the same strain, wild-type larvae from different clutches (i.e. non-siblings) behave differently (Joo et al., 2021). This is why it is essential to compare behavioural phenotypes within individual clutches (i.e. from a single pair of parents, one male and one female), as we explain in Methods (section Behavioural video-tracking) and in the documentation of the FramebyFrame package. We often see two different experimental designs in literature: comparing non-sibling wild-type and mutant larvae, or pooling different clutches which include all genotypes (e.g. pooling multiple clutches from heterozygous in-crosses or pooling wild-type clutches before injecting them). The first experimental design causes false positive findings (Joo et al., 2021), as the clutchto-clutch variability we and others observe gets interpreted as a behavioural phenotype. The second experimental design should not cause false positives but likely decreases the sensitivity of the assay by increasing the spread within genotypes. In both cases, the clutch-to-clutch variability is hidden, either by interpreting it as a phenotype (first case) or by adding it to animal-to-animal variability (second case). Our experimental design is technically more challenging as it requires obtaining large clutches from unique pairs of parents. However, this approach is better as it clearly separates the different sources of variability (clutch-to-clutch or animal-to-animal). As for every experiment, yes, a larger number of replicates would be better, but we do not plan to assay additional clutches at this time. Our work heavily focuses on the sorl1 and psen2 knockout behavioural phenotypes. The key aspects of these phenotypes were effectively tested in four experiments (five to six clutches) as sorl1 knockout larvae were also tracked in the citalopram and fluvoxamine experiments (Fig. 5 and Fig. 5–supplement 1), and psen2 knockout larvae were also tracked in the small molecule rescue experiment (Fig. 6 and Fig. 6–supplement 1).

      The psen2 behavioural phenotype replicated well across the six clutches tested (pairwise cosine similarities: 0.62 ± 0.15; Author response image 2a). 5/6 clutches were less active and initiating more sleep bouts during the day, as we claimed in Fig. 3.

      In the citalopram experiment, the H<sub>2</sub>O-treated sorl1 knockout fingerprint replicated fairly well the baseline recordings in Fig. 4, despite the smaller sample size (cos = 0.30 and 0.78; Author response image 2b, see “KO Fig. 5”). 5/6 of the significant parameters presented in Fig. 4–supplement 4 moved in the same direction, and knockout larvae were also hypoactive during the day but hyperactive at night. Note that two clutches were tracked on the same 96-well plate in this experiment. We calculated each larva’s z-score using the average of its control siblings, then we averaged all the z-scores to generate the fingerprint. The H<sub>2</sub>O treated sorl1 knockout clutch from the fluvoxamine experiment did not replicate well the baseline recordings (cos = 0.08 and 0.11; Author response image 2b, see “KO Fig. 5–suppl. 1”). Knockout larvae were hypoactive during the day as expected, but behaviour at night was not as robustly affected. As mentioned above, knockouts were made in a different genetic background (TL, instead of AB x Tup LF used for all other experiments), which could explain the discrepancy.

      We also took the opportunity to check whether our SSRI treatments replicated well the data from Rihel et al., 2010. For both citalopram (n = 3 fingerprints in the database) and fluvoxamine (n = 4 fingerprints in the database), replication was excellent (cos ≥ 0.67 for all comparisons of a fingerprint from this study vs. a fingerprint from Rihel et al. 2010; Author response image 2c,d). Note that the scrambled + 10 µM citalopram and + 10 µM fluvoxamine fingerprints correlate extremely well (cos = 0.92; can be seen in Author response image 2c,d), which was predicted by the small molecule screen dataset.

      Author response image 2.

      Replication of psen2 and sorl1 F0 knockout fingerprints and SSRI treatments from Rihel et al., 2010. a, (left) Every psen2 F0 knockout behavioural fingerprint generated in this study. Each dot represents the mean deviation from the same-clutch scrambled-injected mean for that parameter (z-score, mean ± SEM). From the experiments in Fig. 6, presented is the psen2 F0 knockout + H<sub>2</sub>O fingerprints. The fingerprints in grey (“not shown”) are from a preliminary drug treatment experiment we did not include in the final study. These fingerprints are from psen2 F0 knockout larvae treated with 0.2% DMSO, normalised to scrambled-injected siblings also treated with 0.2% DMSO. (right) Pairwise cosine similarities (−1.0–1.0) for the fingerprints presented. b, Every sorl1 F0 knockout behavioural fingerprint, as in a). c, The scrambled-injected + citalopram (10 µM) fingerprints (grey) in comparison to the citalopram (10–15 µM) fingerprints from the Rihel et al., 2010 database (green). d, The scrambled-injected + fluvoxamine (10 µM) fingerprint (grey) in comparison to the fluvoxamine fingerprints from the Rihel et al., 2010 database (pink). In c) and d), the scrambled-injected fingerprints are from the experiments in Fig. 5 and Fig. 5–suppl. 1, but were converted here into the behavioural parameters used by Rihel et al., 2010 for comparison. Parameters: 1, average activity (sec active/min); 2, average waking activity (sec active/min, excluding inactive minutes); 3, total sleep (hr); 4, number of sleep bouts; 5, sleep bout length (min); 6, sleep latency (min until first sleep bout).

      (3) The authors make the point that most of the AD risk genes are expressed in fish during development. Is there public data to comment on whether the genes of interest are expressed in mature/old fish as well? Just because the genes are expressed early does not at all mean that early- life dysfunction is related to future AD (though this could be the case, of course). Genes with exclusive developmental expression would be strong candidates for such an early-life role, however. I presume the case is made because sleep studies are mainly done in juvenile fish, but I think it is really a prejy minor point and such a strong claim does not even need to be made.

      This is a fair criticism but we do not make this claim (“early-life dysfunction is related to future AD”) from expression alone. The reviewer is probably referring to the following quote:

      “[…] most of these were expressed in the brain of 5–6-dpf zebrafish larvae, suggesting they play a role in early brain development or function,” which does not mention future risk of AD. We do suggest that these genes have a function in development. After all, every gene that plays a role in brain development must be expressed during development, so this wording seemed reasonable. Nevertheless, we adapted the wording to address this point and Reviewer #2’s complaint below. As noted, the primary goal was to check that the genes we selected were indeed expressed in zebrafish larvae before performing knockout experiments. Our discussion does raise the hypothesis that mutations in Alzheimer’s risk genes impact brain development and sleep early in life, but this argument primarily relies on our observation that knockout of late-onset Alzheimer’s risk genes causes sleep phenotypes in 7-day old zebrafish larvae and from previous work showing brain structural differences in children at high genetic risk of AD (Dean et al., 2014; Quiroz et al., 2015), not solely on gene expression early in life.

      Please also see our answer to a similar point raised by Reviewer #2 below (cf. Author response image 7).

      (4) A common quandary with defining sleep behaviorally is how to rectify sleep and activity changes that influence one another. With psen2 KOs, the authors describe reduced activity and increased sleep during the day. But how do we know if the reduced activity drives increased behavioral quiescence that is incorrectly defined as sleep? In instances where sleep is increased but activity during periods during wake are normal or elevated, this is not an issue. But here, the animals might very well be unhealthy, and less active, so naturally they stop moving more for prolonged periods, but the main conclusion is not sleep per se. This is an area where more experiments should be added if the authors do not wish to change/temper the conclusions they draw. Are psen2 KOs responsive to startling stimuli like controls when awake? Do they respond normally when quiescent? Great care must be taken in all models using inactivity as a proxy for sleep, and it can harm the field when there is no acknowledgment that overall health/activity changes could be a confound. Particularly worrisome is the betamethasone data in Figure 6, where activity and sleep are once again coordinately modified by the drug.

      This is a fair criticism. We agree it is a concern, especially in the case of psen2 as we claim that day-time sleep is increased while zebrafish are diurnal. We do not rely heavily on the day-time inactivity being sleep (the ZOLTAR predictions or the small molecule rescue do not change whether the parameter is called sleep or inactivity), but our choice of labelling can fairly be challenged.

      To address “are psen2 KO responsive to startling stimuli like controls when awake/when quiescent”, we looked at the larvae’s behaviour immediately after lights abruptly switched on in the mornings. Almost every larva, regardless of genotype, responded strongly to every lights-off transition during the experiment. Instead, we chose the lights-on transition for this analysis because it is a weaker startling stimulus for the larvae than the lights-off transition (Fig. 3–supplement 3), potentially exposing differences between genotypes or behavioural states (quiescent or awake). We defined a larva as having reacted to the lights switching on if it made a swimming bout during the second (25 frames) a er the lights-on transition. Across two clutches and two lights-on transitions, an average of 65% (range 52–73%) of all larvae reacted to the stimulus. psen2 knockout larvae were similarly likely, if not more likely, to respond (in average 69% responded, range 60–76%) than controls (60% average, range 44– 75%). When the lights switched on, about half of the larvae (39–51%) would have been classified as asleep according to the one-minute inactivity definition (i.e. the larva did not move in the minute preceding the lights transition). This allowed us to also compare behavioural states, as suggested by the reviewer. For three of the four light transitions, larvae which were awake when lights switched on were more likely to react than asleep larvae, but this difference was not striking (overall, awake larvae were only 1.1× more likely to react; Author response image 3). Awake psen2 knockout larvae were 1.1× (range 1.04–1.11×) more likely to react than awake control larvae, so, yes, psen2 knockout larvae respond normally when awake. Asleep psen2 knockout larvae were 1.4× (range 0.63–2.19×) more likely to react than asleep control larvae, so psen2 knockouts are also more or equally likely to react than control larvae when asleep. In summary, the overall health of psen2 knockouts did not seem to be a significant confound in the experiment. As the reviewer suggested, if psen2 knockout larvae were seriously unhealthy, they would not be as responsive as control larvae to a startling stimulus.

      Author response image 3.

      psen2 F0 knockouts react normally to lights switching on, indicating they are largely healthy. At each lights-on transition (9 AM), each larva was categorised as awake if it had moved in the preceding one minute or asleep if it had been inactive for at least one minute. Darker tiles represent larvae which performed a swimming bout during the second following lights-on; lighter tiles represent larvae which did not move during that second. The total count of each waffle plot was normalised to 25 so plots can be compared to each other. The real count is indicated in the corner of each plot. Data is from the baseline psen2 knockout trackings presented in Fig. 3 and Fig. 3–suppl. 2.

      Next, we compared inactive period durations during the day between psen2 and control larvae. If psen2 knockout larvae indeed sleep more during the day compared to controls, we may predict inactive periods longer than one minute to increase disproportionately compared to the increase in shorter inactive periods. This broadly appeared to be the case, especially for one of the two clutches (Author response image 4). In clutch 1, inactive periods lasting 1–60 sec were equally frequent in both psen2 and control larvae (fold change 1.0× during both days), while inactive periods lasting 1–2 min were 1.5× (day 1) and 2.5× (day 2) more frequent in psen2 larvae compared to control larvae. In clutch 2, 1–60 sec inactive periods were also equally frequent in both psen2 and control larvae, while inactive periods lasting 1–2 min were 3.4× (day 1) and 1.5× (day 2) more frequent in psen2 larvae compared to control larvae. Therefore, psen2 knockouts disproportionately increased the frequency of inactive periods longer than one minute, suggesting they genuinely slept more during the day.

      Author response image 4.

      psen2 F0 knockouts increased preferentially the frequency of longer inactive bouts. For each day and clutch, we calculated the mean distribution of inactive bout lengths across larvae of same genotype (psen2 F0 knockout or scrambled-injected), then compared the frequency of inactive bouts of different lengths between the two genotypes. For example, in clutch 1 during day 2, 0.01% of the average scrambled-injected larva’s inactive bouts lasted 111–120 seconds (X axis 120 sec) while 0.05% of the average psen2 F0 knockout larva lasted this long, so the fold change was 5×. Inactive bouts lasting < 1 sec were excluded from the analysis. In clutch 2, day 1 plot, two datapoints fall outside the Y axis limit: 140 sec, Y = 32×; 170 sec, Y = 16×. Data is from the baseline psen2 knockout trackings presented in Fig. 3 and Fig. 3–suppl. 2.

      Ultimately, this criticism seems challenging to definitely address experimentally. A possible approach could be to use a closed-loop system which, after one minute of inactivity, triggers a stimulus that is sufficient to startle an awake larva but not an asleep larva. If psen2 knockout larvae indeed sleep more during the day, the stimulus should usually not be sufficient to startle them. Nevertheless, we believe the two analyses presented here are consistent with psen2 knockout larvae genuinely sleeping more during the day, so we decided to keep this label. We agree with the reviewer that the one-minute inactivity definition has limitations, especially for day-time inactivity.

      (5) The conclusions for the serotonin section are overstated. Behavioural pharmacology purports to predict a signaling pathway disrupted with sorl1 KO. But is it not just possible that the drug acts in parallel to the true disrupted pathway in these fish? There is no direct evidence for serotonin dysfunction - that conclusion is based on response to the drug. Moreover, it is just one drug - is the same phenotype present with another SSRI? Likewise, language should be toned down in the discussion, as this hypothesis is not "confirmed" by the results (consider "supported"). The lack of measured serotonin differences further raises concern that this is not the true pathway. This is another major point that deserves further experimental evidence, because without it, the entire approach (behavioral pharm screen) seems more shaky as a way to identify mechanisms. There are any number of testable hypotheses to pursue such as a) Using transient transgenesis to visualize 5HT neuron morphology (is development perturbed: cell number, neurite morphology, synapse formation); b) Using transgenic Ca reporters to assay 5HT neuron activity.

      Regarding the comment, “is it not just possible that the drug acts in parallel to the true disrupted pathway”, we think no, assuming we understand correctly the question. Key to our argument is the fact that sorl1 knockout larvae react differently to the drug(s) than control larvae. As an example, take night-time sleep bout length, which was not affected by knockout of sorl1 (Fig. 4–supplement 4). For the sake of the argument, say only dopamine signalling (the “true disrupted pathway”) was affected in sorl1 knockouts and that serotonin signalling was intact. Assuming that citalopram specifically alters serotonin signalling, then treatment should cause the same increase in sleep bout length in both knockouts and controls as serotonin signalling is intact in both. This is not what we see, however. Citalopram caused a greater increase in sleep bout length in sorl1 knockouts than in scrambled-injected larvae. In other words, the effect is non-additive, in the sense that citalopram did not add the same number of z-scores to sorl1 knockouts or controls. We think this shows that serotonin signalling is somehow different in sorl1 knockouts. Nonetheless, we concede that the experiment does not necessarily say much about the importance of the serotonin disruption caused by loss of Sorl1. It could be, for example, that the most salient consequence of loss of Sorl1 is cholinergic disruption (see reply to Reviewer #1 above) and that serotonin signalling is a minor theme.

      Furthermore, we agree with the reviewer and Reviewer #2 that the conclusions were overly confident. As suggested, we decided to repeat this experiment with another SSRI, fluvoxamine. Please find the results of this experiment in Fig. 5–supplement 1. The suggestions to further test the serotonin system in the sorl1 knockouts are excellent as well, however we do not plan to pursue them at this stage.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Comments:

      - Data are presented in a variety of different ways, occasionally making comparisons across figures difficult. Perhaps at a minimum, behavioral fingerprints as in Figure 3 - Supplementary Figure 1 should be presented for all mutants in the main figures.

      We like this suggestion! Thank you. We brought the behavioural fingerprints figure (previously Fig. 4–supplement 5) as main Fig. 4, and put the figure focused on the sorl1 knockout behavioural phenotype in supplementary, with the other gene-by-gene figures.

      - It is not clear why some data were selected for supplemental rather than main figures. In many cases, detailed phenotypic data is provided for one example mutant in the main figures, and then additional mutants are described in detail in the supplement. Again, to facilitate comparisons between mutants, fingerprints could be provided for all mutants in a main figure, with detailed analyses moved to the supplements.

      The logic was to dedicate one main figure to psen2 (Fig. 3) as an example of an early-onset Alzheimer’s risk gene, and one to sorl1 (previously Fig. 4) as an example of a late-onset Alzheimer’s risk gene. We focused on them in main figures as they are both tested again later (Fig. 5 and Fig. 6). Having said that, we agree that the fingerprints may be a better use of main figure space than the parameters plots. In addition to the above (fingerprints of lateonset Alzheimer’s risk genes in main figure), we rearranged the figures in the early-onset AD section to have the psen2 F0 knockout fingerprint in main.

      - The explication of the utility of behavioral fingerprinting on page 35 is somewhat confusing. The authors describe drugs used to treat depression as enriched among small molecules anti-correlating with the sorl1 fingerprint. However, in Figure 5 - Supplementary Figure 1, drugs used to treat depression are biased toward positive cosines, which are indicated as having a more similar fingerprint to sorl1. These drugs should be described as more present among compounds positively correlating with the sorl1 fingerprint.

      Sorry, the confusion is about “(anti-)correlating”. Precisely, we meant “correlating and/or anti-correlating”, not just anti-correlating. We changed to that wording. In short, the analysis is by design agnostic to whether compounds with a given annotation are found more on the positive cosines side (le side in Fig. 5–supplement 1a) or the negative cosines side (right side). This is because the dataset often includes both agonists and antagonists to a given pathway but these are difficult to annotate. For example, say 10 compounds in the dataset target the dopamine D4 receptor, but these are an unknown mix of agonists and antagonists. In this case, we want ZOLTAR to generate a low p-value when all 10 compounds are found at extreme ends of the list, regardless of which end(s) that is (e.g. top 8 and bottom 2 should give an extremely low p-value). Initially, we were splitting the list, for each annotation, into positive-cosine fingerprints and negative-cosine fingerprints and testing enrichment on both separately, but we think the current approach is better as it reflects better the cases we want to detect and considers all available examples for a given annotation in one test. In sum, yes, in this case drugs used to treat depression were mostly in the positive-cosine side, but the other drugs on the negative-cosine side also contributed to what the p-value is, so it reflects better the analysis to say “correlating and/or anticorrelating”. You can read more about our logic for the analysis in Methods (section Behavioural pharmacology from sorl1 F0 knockout’s fingerprint).

      - The authors conclude the above-described section by stating: "sorl1 knockout larvae behaved similarly to larvae treated with small molecules targeting serotonin signaling, suggesting that the loss of Sorl1 disrupted serotonin signaling." Directionality here may be important. Are all of the drugs targeting the serotonin transporter SSRIs or similar? If so, then a correct statement would be that loss of Sorl1 causes similar phenotypes to drugs enhancing serotonin signaling. Finally, based on the correlation between serotonin transporter inhibitor trazodone and the sorl1 crispant phenotype, it is potentially surprising that the SSRI citalopram caused the opposite phenotype from sorl1, that is, increased sleep during the day and night. It is potentially interesting that this result was enhanced in mutants, and suggests dysfunction of serotonin signaling, but the statement that "our behavioral pharmacology approach correctly predicted from behaviour alone that serotonin signaling was disrupted" is too strong a conclusion.

      We understand “disrupt” as potentially going either way, but this may not be the common usage. We changed to “altered”.

      The point regarding directionality is excellent, however. We tested the proportion of serotonin transporter agonists and antagonists (SSRIs) on each side of the ranked list of small molecule fingerprints. We used the STITCH database for this analysis as it has more drug–target interactions, but likely less curated, than the Therapeutic Target Database (Szklarczyk et al., 2016). As with the Therapeutic Target Database, most fingerprints of compounds interacting with the serotonin transporter SLC6A4 were found on the side of positive cosines (p ~ 0.005 using the custom permutation test), which replicates Fig. 5a with a different source for the drug–target annotations (Author response image 5). On the side of positive cosines (small molecules which generate behavioural fingerprints correlating with the sorl1 fingerprint), there were 2 agonists and 26 antagonists. On the side of negative cosines (small molecules which generate behavioural fingerprints anti-correlating with the sorl1 fingerprint), there were 3 agonists and 2 antagonists. Using a Chi-squared test, this suggests a significant (p = 0.002) over-representation of antagonists (SSRIs) on the positive side (expected count = 24, vs. 26 observed) and agonists on the negative side (expected count = 1, vs. 3 observed). If SLC6A4 antagonists, i.e. SSRIs, indeed tend to cause a similar behavioural phenotype than knockout of sorl1, this would point in the direction of our original interpretation of the citalopram experiment; which was that excessive serotonin signalling is what causes the sorl1 behavioural phenotype.

      Author response image 5.

      Using the STITCH database as source of annotations also predicts SLC6A4 as an enriched target for the sorl1 behavioural fingerprint. Same figures as Fig. 5a,b but using the STITCH database (Szklarczyk et al., 2016) as source for the drug targets. a, Compounds annotated by STITCH as interacting with the serotonin transporter SLC6A4 tend to generate behavioural phenotypes similar to the sorl1 F0 knockout fingerprint. 40,522 compound–target protein pairs (vertical bars; 1,592 unique compounds) are ranked from the fingerprint with the most positive cosine to the fingerprint with the most negative cosine in comparison with the mean sorl1 F0 knockout fingerprint. Fingerprints of drugs that interact with SLC6A4 are coloured in yellow. Simulated p-value = 0.005 for enrichment of drugs interacting with SLC6A4 at the top (positive cosine) and/or bottom (negative cosine) of the ranked list by a custom permutation test. b, Result of the permutation test for top and/or bottom enrichment of drugs interacting with SLC6A4 in the ranked list. The absolute cosines of the fingerprints of drugs interacting with SLC6A4 (n = 52, one fingerprint per compound) were summed, giving sum of cosines = 15.9. To simulate a null distribution, 52 fingerprints were randomly drawn 100,000 times, generating a distribution of 100,000 random sum of cosines. Here, only 499 random draws gave a larger sum of cosines, so the simulated p-value was p = 499/100,000 = 0.005 **.

      If this were true, we would expect, as the reviewer suggested, SSRI treatment (citalopram or fluvoxamine) on control larvae to give a similar behavioural phenotype as knockout of sorl1. However, this generally did not appear to be the case (sorl1 knockout fingerprint vs. SSRI-treated control fingerprint, cosine = 0.08 ± 0.35; Author response image 6).

      Author response image 6.

      sorl1 F0 knockouts in comparison to controls treated with SSRIs. a, sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the citalopram experiment) in comparison with the scrambled-injected + citalopram (1 or 10 µM) fingerprints. Each dot represents the mean deviation from the same-clutch scrambled-injected H<sub>2</sub>O-treated mean for that parameter (z-score, mean ± SEM). b, As in a), sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the fluvoxamine experiment) in comparison with the scrambled-injected + fluvoxamine (10 µM) fingerprint.

      The comparison with trazodone is an interesting observation, but it is only a weak serotonin reuptake inhibitor (Ki for SLC6A4 = 690 nM, vs. 8.9 nM for citalopram; Owens et al., 1997) and it has many other targets, both as agonist or antagonist, including serotonin, adrenergic, and histamine receptors (Mijur, 2011). In any case, the average trazodone fingerprint does not correlate particularly well to the sorl1 knockout fingerprint (cos = 0.3). Finally, the sorl1 knockout behavioural phenotype could be primarily caused by altered serotonin signalling in the hypothalamus, where we found both the biggest difference in tph1a/1b/2 HCR signal intensity (Fig. 5f) and the highest expression of sorl1 across scRNA-seq clusters (Fig. 1– supplement 2). In this case, it would be correct to expect sorl1 knockouts to react differently to SSRIs than controls, but it would be incorrect to expect SSRI treatment to cause the same behavioural phenotype, as it concurrently affects every other serotonergic neuron in the brain.

      Finally, we agree the quoted conclusion was too strong given the current evidence. We since tested another SSRI, fluvoxamine, on sorl1 knockouts.

      - Also in reference to Figure 5: in panel c, data are presented as deviation from vehicle treated. Because of this data presentation choice, it's no longer possible to determine whether, in this experiment, sorl1 crispants sleep less at night relative to their siblings. Does citalopram rescue / reverse sleep deficits in sorl1 mutants?

      On your first point, please see our response to Reviewer #3 (2)c and Author Response 2b above.

      On “does citalopram rescue/reverse sleep deficits in sorl1 mutants”: citalopram (and fluvoxamine) tends to reverse the key aspects of the sorl1 knockout behavioural phenotype by reducing night-time activity (% time active and total Δ pixels), increasing night-time sleep, and shortening sleep latency (Author response image 7). Extrapolating from the hypothesis presented in Discussion, this may be interpreted as a hint that sorl1 knockouts have reduced levels of 5-HT receptors, as increasing serotonin signalling using an SSRI tends to rescue the phenotype. However, we do not think that focusing on the significant behavioural parameters necessarily make sense here. Rather, one should take all parameters into account to conclude whether knockouts react differently to the drug than wild types (also see answer to Reviewer #3, (7) on this). For example, citalopram increased more the night-time sleep bout length of sorl1 knockouts than the one of controls (Fig. 5), but this parameter was not modified by knockout of sorl1 (Fig. 4). To explain the rationale more informally, citalopram is only used as a tool here to probe serotonin signalling in sorl1 knockouts, whether it worsens or rescues the behavioural phenotype is somewhat secondary, the key question is whether knockouts react differently than controls.

      Author response image 7.

      Comparing untreated sorl1 F0 knockouts vs. treated with SSRIs. a, sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the citalopram experiment) in comparison with the sorl1 knockout + citalopram (1 or 10 µM) fingerprints. Each dot represents the mean deviation from the same-clutch scrambled-injected H<sub>2</sub>O-treated mean for that parameter (z-score, mean ± SEM). b, As in a), sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the fluvoxamine experiment) in comparison with the sorl1 + fluvoxamine (10 µM) fingerprint.

      - Possible molecular pathways targeted by tinidazole, fenoprofen, and betamethasone are not described.

      Tinidazole is an antibiotic, fenoprofen is a non-steroidal anti-inflammatory drug (NSAIDs), betamethasone is a steroidal anti-inflammatory drug. Interestingly, long-term use of NSAIDs reduces the risk of AD (in ’t Veld Bas A. et al., 2001). Several mechanisms are possible (Weggen et al., 2007), including reduction of Aβ42 production by interacting with γ-secretase (Eriksen et al., 2003). However, we did not explore the mechanism of action of these drugs on psen2 knockouts so do not feel comfortable speculating. We do not know, for example, whether these findings apply to betamethasone.

      Minor Comments:

      - On page 25, panel "g" should be labeled as "f".

      Thank you!

      - On page 35, a reference should be provided for the statement "From genomic studies of AD, we know that mutations in genes such as SORL1 modify risk by disrupting some biological processes.".

      Thank you, this is now corrected. There were the same studies as mentioned in Introduction.

      - On page 43, the word "and" should be added - "in wild-type rats and mice, overexpressing mutated human APP and PSEN1, AND restricting sleep for 21 days...".

      Right, this sentence could be misread, we edited it. “overexpressing […]” only applied to the mice, not the rats (as they are wild-type); and both are sleep-deprived.

      - On page 45, a reference should be provided for the statement "SSRIs can generally be used continuously with no adverse effects" and this statement should potentially be softened.

      The reference is at the end of that sentence (Cirrito et al., 2011). You are correct though; we reformulated this statement to: “SSRIs can generally be used safely for many years”. SSRIs indeed have side effects.

      - On page 54, a 60-minute rolling average is described as 45k rows, but this seems to be a 30-minute rolling average.

      Thank you! We corrected. It should have been 90k rows, as in: 25 frames-per-second × 60 seconds × 60 minutes.

      Reviewer #2 (Recommendations For The Authors):

      "As we observed in the scRNA-seq data, most genes tested (appa, appb, psen1, psen2, apoea, cd2ap, sorl1) were broadly expressed throughout the 6-dpf brain (Fig. 1d and Fig. 1supplement 3 and 4)."

      - apoea and appb are actually not expressed highly in the scRNA-seq data, and the apoea in situ looks odd, as if it has no expression. The appb gene mysteriously does not look as though it has high expression in the Raj data, but it is clearly expressed based on the in situ. I had previously noticed the same discrepancy, and I attribute it to the transcriptome used to map the Raj data, as the new DanioCell data uses a new transcriptome and indicates high appb expression in the brain. Please point out the discrepancy and possible explanation, perhaps in the figure legend.

      All excellent points, thank you. We included them directly in Results text.

      "most of these were expressed in the brain of 5-6-dpf zebrafish larvae, suggesting they play a role in early brain development or function."

      - Evidence of expression does not suggest function, particularly not a function in brain development. As one example, almost half of the genome is expressed prior to the maternal-zygotic transition but does not have a function in those earliest stages of development. There are numerous other instances where expression does not equal function. Please change the sentence even as simply as "it is possible that they".

      We mostly agree and edited to “[…], so they could play a role […]”.

      Out of curiosity, we plotted, for each zebrafish developmental stage, the proportion of Alzheimer’s risk gene orthologues expressed in comparison to the proportion of all genes expressed (Author response image 8). We defined “all genes” as every gene that is expressed in at least one of the developmental stages (n = 24,856), not the complete transcriptome, to avoid including genes that are never expressed in the brain or whose expression is always below detection limit. We counted a gene as “expressed” if at least three cells had detectable transcripts. Using these definitions, 82 ± 7% of genes are expressed during development. For every developmental stage except 5 dpf (so 11/12), a larger proportion of Alzheimer’s risk genes than all genes are expressed (+5 ± 4%).

      Author response image 8.

      Proportion of Alzheimer’s risk genes orthologues expressed throughout zebrafish development. Proportion of Alzheimer’s risk genes orthologues (n = 42) and all genes (n = 24,856) expressed in the zebrafish brain at each developmental stage, from 12 hours post-fertilisation (hpf) to 15 days post-fertilisation (dpf). “All genes” corresponds to every gene expressed in the brain at any of the developmental stages, not the complete transcriptome. A gene is considered “expressed” (green) if at least three cells had detectable transcripts. Single-cell RNA-seq dataset from Raj et al., 2020.

      "This frame-by-frame analysis has several advantages over previous methods that analysed activity data at the one-minute resolution."

      - Which methods are these? There are no citations. There are certainly existing methods in the zebrafish field that can produce similar data to the method developed for this project. This new package is useful, as most existing software is not written in R, so it would help scientists who prefer this programming language. However, I would be careful not to oversell its novelty, since many methods do exist that produce similar results.

      We added the references. There were referenced above after “we combined previous sleep/wake analysis methods”, but should have been referenced again here.

      We are not convinced by this criticism. We would obviously not claim that the FramebyFrame package is as sophisticated and versatile as video-tracking tools like SLEAP or DeepLabCut, but we do think it answers a genuine need that was not addressed by other methods. Specifically, we know of many labs recording pixel count data across multiple days using the Zebrabox or DanioVision (we added support for DanioVision data after submission), but there were no packages to extract behavioural parameters from these data. Other methods involved standalone scripts with no documentation or version tracking. We would concede the FramebyFrame package is mostly targeted at these labs, but we already know of six labs routinely using it and were recently contacted by a researcher tracking Daphnia in the Zebrabox.

      "F0 knockouts of both cutches" - "clutches"

      Thank you!

      Reviewer #3 (Recommendations For The Authors):

      I would suggest totally revamping the Introduction section, and being sure to provide readers with the context and background they need for the data that comes thereafter. Key areas to touch on, in no particular order, include:

      • Far more detail on the behavioral pharm screen upon which this paper builds, as a brief overview of that approach and the data generated are needed.

      Thank you for the suggestion, we added a sentence hinting at this work in the last Introduction paragraph.

      • Limitations of current zebrafish sleep/arousal assays that motivated the authors to develop a new, temporally high-resolution system.

      We think this is better explained in Results, as is currently. For example, we need to point to Fig. 2–supplement 2a,b,c to explain that one-minute methods were missing sleep bouts and how FramebyFrame resolves this issue.

      • A paragraph about sleep and AD, that does a better job of citing work in humans, mammalian, and invertebrate models that motivate the interest in the connection pursued here.

      Sorry, we think this would place too much focus on sleep and AD. We want the main topic of the paper to be the behavioural pharmacology approach, not AD or sleep per se. As the Introduction states, we see Alzheimer’s risk genes as a case study for the behavioural pharmacology approach, rather than the reason why the approach was developed. Additionally, presenting sleep and AD in Introduction risks sounding like ZOLTAR is specifically designed for this context, while we conceived of it as much more generalisable and explicitly encourage its use to study genes associated to other diseases. Note that the paragraph you suggest is, we think, mostly present in Discussion (section Disrupted sleep and serotonin signalling […]).

      • I modestly suggest eliminating making such a strong case for a gene-first approach being the best way to understand disease. It is not a zero-sum game, and there is plenty to learn from proteomics, metabolomics, etc. I suspect nobody will argue with the authors saying they leveraged the strength of their system and focused on key AD genes of interest.

      From your point below, we understand the following quote is the source of the issue: “For finding causal processes, studying the genome, rather than the transcriptome or epigenome, is advantageous because the chronology from genomic variant to disease is unambiguous […]”. We did not want to suggest it is a zero-sum game, but we now understand how it can be read this way. We adapted slightly the wording. What we want to do is highlight the causality argument as the advantage of the genomics approach. We feel we do not read this argument often enough, while it remains a ‘magic power’ of genomics. One essentially does not have to worry about causality when studying a pathogenic germline variant, while it is a constant concern when studying the transcriptome or epigenome (i.e. did the change in this transcript’s level cause disease, or vice-versa?). To take an example in the context of AD, arguments based on genomics (e.g. Down syndrome or APP duplication) are often the definite arbiters when debating the amyloid hypothesis, exactly because their causality cannot be doubted.

      Minor comments

      (1) The opening of the introduction is perhaps overly broad, spending an entire paragraph on genome vs transcriptome, etc and making the claim that a gene-first approach is the best path. It isn't zero-sum, and the authors could just get right into AD and study genes of interest. Similar issues occur throughout the manuscript, with sentences/paragraphs that are not necessarily needed.

      Please see our answer to your previous point. On the introduction being overly broad, we perfectly agree it is broad, but related to your point about presenting sleep and AD in the Introduction, we wish to talk about finding causal processes from genomics findings using behavioural pharmacology. We purposefully present research on AD as one instance of this broader goal, not the primary topic of the paper.

      Another example are these sentences, which could be totally removed as the following paragraph starts off making the same point much more succinctly. "From genomic studies of AD, we know that mutations in genes such as SORL1 modify risk by disrupting some biological processes. Presumably, the same processes are disrupted in zebrafish sorl1 knockouts, and some caused the behavioural alterations we observed. Can we now follow the thread backwards and predict some of the biological processes in which Sorl1 is involved based on the behavioural profile of sorl1 knockouts?"

      Thanks for the suggestion, but we think these sentences are useful to place back this Results section in the context of the Introduction. Think of the paper as mainly about the behavioural pharmacology approach, not on Alzheimer’s risk genes. The function of the paragraph here is not simply to explain the method by which we decided to study sorl1; it is to reiterate the rationale behind the behavioural pharmacology approach so that the reader understands where this Results section fits in the overall structure.

      (2) Related to the above, the authors use lecanemab as an example to support their approach, but there has been a great deal of controversy regarding this drug. I don't think such extensive justification is needed. This study uses AD risk genes as a case study in a newly developed behavioral pharm pipeline. A great deal of the rest of the intro seems to just fill space and could be more focused on the study at hand. Interestingly, a er gene selection, the next step in their pipeline is sleep/wake analysis yet nothing is covered about AD and sleep in the intro. Some justification of that approach (why focus on sleep/wake as a starting point for behavioral pharm rather than learning and memory?) would be a better use of intro space.

      There has indeed been controversy about lecanemab, but even the harshest critiques of the amyloid hypothesis concede that it slows down cognitive decline (Espay et al., 2023). That is all that is needed to support our argument, which is that research on AD started primarily from genomics and thereby yielded a disease-modifying drug. The controversy seems mostly focused on whether this effect size is clinically significant, and we think we correctly represent this uncertainty (e.g. “antibodies against Aβ such as lecanemab show promise in slowing down disease progression” and “the beneficial effects from targeting Aβ aggregation currently remain modest”).

      Your next point is entirely fair. We mostly answered it above. To explain further, the primary reason why we measured sleep/wake behaviour is to match the behavioural dataset from Rihel et al., 2010 so we can use it to make predictions, not to study sleep in the context of AD per se. Sure, perhaps learning and memory would have been interesting, but we do not know of any study testing thousands of small molecules on zebrafish larvae during a memory task. We understand it can be slightly confusing though, as we then spend a paragraph of Discussion on sleep as a causal process in AD, but we obviously need to discuss this topic given the findings. However, to reiterate, we purposefully designed FramebyFrame and ZOLTAR to be useful beyond studying sleep/wake behaviour. For example, FramebyFrame would not calculate 17 behavioural parameters if the only goal was to measure sleep. We now mention the Rihel et al., 2010 study in the Introduction as you suggested above (“Far more detail on the behavioral pharm screen […]”), as that is the real reason why sleep/wake behaviour was measured in the first place.

      (3) Also related to the above, another more relevant point that could be talked about in the intro is the need for more refined approaches to analyze sleep in zebrafish, given the effort that went into the new analysis system described here. Again, I think the context for why the authors developed this system would be more meaningful than the current content.

      Thank you, we think we answered this point above (especially below Limitations of current zebrafish sleep/arousal assays […]).

      (4) GWAS can stand for Genome-wide associate studies (plural) so I do not think the extra "s" is needed (GWASs) .

      Indeed, that seems to be the common usage. Thank you.

      (5) AD candidate risk genes were determined from loci using "mainly statistic colocalization". Can the authors add a few more details about what was done and what the "mainly" caveat refers to?

      “Mainly” simply refers to the fact that other methods were used by Schwartzentruber et al. (2021) to annotate the GWAS loci with likely causal genes, but that most calls were ultimately made from statistic colocalisation. Readers can refer to this work to learn more about the methods used.

      (6) The authors write "The loss of psen1 only had mild effects on behaviour" but I think they mean "sleep behaviors" as there could be many other behaviors that are disrupted but were not assessed. The same issue a few sentences later with "Behaviour during the day was not affected" and at the end of the following paragraph.

      Yes, that would be more precise, thank you.

      (7) For the Sorl1 pharmacology data, it is very hard to understand what is being measured behaviorally. Are the authors measuring sleep +/- citalopram, or something else, and why the change to Euclidean distance rather than all the measures we were just introduced to earlier in the manuscript?

      We understand these plots (Fig. 5c,d) are less intuitive, but it is important that we show the difference in behaviour compared to H<sub>2</sub>O-treated larvae of same genotype. The claim is that citalopram has a larger effect on knockouts than on controls, so the reader needs to focus on the effect of the drug on each genotype, not on the effect of sorl1 knockout. We added the standard fingerprints (i.e. setting controls to z-score = 0) here in Author response figures.

      Euclidean distance takes as input all the measures we introduced. The point is precisely not to select a single measure. For example, say we were only plotting active bout number during the day, we would conclude that 10 µM citalopram has the same effect on knockouts and controls. Conversely, if we had taken sleep bout length at night, we would conclude 10 µM has a stronger effect on knockouts. What is the correct parameter to select? Using Euclidean distance resolves this by taking all parameters into account, rather than arbitrarily choosing one.

      And what exactly is a "given spike in serotonin"? and how is this hypothesis the conclusion based on the lack of evidence for the second hypothesis? As the authors say, there could be other ways sorl1 knockouts are more sensitive to citalopram, so the absence of evidence for one hypothesis certainly does not support the other hypothesis.

      We mean a given release of serotonin in the synaptic cleft. We have fixed this wording. 

      We tend to disagree on the second point. We can think of two ways that sorl1 knockouts are more sensitive to citalopram: 1) they produce more serotonin, so blocking reuptake causes a larger spike in knockouts; or 2) blocking reuptake causes the same increase in both knockouts and wild-types but knockouts react more strongly to serotonin. We cannot in fact think of another way to explain the citalopram results. Not finding overwhelming evidence for 1) surely supports 2) somewhat, even if we do not have direct evidence for it. As an analogy, if two diagnoses are possible for a patient, testing negative for the first one supports the other one, even before it is directly tested.

      (8) Again some language is used without enough care. Fish are referred to as "drowsier" under some drug conditions. How do the authors know the animal is drowsy? The phenotype is more specific - more sleep, less activity.

      Thank you, we switched to “Furthermore, fenoprofen worsened the day-time hypoactivity of psen2 knockout larvae […]”.

      (9) This sentence is misleading as it gives the impression that results in this manuscript suggest the conclusion: "Our observation that disruption of genes associated with AD diagnosis after 65 years reduces sleep in 7-day zebrafish larvae suggest that disrupted sleep may be a common mechanism through which these genes exert an effect on risk." That idea is widely held in the field, and numerous other previous manuscripts/reviews should be cited for clarity of where this hypothesis came from.

      This idea is not widely held in the field. You likely read this point as “disrupted sleep is a risk factor for AD”, which, yes, is widely discussed in the field, but is not precisely what we are saying. We hypothesise that mutations in some of the Alzheimer’s risk genes cause disrupted sleep, possibly from a very early age, which then causes AD decades later. Studies and reviews on sleep and AD rarely make this hypothesis, at least not explicitly. The closest we know of are a few recent human genetics studies, typically using Mendelian Randomisation, finding that higher genetic risk of AD correlates with some sleep phenotypes, such as sleep duration (Chen et al., 2022; Leng et al., 2021). The work of Muto et al. (2021) is particularly interesting as it found correlations between higher genetic risk of AD and some sleep phenotypes in men in their early twenties, which seems unlikely to be a consequence of early pathology (Muto et al., 2021). Note, however, that even these studies do not mention sleep possibly being disrupted early in development, which is what our findings in zebrafish larvae support. As we mention, we think a team should test whether sleep is different in infants at higher genetic risk of AD, essentially performing an analogous, but obviously much more difficult, experiment as we did in zebrafish larvae. We do not know of any study testing this or even raising this idea, so evidently it is not widely held. Having said that, the studies we mention here were not referenced in the Discussion paragraph. We have now corrected this.

      Ashlin TG, Blunsom NJ, Ghosh M, Cockcroft S, Rihel J. 2018. Pitpnc1a Regulates Zebrafish Sleep and Wake Behavior through Modulation of Insulin like Growth Factor Signaling. Cell Rep 24:1389–1396. doi:10.1016/j.celrep.2018.07.012

      Chen D, Wang X, Huang T, Jia J. 2022. Sleep and LateOnset Alzheimer’s Disease: Shared Genetic Risk Factors, Drug Targets, Molecular Mechanisms, and Causal Effects. Front Genet 13. doi:10.3389/fgene.2022.794202

      Cirrito JR, Disabato BM, Restivo JL, Verges DK, Goebel WD, Sathyan A, Hayreh D, D’Angelo G, Benzinger T, Yoon H, Kim J, Morris JC, Mintun MA, Sheline YI. 2011. Serotonin signaling is associated with lower amyloid-β levels and plaques in transgenic mice and humans. Proc Natl Acad Sci U S A 108:14968–14973. doi:10.1073/pnas.1107411108

      Dean DC, Jerskey BA, Chen K, Protas H, Thiyyagura P, RoonJva A, O’Muircheartaigh J, Dirks H, Waskiewicz N, Lehman K, Siniard AL, Turk MN, Hua X, Madsen SK, Thompson PM, Fleisher AS, Huentelman MJ, Deoni SCL, Reiman EM. 2014. Brain Differences in Infants at Differential Genetic Risk for Late-Onset Alzheimer Disease A Cross-sectional Imaging Study. JAMA Neurol 71:11–22. doi:10.1001/jamaneurol.2013.4544

      Eriksen JL, Sagi SA, Smith TE, Weggen S, Das P, McLendon DC, Ozols VV, Jessing KW, Zavitz KH, Koo EH, Golde TE. 2003. NSAIDs and enantiomers of flurbiprofen target γ-secretase and lower Aβ42 in vivo. J Clin Invest 112:440–449. doi:10.1172/JCI18162

      Espay AJ, Herrup K, Kepp KP, Daly T. 2023. The proteinopenia hypothesis: Loss of Aβ42 and the onset of Alzheimer’s Disease. Ageing Res Rev 92:102112. doi:10.1016/j.arr.2023.102112

      Hoffman EJ, Turner KJ, Fernandez JM, Cifuentes D, Ghosh M, Ijaz S, Jain RA, Kubo F, Bill BR, Baier H, Granato M, Barresi MJF, Wilson SW, Rihel J, State MW, Giraldez AJ. 2016. Estrogens Suppress a Behavioral Phenotype in Zebrafish Mutants of the AuJsm Risk Gene, CNTNAP2. Neuron 89:725–733. doi:10.1016/j.neuron.2015.12.039

      in ’t Veld Bas A, Ruitenberg A, Hofman A, Launer LJ, van Duijn CM, Stijnen T, Breteler MMB, Stricker BHC. 2001. Nonsteroidal Anti inflammatory Drugs and the Risk of Alzheimer’s Disease. N Engl J Med 345:1515–1521. doi:10.1056/NEJMoa010178

      Jagirdar R, Fu C-H, Park J, Corbek BF, Seibt FM, Beierlein M, Chin J. 2021. Restoring activity in the thalamic reticular nucleus improves sleep architecture and reduces Aβ accumulation in mice. Sci Transl Med 13:eabh4284. doi:10.1126/scitranslmed.abh4284

      Jiang H, Newman M, Lardelli M. 2018. The zebrafish orthologue of familial Alzheimer’s disease gene PRESENILIN 2 is required for normal adult melanotic skin pigmentation. PLOS ONE 13:e0206155. doi:10.1371/journal.pone.0206155

      Jiang H, Pederson SM, Newman M, Dong Y, Barthelson K, Lardelli M. 2020. Transcriptome analysis indicates dominant effects on ribosome and mitochondrial function of a premature termination codon mutation in the zebrafish gene psen2. PloS One 15:e0232559. doi:10.1371/journal.pone.0232559

      Joo W, Vivian MD, Graham BJ, Soucy ER, Thyme SB. 2021. A Customizable Low-Cost System for Massively Parallel Zebrafish Behavioral Phenotyping. Front Behav Neurosci 14.

      Joubert L, Hanson B, Barthet G, Sebben M, Claeysen S, Hong W, Marin P, Dumuis A, Bockaert J. 2004. New sorting nexin (SNX27) and NHERF specifically interact with the 5-HT4a receptor splice variant: roles in receptor targeting. J Cell Sci 117:5367–5379. doi:10.1242/jcs.01379

      Leng Y, Ackley SF, Glymour MM, Yaffe K, Brenowitz WD. 2021. Genetic Risk of Alzheimer’s Disease and Sleep Duration in Non-Demented Elders. Ann Neurol 89:177–181. doi:10.1002/ana.25910

      Mitchell PB, Hadzi-Pavlovic D. 2000. Lithium treatment for bipolar disorder. Bull World Health Organ 78:515–517.

      Mikur A. 2011. Trazodone: properties and utility in multiple disorders. Expert Rev Clin Pharmacol 4:181–196. doi:10.1586/ecp.10.138

      Munoz-Torrero D. 2008. Acetylcholinesterase Inhibitors as Disease-Modifying Therapies for Alzheimer’s Disease. Curr Med Chem 15:2433–2455. doi:10.2174/092986708785909067

      Muto V, Koshmanova E, Ghaemmaghami P, Jaspar M, Meyer C, Elansary M, Van Egroo M, Chylinski D, Berthomier C, Brandewinder M, Mouraux C, Schmidt C, Hammad G, Coppieters W, Ahariz N, Degueldre C, Luxen A, Salmon E, Phillips C, Archer SN, Yengo L, Byrne E, Collette F, Georges M, Dijk D-J, Maquet P, Visscher PM, Vandewalle G. 2021. Alzheimer’s disease genetic risk and sleep phenotypes in healthy young men: association with more slow waves and daytime sleepiness. Sleep 44. doi:10.1093/sleep/zsaa137

      Myers-Turnbull D, Taylor JC, Helsell C, McCarroll MN, Ki CS, Tummino TA, Ravikumar S, Kinser R, Gendelev L, Alexander R, Keiser MJ, Kokel D. 2022. Simultaneous analysis of neuroactive compounds in zebrafish. doi:10.1101/2020.01.01.891432

      Owens MJ, Morgan WN, Plok SJ, Nemeroff CB. 1997. Neurotransmiker receptor and transporter binding profile of antidepressants and their metabolites. J Pharmacol Exp Ther 283:1305– 1322.

      Özcan GG, Lim S, Leighton PL, Allison WT, Rihel J. 2020. Sleep is bi-directionally modified by amyloid beta oligomers. eLife 9:e53995. doi:10.7554/eLife.53995

      Quiroz YT, Schultz AP, Chen K, Protas HD, Brickhouse M, Fleisher AS, Langbaum JB, Thiyyagura P, Fagan AM, Shah AR, Muniz M, Arboleda-Velasquez JF, Munoz C, Garcia G, Acosta-Baena N, Giraldo M, Tirado V, Ramírez DL, Tariot PN, Dickerson BC, Sperling RA, Lopera F, Reiman EM. 2015. Brain Imaging and Blood Biomarker Abnormalities in Children With Autosomal Dominant Alzheimer Disease: A Cross-Sectional Study. JAMA Neurol 72:912–919. doi:10.1001/jamaneurol.2015.1099

      Relkin NR. 2007. Beyond symptomatic therapy: a reexamination of acetylcholinesterase inhibitors in Alzheimer’s disease. Expert Rev Neurother 7:735–748. doi:10.1586/14737175.7.6.735

      Rihel J, Prober DA, Arvanites A, Lam K, Zimmerman S, Jang S, Haggarty SJ, Kokel D, Rubin LL, Peterson RT, Schier AF. 2010. Zebrafish Behavioral Profiling Links Drugs to Biological Targets and Rest/Wake Regulation. Science 327:348–351. doi:10.1126/science.1183090

      Sleegers K, Brouwers N, Gijselinck I, Theuns J, Goossens D, Wauters J, Del-Favero J, Cruts M, van Duijn CM, Van Broeckhoven C. 2006. APP duplication is sufficient to cause early onset Alzheimer’s dementia with cerebral amyloid angiopathy. Brain J Neurol 129:2977–2983. doi:10.1093/brain/awl203

      Sun L, Zhou R, Yang G, Shi Y. 2017. Analysis of 138 pathogenic mutations in presenilin-1 on the in vitro production of Aβ42 and Aβ40 peptides by γ-secretase. Proc Natl Acad Sci 114:E476– E485. doi:10.1073/pnas.1618657114

      Szklarczyk D, Santos A, von Mering C, Jensen LJ, Bork P, Kuhn M. 2016. STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data. Nucleic Acids Res 44:D380–D384. doi:10.1093/nar/gkv1277

      Weggen S, Rogers M, Eriksen J. 2007. NSAIDs: small molecules for prevention of Alzheimer’s disease or precursors for future drug development? Trends Pharmacol Sci 28:536–543. doi:10.1016/j.Jps.2007.09.004

      Wiltschko AB, Tsukahara T, Zeine A, Anyoha R, Gillis WF, Markowitz JE, Peterson RE, Katon J, Johnson MJ, Daka SR. 2020. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat Neurosci 23:1433–1443. doi:10.1038/s41593-020-00706-3

      Yang T, Arslanova D, Gu Y, Augelli-Szafran C, Xia W. 2008. Quantification of gamma-secretase modulation differentiates inhibitor compound selectivity between two substrates Notch and amyloid precursor protein. Mol Brain 1:15. doi:10.1186/1756-6606-1-15

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined.

      We thank the reviewer for this useful comment. We have clarified the method and defined all relevant variables in the revised manuscript (Line 537-573). The reviewer correctly pointed out additional sections and equations in Choi et al., including the derivation of an exact expression for the steady-state queue-length distribution and the two-moment approximation. Since our work directly utilized the two-moment approximation, our previous manuscript included only material on that section. However, we agree that providing additional details on the derivation of the exact expression would benefit readers. Therefore, we have summarized this derivation in the revised manuscript (Line 561-564). Additionally, we clarified the method’s assumptions, particularly those involved in transitioning from the exact expression to the two-moment approximation (Line 565-570).

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow.

      We thank the reviewer for this suggestion. In the revised manuscript, we included a diagram illustrating the connection between the queueing procedure and malaria transmission (Appendix 1-Figure 8).

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates.

      There seems to be some confusion on what we display in some key figures. Figures 1-2 and 10-14 (labeled as Figure 1-2 and Appendix 1-Figure 11-15 in the revised manuscript) display bootstrapped distributions including the 95% CIs, not the distribution of the mean FOI taken over multiple simulations. To estimate the mean FOI per host on an annual basis, the two proposed methods require either the steady-state queue length distribution (MOI distribution) or the moments of this distribution. Obtaining such a steady-state queue length distribution necessitates either densely tracked time-series observations per host or many realizations at the same sampling time per host. However, under the sparse sampling schemes, we only have two one-time-point observations per host: one at the end of wet/high-transmission and another at the end of dry/low-transmission. This is typically the case for empirical data, although numerical simulations could circumvent this limitation and generate such output. Nonetheless, we have a population-level queue length distribution from both simulation outputs and empirical data by aggregating MOI estimates across all sampled individuals. We use this population-level distribution to represent and approximate the steady-state queue length distribution at the individual level, not explicitly considering any individual heterogeneity due to transmission. The estimated FOI is per host in the sense of representing the FOI experienced by an individual host whose queue length distribution is approximated from the collection of all sampled individuals. The true FOI per host per year in the simulation is the total FOI of all hosts per year divided by the number of hosts. Therefore, our estimator, combined with the demographic information on population size, estimates the total number of Plasmodium falciparum infections acquired by all individual hosts in the population of interest per year. We clarified this point in the revised manuscript in the subsection of the Materials and Methods, entitled ‘Population-level MOI distribution for approximating time-series observation of MOI per host or many realizations at the same sampling time per host’ (Line 623-639).

      We evaluated the impact of individual heterogeneity due to transmission on FOI inference using simulation outputs (Line 157-184, Figure 1-2 and Appendix 1-Figure 11-15). Even with significant heterogeneity among individuals (2/3 of the population receiving approximately 94% of all bites whereas the remaining 1/3 receives the rest of the bites), our methods performed comparably to scenarios with homogeneous transmission. Furthermore, our methods demonstrated similar performance for both non-seasonal and seasonal transmission scenarios.

      Regarding the second point, we quantitatively assessed the ability of the estimator to recover the truth across simulations and included this information in a supplementary table in the revised manuscript (supplementary file 3-FOImethodsPerformance.xlsx). Specifically, we indicated whether the truth lies within the bootstrap distribution and provided a measure of relative deviation, which is defined as the true FOI value minus the median of the bootstrap distribution for the estimate, normalized by the true FOI value .  This assessment is a valuable addition which enhances clarity, but please note that our previous graphical comparisons do illustrate the ability of the methods to estimate “sensible” values, close to the truth despite multiple sources of errors. “Close” here is relative to the scale of variation of FOI in the field and to the kind of precision that would be useful in an empirical context. From a practical perspective based on the potential range of variation of FOI, the graphical results already illustrate that the estimated distributions would be informative.

      We also thank the reviewer for highlighting instances where our proposed methods for FOI inference perform sub-optimally (e.g. Figure 10, Figure 1 under the mid-IRS panel in the previous manuscript). This feedback prompted us to examine these instances more closely and identify the underlying causes related to the stochastic impact introduced during various sampling processes. These include sampling the host population and their infections at a specific sampling depth in the simulated output, matching the depth used for collecting empirical data. In addition, previously, we imputed MOI estimates for treated individuals by sampling only once from non-treated individuals. This time, we conducted 200 samplings and used the final weighted MOI distribution for FOI inference. By doing so, we reduced the impact of extreme single-sampling efforts on MOI distribution and FOI inference. In other words, some of these suboptimal instances correspond to the scenarios where the one-time sampled MOIs from non-treated individuals do not fully capture the MOI distribution of non-treated individuals. We added a section titled ‘Reducing stochastic impact in sampling processes’ to Appendix 1 on this matter (Line 841-849).

      The reviewer correctly noted that our proposed methods tend to underestimate FOI (Figure 1-2, 10-14, ‘Estimated All Errors’ and ‘Estimated Undersampling of Var’ panels in the previous manuscript, corresponding to Figure 1-2 and Appendix 1-Figure 11-15 in the revised manuscript). This underestimation arises from the underestimation of MOI. The Bayesian formulation of the varcoding method does not account for the limited overlap between co-infecting strains, an additional factor that reduces the number of var genes detected per individual. We have elaborated on this matter in the Results and Discussion sections of the revised manuscript (Line 142-149, 252-256).

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure.

      We thank the reviewer for pointing out these aspects of the work that can be further clarified. In response, we maximized the likelihood of observing the population-level MOI distribution in the sampled population (see our responses to your previous comment c), given queue length distributions, derived from the two-moment approximation method for various mean and variance combinations of inter-arrival times. We added a new section to the Materials and Methods in the revised manuscript with an explicit likelihood formulation (Line 574-585).

      Additionally, we specified the ranges for the mean and variance parameters for inter-arrival times and provided the grid of values tested in a supplementary table (supplementary file 4-meanVarianceParams.xlsx). Example figures illustrating the shape of the likelihood have also been included in Appendix 1-Figure 9. We tested the impact of different grid value choices on estimation quality by refining the grid to include more points, ensuring the FOI inference results are consistent. The results of the test are documented in the revised manuscript (Line 587-593, Appendix 1-Figure 10).

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population.

      We thank the reviewer for this useful comment. The reviewer correctly noted the challenge in empirically measuring the duration of infection for 1-5-year-olds and comparing it to that of naïve adults co-infected with syphilis. We nevertheless continued to use the described method for the duration of infection, while more thoroughly acknowledging and discussing the limitations this aspect of the method introduces. We have highlighted this potential limitation in the Abstract, Introduction, and Discussion sections of the revised manuscript (Line 26-28, 99-103, 270-292). It is important to note that the infection duration from the historical clinical data we have relied on has been used, and is still used, in the malaria modeling community as a credible source for this parameter in untreated natural infections of malaria-naïve individuals in endemic settings of Africa (e.g. in the agent-based model OpenMalaria, see 1).

      To reduce misspecification in infection duration and fully utilize our proposed methods, future data collection and sampling could prioritize subpopulations with minimal prior infections and an immune profile similar to naïve adults, such as infants and toddlers. As these individuals are also the most vulnerable, prioritizing them aligns with the priority of all intervention efforts in the short term, which is to monitor and protect the most vulnerable individuals from severe symptoms and death. We discuss this aspect in detail in the Discussion section of the revised manuscript (Line 287-292).

      In the pre-IRS phase of Ghana surveys, an estimated mean FOI of about 5 per host per year indicates that a 4-year-old child would have experienced around 20 infections, which could suggest they are far from naïve. The extreme diversity of circulating var genes (2) implies, however, that even after 20 infections, a 4-year-old may have only developed immunity to a small fraction of the variant surface antigens (PfEMP1, Plasmodium falciparum erythrocyte membrane protein 1) encoded by this important gene family. Consequently, these children are not as immunologically experienced as it might initially seem. Moreover, studies have shown that long-lived infections in older children and adults can persist for months or even years, including through the dry season. This persistence is driven by high antigenic variation of var genes and associated incomplete immunity. Additionally, parasites can skew PfEMP1 expression to produce less adhesive erythrocytes, enhancing splenic clearance, reducing virulence, and maintaining sub-clinical parasitemia (3, 4, 5). The impact of immunity on infection duration with age for falciparum malaria remains a challenging open question.

      Lastly, the FOI for naïve hosts is a key basic parameter for epidemiological models of complex infectious diseases like falciparum malaria, in both agent-based and equation-based formulations. This is because FOI for non-naïve hosts is typically a function of their immune status, body size, and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts helps parameterize and validate these models by reducing degrees of freedom.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation.

      Thank you for this question. This parameter represents the carrying capacity of the queuing system, or the maximum number of blood-stage strains with which an individual human host can be co-infected. Empirical evidence, estimated using the varcoding method, suggests this value is 20 (2), providing a lower bound for parameter c. However, the varcoding method does not account for the limited overlap between co-infecting strains, which reduces the number of var genes detected in an individual, thereby affecting the basis of MOI estimation. Additional factors, such as the synchronicity of clones in their 48-hour life cycle on alternate days (6) and within-host competition of strains leading to low-parasitemia levels (7, 8), contribute to under-sampling of strains and are not accounted for in MOI estimation (9). To address these potential under-sampling issues, we previously tested values of 25 and 30.

      This time, we systematically investigated a wider range of values, including substantially higher ones: 25, 30, 40, and 60. We found that the FOI inference results are similar across these values. Figure 3 in the main text and supplementary figures (Appendix 1-Figure 16-18) illustrates these findings.

      The parameter c influences the steady-state queue length distribution based on the two-moment approximation with specific mean and variance combinations, primarily affecting the distribution’s tail when customer or infection flows are high. Smaller values of c lower the maximum possible queue length, making the system more prone to “overflow”. In such cases, customers or infections may find no space available upon their arrival, hence not incrementing the queue length.

      Empirical MOI distributions for high-transmission endemic regions center around 4 or 5, mostly remaining below 10, with only a small fraction between 15-20 (2). These distributions do not support parameter combinations resulting in frequent overflow for a system with c equal to 25 or 30. As one increases the value of c further, these parameter combinations would cause the MOI distributions to shift to larger values inconsistent with the empirical MOI distributions. We therefore do not expect substantially higher values for parameter c to noticeably change either the relative shape of the likelihood or the MLE.

      We have included a subsection on parameter c in the Materials and Methods section of the revised manuscript (Line 596-612).

      Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context.

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics.

      (3) The mathematical approach is simple and elegant, and thus easy to understand.

      Weaknesses:

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates.

      We thank the reviewer for bringing up the limitation of our proposed methods due to their reliance on a known and fixed duration of infection distribution from historical clinical data. Please see our response to Reviewer 1, Comment 2a, for a detailed discussion on this matter.

      (2) Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration.

      We thank the reviewer for pointing out a potential improvement to our work. We acknowledge that FOI is inferred from MOI and thus depends on the information contained in MOI. However, MOI by definition is a number and not a rate parameter. FOI for naïve hosts is a fundamental parameter for epidemiological models of complex infectious diseases like falciparum malaria, in both agent-based and equation-based formulations. FOI of non-naïve hosts is typically a function of their immune status, body size, and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts helps parameterize and validate these models by reducing degrees of freedom. In this sense, we believe the transformation from MOI to FOI is valuable.

      Measuring infection duration is challenging, making the simultaneous estimation of infection duration and FOI an attractive alternative, as the referee noted. This, however, would require closely monitored cohort studies or densely sampled cross-sectional surveys to reduce issues like identifiability. For instance, a higher arrival rate of infections paired with a shorter infection duration could generate a similar MOI distribution to a lower arrival rate with a longer infection duration. In some cases, incorrect combinations of rate and duration might even produce an MOI distribution that appears closer to the targeted distribution. Such cohort studies and densely sampled cross-sectional surveys have not been and will not be widely available across different geographical locations and times. This work utilizes more readily available data from sparsely sampled single-time-point cross-sectional surveys, which precludes more sophisticated derivation of time-varying average arrival rates of infections and lacks the resolution to simultaneously estimate arrival rates and infection duration. In the revised manuscript, we have elaborated on this matter and added a paragraph in the Discussion section (Line 306-309).

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates.

      We thank the reviewer for pointing out aspects of the work that can be further clarified. Disentangling the effect of drug treatment on measurements like infection duration is challenging. Since our methods rely on the known and fixed distribution of infection duration from historical data of naïve patients with neurosyphilis infected with malaria as a therapy, drug treatment can potentially violate this assumption. In the previous manuscript, we did not attempt to directly address the impact of drug treatment. Instead, we considered two extreme scenarios that bound reality, well summarized by the reviewer. Reality lies somewhere in between these two extremes, with antimalarial treatment significantly affecting measurements in some individuals but not in others. Nonetheless, the results of FOI inference do not differ significantly across both extremes.

      The impact of the drugs likely depends on their nature, efficiency, and duration. We note that treatment information was collected via a routine questionnaire, with participant self-reporting that they had received an antimalarial treatment in the previous two-weeks before the surveys (i.e., participants that reported they were sick, sought treatment, and were provided with an antimalarial treatment). No confirmation through hospital or clinic records was conducted, as it was beyond the scope of the study. Additionally, many of these sick individuals seek treatment at local chemists, which may limit the relevance of hospital or clinic records, if they are even available. Consequently, information on the nature, efficiency, and duration of administrated drugs was incomplete or lacking. As this is not the focus of this work, we do not elaborate on the impact of drug treatment in the revised manuscript.

      The reviewer correctly noted that this imputation might not add additional information and could reduce MOI variability. Therefore, in the revised manuscript, we reported FOI estimates with drug-treated 1-5-year-olds excluded. Additionally, we discarded the infection status and MOI values of treated individuals and sampled their MOI from non-treated microscopy-positive individuals, imputing a positive MOI for treated and uninfected individuals. We also reported FOI estimates based on these MOI values. This scenario provides an upper bound for FOI estimates. Note that we do not assume that the MOI distribution for treated individuals is the same as that for untreated individuals. Rather, we aim to estimate what their MOI would have been, and consequently, determine what the FOI per individual per year in the combined population would be, had these individuals not received antimalarial treatment. The results of FOI inference do not differ significantly between these two approaches. They can serve as general solutions to antimalarial treatment issues for others applying our FOI inference methods. These details can be found in the revised manuscript (Line 185-210, 462-484).

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals.

      We thank the reviewer for this comment. The reviewer correctly noted that we imputed the MOI values for microscopy-negative but PCR-positive 1-5-year-olds by sampling from the microscopy-positive 1-5-year-olds, under the assumption that both groups have similar MOI distributions. This approach was motivated by the analysis of our Ghana surveys, which shows no clear relationship between MOI (or the number of var genes detected within an individual host, on the basis of which our MOI values were estimated) and the parasitemia levels of those hosts. Parasitemia levels underlie the difference in detection sensitivity between PCR and microscopy.

      In the revised manuscript, we elaborated on this issue and included formal regression tests showing the lack of a relationship between MOI/the number of var genes detected within an individual host and the parasitemia levels of those hosts (Line 445-451, Appendix 1-Figure 7). We also described potential reasons or hypotheses behind this observation (Line 452-461).

      Reviewer #3 (Public Review):

      Summary:

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI.

      Strengths:

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics.

      Weaknesses:

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153) and I feel it would be appropriate to differentiate this in the discussion.

      We thank the reviewer for this comment, although we think there is a mis-understanding on what can and cannot be practically validated in the sense of a “true” measure of FOI that would be free from assumptions for a complex disease such as malaria. We would not want the results to be over-interpreted, and we have extended the discussion of what we have done to test the methods in the revised manuscript (Line 314-328). Performance evaluation via simulation output is common and often necessary for statistical methods. These simulations can come from dynamical or descriptive models, each making their own assumptions to simplify reality. Our stochastic agent-based model (ABM) of malaria transmission, used in this study, has successfully replicated several key patterns from high-transmission endemic regions in the field, including aspects of strain diversity not represented and captured by simpler models (10).

      In what sense this ABM makes a set of biological and structural assumptions that are “probably similar” to those of the queuing methods we present is not clear to us. We agree that using models with different structural assumptions from the method being tested is ideal. Our FOI inference methods based on queuing theory require the duration of infection distribution and the MOI distribution among sampled individuals. However, these FOI inference methods are agnostic to the specific biological mechanisms governing these distributions.

      Another important point raised by this comment is what would be the “true” FOI value against which to validate our methods. Empirical MOI-FOI pairs from cohort studies tracking FOI directly are still lacking. Direct FOI measurements are prone to errors because differentiating new infections from the temporary absence of an old infection in the peripheral blood and its subsequent re-emergence remains challenging. Reasons for this challenge include the low resolution of the polymorphic markers used in cohort studies, which cannot fully differentiate hyper-diverse antigenic strains, and the complexity of within-host dynamics and competitive interaction of co-infecting strains (6, 8, 9). Alternative approaches also do not provide a “true” FOI estimation free from assumptions. These approaches involve fitting simplified epidemiological models to densely sampled/repeated cross-sectional surveys for FOI inference. In this case, no FOI is measured directly, and thus, there are no FOI values available for benchmarking against fitted FOI values. The evaluation or validation of these model-fitting approaches is typically based on their ability to capture other epidemiological quantities that are easier to sample or measure, such as prevalence or incidence, with criteria such as the Akaike information criterion (AIC). This type of evaluation is similar to the one done in this work. We selected FOI values that maximize the likelihood of observing the given MOI distribution. Furthermore, we paired our estimated FOI values for Ghana surveys with the independently measured EIR (Entomological Inoculation Rate), a common field measure of transmission intensity. We ensured that our resulting FOI-EIR points align with existing FOI-EIR pairs and the relationship between these quantities from previous studies. We acknowledge that, like model-fitting approaches, our validation for the field data is also indirect and further complicated by high variance in the relationship between EIR and FOI from previous studies.

      Prompted by the reviewer’s comment, we elaborated on these points in the revised manuscript, emphasizing the indirect nature and existing constraints of our validation with field data in the Discussion section (Line 314-328). Additionally, we clarified certain basic assumptions of our agent-based model in Appendix 1-Simulation data.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone).

      We thank the reviewer for this comment. Please refer to our response to Reviewer 2, comment (3), as we made changes in the revised manuscript regarding antimalarial drug treated individuals. We reported two sets of FOI estimates. In the first, we excluded these treated individuals from the analysis as suggested by Reviewer 2. In the second, we discarded their infection status and MOI estimates and sampling from non-treated individuals.

      The reviewer correctly noted the surprising lack of impact of antimalarial treatment on MOI estimates. This pattern is indeed interesting and counterintuitive. The impact of the drugs likely depends on their nature, efficiency, and duration. We note that treatment information was collected via a routine questionnaire, with participant self-reporting that they had received an antimalarial treatment in the previous two-weeks before the surveys (i.e., participants that reported they were sick, sought treatment, and were provided with an antimalarial treatment). No confirmation through hospital or clinic or pharmacy records was conducted, as it was beyond the scope of the study. Additionally, many of these sick individuals seek treatment at local chemists, which may limit the relevance of hospital or clinic records, if they are even available. Consequently, information on the nature, efficiency, and duration of administrated drugs was incomplete or lacking. As this is not the focus of this work, we do not elaborate on the impact of drug treatment in the revised manuscript.

      Regarding the last point of the reviewer, on understanding the relationship between MOI and FOI, we are not fully clear about what was meant. We are also confused about the statement on what the “model is doing in this manuscript alone”. We interpret the overall comment as the reviewer suggesting a better understanding of the relationship between MOI and FOI generated by the two-moment approximation method and the agent-based model. This could involve exploring the relationship between the moments of their distributions, possibly by fitting models such as simple linear regression models. Although this approach is in principle possible, it falls outside the focus of our work. Moreover, it would be challenging to evaluate the performance of this alternative approach given the lack of MOI-FOI pairs from empirical settings with directly measured FOI values (from large cohort studies). Nonetheless, we note that the qualitative relationship between the two quantities is intuitive. Higher FOI values should correspond to higher MOI values. Less variable FOI values should result in more narrow or concentrated MOI distributions, whereas more variable FOI values should lead to more spread-out MOI distributions. We described this qualitative relationship between MOI and FOI in the revised manuscript (Line 499-502).

      As mentioned in the response to the reviewer’s previous point (1), we hope that our clarification of the basic assumptions underlying our agent-based model in Appendix 1-Simulation data helps the reviewer gain a better sense of the model. We appreciate agent-based models involve more assumptions and parameters than typical equation-based models in epidemiology, and their description can be difficult to follow. We have extended this description to rely less on previous publications. As for other ABMs, the population dynamics of the disease is followed over time by tracking individual hosts and strains. This allows us to implement specific immune memory to the large number of strains arising from the var multigene family. There is no equation-based formulation of the transmission dynamics that can incorporate immune memory in the presence of such large variation as well as recombination of the strains. We rely on this model because large strain diversity at high transmission underlies superinfection of individual hosts, and therefore, MOI values larger than one. We relied on the estimation of MOI with a method based on var gene sampling, and therefore, simulated such sampling for individual hosts (which requires an ABM and one that represents such genes and resulting strains explicitly).

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying.

      We thank the reviewer for this helpful comment, as it is crucial to avoid any confusion regarding basic definitions. EIR, the entomological inoculation rate, is closely related to the FOI, force of infection, but they are not equivalent. EIR focuses on the rate of arrival of infectious bites and is measured as such by focusing on the mosquito vectors that are infectious and arrive to bite a given host. Not all these bites result in actual infection of the human host. Epidemiological models of malaria transmission clearly make this distinction, as FOI is defined as the rate at which a host acquires infection. This definition comes from more general models of the population dynamics of infectious diseases. For simpler diseases without super-infection, the typical SIR models define FOI as the rate at which a susceptible individual becomes infected. In the context of malaria, FOI refers to the number of new infections acquired by an individual host over a given time interval. This distinction between EIR and FOI is the reason why studies have investigated their relationship, with the nonlinearity of this relationship reflecting the complexity of the underlying biology and how host immunity influences the outcome of an infectious bite.

      We added “blood-stage strains” to the definition of FOI in the previous manuscript, as pointed out by the reviewer, for the following reason. After an individual host acquires an infection/strain from an infectious mosquito bite, the strain undergoes a multi-stage life cycle within the host, including the liver stage and asexual blood stage. Liver-stage infections can fail to advance to the blood stage due to immunity or exceeding the blood-stage carrying capacity. Only active blood-stage infections are detectable in all direct measures of FOI. Quantities used in indirect model-fitting approaches for estimating FOI are also based on or reflect these blood-stage strains/infections. Only these blood-stage strains/infections are transmissible to other individuals, impacting disease dynamics. Ultimately, the FOI we seek to estimate is the one defined as specified above, as well as in both the previous and revised manuscripts, consistent with the epidemiological literature. We expanded on this point in the revised manuscript (Line 641-656).

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method.

      We thank the reviewer for this comment and have modified the relevant sentences to use “consistent” instead of “robust” (Line 229-231).

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology.

      We thank the reviewer for this comment. As mentioned in the response to Reviewer 1 and in response to your previous points, we have shortened, reorganized and rewritten parts of the text in the revised manuscript to improve clarity and readability.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Bar graphs in Figures 6 and 7 are not an appropriate way to rigorously compare whether your estimated MOI (under different approaches) is comparable to your true MOIs. Particularly in Figure 6 it is very difficult to clearly compare what is going on. If anything in Figure 7 it looks like as MOI gets higher, Bayesian methods and barcoding are overestimating relative to the truth. The large Excel file that shows KS statistics could be better summarized (and include p-values not in a separate table) and further discussion of how these methods perform on metrics other than the mean value would be important given that MOI distributions can be heavily right skewed and these high MOI values contain a large proportion of genetic diversity which can be highly informative for the purposes of this estimation.

      We appreciate the reviewer’s comment. It appears there may have been some misinterpretation of the pattern in Figure 7 in the previous manuscript. We believe the reviewer meant “as MOI gets higher, Bayesian methods and varcoding are UNDERESTIMATING relative to the truth” rather than “OVERESTIMATING”.

      We agree with the reviewer that the comparison of MOI distributions can be improved. To better quantify the difference between the MOI distribution from the original varcoding method and its Bayesian formulation relative to true MOIs, we replaced the KS test conducted in the previous manuscript with two alternative, more powerful tests: the Cramer-von Mises Test and the Anderson-Darling Test. The Cramer-von Mises Test quantifies the sum of the squared differences between the two cumulative distribution functions, while the Anderson-Darling Test, a modification of the Cramer-von Mises Test, gives more weight to the tails of the distribution, as noted by the reviewer. We have summarized the results, including test statistics and their associated p-values, in a supplementary table (Line 135-149, Line 862-883, supplementary file 1-MOImethodsPerformance.xlsx and supplementary file 7-BayesianImprovement.xlsx).

      Throughout the text the authors use "consistent" to describe their estimation of FOI, I know this is meant in the colloquial use of the word but consider changing this word to replicable or something similar. When talking about estimators, usually, consistency implies asymptotic convergence in probability which we do not know whether the proposed estimator does.

      We thank the reviewer for this suggestion. We changed “consistent” to “replicable” in the revised manuscript.

      I think there is an issue with the numbering of the figures, they are just numbered continuously between the main text and appendix between 1 and 15, but in the text, there is a different numbering system between the main text and appendix figures.

      We thank the reviewer for this comment. We have double-checked to ensure that the numbering of the figures is consistent with the text in the revised manuscript. Figures are numbered continuously between the main text and the appendix. When referring to these figures in the text, we provide a prefix (i.e., Appendix 1) indicating whether the figure is in the main text or Appendix 1, followed by the figure number.

      The description of the bootstrap for 95% CI is a bit sparse, did bootstrap distributions look symmetric? If not did authors use a skewness adjustment to ensure good coverage? Also, is the bootstrap unit of resampling at the individual level, the simulation scenario level, population level?

      We checked the bootstrap distributions and calculated their skewness. The majority fall within the range of -0.5 to 0.5, with a few exceptions falling within the range of 0.5-0.75 (supplementary file 6-FOIBootstrapSkewness.xlsx). We considered them as fairly symmetric and thus did not use a skewness adjustment.

      In Figures 8 and 9 the x-axes seem to imply there are both the true and estimated MOI distributions on the plot but only 1 color of grey is clearly visible. If there are 2 distributions the color or size needs to be changed or if not consider re-labeling the x-axis.

      We thank the reviewer for this comment. There was a mistake in the x-axis labels in Figure 8 and 9. Only the estimated MOI distributions were shown because the true ones are not available for the Ghana field surveys. The labels should simply be “Estimated MOIvar”.

      Reviewer #2 (Recommendations For The Authors):

      (1) Throughout the results section there are lots of vague statements such as "differ only slightly", "exhibit a somewhat larger, but still small, difference", etc. Please include the exact values and ranges within the text where appropriate because it can be difficult to discern from the figure.

      We thank the reviewer for this useful comment. In the revised manuscript, we have provided exact values and ranges where appropriate (supplementary file 1- MOImethodsPerformance.xlsx, supplementary file 3- FOImethodsPerformance.xlsx, and supplementary file 7-BayesianImprovement.xlsx).

      (2) Truncate decimals to 2 places.

      We thank the reviewer for this comment. In the revised manuscript, we have truncated decimals to two places where applicable.

      (3) The queueing theory notation in the methods section is unfamiliar, specifically things like "M/M/c/k", please define the variables used.

      We thank the reviewer for this useful comment. In the revised manuscript, we have defined all the variables used. Please refer to our responses to Reviewer 1 Point (1) a.

      Reviewer #3 (Recommendations For The Authors):

      (1) The work takes many of the models and data from a previous paper published in eLife in 2023 (the 4 most senior authors of this previous manuscript are the 4 authors of the current manuscript). This previous paper introduced some new terminology "census population" which was highlighted as being potentially confusing by 2 of the 3 reviewers of the original article. This was somewhat rebuffed by the authors, though their response was ambiguous about whether the terminology would be changed in any potential future revision. The census population terminology does not appear in this manuscript, though the same data is being used. Publication of similar papers with the same data and different terminology could generate confusion, so I would encourage authors to be consistent and make sure the two papers are in line. To this end, it feels like this paper would be better suited to be classified as a "Research Advances" on this original manuscript and linked, which is a nice functionality that eLife offers.

      We thank the reviewer for this comment, but we do not think our work would fall under the criteria of “Research Advances” based on our previous paper pointed out by the reviewer. The reviewer correctly noted that the current work and the previous paper used the same datasets. However, they have different goals and are not related in terms of content.

      The previous paper examined how epidemiological quantities and diversity measurements of the local parasite population change following the initiation of effective control interventions and subsequently as this control wanes. These quantities included MOI and census population size (MOI was estimated using the Bayesian formulation of the varcoding method, and the census population size was derived from summing MOIvar across individuals in the human population). In contrast, our current work focused on a different goal: inferring FOI based on MOI. We proposed two methods from queuing theory and illustrated them with MOI estimates obtained with the Bayesian formulation of the "varcoding" method. Although the method applied to estimate MOI is indeed the same as that of the paper mentioned by the reviewer, the proposed methods should be applicable to MOI estimates obtained in any other way, as stated in the Abstract in the previous manuscript. That is, the methods we present in the current paper are independent from the way the MOI estimation has been carried out. Our results are not about the MOI values themselves but rather on an illustration of the methods for converting those MOI values to FOI. In fact, there are different ways to obtain MOI estimates for Plasmodium falciparum (9). The most common approach for determining MOI involves size-polymorphic antigenic markers, such as msp1, msp2, msp3, glurp, ama1, and csp. Similarly, microsatellites, also termed simple sequence repeat (SSR), are another type of size-polymorphic marker that can be amplified to estimate MOI by determining the number of alleles detected. Combinations of genome-wide single nucleotide polymorphisms (SNPs) have also been used to estimate MOI.

      The result section of the current manuscript begins by evaluating how different kinds of errors/sampling limitations affect the estimation of MOI using the Bayesian formulation of the varcoding method. Only that brief section, which is not the core or primary objective of the manuscript, could be considered an extension and an advancement related to the other paper. We considered the effect of these errors on the resulting estimates of FOI.

      We further note that, as the reviewer pointed out, the census population size is not utilized at all in our current work. We are unclear on why this quantity is mentioned here. Our previous paper has been revised and can be found in eLife as such. We have not changed this terminology and have provided a clear explanation for why we chose it. The reviewer seems to have read the previous response to version 1 posted on December 28, 2023 (Note that version 2 and the associated response was posted on November 20, 2024). Regardless, this is not the place for a discussion on another paper on a quantity that is irrelevant to the current work being reviewed.

      We understand that the reviewer’s impression may have been influenced by the previous emphasis on the Bayesian formulation of the varcoding method in our manuscript. With the reorganization and rewriting of parts of the manuscript, we hope the revised version will clearly convey the central goal of our work.

      (2) Similar statements that could be toned down. 344 ".... two-moment approximation approach and Little's law are shown to provide consistent and good FOI estimates,.....", 374 "Thus, the flexibility and generality of these two proposed methods allow robust estimation of an important metric for malaria transmission"

      We thank the reviewer for this comment. We have modified the descriptive terms for the performance of our methods. Please also refer to our responses to Reviewer 1, Point (1) c and your previous Point (1).

      (3) Various assumptions seem to have been made which are not justified. For example, heterogeneous mixing is defined as 2/3rd of the population receives 90% of the bites. A reference for this would be good.

      In this work, we considered heterogenous transmission arising from 2/3 of the population receiving approximately 94% of all bites, because we believe this distribution introduces a reasonable and sufficient amount of heterogeneity in exposure risk across individuals. We are not aware of field studies justifying this degree of heterogeneity.

      (4) The work assumes children under 5 have no immunity (Line 648 says "It is thus safe to consider negligible the impact of immune memory accumulated from previous infections on the duration of a current infection." ). Is there supporting evidence for this and what would happen if this wasn't the case?

      We thank the reviewer for this helpful comment. Please refer to our responses to Reviewer 1 Point (2) a.

      (5) Similarly, there are a few instances of a need for more copy-editing. The text says "We continue with the result of the heterogeneous exposure risk scenarios in which a high-risk group ( 2/3 of the total population) receives around 94% of all bites whereas a low-risk group ( 1/3 of the total population) receives the remaining bites (Appendix 1-Figure 5C)." whereas the referenced caption says "For example, heterogeneous mixing is defined as 2/3rd of population receives 90% of the bites."

      We believe there was a misinterpretation of the legend caption. In the referenced caption, we stated “2/3rd of population receives MORE THAN 90% of the bites”, which aligns with “around 94% of all bites”. Nonetheless, to maintain consistency in the revised manuscript, we have updated the description to uniformly state “approximately 94% of all bites” throughout.

      (6) The term "measurement error" is used to describe the missing potential under-sampling of var genes. Given this would only go one way isn't the term "bias" more appropriate?

      We understand that, in general English, “bias” might seem more precise for describing a deviation in one direction. However, in malaria epidemiology and in models for malaria and other infectious diseases, “measurement error” is a general term that describes deviations introduced in the process of measurement and sampling, which can confound or add noise to the true values being collected. This term is commonly used, and we have adhered to it in the revised manuscript.

      (7) Line 739 "Though FOI and EIR both reflect transmission intensity, the former refers directly to detectable blood-stage infections whereas the latter concerns human-vector contact rates." In my mind this is not true, the EIR is the number of potentially invading parasites (a contact rate between parasites in mosquitoes and humans if you will). The human-vector contact rate is the human biting rate.

      We thank the reviewer for this comment. We have clarified the definition regarding FOI and EIR in our response to your previous comment (3) and in the revised manuscript. We agree that the term “human-vector contact rates” was not precise enough for EIR. We intended “human-infectious vector contact rates”, and we have updated the text to reflect this change (Line 644-645).

      References and Notes

      (1) Maire, N. et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg., 75(2 Suppl):19-31 (2006).

      (2) Tiedje, K. E. et al. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. eLife, 12 (2023).

      (3) Andrade C. M. et al. Infection length and host environment influence on Plasmodium falciparum dry season reservoir. EMBO Mol Med.,16(10):2349-2375 (2024).

      (4) Zhang X. and Deitsch K. W. The mystery of persistent, asymptomatic Plasmodium falciparum infections, Current Opinion in Microbiology, 70:102231 (2022).

      (5) Tran, T. M. et al. An Intensive Longitudinal Cohort Study of Malian Children and Adults Reveals No Evidence of Acquired Immunity to Plasmodium falciparum Infection, Clinical Infectious Diseases, 57(1):40–47 (2013).

      (6) Farnert, A., Snounou, G., Rooth, I., Bjorkman, A. Daily dynamics of Plasmodium falciparum subpopulations in asymptomatic children in a holoendemic area. Am J Trop Med Hyg., 56(5):538-47 (1997).

      (7) Read, A. F. and Taylor, L. H. The Ecology of Genetically Diverse Infections, Science, 292:1099-1102 (2001).

      (8) Sondo, P. et al. Genetically diverse Plasmodium falciparum infections, within-host competition and symptomatic malaria in humans. Sci Rep 9(127) (2019).

      (9) Labbe, F. et al. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol, 19(1) (2023).

      (10) He, Q. et al. Networks of genetic similarity reveal non-neutral processes shape strain structure in Plasmodium falciparum. Nat Commun 9(1817) (2018).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents a useful modification of a standard model of genetic drift by incorporating variance in offspring numbers, claiming to address several paradoxes in molecular evolution. It is unfortunate that the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks.

      The prior literature the reviewers referred to are all "modified WF models". In the original submission, we lumped the standard and modified WF models together as the "generalized WF models". As the lumping causes confusions, their distinctions are now made clear.  That said, the Haldane model in our proposal is not a modification of the standard WF model because, conceptually, the two models are very different. WF is based on sampling whereas the Haldane model is based on gene transmission.

      While the "modified WF models" often incorporate V(K) [variance in progeny number], the modification is still based on the WF model of population sampling. The modification is mathematically feasible but biologically untenable, as explained explicitly in the revised text. Most important, all four paradoxes are as incompatible with the modified WF models as with the standard model. Note that the Haldane model does not have the sampling step, which is absorbed into the V(K) term. In the integrated WF-Haldane model, these paradoxes are resolved (see the new sections of Discussion, quoted below).

      If readers do not have time to ponder on all four paradoxes, they may simply read the first one, as follows. When the population size (N) is growing exponentially, such as in a bacteria culture, drift is nearly absent when N is small and becomes stronger as N increases, especially when approaching the carrying capacity.  Such common observations are exactly opposite of the WF model's central prediction. Any model based on sampling cannot escape the constraint of "greater drift, smaller N".

      Revision - The following text is a reproduction of the last 7 paragraphs of Discussion.

      “The standard WF model has been extended in several directions (overlapping generations, multiple alleles, ploidy, etc.). The modification most relevant to our studies here is the introduction of V(K) into the model, thus permitting V(K) ≠ E(K). While the modifications are mathematically valid, they are often biologically untenable. Kimura and Crow (1963) may be the first to offer a biological mechanism for V(K) ≠ E(K), effectively imposing the Haldane model on the WF model. Other models (Kimura and Crow 1963; Lynch, et al. 1995; Sjodin, et al. 2005; Der, et al. 2011; Cannings 2016) indeed model mathematically the imposition of the branching process on the population, followed by the WF sampling. The constructions of such models are biologically dubious but, more importantly, still unable to resolve the paradoxes. It would seem more logical to use the Haldane model in the first place by having two parameters, E(K) and V(K). 

      Even if we permit V(K) ≠ E(K) under the WF sampling, the models would face other difficulties. For example, a field biologist needs to delineate a Mendelian population and determine its size, N or Ne. In all WF models, one cannot know what the actual population being studied is. Is it the fly population in an orchard being sampled, in the geographical region, or in the entire species range? It is unsatisfactory when a population biologist cannot identify the population being studied. The Haldane model is an individual-output model (Chen, et al. 2017), which does not require the delineation of a Mendelian population.

      We shall now review the paradoxes specifically in relation to the modified WF models, starting with the multi-copy gene systems such as viruses and rRNA genes covered in the companion study (Wang, et al. 2024). These systems evolve both within and between hosts. Given the small number of virions transmitted between hosts, drift is strong in both stages as shown by the Haldane model (Ruan, Luo, et al. 2021; Ruan, Wen, et al. 2021; Hou, et al. 2023). Therefore, it does not seem possible to have a single effective population size in the WF models to account for the genetic drift in two stages. The inability to deal with multi-copy gene systems may explain the difficulties in accounting for the SARS-CoV-2 evolution (Deng, et al. 2022; Pan, Liu, et al. 2022; Ruan, Wen, et al. 2022; Hou, et al. 2023; Ruan, et al. 2023).

      We now discuss the first paradox of this study, which is about the regulation of N. In the general WF models, N is imposed from outside of the model, rather than self-generating within the model. When N is increasing exponentially as in bacterial or yeast cultures, there is almost no drift when N is very low and drift becomes intense as N grows to near the carrying capacity. As far as we know, no modifications of the WF model can account for this phenomenon that is opposite of its central tenet. In the general WF models, N is really the carrying capacity, not population size. 

      The second paradox of sex chromosomes is rooted in V(K) ≠ E(K). As E(K) is the same between sexes but V(K) is different, clearly V(K) = E(K) would not be feasible. The mathematical solution of defining separate Ne's for males and females (Kimura and Crow 1963; Lynch, et al. 1995; Sjodin, et al. 2005; Der, et al. 2011; Cannings 2016) unfortunately obscures the interesting biology. As shown in Wang et al. (2024; MBE), the kurtosis of the distribution of K indicates the presence of super-breeder males. While the Haldane model can incorporate the kurtosis, the modified WF models are able to absorb only up to the variance term, i.e., the second moment of the distribution. The third paradox of genetic drift is manifested in the fixation probability of an advantageous mutation, 2_s_/V(K). As explained above, the fixation probability is determined by the probability of reaching a low threshold that is independent of N itself. Hence, the key parameter of drift in the WF model, N (or Ne), is missing. This paradox supports the assertion that genetic drift is fundamentally about V(K) with N being a scaling factor. 

      As the domain of evolutionary biology expands, many new systems do not fit into the WF models, resulting in the lack of a genetic drift component in their evolutionary trajectories. Multi-copy gene systems are obvious examples. Others include domestications of animals and plants that are processes of rapid evolution  (Diamond 2002; Larson and Fuller 2014; Purugganan 2019; Chen, Yang, et al. 2022; Pan, Zhang, et al. 2022; Wang, et al. 2022). Due to the very large V(K) in domestication, drift must have played a large role. Somatic cell evolution is another example with “undefinable” genetic drift (Wu, et al. 2016; Chen, et al. 2017; Chen, et al. 2019; Ruan, et al. 2020; Chen, Wu, et al. 2022). The Haldane (or WFH) model, as an "individual output" model, can handle these general cases of genetic drift.

      The Haldane model and the WF model are fundamentally different approaches to random forces of evolution. While the WF models encounter many biological contradictions, they have provided approximate mathematical solutions to more realistic scenarios. In systems such as in viral evolution (Ruan, Hou, et al. 2022; Hou, et al. 2023) or somatic cell evolution (Chen, Wu, et al. 2022; Zhai, et al. 2022) whereby the WF solution is absent, further development of the WFH model will be necessary.”

      In addition, while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims.

      This point is addressed in the responses to reviewers' comments. Since they are quite technical, they do not fit in the overview here.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present a theoretical treatment of what they term the "Wright-Fisher-Haldane" model, a claimed modification of the standard model of genetic drift that accounts for variability in offspring number, and argue that it resolves a number of paradoxes in molecular evolution. Ultimately, I found this manuscript quite strange.

      The notion of effective population size as inversely related to the variance in offspring number is well known in the literature, and not exclusive to Haldane's branching process treatment. However, I found the authors' point about variance in offspring changing over the course of, e.g. exponential growth fairly interesting, and I'm not sure I'd seen that pointed out before.

      Weaknesses:

      I have several outstanding issues. First of all, the authors really do not engage with the literature regarding different notions of an effective population. Most strikingly, the authors don't talk about Cannings models at all, which are a broad class of models with non-Poisson offspring distributions that nonetheless converge to the standard Wright-Fisher diffusion under many circumstances, and to "jumpy" diffusions/coalescents otherwise (see e.g. Mohle 1998, Sagitov (2003), Der et al (2011), etc.). Moreover, there is extensive literature on effective population sizes in populations whose sizes vary with time, such as Sano et al (2004) and Sjodin et al (2005).

      Of course in many cases here the discussion is under neutrality, but it seems like the authors really need to engage with this literature more.

      The reviewer's summary and weakness statement reflects the general criticism summarized by the editors. The reply and revision to these criticisms have been presented in the long reply to elife assessment above.

      We hence re-emphasize only the key points here.

      (1) The literature that the reviewers fault us for not citing is about the modifications of the standard WF model. We now cite them as well as a few others in that vein. However, the WF-Haldane model we propose is conceptually very different from the modified WF models. This WFH model is in essence the Haldane model which may use the results of the WF models as the starting point to find the exact solutions.

      (2) The check of the power of the modified WF models is whether they can resolve the paradoxes. None of them can. The arguments apply to neutral cases as well as selection effects. Hence, our central point is that the modifications of the standard WF model [e.g., by incorporating V(K)] do not help the WF model in resolving the paradoxes.  Besides, the incorporation of V(K) is mathematically feasible but biologically untenable as presented in the new sections of Discussion.

      Nonetheless, I don't think the authors' modeling, simulations, or empirical data analysis are sufficient to justify their claims.

      The most interesting part of the manuscript, I think, is the discussion of the Density Dependent Haldane model (DDH). However, I feel like I did not fully understand some of the derivation presented in this section, …… - this is the whole notion of exchangeability, also neglected in this manuscript). As such, I don't believe that their analysis of the empirical data supports their claim. [Since the comments above are highly technical and fairly long, they are not copied verbatim.]

      We thank this reviewer for the detailed comments with respect to the potential confusion in the discussion of the Density Dependent Haldane (DDH) model.

      First, the reviewer appears to ask how Eqs (5-6) are derived. We should clarify that both Eq (5) and (6) are assumptions rather than derived results. Both equations are assumptions based on population ecology. Eq (7) is then derived by substituting the assumptions in Eq (5) and (6) into Eq (3).

      The definition in Equation (5) allows the growth rate of the population size to be dependent on N itself, such that growth rate E(K) (average offspring number per generation) is greater than 1 when N < Ck and less than 1 when N > Ck. The parameter z is introduced to adjust the sensitivity of E(K) to changes in population size (as shown in Fig. 3a).

      Second, we appreciate the comments regarding the use of individual-based simulations and the apparent lack of interaction between individuals. In our simulations, there is indeed an interaction among individuals, which is represented by Eq (5). This equation reflects how the competition between two alleles affects the expected growth rate 𝐸(𝐾), which decreases as the population size increases. Furthermore, once 𝐸(𝐾) for the entire population is determined, the offspring numbers of the alleles are independent.

      We believe that the primary purpose of our simulations was not clearly stated. This lack of clarity may be the root of the criticisms. We now note that the simulations are aimed at testing the accuracy of Equation (10).

      Note that Eq. (10) is a textbook result and quite important in our study. This equation shows that the strength of genetic drift, as given by Pf (the fixation probability of an advantageous mutation), is not a function of N at all. This approximate solution has been obtained using the WF model by Kimura.  The Haldane model solution that can explain Paradox 1 is based on Equation (7) as shown below

      Since the fixation probability of Equation (10) cannot be easily obtained using Eq. (7), we conducted simulations to confirm the accuracy of Eq. (10) when applied to the Haldane model.

      We have revised the relevant sections of the manuscript to clarify these points and to better distinguish between assumptions and results. 

      Revision - Details of the DDH model are given in the Supplementary Information. A synopsis is given here: We consider a non-overlapping haploid population with two neutral alleles. The population size at time t is Nt. We assume that expected growth rate E(K) is greater than 1 when N < Ck and less than 1 when N > Ck, as defined by Eq. (5) below:

      The slope of E(K) vs. N (i.e., the sensitive of growth rate to changes in population size), as shown in Fig 3a, depends on z. To determine the variance V(K), we assume that K follows the negative binomial distribution whereby parents would suffer reproduction-arresting injury with a probability of pt at each birthing (Supplementary Information). Accordingly, V(K) can then be expressed as

      By Eq. (6), the ratio of V(K)/E(K) could be constant, decrease or increase with the increase of population size. With E(K) and V(K) defined, we could obtain the effective population size by substituting Eq. (5) and Eq. (6) into Eq. (3).

      Eq. (7) presents the relationship between effective population size (Ne) and the population size (N) as shown in Fig. 3. The density-dependent E(K) could regulate N with different strength (Fig. 3a). The steeper the slope in Fig. 3a, the stronger the regulation.

      Simulation of genetic drift in the Haldane model and the Wright-Fisher (WF) model. In both models, interactions between individuals are implicitly included through the dependency of the average number of offspring on population size, as defined by Eq. (5). This dependency leads to the logistic population growth, reflecting the density-dependent interactions.

      Thus, while I think there are some interesting ideas in this manuscript, I believe it has some fundamental issues:

      first, it fails to engage thoroughly with the literature on a very important topic that has been studied extensively. Second, I do not believe their simulations are appropriate to show what they want to show. And finally, I don't think their empirical analysis shows what they want to show.

      References omitted

      The comments are the summary of previous ones, which have been addressed in detail in the preceding sections.

      Reviewer #2 (Public Review):

      Summary:

      This theoretical paper examines genetic drift in scenarios deviating from the standard Wright-Fisher model. The authors discuss Haldane's branching process model, highlighting that the variance in reproductive success equates to genetic drift. By integrating the Wright-Fisher model with the Haldane model, the authors derive theoretical results that resolve paradoxes related to effective population size [Ne]

      Thanks.  The issue of Ne will be addressed below where the reviewer returns to this issue. The strength of the integrated WFH model is that N (or Ne) is generated by the model itself, rather than externally imposed as in WF models.

      Strengths:

      The most significant and compelling result from this paper is perhaps that the probability of fixing a new beneficial mutation is 2s/V(K). This is an intriguing and potentially generalizable discovery that could be applied to many different study systems.

      The authors also made a lot of effort to connect theory with various real-world examples, such as genetic diversity in sex chromosomes and reproductive variance across different species.

      Thanks. 

      Weaknesses:

      One way to define effective population size is by the inverse of the coalescent rate. This is where the geometric mean of Ne comes from. If Ne is defined this way, many of the paradoxes mentioned seem to resolve naturally. If we take this approach, one could easily show that a large N population can still have a low coalescent rate depending on the reproduction model. However, the authors did not discuss Ne in light of the coalescent theory. This is surprising given that Eldon and Wakeley's 2006 paper is cited in the introduction, and the multiple mergers coalescent was introduced to explain the discrepancy between census size and effective population size, superspreaders, and reproduction variance - that said, there is no explicit discussion or introduction of the multiple mergers coalescent.

      The Haldane model treats N’s very differently from the WF models.  In the WF models, N’s are imposed externally (say, constant N, exponentially growing N, temporally fluctuating N’s and so on; all provided from outside of the model). Ne and coalescence are all derived from these given N’s.  In order to account for the first paradox (see the next paragraph), N needs to be regulated but the WF models cannot regulate N’s. The density-dependent Haldane model that Reviewer 1 inquired above is a model that regulates N internally. It can thus account for the paradox.

      Paradox 1 -  When the population size (N) is growing exponentially, such as in a bacteria culture, drift is nearly absent when N is small and is much stronger as N increases, especially when approaching the carrying capacity.  Such a pattern is a common observation and is exactly opposite of the WF model's central prediction. In short, a model that does not regulate N cannot explain the paradox

      Ne is a fix of the WF model in order to account for the missing components of genetic drift. The paradoxes presented in this one and the companion study show that the fix is rather inadequate.  In contrast, by the WFH model, N is regulated within the model itself as E(K) and V(K) are both functions of N.

      The Wright-Fisher model is often treated as a special case of the Cannings 1974 model, which incorporates the variance in reproductive success. This model should be discussed. It is unclear to me whether the results here have to be explained by the newly introduced WFH model, or could have been explained by the existing Cannings model. The abstract makes it difficult to discern the main focus of the paper. It spends most of the space introducing "paradoxes".

      We appreciate greatly the illuminating advice.  Nevertheless, we should explain, or should have explained, more clearly that these four paradoxes presented are central to this pair of eLife papers. The WF and Haldane models are very different conceptual ideas altogether. The choice should not be based on mathematical grounds but on how they help us understand biological evolution. We are using four paradoxes to highlight the differences.  We have said in the papers that the origin and evolution of COVID-19 caused a lot of confusions partly because the WF models cannot handle multi-copy gene systems, including viruses that evolve both within- and between- hosts.

      The standard Wright-Fisher model makes several assumptions, including hermaphroditism, non-overlapping generations, random mating, and no selection. It will be more helpful to clarify which assumptions are being violated in each tested scenario, as V(K) is often not the only assumption being violated. For example, the logistic growth model assumes no cell death at the exponential growth phase, so it also violates the assumption about non-overlapping generations.

      We appreciate the question which has two aspects.  First, why do we think the WF models are insufficient? After all, for each assumption of the WF model (as given in the reviewer’s examples), there is often a solution by modifying Ne which relaxes the assumption. In this sense, there is only one grand assumption made by the WF models. That is, however complex the biology is, it is possible to find Ne that can make the WF model work. Our argument is that Ne is a cumbersome fix of the WF model and it does not work in many situations. That is how we replied about the importance of the paradoxes above.  We shall again use the first paradox as an example whereby drift is stronger as N becomes larger, the fix has to make Ne negatively correlated with N. In reality, it does not appear possible to resolve this paradox. Another paradox is the evolution of multi-copy gene systems. In short, it seems clear that Ne is not a useful or usable fix.

      The second aspect is that “why, among the many modifications the WF models make, do we only emphasize the inclusion of V(K)?” This is the essence of the two papers of ours.  Although V(K) is a modification of the WF models, it does not enable the WF models to resolve the paradoxes. In contrast, the Haldane model has incorporate E(K) and V(K) in the model. In presenting paradox 3, it was stated that

      This equation shows that the strength of genetic drift, as given by Pf (the fixation probability of an advantageous mutation), is not a function of N at all. It supports the view that the essence of genetic drift is V(K) with N as a scaling factor. Note that, if V(K) = 0, there is no genetic drift regardless of N. As V(K) is not an add-on to the Haldane model (unlike in WF models), the Haldane model can resolve the paradoxes.

      The theory and data regarding sex chromosomes do not align. The fact that \hat{alpha'} can be negative does not make sense. The authors claim that a negative \hat{alpha'} is equivalent to infinity, but why is that? It is also unclear how theta is defined. It seems to me that one should take the first principle approach e.g., define theta as pairwise genetic diversity, and start with deriving the expected pair-wise coalescence time under the MMC model, rather than starting with assuming theta = 4Neu. Overall, the theory in this section is not well supported by the data, and the explanation is insufficient.

      a' can be negative for the same reason that a (the male/female ratio in mutation rate) can be negative (Miyata, et al. 1987; Li, et al. 2002; Makova and Li 2002). Clearly, this has not been a problem in the large literature on a becoming negative.  In fact, in many reports, a is negative, which is read as a approaching infinity.  Imagine that our equation is a'^2 = 0.25, then a' can be 0.5 or -0.5, although the latter solution is not biologically meaningful.

      As for theta, the reviewer asked why we do not use the pairwise genetic diversity (or theta[pi]) as the first-principle approach to estimating theta. While theta(pi) is the first estimator of theta used, the general principle is that every bin of the frequency spectrum can be used for estimating theta since the expected value is theta/i where i is the occurrence of the mutation in the sample.  (If the sample size is 100, then i is between 1 and 99.)  Hence, the issue is which part of the spectrum has the best statistical properties for the questions at hand.  The pairwise measure is theta(pi) [which the reviewer recommends]. While theta(pi) and theta(w) are most commonly used, there are in fact numerous ways to estimate theta.  ((Fu 2022) presents an excellent review.) For our purpose, we need a theta estimate least affected by selection and we choose the lowest frequency bin of the spectrum, which is theta(1) based on the singletons. Theta(1), least affected by selection, is the basis of the Fu and Li test. 

      Reviewer #3 (Public Review):

      Summary:

      Ruan and colleagues consider a branching process model (in their terminology the "Haldane model") and the most basic Wright-Fisher model. They convincingly show that offspring distributions are usually non-Poissonian (as opposed to what's assumed in the Wright-Fisher model), and can depend on short-term ecological dynamics (e.g., variance in offspring number may be smaller during exponential growth). The authors discuss branching processes and the Wright-Fisher model in the context of 3 "paradoxes": (1) how Ne depends on N might depend on population dynamics; (2) how Ne is different on the X chromosome, the Y chromosome, and the autosomes, and these differences do match the expectations base on simple counts of the number of chromosomes in the populations; (3) how genetic drift interacts with selection. The authors provide some theoretical explanations for the role of variance in the offspring distribution in each of these three paradoxes. They also perform some experiments to directly measure the variance in offspring number, as well as perform some analyses of published data.

      Strengths:

      (1) The theoretical results are well-described and easy to follow.

      (2) The analyses of different variances in offspring number (both experimentally and analyzing public data) are convincing that non-Poissonian offspring distributions are the norm.

      (3) The point that this variance can change as the population size (or population dynamics) change is also very interesting and important to keep in mind.

      (4) I enjoyed the Density-Dependent Haldane model. It was a nice example of the decoupling of census size and effective size.

      Thanks.

      Weaknesses:

      (1) I am not convinced that these types of effects cannot just be absorbed into some time-varying Ne and still be well-modeled by the Wright-Fisher process.

      Please allow us to refer to, again, two of the four paradoxes.  We believe that that no modification of the WF model can resolve the paradoxes.

      (1) When the population size (N) is growing exponentially, such as in a bacteria culture, drift is nearly absent when N is small and is much stronger as N increases, especially when approaching the carrying capacity.  Such common observations are exactly opposite of the WF model's key prediction. It is not possible for a model that does not regulate N to explain the paradox.

      (2) There is no way the WF models can formulate Ne for, say viruses or ribosomal RNA genes that have two levels of populations – the within-host populations as well as the host population itself.

      The fact that there are numerous Ne's suggests that Ne is a collection of cumbersome fixes of the WF model. By the WF-Haldane model, all factors are absorbed into V(K) resulting in a simpler model in the end. V(K) is often a measurable quantity. Note that, even if V(K) is incorporated into the WF model, the paradoxes remain unresolvable.

      (2) Along these lines, there is well-established literature showing that a broad class of processes (a large subset of Cannings' Exchangeable Models) converge to the Wright-Fisher diffusion, even those with non-Poissonian offspring distributions (e.g., Mohle and Sagitov 2001). E.g., equation (4) in Mohle and Sagitov 2001 shows that in such cases the "coalescent Ne" should be (N-1) / Var(K), essentially matching equation (3) in the present paper.

      The criticism of lack of engagement with well-established literature has been responded extensively above.  Briefly, the literature is about modifications of the WF model which share the same feature of population sampling. With that feature, the paradoxes are unresolvable.  For example, however Ne is defined, the fixation probability of an advantageous mutation does not depend on N or Ne. This is the third paradox of the WF models.

      (3) Beyond this, I would imagine that branching processes with heavy-tailed offspring distributions could result in deviations that are not well captured by the authors' WFH model. In this case, the processes are known to converge (backward-in-time) to Lambda or Xi coalescents (e.g., Eldon and Wakely 2006 or again in Mohle and Sagitov 2001 and subsequent papers), which have well-defined forward-in-time processes.

      We admire the learned understanding of the literature expressed by the review, which raise two points.  First, our model may not be able to handle the heavy-tailed progeny distribution (i.e., the kurtosis of the distribution of k). Second, the Xi coalescence models (cited above) can do that.  Below are our clarifications.

      First, the WFH model is based on the general distribution of K, which includes flexible and realistic representations of offspring number distributions. In fact, we have used various forms of K distribution in our publications on the evolution of SARS-CoV-2 (see the Ruan et al publications in the bibliography). Power-law distribution is particularly useful as the K-distribution in viral transmission is highly kurtotic. This is reflected in the super-spreader hypothesis. In short, the branching process on which the WFH model is based in is mainly about the distribution of K. Nevertheless, the variance V(K) can often yield good approximations when the kurtosis is modest.

      Second, we would like to comment on the models of Eldon and Wakely 2006. or Mohle and Sagitov 2001 and subsequent papers. These papers are based on the Moran model by considering a highly skewed distribution of offspring numbers. Fundamentally, the Moran models generally behave like WF models (standard or modified) and hence have the same problems with the paradoxes that are central to our studies. In fact, the reservations about introducing V(K) into the WF models apply as well to the Moran models.  The introduction of V(K) is mathematically valid but biologically untenable. Essentially, the WF models incorporate the Haldane model as a first step in the generation transition. The introduction of V(K) into the Moran model is even less biologically sensible. Furthermore, the model allows K to take only three discrete values: 0, 2, and Nψ (see Eq. (7) in Eldon and Wakely). Their model also assumes a constant population size, which contrasts with our model's flexibility in handling varying population sizes and more complex distributions for K.

      In short, the modifications of the WF (and Moran) models are unnecessarily complicated, biologically untenable but still fail to account for the paradoxes. The WFH model can rectify these problems. 

      (4) These results that Ne in the Wright-Fisher process might not be related to N in any straightforward (or even one-to-one) way are well-known (e.g., Neher and Hallatschek 2012; Spence, Kamm, and Song 2016; Matuszewski, Hildebrandt, Achaz, and Jensen 2018; Rice, Novembre, and Desai 2018; the work of Lounès Chikhi on how Ne can be affected by population structure; etc...)

      The reviewer is correct in pointing out the inexact correlation between N and Ne. Nevertheless, it should still be true that the WF models predict qualitatively weaker drift as N increases. The first paradox is as stated:

      When the population size (N) is growing exponentially, such as in a bacteria culture, drift is nearly absent when N is small and is much stronger as N increases, especially when approaching the carrying capacity.  Such common observations are exactly opposite of the WF model's key prediction.

      (5) I was also missing some discussion of the relationship between the branching process and the Wright-Fisher model (or more generally Cannings' Exchangeable Models) when conditioning on the total population size. In particular, if the offspring distribution is Poisson, then conditioned on the total population size, the branching process is identical to the Wright-Fisher model.

      We thank the reviewer for this important comment. The main difference is that N is imposed from outside the WF models but can be generated from within the Haldane model (see the density-dependent Haldane model). In nature, N of the next generation is the sum of K’s among members of the population. It is how the Haldane model determines N(t+1) from N(t). In the WF models, N is imposed from outside the model and, hence the given N determines the distribution of K.  For this reason, N regulation is not possible in the WF models, thus resulting in the paradoxes.

      (6) In the discussion, it is claimed that the last glacial maximum could have caused the bottleneck observed in human populations currently residing outside of Africa. Compelling evidence has been amassed that this bottleneck is due to serial founder events associated with the out-of-Africa migration (see e.g., Henn, Cavalli-Sforza, and Feldman 2012 for an older review - subsequent work has only strengthened this view). For me, a more compelling example of changes in carrying capacity would be the advent of agriculture ~11kya and other more recent technological advances.

      We thank the reviewer and have used this more convincing case as suggested by the reviewer.

      Recommendations for the authors:

      General replies - We thank the editors and reviewers again.  The points below are re-iterations of the comments received above and have since been replied in detail. Specific instructions about wording and notations have also been rectified. Again, we are grateful for the inputs from which we learned a great deal.

      Reviewing Editor Comments:

      The reviewers recognize the value of this model and some of the findings, particularly results from the density-dependent Haldane model. However, they expressed considerable concerns with the model and overall framing of this manuscript.

      First, all reviewers pointed out that the manuscript does not sufficiently engage with the extensive literature on various models of effective population size and genetic drift, notably lacking discussion on Cannings models and related works.

      We have addressed this issue in the beginning of Introduction and Discussion, pointing to the long section in the new second half of Discussion. The essence is that the literature is all about the modified WF models.  The WF-Haldane model is conceptually and operationally distinct from the WF models, either standard or modified ones,

      Second, there is a disproportionate discussion on the paradoxes, yet some of the paradoxes might already be resolved within current theoretical frameworks. All three reviewers found the modeling and simulation of the yeast growth experiment hard to follow or lacking justification for certain choices. The analysis approach of sex chromosomes is also questioned.

      This criticism is addressed together with the next one as they make the same point.

      The reviewers recommend a more thorough review of relevant prior literature to better contextualize their findings. The authors need to clarify and/or modify their derivations and simulations of the yeast growth experiment to address the identified caveats and ensure robustness. Additionally, the empirical analysis of the sex chromosome should be revisited, considering alternative scenarios rather than relying solely on the MSE, which only provides a superficial solution. Furthermore, the manuscript's overall framing should be adjusted to emphasize the conclusions drawn from the WFH model, rather than focusing on the "unresolved paradoxes", as some of these may be more readily explained by existing frameworks. Please see the reviewers' overall assessment and specific comments.

      Many thanks.  We have carefully reframed and presented the WF-Haldane model to make it clear and logically consistent. Whether a new model (i.e., the WF-Haldane model) deserves to be introduced depends on whether it makes any contribution for understanding nature. That is why we emphasize the four paradoxes. 

      A most important disagreement between the reviewers and the authors is about the nature of the paradoxes. While the reviewers suggest that they "may" be resolvable by the conventional WF model (standard or modified), they did not offer the possible resolutions.  To use the analogy in our provisional response: the WF vs. Haldane models are compared to gas cars vs electric vehicles.  We can say confidently that the internal combustion engine cannot resolve the conflicting demands of transportation and zero emission. Its design has limited its capability. 

      Reviewer #2 (Recommendations For The Authors):

      Many thanks.  We have incorporated all these suggestions.  When the incorporation is not straightforward, we have carefully revised the text to minimize mis-communications.

      In the introduction -- "Genetic drift is simply V(K)" -- this is a very strong statement. You can say it is inversely proportional to V(K), but drift is often defined based on changes in allele frequency.

      We change the word “simply” to “essentially”. This wording is supported by the fixation probability of advantageous mutations, 2s/(V(k). We have shown in the text that N does not matter here because the fixation is nearly deterministic when the copy number reaches, say, 100, regardless of whether N is 10^4 or 10^8,

      Page 3 line 86. "sexes is a sufficient explanation."--> "sex could be a sufficient explanation"

      The strongest line of new results is about 2s/V(K). Perhaps, the paper could put more emphasis on this part and demonstrate the generality of this result with a different example.

      The math notations in the supplement are not intuitive. e.g., using i_k and j_k as probabilities. I also recommend using E[X] and V[X]for expectation and variance rather than \italic{E(X)} to improve the readability of many equations.

      Thank you for your careful reading. Regarding the use of i_k and j_k  as probabilities, we initially considered using 𝑝 or 𝑞 to represent probabilities. However, since 𝑝 and 𝑞 are already used in the main text, we opted for 𝑖 and 𝑗 to avoid potential confusion potential confusion. As for your recommendation to use

      E[X] and V[X] for expectation and variance, we would like to clarify that we follow the standard practice of italicizing these symbols to represent variables.

      Eq A6, A7, While I manage to follow, P_{10}(t) and P_{10} are not defined anywhere in the text.<br /> Supplement page 7, the term "probability of fixation" is confusing in a branching model.

      Thank you for your observation. We have carefully revised the supplement to provide clarity on these points.<br /> Revision - In population genetics, the fixation of M allele means that the population consist entirely of the M allele, with no W alleles remaining. We define the fixation probability of M allele by generation t as follows:

      Given that M and W allele reproduce independently, this can be factored as:

      As t approaches infinity, the ultimate fixation probability of M allele can be derived as follows:

      E.q. A 28. It is unclear eq. A.1 could be used here directly. Some justification would be nice.

      We appreciate your careful review, and we will ensure this connection between the two equations is made clearer in the supplement. 

      Revision - Note we would like to clarify that Eq. (A1) and Eq. (A28) are essentially the same, with the only difference being the subscript 𝑡, which indicates the time dependence in the dynamic process.

      Supplement page 17. "the biological meaning of negative..". There is no clear justification for this claim. As a reader, I don't have any intuition as to why that is the case.

      Thank you for raising this concern. We have addressed this issue earlier.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study has uncovered some important initial findings about cellular responses to aneuploidy through analysis of gene expression in a set of donated human embryos. While the study's findings are in general solid, some experiments lack statistical power due to small sample sizes. The authors should try to get much more insight with their data highlighting the novel findings.

      We thank the editor for considering our manuscript for publication at elife, and for the helpful and thorough reviews of our work. Based on the suggestions of the reviewers, we have carried out additional experiments, expanded the sample size and reanalyzed the data. This has resulted in a thoroughly revised manuscript and much improved work, which we are convinced meets the requirements to be published as a version of record. Of note, the experiments for the revision required the support by 2 additional researchers from our lab which are now coauthors.

      These are the main changes made to the initial manuscript:

      (1) The RNA-seq data (Figures 1+2) is now FDR corrected and been reanalyzed. This has not affected the initial observations on the activation of p53 and apoptosis in aneuploid human embryos, as well as that the transcriptomic changes are driven by gene dosage effects. 

      (2) We have included the transcriptome analysis of reversine-treated embryos in the supplementary data.

      (3) For validation of novel findings such as the presence of DNA-damage and the expression of DRAM1 in aneuploid embryos, we now include the stainings of 30 human blastocysts (Figure 3o-t). We found absence of DNA-damage in aneuploid embryos and that DRAM1 is increased in the TE but not the ICM of aneuploid embryos. 

      (4) We re-analyzed the co-expression of CASP8/HSP70 in reversine-embryos as suggested by reviewer 1 and found that both proteins tend to be co-expressed. 

      (5) We have added a new analysis of NANOG expression (Figure 4a,b) of the embryos used in Figure 3o-t and have found retention of NANOG protein in both the TE and ICM.

      (6) We have added 6 euploid and 4 aneuploid embryos to Figure 4l-s, which support the conclusions on the absence of autophagy activation in the ICM and failure of PrE formation in aneuploid embryos.

      (7) We have significantly changed the layout of the figures, revised the supplementary tables, added source data files and rewritten the discussion.

      Regarding the sample size of the study, it is important to emphasize that human embryos are ethically sensitive material and that those with the specific genetic content we used in this study are rare, limiting our ability to expand the sample size. For the revision, we have added 40 human blastocysts to our initial 85 embryos. Compared to similar and high-quality studies using human embryos, our study shows a relatively large sample size (n=125): Victor et al. 2021: 30 human blastocysts for immunostainings1; Martin et al. 2023: 14 human blastocysts2; Martin et al. 2024: 64 human blastocysts3; Domingo-Muelas et al. 2023: 23 human blastocysts4.              

      Public Reviews:

      Reviewer#1(PublicReview):

      This study investigated an important question in human reproduction: why most fully aneuploid embryos is incompatible with normal fetal development. Specifically, the authors investigated the cellular responses to aneuploidy through analysis of gene expression in a set of donated human blastocysts. The samples included uniform aneuploid embryos of meiotic origin and mosaic aneuploid embryos from the SAC inhibitor reversine treatment. The authors relied mainly on low-input RNA sequencing and immunofluorescence staining. Pathway analysis with RNA-seq data of trophectoderm cells suggested activation of p53 and possibly apoptosis, and this cellular signature appeared to be stronger in TE cells with a higher degree of aneuploidy. Immunostaining also found some evidence of apoptosis, increased expression of HSP70 and autophagy in some aneuploid cells. With combinational OCT4 and GATA4 as lineage markers, it appeared that aneuploidy could alter the second lineage segregation and primitive endoderm formation in particular.

      Although this study is largely descriptive, it generated valuable RNA-seq data from a set of aneuploid TE cells with known karyotypes. Immunostaining results in general were consistent with findings in mouse embryos and human gastruloids.

      We thank the reviewer for the thorough evaluation of our manuscript. We have implemented most of the suggestions, which have further strengthened the original findings.

      While there is a scarcity of human embryo materials for research, the lack of single cell level data limits further extension of the presented data on the consequences of mosaic embryos.  

      We did not include single cell RNA-seq data of mosaic human embryos in our study because we focused on embryos diagnosed with complex meiotic abnormalities. Our hypothesis was that the cellular consequences of aneuploidy would be strongest in this type of aneuploidies and most evident to identify and would allow us to provide a basis for the mechanisms of elimination of aneuploid cells in human embryos. In the manuscript (lines 596-626) we acknowledge the limitations of the extrapolation of our results to mosaic embryos.

      A major concern is that the gene list used for pathway analysis is not FDR controlled. It is also unclear how the many plots generated with the "supervised approach" were actually performed. 

      We agree with the concerns about the fact that our differential expression gene list was not FDR but p-value ranked. We followed the suggestion of the reviewer and revised the RNAseq analysis and focused primarily on pathway analysis. We have also added the comparison between aneuploid and reversine treated embryos to the supplementary data and expanded the analysis of high dosage and low dosage embryos. Importantly, the new analysis has not changed the original finding that aneuploid embryos show hallmarks of p53 activation and apoptosis, and that these effects are gene dosage dependent. The manuscript now includes two completely revised and new figures 1 and 2.

      Since we discarded the data generated from our previous approach, we do not use the term supervised approach anymore.

      The authors also appear to have ignored the possibility that high-dosage group could have a higher mitotic defect.

      This is indeed a possibility. In the discussion (lines 504-508) we have now incorporated the notion that the high dosage embryos could have higher mitotic defects, although our data cannot provide any evidence for this. Of note, the gene expression data shows that all aneuploid embryos (including low dosage and reversine embryos) equally show an enrichment for mitotic spindle pathway genes.

      Assuming a fully aneuploid embryo, why do only some cells display p53 and autophagy marker? 

      This is a very good question, on which we can only speculate, but the answer likely lies in the diversity across cells of the same embryo.

      Even in genetically homogenous tissues and cell cultures, individual cells can exhibit different levels of stress responses, such as p53 activation and apoptosis. This variation may be influenced by the local cellular environment, stochastic gene expression, or differences in cell cycle stages. Other studies on fully aneuploid human embryos could also not detect apoptotic responses in every cell1,3.

      For instance, p53 activation differs even between cells that have a similar number of DNA breaks, and this activation is influenced by both cell-intrinsic factors and previous exposure to DNA damage5.

      Cell cycle tightly regulates the response of cells to different stressors. For instance, cells in G1 or S-phase might be more sensitive to apoptosis signals6, while those in G2/M might escape this response temporarily7.  Autophagy is more induced in G1 and S phases, with reduced activity in G2 and M phases8.

      Individual cells may also have different levels of success in the activation of the compensatory pathways, including the unfolded protein response, autophagy, or changes in metabolism, resulting in some cells adapting better than others.

      The expression of p53 and the sensitivity to apoptosis could also be influenced by epigenetic differences between cells, which may alter their transcriptional response to aneuploidy. Even in a genetically identical population, cells can have different epigenetic landscapes, leading to heterogeneous gene expression patterns.

      The conclusion about proteotoxic stress was largely based on staining of HSP70. It appears from Figure 3 d,h that the same cells exhibited increased HSP70 and CASP8 staining. Since HSP70 is known to have anti-apoptotic effect, could the increased expression of Hsp70 be an anti-apoptotic response?

      Our conclusion about proteotoxic stress was not solely based on HSP70 expression. We also stained for LC3B and p62, which are markers for autophagy and when highly expressed indirectly point towards underlying proteotoxic stress in the cells. 

      We reanalyzed the imaging of the stainings in the reversine-treated embryos, and found that the same cells were positive for both HSP70 and CASP8 staining while the minority was single positive (shown now in Figure 3k,l). 

      HSP70 does indeed not only unfold misfolded and aggregated proteins but does also have a function during cell survival and apoptosis9. HSP70 has been for instance found to inhibit the cleavage of Bid through active CASP8 within the extrinsic apoptosis pathway10. It is thus possible that it temporarily plays this role, and we have acknowledged this in the discussion (lines 623-626). On the other hand, the evidence points at an active apoptosis in the TE, with concomitant cell loss, so if HSP70 is indeed having an anti-apoptotic effect, it is having a limited impact.

      Reviewer #2 (Public Review): 

      A high fraction of cells in early embryos carry aneuploid karyotypes, yet even chromosomally mosaic human blastocysts can implant and lead to healthy newborns with diploid karyotypes. Previous studies in other models have shown that genotoxic and proteotoxic stresses arising from aneuploidy lead to the activation of the p53 pathway and autophagy, which helps eliminate cells with aberrant karyotypes. These observations have been here evaluated and confirmed in human blastocysts. The study also demonstrates that the second lineage and formation of primitive endoderm are particularly impaired by aneuploidy.

      This is a timely and potentially important study. Aneuploidy is common in early embryos and has a negative impact on their development, but the reasons behind this are poorly understood. Furthermore, how mosaic aneuploid embryos with a fraction of euploidy greater than 50 % can undergo healthy development remains a mystery. Most of our current information comes from studies on murine embryos, making a substantial study on human embryos of great importance. However, there are only very few new findings or insights provided by this study. Some of the previous findings were reproduced, but it is difficult to say whether this is a real finding, or whether it is a consequence of a low sample number. The authors could get much more insight with their data.

      We thank the reviewer for the thorough evaluation of our manuscript and the valuable suggestions made in the private recommendations. We have expanded the sample size and have carried out additional experiments that have significantly improved the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Instead of using cut off to generate a list, the authors could just rank the entire detected transcriptome for GSEA. This method fits better the authors' intentions of "primarily focused on pathway analysis." The cut-off value "-log10(p-value)<0.05" is not correct. As we can see from the PCA plot, one would not expect many cut off defined DEGs at all. The most obvious transcriptome change is dosage dependent, as the authors cleared showed with InferCNV.

      We thank the reviewer for this suggestion and agree that this was an important concern of the study. We have entirely revised the RNA-seq analysis based on the proposed approach (Figure 1 and 2, Supplementary Figure 1). Also, we have included the analysis of aneuploid versus reversine treated embryos, which has allowed us to determine the differences between naturally occurring chromosomal abnormalities and those that are induced using reversine (Supplementary Figure 1). 

      We first performed differential gene expression analysis using DESEq2 with a cut-off value for significantly differentially expressed genes of | log2FC | > 1 and an FDR < 0.05. Based on the PCAs and the low number of differentially expressed genes for all comparisons, besides high dosage versus euploid embryos, we focussed primarily on pathway analysis. 

      For that, based on the reviewer’s suggestion, we generated a ranked gene list using the GSEA software (version 4.2.2, MSigDatabase) based on the normalized count matrix of the whole transcriptome that was detected after differential gene expression. The ranked gene list was then subjected to the run GSEA function, and we searched the Hallmark and C2 library for significantly enriched pathways. Thus, we could generate normalized enrichment scores, allowing us to predict whether a pathway is activated or suppressed. The details of the new analysis are described in the Material and Methods section (lines 220-232). Significance was determined using a cut-off value of 25% FDR. This cut-off is proposed in the user guide of the GSEA (https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm) especially for incoherent gene expression datasets, as suggested by our PCAs, which allows for hypothesis driven validation of the dataset. 

      Indeed, we found that the most important transcriptome changes are aneuploidy dosage dependent. High dosage embryos show signatures of cellular unfitness, while low-dosage embryos still seem to activate survival pathways (lines 349-364). 

      This new analysis did not only increase robustness of our results but also introduced novel findings, which pave the road for future studies. 

      The validity of our findings is supported by recent work by the Zernicka-Goetz lab. We found that hypoxia is upregulated in low dosage human aneuploid TE cells. In line with our data, the Zernicka-Goetz lab found in a mouse model of low degree chromosomal abnormalities that hypoxia inducible factor 1A (HIF1A) promotes survival of extraembryonic aneuploid cells by reducing levels of DNA damage11.

      (2) It would be very helpful if the authors could perform co-staining of multiple stress markers to better understand the origins of apoptosis and autophagy cells. In Fig 3d and 3h, it seems that the same reversine treated embryo was stained with CASP8, LC3B and HSP70. Is there any correlation between CASP8 and HSP70 at the single cell level? Is there any correlation between p53 and LC3B as the authors suggested, possibly through DRAM1?

      We decided to use the complex aneuploid embryos that were left at our facility for the validation of novel findings such as upregulation of DRAM1 and presence and consequences of DNA damage in aneuploid embryos. As suggested by the editor and the other reviewer we also added embryos to existing datasets to increase the sample size where necessary. Therefore, we did not include other co-staining’s of multiple stress markers.

      Following the reviewer’s suggestion, we reanalyzed the existing stainings and evaluated whether there is a correlation between CASP8 and HSP70 at the single cell level. The reversine-treated embryos were the only embryo group that was co-stained for both CASP8 and HSP70. We quantified the percentage of cells that were single or double positive for CASP8 and HSP70 and found a higher proportion of double positive cells than to single positives. Therefore, we concluded that there is indeed a correlation between both proteins at the single cell level in reversine-treated embryos and included this data in Figure 3k,l. 

      During the experiments for the revision, we found that the DRAM1 protein was upregulated in the cytoplasm of TE cells but not in the ICM of aneuploid embryos (Figure 3s,t), which validates the findings of the gene expression analysis. This data also supports our findings that autophagy is active in aneuploid TE cells while not significantly increased in aneuploid pluripotent ICM cells. Unfortunately, we could not stain LC3B and DRAM1 in the same embryo because the antibodies were raised in the same species.

      (3) While " the possibilities for functional studies and lineage tracing experiments in human embryos are very limited," the authors can leverage in silico modelling (ie, PMID: 28700688) to address the roles of aneuploidy in blastocyst formation and development. Is there any selfregulating mechanism underlying the ratios of PrE and EPI? Is apoptosis of ICM cells a natural process during PrE formation (PMID: 18725515)?

      It is a very interesting proposal to use in silico modelling to address the roles of aneuploidy during human blastocyst formation and lineage segregation. Although this type of analysis would yield very important insights, we are not able to address this point of the revision due to lack of expertise for this type of analysis in our group, requiring setting up a collaboration with experts in this field.  In the discussion we proposed that future studies can leverage our data to be carried out in silico modelling and cited the proposed article (lines 608-610).

      On the second part of the question, we would like to discuss the differences between mouse and human embryo studies. Parts of this were included in the discussion on the possible mechanisms of PrE elimination. 

      Is there a self-regulating mechanism for EPI/PrE formation?

      To extrapolate the knowledge on mouse development to human it is important to bear in mind that (1) human embryos are outbred, as compared to inbred super-fertile laboratory mouse strains and (2) the embryos are donated to research by subfertile couples, which could compromise the EPI/PrE ratios. For instance, Chousal and colleagues found that poor quality blastocysts have a reduced number of PrE cells12. In human embryos the proportion EPI and PrE cells is indeed highly variable (20%-60%) and while the number of EPI cells does not increase between dpf6 and 7, the number of PrE cells does grow13. We found a similar variable number of EPI and PrE in our study on the lineage segregation mechanisms in good quality human embryos, with an absolute number of EPI of 12.1±6.5 cells and 8.4±3.44 PrE cells14.

      By comparison, in late mouse blastocysts, the ratio EPI/PrE cells is consistent (2/3)15. Overall, self-regulating mechanisms in the human embryo are not yet studied in detail due to the lack of possible functional testing.

      Is apoptosis a natural process during PrE formation?

      Yes, in mice apoptosis is a natural process during PrE formation to eliminate misallocated cells of the inner cell mass through cell competition16,17. Yet, in the human embryo there is no evidence of such mechanisms. Although apoptosis is present even in human blastocysts of good quality18, the origin of such apoptotic cells is now still shown, although suboptimal culture conditions are known to increase cellular fragmentation19. Conversely, our data and that of others1,2 supports the notion that the pluripotent inner cell mass in human embryos is more resistant to apoptosis than the trophectoderm, even in karyotypically aberrant cells. 

      (4) The "count tables generated from the raw data files" could not be found in the source data files.

      This slipped to our attention, we have added now the count tables to the source data files. Our apologies.

      (5) Citations on aneuploidy literature were not done in a fully scholarly manner. It appears that authors selectively cite previous papers that are in support of their hypothesis but left out those with alternative conclusions.

      We apologize if we missed any literature that contradicts our findings, it is not intentional. We would be grateful if the reviewer could provide such references. 

      In the manuscript we describe the alignment and differences of key findings with several studies (listed below) and the limitations of our study are extensively described in lines 596626.

      Our findings align with other work on these aspects:

      - RNA-sequencing data2,20–26

      - Gene dosage effects drive the transcriptome of the aneuploid human embryo27,28

      - Aneuploid cells are cleared by sustained proteotoxic stress followed by p53 activation, autophagy and eventually apoptosis29–37.

      - p53 is active in constitutional aneuploid cells38

      - The ICM is less sensitive to apoptosis1,2

      Our findings differ with other work on these points:

      - p53 activation is independent from DNA-damage39

      - p53 is active in constitutional aneuploid cells40,41

      - Apoptosis is only present in the aneuploid TE of aneuploid cells in the embryo29,30,42    

      Reviewer #2 (Recommendations For The Authors):

      Comments:

      (1) The main problem is that there is no substantial novelty. The authors look at previously identified factors affected by chromosome gains and losses, but none of the new one from their analysis. Anything what could be potentially novel is not carefully analyzed (e.g. the difference between reversine-treated and aneuploid samples, or new potential candidates) or explained. This is really a pity.

      In the revision, we have further elaborated on the DNA damage aspect by staining for DNA double-stranded breaks and have validated DRAM1 as an activated downstream effector of p53. We have also added the analyses of the gene-expression of the reversine-treated embryos.

      (2) Some of the general statements on aneuploidy are confusing and often borderline generalized. E.g. introduction line 106: "If this (proteotoxic stress) remains unresolved by the activation of autophagy..." I am not aware of any publication suggesting that autophagy resolves proteotoxic stress in aneuploid cells. Citations that replication stress causes DNA damage in aneuploid cells are wrong. This link was first shown by Passerini et al. in 2016. etc.

      We have clarified these statements in the introduction and added the proposed citations on replication stress that causes DNA damage in aneuploid cells (lines 95-108).

      (3) In the figures the authors show a representative image of aneuploid and diploid embryos. Given the aneuploid embryos have widely different karyotypes, it would be important to clarify which of the embryos has been actually shown. Similarly, in the heat maps it is not clear which line is which embryo. This would be very useful.

      We added the karyotypes of the aneuploid embryos to the images in figure 3 and 4. Since the heatmaps were removed from the figures we added the karyotypes to the PCAs in all figures.

      (4) The authors constantly state that aneuploid embryo accumulate more DNA damage, which is supported by some of their observations, e.g. the DNA damage response is upregulated. It would be great if they would validated this statements with testing some markers for DNA damage.

      We agree with the reviewer that this was an important point and addressing it has revealed that our initial assumption was incorrect and has provided new interesting findings. From the revised RNA-seq analysis, we found only one pathway (DNA damage response TP53) to be activated in all aneuploid embryos (Fig.1e). The ATM pathway was also activated specifically in high-dosage embryos. Following this, we set to test if DNA damage was indeed increased in aneuploid embryos by staining for DNA double strand breaks with gH2AX. 

      First, we investigated the gH2AX expression in 5dpf embryos in which we induced DNAdamage with Bleomycin. We compared 6 untreated versus 6 Bleomycin treated human embryos (Fig. 3m) and found that gH2AX foci were rarely present in the untreated embryos and that all cells of the treated embryos showed a pan-nuclear gH2AX staining. 

      Second, we compared the presence of gH2AX foci in the TE (NANOG negative cells), ICM (NANOG positive cells) and the whole embryo of 7 euploid versus 11 aneuploid embryos. Interestingly, we found no differences in the number of gH2AX foci or pan-nuclear gH2AX nuclei between euploid and aneuploid embryos (Fig 3o). When dividing our aneuploid embryos into high and low dosage embryos we could also not account for differences. Our data now suggests that complex aneuploid human embryonic cells of meiotic origin do not contain more DNA-double strand breaks, precluding DNA-damage as the source of p53 activation. Last, in our previous experiment we found that phosphorylated S15p53 is increased in aneuploid embryos, supporting an active p53 pathway as suggested by our transcriptomic data. Since we could not find DNA-damage in aneuploid human embryos we speculate that p53 is phosphorylated on Serine15 through metabolic stress as suggested by Jones and colleagues43. We also argue that proteotoxic stress might induce p53 expression as proposed by Singla and colleagues29.

      (5) The source of embryos is only partially described in a figure legend. This should be expanded and described in the Materials and Methods section. The embryos are named, but this is nowhere explained. One can only assume that T is for trisomy and M is for monosomy.

      We have divided the embryos into different experimental series (Experiment 1-4). This is now described in the Material and methods section (lines 157-175). Also, we have added the experiment number of each embryo to the supplementary tables and to the source data. The abbreviation for T = Trisomy and M= Monosomy was initially introduced in the last paragraph of the figure legend of figure 4.  We now added it to every panel.

      (6) Recent works from non-embryonic cells suggest that the cellular response to monosomy is different than the response to trisomy. Did the authors try to test this possible difference? For example, one could compare embryos M174/21, M2/19 and M17 with T2/10, T10/22 and T1/15/18/22.

      We thank the reviewer for pointing this out. Our RNA-seq. dataset consisted of three embryos that contained trisomies only and four embryos that contained monosomies only. When reanalyzing our data we found different transcriptomic responses between monosomic only and trisomic only cells. Compared to euploid cells, monosomy only cells activate mainly the p53pathway and protein secretion while translation, DNA replication, cell cycle G1/S, DNA synthesis and processing of DNA double strand breaks were inhibited. Trisomy only cells show activated oxidative phosphorylation, ribosome and translation while protein secretion, apoptosis and cell cycle are inhibited. These differences were confirmed by testing transcriptomic differences between trisomic versus monosomic cells. Our results are similar to studies on human embryos20,26 and other monosomic and trisomic cell lines44,45. However, the interpretation of these results is very limited by the small sample size and the comparison of monosomies and trisomies of different chromosomes. Thus, we decided to keep this analysis out of the manuscript.

      Author response image 1.

      On the protein level, next to the small sample size, our results were also limited by the fact that not all embryos were stained with the same combinations of antibodies. LC3B was the only protein for which all embryos were immunostained. Thus, other protein data could not be re-analyzed due to even lower sample sizes. 

      Below we have separated the LC3B puncta per cell counts into euploid, trisomies only, monosomies only and all other aneuploid embryos. We performed a Kruskal Wallis test with multiple comparisons. It is worth noticing that the difference between euploid and monosomies only (and those that contained both) was statistically significant, while the difference between euploid vs trisomies only and trisomies only vs monosomies only was not statistically significant. These differences contradict the studies on monosomic cell lines that found that proteotoxic stress and autophagy are not present and specific to trisomic cell lines. Here we also decided to keep this specific protein expression analysis out of the manuscript due to the above-mentioned limitations.

      Author response image 2.

      (7) Line 329: "a trisomy 12 meiotic chromosomal abnormality in one reversine-treated embryo." What does it mean? Why meiotic chromosomal abnormality when the reversine treatment was administered 4 days after fertilization? In the discussion, the authors state "presumed meiotic," but this should be discussed and described more clearly.

      Since reversine induces mitotic abnormalities of different types leading to chromosomally mosaic embryos, we could not identify these induced abnormalities using inferCNV on the RNAseq of TE biopsies of said embryos. However, we were not aware of the karyotype of the embryos that were used for these experiments, as they were thawed after they had been cryopreserved at day 3 of development and had not been subjected to genetic testing.  This makes it possible that some of those embryos we used for the reversine experiments in fact carried endogenously acquired meiotic and mitotic chromosomal abnormalities. Since we are only able to detect by inferCNV aneuploidies homogeneously present in the majority of the cells of the sequenced biopsy, we only picked up this trisomy 12.  It is possible that this was not a meiotic abnormality but a miotic one originating at the first cleavage and present at a high percentage of cells in the blastocyst. At any rate, the exact origin of this aneuploidy has no further implications for the results of the study. We clarified this in the manuscript (lines 310-315).

      (8) Line 422: "The gene expression profiles suggest that the accumulation of autophagic proteins in aneuploid embryos is caused by increased autophagic flux due to differential expression of the p53 target gene DNA Damage Regulated Autophagy Modulator-1 (DRAM1), rather than by inhibition of autophagy (Supplementary Table 2)." This is highly speculative, as the authors do not have any evidence to support this statement.

      To validate this finding we have now stained 7 euploid and 11 aneuploid embryos with a DRAM1 antibody. We found DRAM1 protein to be significantly enriched in the cytoplasm of TE cells but not in the ICM of aneuploid embryos when comparing with euploid embryos (Fig. 3s,t). This data is consistent with the finding that autophagy is increased in the TE and not the ICM of aneuploid human embryos. (Fig 4l-o). Potential implications of DRAM1 expression have been mentioned in the discussion.

      (9) The figure legends are confusing. They are mixed up with the methods and some key information are missing.

      We revised all figure legends accordingly and removed the experimental set-up figures from the manuscript to reduce any confusion. The methods section was revised and expanded.

      (10) In Figure 1, what is the difference between "activated" and "deregulated"?

      Since we analyzed our RNA-seq dataset with the method proposed by reviewer 1 we now generated normalized enrichment scores. The terms activated and deregulated are thus not present anymore.  

      (11) The p62 images are not really clear. There might be more puncta (not obvious, though), but the staining intensity seems lower in the representative images.  

      We do not agree with the reviewer that there might be more p62 puncta (purple), however, we agree that it was not clearly visible from the pictures. Below we show an example of the counting mask (in green) of the aneuploid embryo from figure 3i, where one can clearly appreciate that all the puncta are captured by the counting mask. In this case, the software counted 1704 puncta. To further clarify, we now added a zoom of a randomly chose ROI of the p62 staining’s to figure 3i.

      Author response image 3.

      (12) The authors claim that there are differences between lineages in response to aneuploidy, such as autophagy not being activated in the OCT4+ lineage, etc. However, the differences are very small and based on a small number of embryos. It is difficult to draw far-reaching conclusions based on a small number of experiments (Fig. 4n-r). The authors also claim in the Abstract that they demonstrated "clear differences with previous findings in the mouse", which are however difficult to identify in the text.

      We agree with the reviewer that our conclusions on figures 4l-o were based on a small number of embryos. We have increased as much as possible the sample size. This is challenging due to the constrictions in accessing human embryos, and especially the limited number of embryos with meiotic complex aneuploidy. We have performed immunostainings for LC3B, OCT4 and GATA4 of six additional euploid and four additional aneuploid human embryos. This did not change our overall findings that aneuploid embryos upregulate autophagy in the TE rather than the ICM (Figure 4l-o). After the inclusion of additional embryos, we removed our speculation from the manuscript that autophagy is present in ICM cells of already differentiated cells towards EPI/PrE.

      We have rephrased the abstract to state that we highlight a few differences with previous findings in the mouse. Here we focused especially on the different transcriptomic response of reversine treated embryos, that aneuploid mouse embryos do not seem to suffer from lineage segregation errors and that the ICM of aneuploid human embryos lacks apoptosis while aneuploid mouse embryos show elimination from the EPI. Likewise, we highlighted the similar stress responses and that we could give novel insights into p53 mediated autophagy and apoptosis activation through DRAM1 in aneuploid TE cells but not the ICM.  

      (13) The text needs thorough editing - long sentences, typos, and grammar errors are frequent. Punctuation is largely missing.

      We have revised the text.

      References

      (1) Victor, A. R. et al. One hundred mosaic embryos transferred prospectively in a single clinic: exploring when and why they result in healthy pregnancies. Fertil Steril 111, 280–293 (2019).

      (2) Martin, A. et al. Mosaic results after preimplantation genetic testing for aneuploidy may be accompanied by changes in global gene expression. Front Mol Biosci 10, 264 (2023).

      (3) Martín, Á. et al. Trophectoderm cells of human mosaic embryos display increased apoptotic levels and impaired differentiation capacity: a molecular clue regarding their reproductive fate? Human Reproduction 39, 709–723 (2024).

      (4) Domingo-Muelas, A. et al. Human embryo live imaging reveals nuclear DNA shedding during blastocyst expansion and biopsy. Cell 186, 3166-3181.e18 (2023).

      (5) Loewer, A., Karanam, K., Mock, C. & Lahav, G. The p53 response in single cells is linearly correlated to the number of DNA breaks without a distinct threshold. BMC Biol 11, 1–13 (2013).

      (6) Kim, H., Watanabe, S., Kitamatsu, M., Watanabe, K. & Ohtsuki, T. Cell cycle dependence of apoptosis photo-triggered using peptide-photosensitizer conjugate. Scientific Reports 2020 10:1 10, 1–8 (2020).

      (7) Pollak, N. et al. Cell cycle progression and transmitotic apoptosis resistance promote escape from extrinsic apoptosis. J Cell Sci 134, (2021).

      (8) Neufeld, T. P. Autophagy and cell growth--the yin and yang of nutrient responses. J Cell Sci 125, 2359–2368 (2012).

      (9) Lanneau, D. et al. Heat shock proteins: essential proteins for apoptosis regulation. J Cell Mol Med 12, 743 (2008).

      (10) Gabai, V. L., Mabuchi, K., Mosser, D. D. & Sherman, M. Y. Hsp72 and Stress Kinase cjun N-Terminal Kinase Regulate the Bid-Dependent Pathway in Tumor Necrosis Factor-Induced Apoptosis. Mol Cell Biol 22, 3415 (2002).

      (11) Sanchez-Vasquez, E., Bronner, M. E. & Zernicka-Goetz, M. HIF1A contributes to the survival of aneuploid and mosaic pre-implantation embryos. bioRxiv 2023.09.04.556218 (2023) doi:10.1101/2023.09.04.556218.

      (12) Chousal, J. N. et al. Molecular profiling of human blastocysts reveals primitive endoderm defects among embryos of decreased implantation potential. Cell Rep 43, (2024).

      (13) Corujo-Simon, E., Radley, A. H. & Nichols, J. Evidence implicating sequential commitment of the founder lineages in the human blastocyst by order of hypoblast gene activation. Development (Cambridge) 150, (2023).

      (14) Regin, M. et al. Lineage segregation in human pre-implantation embryos is specified by YAP1 and TEAD1. Human Reproduction 38, 1484–1498 (2023).

      (15) Saiz, N., Williams, K. M., Seshan, V. E. & Hadjantonakis, A. K. Asynchronous fate decisions by single cells collectively ensure consistent lineage composition in the mouse blastocyst. Nature Communications 2016 7:1 7, 1–14 (2016).

      (16) Plusa, B., Piliszek, A., Frankenberg, S., Artus, J. & Hadjantonakis, A. K. Distinct sequential cell behaviours direct primitive endoderm formation in the mouse blastocyst. Development 135, 3081–3091 (2008).

      (17) Hashimoto, M. & Sasaki, H. Epiblast Formation by TEAD-YAP-Dependent Expression of Pluripotency Factors and Competitive Elimination of Unspecified Cells. Dev Cell 50, 139-154.e5 (2019).

      (18) Hardy, K. Apoptosis in the human embryo. Rev Reprod 4, 125–134 (1999).

      (19) Ramos-Ibeas, P. et al. Embryo responses to stress induced by assisted reproductive technologies. Mol Reprod Dev 86, 1292–1306 (2019).

      (20) Licciardi, F. et al. Human blastocysts of normal and abnormal karyotypes display distinct transcriptome profiles. Sci Rep 8, 1–9 (2018).

      (21) Maxwell, S. M. et al. Investigation of Global Gene Expression of Human Blastocysts Diagnosed as Mosaic using Next-generation Sequencing. Reproductive Sciences 1–11 (2022) doi:10.1007/s43032-022-00899-x.

      (22) Groff, A. F. et al. RNA-seq as a tool for evaluating human embryo competence. Genome Res 29, 1705–1718 (2019).

      (23) Starostik, M. R., Sosin, O. A. & McCoy, R. C. Single-cell analysis of human embryos reveals diverse patterns of aneuploidy and mosaicism. Genome Res 30, 814–826 (2020).

      (24) Vera-Rodriguez, M., Chavez, S. L., Rubio, C., Pera, R. A. R. & Simon, C. Prediction model for aneuploidy in early human embryo development revealed by single-cell analysis. Nat Commun 6, 7601 (2015).

      (25) Sanchez-Ribas, I. et al. Transcriptomic behavior of genes associated with chromosome 21 aneuploidies in early embryo development. Fertil Steril 111, 991-1001.e2 (2019).

      (26) Fuchs Weizman, N. et al. Towards Improving Embryo Prioritization: Parallel Next Generation Sequencing of DNA and RNA from a Single Trophectoderm Biopsy. Sci Rep 9, 1–11 (2019).

      (27) Fernandez Gallardo, E. et al. A multi-omics genome-and-transcriptome single-cell atlas of human preimplantation embryogenesis reveals the cellular and molecular impact of chromosome instability. bioRxiv 2023.03.08.530586 (2023) doi:10.1101/2023.03.08.530586.

      (28) Dürrbaum, M. & Storchová, Z. Effects of aneuploidy on gene expression: implications for cancer. FEBS J 283, 791–802 (2016).

      (29) Singla, S., Iwamoto-Stohl, L. K., Zhu, M. & Zernicka-Goetz, M. Autophagy-mediated apoptosis eliminates aneuploid cells in a mouse model of chromosome mosaicism. Nat Commun 11, 1–15 (2020).

      (30) Bolton, H. et al. Mouse model of chromosome mosaicism reveals lineage-specific depletion of aneuploid cells and normal developmental potential. Nat Commun 7, 1– 12 (2016).

      (31) Ohashi, A. et al. Aneuploidy generates proteotoxic stress and DNA damage concurrently with p53-mediated post-mitotic apoptosis in SAC-impaired cells. Nat Commun 6, 1–16 (2015).

      (32) Santaguida, S. & Amon, A. Short- and long-term effects of chromosome missegregation and aneuploidy. Nature Reviews Molecular Cell Biology vol. 16 473–485 Preprint at https://doi.org/10.1038/nrm4025 (2015).

      (33) Santaguida, S., Vasile, E., White, E. & Amon, A. Aneuploidy-induced cellular stresses limit autophagic degradation. Genes Dev 29, 2010–2021 (2015).

      (34) Chunduri, N. K. & Storchová, Z. The diverse consequences of aneuploidy. Nature Cell Biology 2019 21:1 21, 54–62 (2019).

      (35) Dürrbaum, M. et al. Unique features of the transcriptional response to model aneuploidy in human cells. BMC Genomics 15, 139 (2014).

      (36) Pan, J.-A., Ullman, E., Dou, Z. & Zong, W.-X. Inhibition of protein degradation induces apoptosis through a microtubule-associated protein 1 light chain 3-mediated activation of caspase-8 at intracellular membranes. Mol Cell Biol 31, 3158–70 (2011).

      (37) Stingele, S. et al. Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. Mol Syst Biol 8, 608 (2012).

      (38) Tang, Y.-C., Williams, B. R., Siegel, J. J. & Amon, A. Identification of aneuploidyselective antiproliferation compounds. Cell 144, 499–512 (2011).

      (39) Janssen, A., Van Der Burg, M., Szuhai, K., Kops, G. J. P. L. & Medema, R. H. Chromosome segregation errors as a cause of DNA damage and structural chromosome aberrations. Science 333, 1895–1898 (2011).

      (40) Li, M. et al. The ATM-p53 pathway suppresses aneuploidy-induced tumorigenesis. Proc Natl Acad Sci U S A 107, 14188–14193 (2010).

      (41) Thompson, S. L. & Compton, D. A. Proliferation of aneuploid human cells is limited by a p53-dependent mechanism. J Cell Biol 188, 369–381 (2010).

      (42) Yang, M. et al. Depletion of aneuploid cells in human embryos and gastruloids. Nat Cell Biol 23, 314–321 (2021).

      (43) Jones, R. G. et al. AMP-activated protein kinase induces a p53-dependent metabolic checkpoint. Mol Cell 18, 283–293 (2005).

      (44) Chunduri, N. K., Barthel, K. & Storchova, Z. Consequences of Chromosome Loss: Why Do Cells Need Each Chromosome Twice? Cells 2022, Vol. 11, Page 1530 11, 1530 (2022).

      (45) Krivega, M., Stiefel, C. M. & Storchova, Z. Consequences of chromosome gain: A new view on trisomy syndromes. American Journal of Human Genetics vol. 109 2126–2140 Preprint at https://doi.org/10.1016/j.ajhg.2022.10.014 (2022).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Brdar, Osterburg, Munick, et al. present an interesting cellular and biochemical investigation of different p53 isoforms. The authors investigate the impact of different isoforms on the in-vivo transcriptional activity, protein stability, induction of the stress response, and hetero-oligomerization with WT p53. The results are logically presented and clearly explained. Indeed, the large volume of data on different p53 isoforms will provide a rich resource for researchers in the field to begin to understand the biochemical effects of different truncations or sequence alterations.

      Strengths:

      The authors achieved their aims to better understand the impact/activity of different p53 is-forms, and their data will support their statements. Indeed, the major strengths of the paper lie in its comprehensive characterization of different p53 isoforms and the different assays that are measured. Notably, this includes p53 transcriptional activity, protein degradation, induction of the chaperone machinery, and hetero-oligomerization with wtp53. This will provide a valuable dataset where p53 researchers can evaluate the biological impact of different isoforms in different cell lines. The authors went to great lengths to control and test for the effect of (1) p53 expression level, (2) promotor type, and (3) cell type. I applaud their careful experiments in this regard.

      Weaknesses:

      One thing that I would have liked to see more of is the quantification of the various pull-down/gel assays - to better quantify the effect of, e.g., hetero-oligomerization among the various isoforms. In addition, a discussion about the role of isoforms that contain truncations in the IDRs is not available. It is well known that these regions function in an auto-inhibitory manner (e.g. work by Wright/Dyson) and also mediate many PPIs, which likely have functional roles in vivo (e.g. recruiting p53 to various complexes). The discussion could be strengthened by focusing on some of these aspects of p53 as well.

      Thank you for these comments. In this paper we have focused on the importance of the integrity of the folded domains of p53 for their function. The unfolded regions in the N- and the C-terminus have not been our main target but the reviewer is right that they play important regulatory functions that are lost in the corresponding isoforms. We have, therefore, added a few sentences in the Discussion section.

      With respect to a better quantification, we have re-evaluated the quantification and adjusted where necessary (see also reviewer 2). With respect to the hetero-oligomerization we have run a new mass spectrometry experiment in which we only focus on the p53 peptides. These have been now quantitatively evaluated and the results are provided in this manuscript Fig. 5.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript entitled "p53 isoforms have a high aggregation propensity, interact with chaperones and lack 1 binding to p53 interaction partners", the authors suggest that the p53 isoforms have high aggregation propensity and that they can co-aggregate with canonical p53 (FLp53), p63 and p73 thus exerting a dominant-negative effect.

      Strengths:

      Overall, the paper is interesting as it provides some characterization of most p53 isoforms DNA binding (when expressed alone), folding structure, and interaction with chaperones. The data presented support their conclusion and bring interesting mechanistic insight into how p53 isoforms may exert some of their activity or how they may be regulated when they are expressed in excess.

      Weaknesses:

      The main limitation of this manuscript is that the isoforms are highly over-expressed throughout the manuscript, although the authors acknowledge that the level of expression is a major factor in the aggregation phenomenon and "that aggregation will only become a problem if the expression level surpasses a certain threshold level" (lines 273-274 and results shown in Figures S3D, 6E). The p53 isoforms are physiologically expressed in most normal human cell types at relatively low levels which makes me wonder about the physiological relevance of this phenomenon.

      Furthermore, it was previously reported that some isoforms clearly induce transcription of target genes which are not observed here. For example, p53β induces p21 expression (Fujita K. et al. p53 isoforms Delta133p53 and p53beta are endogenous regulators of replicative cellular senescence. Nat Cell Biol. 2009 Sep;11(9):1135-42), and Δ133p53α induces RAD51, RAD52, LIG4, SENS1 and SOD1 expression (Gong, L. et al. p53 isoform D113p53/D133p53 promotes DNA double-strand break repair to protect cell from death and senescence in response to DNA damage. Cell Res. 2015, 25, 351-369. / Gong, L. et al. p53 isoform D133p53 promotes the efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Sci. Rep. 2016, 6, 37281. / Horikawa, I. et al. D133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell Death Differ. 2017, 24, 1017-1028. / Gong, L. p53 coordinates with D133p53 isoform to promote cell survival under low-level oxidative stress. J. Mol. Cell Biol. 2016, 8, 88-90. / Joruiz et al. Distinct functions of wild-type and R273H mutant Δ133p53α differentially regulate glioblastoma aggressiveness and therapy-induced senescence. Cell Death Dis. 2024 Jun 27;15(6):454.) which demonstrates that some isoforms can induce target genes transcription and have defined normal functions (e.g. Cellular senescence or DNA repair).

      However, in this manuscript, the authors conclude that isoforms are "largely unfolded and not capable of fulfilling a normal cellular function" (line 438), that they do not have "well defined physiological roles" (line 456), and that they only "have the potential to inactivate members of the p53 protein family by forming inactive hetero complexes with wtp53" (line 457-458).

      Therefore, I think it is essential that the authors better discuss this major discrepancy between their study and previously published research.

      This manuscript is not about hunting for the next “signal transduction pathway” that is “regulated” by a specific p53 isoform. For such a project work has indeed to be conducted at the endogenous level. However, our manuscript is about the basic thermodynamic behavior of these isoforms in in vitro assays and in some cell culture assays.

      What, however, depends on the expression level is the interaction with chaperones as well as the tendency to aggregate. And this we actually show in our manuscript by using two different promotors with very different strength: Strong overexpression leads to aggregation, much weaker expression to soluble isoforms. For the mass spectrometry experiments we have established stable expressing cell lines and not used transiently overexpressing ones.

      The level from which on the chaperone systems of the cell cannot keep these isoforms soluble and they start to aggregate is certainly an important question, and we have experimental evidence that if we use different chaperone inhibitors the percentage of the aggregating isoforms in the insoluble fraction increases.

      Proteins have to follow the basic physicochemical rules also in cells. And this manuscript sets the stage for re-interpreting the observed cellular effects – not in terms of specific interaction with certain promoters but as causing a stress response and non-specific interaction with other not-well folded domains of other proteins.

      With respect to this discussion about the physiological relevance, it is interesting to look at a study that was published in Cell:

      Rohaly, G., Chemnitz, J., Dehde, S., Nunez, A.M., Heukeshoven, J., Deppert, W. and Dornreiter, I. (2005) A novel human p53 isoform is an essential element of the ATR-intra-S phase checkpoint. Cell, 122, 21-32.

      This manuscript describes how a specific isoform regulates an important pathway. Two other studies also focused on the same isoform but showed that it lacks the nuclear localization signal and therefore does not enter the nucleus. And even if it would, it would have no transcriptional activity due to the unfolding of the DBD.

      Chan, W.M. and Poon, R.Y. (2007) The p53 Isoform Deltap53 lacks intrinsic transcriptional activity and reveals the critical role of nuclear import in dominant-negative activity. Cancer Res, 67, 1959-1969.

      Garcia-Alai, M.M., Tidow, H., Natan, E., Townsley, F.M., Veprintsev, D.B. and Fersht, A.R. (2008) The novel p53 isoform "delta p53" is a misfolded protein and does not bind the p21 promoter site. Protein Sci, 17, 1671-1678.

      This example shows that it is important to re-consider the basic principles of protein structure and protein folding. And that is exactly what this manuscript is about.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Does the p53g C-terminus (322-346) form cross-beta amyloid structures? The strong fluorescence signal in the presence of ThT suggests this may be forming amyloid. I wonder if any amyloid sequence predictors identify this region as amyloidogenic.

      Using the Waltz predictor (https://doi.org/10.1038/nmeth.1432), the amino acids 339-346 have been identified as potentially amyloidogenic. We have added this information to the manuscript.

      (2) The chaperone binding results in Figure 5 are interesting and indeed suggest that many p53 isoforms interact with chaperones in vivo to counteract their destabilized nature. For the 5 p53 isoforms shown in Figure 5D, do they present any HSP70-binding motifs that may not exist in wtp53? These motifs can be predicted from the sequence with established software in a similar manner as the authors performed for TANGO.

      Author response image 1.

      Predicted Chaperon binding sites using the LIMBO prediction tool. (http://www.ncbi.nlm.nih.gov/pubmed/19696878)

      We have analyzed the sequence of p53 and the isoforms for potential HSP70 binding sites using the LIMBO prediction tool. The results are shown in the figure above. Wild type p53 has a very strong site that is lost in the β- and ɣ-isoforms. The ɣ-isoform in addition loses another predicted binding site which is replaced with a ɣ-specific one. Overall, this analysis does not provide a very clear picture due to the loss of some and the creation of new, isoform-specific binding sites. We have, therefore, not included this analysis in the manuscript but show it here for the reviewers.

      (3) The mixed hetero-tetramers detected by the MS is very interesting. Also the pull-down experiments in Figure 6. However, the extent of hetero-oligomerization is at times hard to follow. Could you more clearly summarize and/or quantify the results of the hetero-oligomerization experiments?

      We have conducted a new mass spectrometry experiment that was focused only on the analysis of p53 peptides. These data are now shown in Figure 5 and Supplementary Figure 6. They show that peptides not present in the Δ133p53α isoform and therefore must come from wild type p53 can be detected. For the Δ133p53β isoform these peptides are absent, suggesting that this isoform does not hetero-oligomerize with wild type p53. Furthermore, all β- and ɣ- isoforms do not show peptides derived from wild type p53, again suggesting that they cannot hetero-oligomerize due to the lack of a functional oligomerization domain.

      (4) There is a typo in Figure 5. The figure title (top of page) says "Figure 4: Chaperons". Also, "chaperons" appears in the legend.

      Thank you for making us aware of this problem. This has been corrected.

      (5) The figures are often quite small with a lot of white space. Figure 4 in particular is arranged in a confusing way with A, D, B, C, E, F, G in T->B L->R order. Perhaps some figures could be expanded or re-arranged to make better use of the available space. E.g. could move B, C above panel D, and then shift F, G to be next to E. This would give you A, [B, C, D], [E, F, G] in a 2x2 format.

      We have rearranged figures 2, 4, 5 and 6 to be able to enlarge the individual figure panels.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2C: Why is the p21-Luc reporter assay performed in SAOS-2 cells when all other assays are performed in H1299?

      The assays we have performed in this study are independent of the cell type because we investigate very basic principles of protein folding and stability. If one removes a third of a folded domain, this domain will no longer fold, independent of the cell type it is in. However, to show, that the cell type indeed does not play any role, we have repeated the experiments in H1299 cells. These data are now shown in Figure 2C and the original data in SAOS cells we have moved to Supplementary Figure 1E.

      (2) Figure 3: I find the statistics on this figure very confusing... It looks like every isoform is compared to the "WT", but in that case, in Figure 3B for example, how can the Δ40p53β be ****, Δ133p53γ be *** while the Δ133p53α, more different to WT and narrower error bars is non-specific? I guess this comes from the normalization of the GST expression of each isoform but in this case, the isoforms should not be compared to the WT, but to their respective GST sample.

      There was indeed a mistake in the statistics, thank you for pointing this out.

      We repeated the statistical analysis and the relative protein level within each sample is now calculated using the ratio between the respective GST sample and the sample containing E6. Significance for each isoform was assessed by comparing the relative protein level to the protein level of the WT.

      (3) Figures 3D and 3E: the authors did not perform the assays on Δ40p53 isoforms because they "contain a fully folded DBD" (lines 218-219). This may be true for Δ40p53α as shown by the pAB240 binding figure 3C, but it is speculative for Δ40p53β and Δ40p53γ since these were not tested in Figure 3C either... Furthermore, Figure 3B suggests that there may be differences between Δ40p53α, Δ40p53β and Δ40p53γ and therefore these two isoforms should be tested for pAB240 IP at least (and DARPin as well if the pAB240 IP shows differences). Also, why were the TAp53β and TAp53γ not tested in Figures 3D and 3E?

      Here we disagree with the reviewer. The PDB is full of structures of the p53 DNA binding domain. All of them – including many structures of the same domain from other species – span residues ~90 to 294 (or the equivalent residues in other species). That means that the β- and ɣ- versions of p53 contain the full DNA binding domain. In contrast to the DNA binding domain, the oligomerization domain, however, is truncated and therefore does not form functional tetramers. This is the reason for the reduced binding affinity to DNA.

      The pAB240 antibody recognizes and binds to an epitope that becomes exposed upon the unfolding of the DBD. This manuscript shows by multiple experiments that the DBD of the β- and the ɣ-isoforms are not compromised but that the oligomerization domain is not functional. In figures 3D and 3E we have not included the TA β- and the ɣ-isoforms, because, again, they have a folded DBD and their inclusion would not provide any additional information compared to TAp53α.

      (4) Figures 4B and 4C are small and extremely difficult to read.

      We agree and have rearranged and enlarged these and other figures. Please see also answer to comment (5) of reviewer 1.

      (5) Figure 5C: the authors claim that "the isoform induced cellular stress that triggers the expression of chaperones" (line 320). However, if the induction of the HSP70 promoter is shown, there is no evidence that this is due to cellular stress. Evidence to support that claim should be shown.

      The expression and accumulation of unfolded, aggregation prone sequences is a stress situation for the cell which triggers the expression of chaperones. The expression of isoforms that are not well folded or of p53 mutants that are not well folded increases expression both from the HSP70 promoter and the heat shock promoter. This shows that the expression of unfolded isoforms induces cellular stress.

      (6) Figure 5D: why was this experiment performed in SAOS2 cells when the whole paper was otherwise performed in H1299 cells?<br /> Also, about this figure, the authors write "In addition to this common set, Δ133p53α and Δ40p53α showed only very few additional interaction partners. This situation was very different for Δ133α, Δ133β and TAp53γ." (lines 331 to 333). My feeling is that we should instead read "In addition to this common set, TAp53β and Δ40p53α showed only very few additional interaction partners. This situation was very different for Δ133p53α, Δ133p53β and TAp53γ"

      Thank you for spotting this mistake. Indeed, the correct wording is TAp53β and Δ40p53α and we have corrected the manuscript.

      The mass spectrometry experiments were actually not carried out in SAOS cells, but in U2OS cells. The reason for not using the H1299 cell line was that these cells do not contain functional p53. In contrast, U2OS cells express wild type p53. We have repeated the mass spectrometry analysis and analyzed the data with a special focus on p53 peptides. This information is now added as Figure 5E. In this analysis we show that the Δ133p53α samples contain peptides from the DBD that are not part of this truncated isoform and must therefore originate from wild type p53 with which this isoform hetero-oligomerizes. The corresponding peptides are absent from Δ133p53β, showing that without a functional oligomerization domain this isoform does not interact with wild type p53. Likewise, the data demonstrate that the β- and the ɣ-isoforms do not form hetero-oligomers.

      (7) Supplementary Table 2: the authors claim "For Δ133p53α we could identify peptides between amino acids 102 and 132 that must originate from wild type p53". SAOS2 has a WT TP53 gene and expresses all isoforms endogenously. Therefore, peptides between amino acids 102 and 132 can actually originate from "WT p53" but also TAp53β, TAp53γ, Δ40p53α, Δ40p53β or Δ40p53γ (most likely a mix of these).

      We have not used SAOS cells but U2OS cells. As mentioned above the data show that the Δ133p53α sample contains peptides from wild type p53 and that these peptides cannot be found in the Δ133p53β sample. In addition, peptides originating from the oligomerization domain are only found in the samples of isoforms containing an oligomerization domain but not in samples of β- and ɣ-isoforms. The data are presented in Figure 5 E-G and Supplementary Figure S5.

      Since the Biotin ligase is directly fused to a specific isoform, peptides from other isoforms can only be detected if these directly interact with the isoform fused to the ligase (and contain unique peptides, not present in the isoform fused to the ligase). The data confirm that only isoforms that have a functional oligomerization domain can interact with wild type p53 (or potentially other isoforms with a functional oligomerization domain).

      (8) Figure 6: Why not conduct these luciferase reporter assays using the MDM-2 and p21 promoters like in Figure 2B and 2C since there may be promoter-specific regulation?

      This would be particularly important for the p21 promoter as TAp53β is known to induce it (Fujita K. et al. p53 isoforms Delta133p53 and p53beta are endogenous regulators of replicative cellular senescence. Nat Cell Biol. 2009 Sep;11(9):1135-42) and the Δ133p53α, Δ133p53β and Δ133p53γ isoforms were shown to reduce p21 transcription by TAp73β when co-expressed in H1299 cells (Zorić A. et al. Differential effects of diverse p53 isoforms on TAp73 transcriptional activity and apoptosis. Carcinogenesis. 2013 Mar;34(3):522-9.). Neither of these regulations appears here on the pBDS2 reporter, which is puzzling.

      The main point of this paper is that all isoforms without a complete DNA binding domain and without a complete oligomerization domain do not bind to DNA with high affinity and do not show transcriptional activity and that is independent of the promotor. There might be effects of expressing certain isoforms in some cells, but that is most likely by inducing a stress response via expression of chaperones etc. High affinity sequence specific DNA binding does not play a role here (see results in Figure 2) and we have therefore not conducted these suggested experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Although the reviewers found our work interesting, they raised several important concerns about our study. To address these concerns, mostly we performed new experiments. The most important changes are highlighted in the summary paragraphs.

      First, in response to Reviewer 1’s suggestions, we have conducted the SFN experiments systematically, e.g., we further confirmed the mechanism of SFN-activated TFEB in HeLa NPC1 cells with new experiments including: the effect of BAPTA-AM (a calcium chelator), FK506+CsA (calcineurin inhibitors) and NAC (ROS scavenger) on SFN-induced TFEB-nuclear translocation in HeLa NPC1 cells (New Fig. S3). The effect of SFN on NPC1 expression (New Fig. S5). Particularly, we examined the colocalization of DiO (a PM marker) staining and surface LAMP1 staining in HeLa NPC1 cells under SFN treatment to confirm the PM exocytosis. In main text and figure legends, accuracy of sentence is thoroughly checked and defined. Hence, we have significantly improved the presentation and clarity in the revision.

      Second, in response to Reviewer 2’s suggestions, we have performed additional experiments to demonstrate that the role of TFEB in SFN-evoked the lysosomal exocytosis by using TFEB-KO cells (New Fig. S7B). In TFEB KO cells, this increase of surface LAMP1 signal by SFN treatment was significantly reduced, suggestive of SFN-induced exocytosis in a TFEB-dependent manner. We also investigated the effect of U18666A on CF555-dextran endocytosis. By examining the localization of CF-dex and Lamp1, we found that CF555 is present in the lysosome with U18666A treatment (Fig for reviewers only A,B), suggesting that NPC1 deficiency/U18666A treatment has no effect on CF-dex endocytosis.

      Third, in response to Reviewer 3’s suggestions, we have performed experiments in addition to response to other reviewers’ suggestion ie. the cytotoxicity of the concentration of SFN used in this study in various cell lines (New Fig.S10).

      In addition, according to the reviewers’ suggestions, we made clarifications and corrections wherever appropriate in the manuscript.

      Reviewer #1 (Public review):

      Summary:

      The authors are trying to determine if SFN treatment results in dephosphorylation of TFEB, subsequent activation of autophagy-related genes, exocytosis of lysosomes, and reduction in lysosomal cholesterol levels in models of NPC disease.

      Strengths:

      (1) Clear evidence that SFN results in translocation of TFEB to the nucleus.

      (2) In vivo data demonstrating that SFN can rescue Purkinje neuron number and weight in NPC1<sup>-/-</sup> animals.

      Thank you for the support!

      Weaknesses:

      (1) Lack of molecular details regarding how SFN results in dephosphorylation of TFEB leading to activation of the aforementioned pathways. Currently, datasets represent correlations.

      Thank you for raising this critical point! The reviewer is right that in this manuscript we did not talk too much about the molecular mechanism of SFN-evoked TFEB activation. Because in our previous study (Li, Shao et al. 2021), we explored the mechanism of SFN-induced TFEB activation. We show that SFN-evoked TFEB activation via a ROS-Ca<sup>2+</sup>-calcineurin dependent but MTOR -independent pathway (Li, Shao et al. 2021). In the current manuscript, we cited this paper, but did not talk the details of the mechanism, which obviously confused the reviewers. Therefore, in the revision manuscript we added more details of the molecular mechanism of SFN-activated TFEB. Also, we further confirmed this mechanism in HeLa NPC1 cells with new experiments including: the effect of BAPTA-AM (a calcium chelator), FK506+CsA (calcineurin inhibitors) and NAC (ROS scavenger) on SFN-induced TFEB-nuclear translocation in NPC cells (New Fig.S3).

      (2) Based on the manuscript narrative, discussion, and data it is unclear exactly how steady-state cholesterol would change in models of NPC disease following SFN treatment. Yes, there is good evidence that lysosomal flux to (and presumably across) the plasma membrane increases with SFN. However, lysosomal biogenesis genes also seem to be increasing. Given that NPC inhibition, NPC1 knockout, or NPC1 disease mutations are constitutively present and the cell models of NPC disease contain lysosomes (even with SFN) how could a simple increase in lysosomal flux decrease cholesterol levels? It would seem important to quantify the number of lysosomes per cell in each condition to begin to disentangle differences in steady state number of lysosomes, number of new lysosomes, and number of lysosomes being exocytosed.

      Thank you for this constructive comment. From our data, in NPC1 cells SFN reduced the cholesterol levels by inducing lysosomal exocytosis and increasing lysosomal biogenesis. We understand the reviewer’s point that it would be really helpful to differentiate the exact three states of original number of lysosomes, number of new lysosomes, and number of lysosomes being exocytosis. Unfortunately, due to the technique limitation, so far seems there is no appropriate method that could clearly differentiate the lysosomes exactly come from which state. In the future, hopefully we will have technique to explore this mechanism.

      (3) Lack of evidence supporting the authors' premise that "SFN could be a good therapeutic candidate for neuropathology in NPC disease".

      Suggestion was taken! We removed this sentence. Thanks!

      Reviewer #2 (Public review):

      (4) The in vivo experiments demonstrate the therapeutic potential of SFN for NPC. A clear dose response analysis would further strengthen the proposed therapeutic mechanism of SFN.

      Thank you for this constructive suggestion. We examined the effect of two doses of SFN30 and 50mg/kg on NPC mice. As shown in Fig.6, SFN (50mg/kg), but not 30mg/kg prevents a degree of Purkinje cell loss in the lobule IV/V of cerebellum, suggesting a dose-correlated preventive effect of SFN. In the future study, we will continue optimizing the dosage form and amount of SFN and do a dose-responsive analysis.

      (5) Additional data supporting the activation of TFEB by SFN for cholesterol clearance in vivo would strengthen the overall impact of the study.

      Thank the reviewer for this constructive comment. We have detected a significant decrease of pS211-TFEB protein in brain tissues of NPC mice upon SFN treatment compared to vehicle, suggesting that SFN activates TFEB in brain tissue for the first time. It is worth to further examine the lysosomal cholesterol levels in brain tissues to show the direct effect of SFN. However, in our hands and in the literatures Filipin seems not suitable for detecting lysosomal cholesterol accumulation in brain tissue. So far there isn’t a good method to directly measure lysosomal cholesterol in tissue.

      (6) In Figure 4, the authors demonstrate increased lysosomal exocytosis and biogenesis by SFN in NPC cells. Including a TFEB-KO/KD in this assay would provide additional validation of whether these effects are TFEB-dependent.

      Great suggestion! We investigated the role of TFEB in SFN-evoked the lysosomal exocytosis by using TFEB-KO cells. As shown in New Suppl. Fig. 7B, in TFEB KO cells, this increase of surface LAMP1 signal by SFN (15 μM, 12 h) treatment was significantly reduced, suggestive of SFN induced exocytosis in a TFEB-dependent manner.

      (7) For lysosomal pH measurement, the combination of pHrodo-dex and CF-dex enables ratiometric pH measurement. However, the pKa of pHrodo red-dex (according to Invitrogen) is ~6.8, while lysosomal pH is typically around 4.7. This discrepancy may account for the lack of observed lysosomal pH changes between WT and U18666A-treated cells. Notably, previous studies (PMID: 28742019) have reported an increase in lysosomal pH in U18666A-treated cells.

      We understand the reviewer’s point. But as stated in the methods and main text, we used pHrodo™ Green-Dextran (P35368, Invitrogen), rather than pHrodo Red-dextran. According to the product information from Invitrogen, pHrodo Green-dex conjugates are non-fluorescent at neural pH, but fluorescence bright green at acidic pH around 4, such as those in endosomes and lysosomes. Therefore, pHrodo Green-dex is suitable to monitor the acidity of lysosome (Hu, Li et al. 2022). We also used LysoTracker Red DND-99 (Thermo Scien fic, L7528) to measure lysosomal pH (Fig. 4G, H), which is consistent with results from pHrodo Green/CF measurement.

      The reviewer mentioned that previous studies have reported an increase in lysosomal pH in U18666Atreated cells. We understood this concern. But in our hands, from our data with two lysosomal pH sensors, we have not detected lysosomal pH change in U18666A-treated NPC1 cell models.

      (7) The authors are also encouraged to perform colocalization studies between CF-dex and a lysosomal marker, as some researchers may be concerned that NPC1 deficiency could reduce or block the trafficking of dextran along endocytosis.

      Thank you for raising this important point and suggestion was taken! We investigated the effect of NPC1 deficiency on CF555-dextran trafficking into lysosome by examining the localization of CF-dex and Lamp1. To clearly define whether CF555-dex is present in the lysosome, we first used apilimod to enlarge lysosomes and then examined the relative posi on of CF555-dex and lamp1. As shown in Author response image 1A,B, in HeLa cells treated with U18666A, CF555 signals (red) clearly present inside lysosome (LAMP1 labelled lysosomal membrane, green signal), suggesting that CF555dex endocytosis is not affected by NPC1 deficiency (U18666A treatment).

      Author response image 1.

      The effect of NPC1 deficiency on CF555 endocytosis. HeLa cells were transiently transfected with LAMP1-GFP plasmid for 24 h. Cells were then treated with apilimod (100 nM) for 2 h to enlarge the lysosomes, and followed by co- treatment of U18666A (2.5 μM, 24 h) and CF555 (12 h). (A)Each panel shows fluorescence images taken by confocal microscopes. (B) Each panel shows the fluorescence intensity of a line scan (white line) through the double labeled object indicated by the white arrow. Scale bar, 20 μm or 2 μm (for zoom-in images).

      (9) In vivo data supporting the activation of TFEB by SFN for cholesterol clearance would significantly enhance the impact of the study. For example, measuring whole-animal or brain cholesterol levels would provide stronger evidence of SFN's therapeutic potential.

      We really appreciate the reviewer’s comments. Please see response to point #5.

      Reviewer #3 (Public review):

      (10) The manuscript is extremely hard to read due to the writing; it needs careful editing for grammar and English.

      Sorry for the defects in the writing and grammar. We had thoroughly checked grammar and polished the English to improve the manuscript.

      (11) There are a number of important technical issues that need to be addressed.

      We will address the technical issues mentioned in the following ques ons.

      (12) The TFEB influence on filipin staining in Figure 1A is somewhat subtle. In the mCherry alone panels there is a transfected cell with no filipin staining and the mCherry-TFEBS211A cells still show some filipin staining.

      Thank you for raising this point. The reviewer is right that not all the mCherry alone cells with the same level of filipin signal and not all mCherry-TFEBS211 transfected cells show completely no filipin signal. The statistical results were from randomly selected cells from 3 independent experiments. To avoid the confusion, we have included more cells in the statistical analysis to cover all the conditions as shown in the new Fig. 1B. Hopefully this helps to clarify the confusion.

      (13) Figure 1C is impressive for the upregulation of filipin with U18666A treatment. However, SFN is used at 15 microM. This must be hitting multiple pathways. Vauzour et al (PMID: 20166144) use SFN at 10 nM to 1microM. Other manuscripts use it in the low microM range. The authors should repeat at least some key experiments using SFN at a range of concentrations from perhaps 100 nM to 5 microM. The use of 15 microM throughout is an overall concern.

      The reason that we use this concentration of SFN is based on our previous study (Li, Shao et al. 2021). We had shown that SFN (10–15 μM, 2–9 h) induces robust TFEB nuclear translocation in a dose- and time-dependent manner in HeLa cells as well as in other human cell lines without cytotoxicity (Li, Shao et al. 2021). Also, tissue concentrations of SFN can reach 3–30 μM upon broccoli consumption (Hu, Khor et al. 2006), so we used low micromolar concentrations of SFN (15 μM) in our study. Moreover, we further confirmed that SFN (15 μM) induces TFEB nuclear translocation in HeLa NPC1 cells (Fig. 1F, G Fig. 2B, G) and this concentration of SFN has no cytotoxicity (New Fig.S10).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The following comments are designed to improve and focus the authors' work.

      (14) Related to data in Figure 1. The mechanism through which TFEB can reduce Filipin in U18 conditions is unclear. Inhibi on of NPC1 results in hyperactivation of mTOR through cholesterol transport at ER-Lysosome contacts (see Zoncu group publications). If mTORC is hyperac ve in NPC disease models, TFEB would be expected to remain cytoplasmic and not enter the nucleus as the representative image in Figure 1A demonstrates.

      In our previous study (Li, Shao et al. 2021), we have shown that SFN induces TFEB nuclear translocation in a mTOR-independent manner (Li, Shao et al. 2021). Consistent with this result, in this study we confirmed that SFN-induced TFEB nuclear translocation is mTor-independent in NPC1 cells (Now Fig. S4A, B). Thus, SFN induced TFEB nuclear translocation in various NPC cells (Fig. 1F, G, Fig. 2B, G). Please also see the discussion about the mechanism of SFN in response to point #1.

      (15) Therefore, how does overexpression of TFEB, which remains in the cytoplasm, result in a decreased filipin signal? Similar ques ons relate to Figure 1C-H.

      Medina et. al (Medina, Fraldi et al. 2011) show that TFEB overexpression (not activation, so overexpressed TFEB is in the cytoplasm) increases the pool of lysosomes in the proximity of the plasma membrane and promotes their fusion with PM by raising intracellular Ca<sup>2+</sup> levels through lysosomal Ca<sup>2+</sup> channel MCOLN1, leading to increased lysosomal exocytosis. Hence, TFEB overexpression only (TFEB is not activated) could reduce filipin signal via increasing lysosomal exocytosis. And with TFEB agonist treatment such as TFEB could further boost this increase.

      (16) It would seem appropriate to measure the NPC1 and NPC2 proteins using western blot to ensure that SFN-dependent clearance of cholesterol is not due to enhanced expression of the native protein in U18-treated cells or enhanced folding of the protein in patient fibroblasts.

      Thank you for this constructive comment! Because NPC1 gene mutation takes about 95% of NPC cases and NPC2 mutation takes about 5% of NPC cases. And in this study we focused on NPC1 deficiency cases. Thus, we measured the effect of SFN on the expression of NPC1 in human NPC1-patient fibroblasts. Western blot analysis showed that SFN (15 μM, 24 h) treatment did not affect NPC1 expression in human NPC1-patient fibroblasts (new Fig. S5).

      (17) Related to data in Figures 1C-E. Controls are missing related to the effect SFN has on steady-state cholesterol levels. This may be insightful in providing information on the mode of action of this compound.

      Suggestion was taken! We have supplemented the control- SFN only in new Fig. 1C-E.

      (18) The mechanism that links SFN to TFEB-dependent translocation is suggested to involve calcineur independent dephosphorylation of TFEB. However, no data is provided. It would seem important to iden fy the mechanism(s) through which SFN positively regulates TFEB location. This would shift the manuscript and its model from correlations to causation. Experiments involving calcineurin inhibitors, or agonists of TRPML1 that have been reported as being a key source of Ca<sup>2+</sup> for calcineurin activation, may provide molecular insight.

      Please see the paragraph in response to point #1.

      (19) Related to Figure 4. Using a plasma membrane counterstain to quantify plasma membrane LAMP1 would increase the rigor of the analysis.

      Great idea! We examined the colocalization of DiO (a PM marker) staining and LAMP1 staining in HeLa NPC1 cells under SFN treatment. As shown in new Fig.4A, surface LAMP1 signal(red) colocalized with DiO (green), a PM marker.

      (20) Related to Figure 5. How do the authors explain the kinetic disparity between SFN treatment for 24 vs 72 hrs? IF TFEB is activated and promoting lysosomal biogenesis and increased lysosomal flux across the PM, why does cholesterol accumulation lag? Perhaps related to this point. Are other cholesterol metabolizing enzymes that may have altered activity in NPC sensitive to SFN? A similar comment applies to the Sterol regulatory element binding protein pathway, which has been shown to be activated in models of NPC disease.

      We understand the reviewer’s point. As shown in Fig. 5C, D, in NPC1<sup>-/-</sup> MEF cells, SFN treatment for 24 h showed relative weaker cholesterol clearance compared to the effects in human cells (Fig.1C, D, Fig.2.E, I). Thus, we explored a longer treatment of SFN for 72 h (fresh SFN in medium was added every 24 h), and 72h treatment of SFN exhibited substantial cholesterol reduction (Fig. 5C, D). This different effect could be attributed to the continuous action of SFN, which could prolong the exocytosis, leading to more effective cholesterol clearance. As shown in the DMSO-treated MEF cells, the cholesterol levels are similar in both 24 and 72 h, thus 24 h U18666A treatment has reached the upper limit of the accumulated cholesterol, longer treatment me would not change the cholesterol levels. Thus, cholesterol accumulation has no lag.

      We did not investigate whether SFN regulates other cholesterol metabolizing enzymes or sterol regulatory element binding proteins although we cannot rule out this possibility. In this study we mainly focus on the cholesterol clearance effect by SFN via TFEB-mediated pathways. From our data, TFEB KO could significantly diminish SFN-evoked cholesterol clearance. Hence, the effect of other cholesterol metabolizing enzymes or sterol regulatory element binding proteins maybe not as important as TFEB, thus out of scope of this study. In the future, we may explore the involvement of possible other pathways on SFN’s effects.

      (21) Related to Figure 7. The western blots for pS211-TFEB are poor. It's suggested that whole blots are shown to increase rigor.

      Thank you for the comments. We have represented the blots with more spare space to increase the rigor.

      (22) Data demonstrating the ability of SFN to improve Purkinje cell survival are exci ng and pair well with the weight analysis, however, to address the overall goal of determining if "SFN could be a good therapeutic candidate for neuropathology in NPC disease" survival analysis should be tested as well.

      Please see the paragraph in response to point #3.

      Minor

      (23) Throughout the manuscript many different Fonts and font sizes are used. This is very jarring to readers. It is suggested that a more uniform approach is taken to presenting these nice datasets.

      We are so sorry and apologize for these oversights. We have thoroughly checked all the manuscript to make sure that Fonts and sizes of font are synchronized.

      (24) Related to data presentation. In general, there is a lack of alignment and organization of the figures.

      So sorry about this. We have reorganized the figures to get them better aligned.

      (25) Line 149, SFN is missing.

      Corrected!

      Reviewer #3 (Recommendations for the authors):

      (26) In Figure 3 the authors should use multiple single siRNAs or perform a functional rescue to determine specificity.

      We understand the reviewer’s point. We did design several siRNAs and the efficiency of these siRNAs were validated. Finally, we decide use this siRNA whose knockdown efficiency is best in the study and the specificity of the siTFEB has been validated by Western blot as shown in Fig. 3A. Furthermore, we used TFEB knockout cells constructed by CRISPR/Cas9 to further examine the role of TFEB in SFN-induced cholesterol clearance (Fig. 3D). Consistently with the results in the siTFEB-transfected HeLa NPC1 cells (Fig. 3B, C), SFN failed to diminish cholesterol in HeLa TFEB KO cells. The result from TFEB KO cells is even convincing than siRNA experiment. We also performed a functional rescue of re-expressing TFEB in TFEB KO cells, in which SFN-induced cholesterol clearance was restored (Fig. 3E, F). Collectively, these data indicate that TFEB is required for lysosomal cholesterol reduction upon SFN treatment. Thus, we did not repeat this rescue experiment in the siTFEB-transfected HeLa NPC1 cells.

      (27) The label for 3D is missing.

      Corrected! Thanks!

      (28) Figure 4, although the authors use an an body against the luminal domain of LAMP1 there could s ll be some permeabilization. A marker of the plasma membrane would be helpful.

      Please see the response to point #19.

      (29) Figure 4, cholesterol in the media because of lysosome exocytosis. This is where the high concentration of SFN is of concern. Is there any cell death that could explain the result? The authors should test for cell death with the SFN treatment.

      Thank you for raising this important point! We have measured the cytotoxicity of SFN of the concentrations used in this study in various cell lines (New Fig.S10). Please also see the paragraph in response to point #13.

      (30) The blot in Figure 6A is unclear. It is very hard to see any change in pS211-TFEB levels, and, the blurry signal is the detection of phospho-TFEB is uncertain.

      Please see the summary paragraph in response to point #21.

      References:

      Hu, M. Q., P. Li, C. Wang, X. H. Feng, Q. Geng, W. Chen, M. Marthi, W. L. Zhang, C. L. Gao, W. Reid, J. Swanson, W. L. Du, R. Hume and H. X. Xu (2022). "Parkinson's disease-risk protein TMEM175 is a proton-activated proton channel in lysosomes." Cell 185(13): 2292-+.

      Hu, R., T. O. Khor, G. Shen, W. S. Jeong, V. Hebbar, C. Chen, C. Xu, B. Reddy, K. Chada and A. N. Kong (2006). "Cancer chemoprevention of intestinal polyposis in ApcMin/+ mice by sulforaphane, a natural product derived from cruciferous vegetable." Carcinogenesis 27(10): 2038-2046.

      Li, D., R. Shao, N. Wang, N. Zhou, K. Du, J. Shi, Y. Wang, Z. Zhao, X. Ye, X. Zhang and H. Xu (2021). "Sulforaphane Activates a lysosome-dependent transcriptional program to mitigate oxidative stress." Autophagy 17(4): 872-887.

      Medina, D. L., A. Fraldi, V. Bouche, F. Annunziata, G. Mansueto, C. Spampanato, C. Puri, A. Pignata, J. A. Martina, M. Sardiello, M. Palmieri, R. Polishchuk, R. Puertollano and A. Ballabio (2011). "Transcriptional activation of lysosomal exocytosis promotes cellular clearance." Dev Cell 21(3): 421-430.

    1. Author response:

      The following is the authors’ response to the original reviews

      We would like to thank you and the reviewers for valuable feedback on the first version of the manuscript. We now addressed all of the issues raised by reviewers, mostly by implementing the suggested changes and clarifying important details in the revised version of the manuscript. A detailed response to each comment is provided in the rebuttal letter. Briefly, the main changes were as follow:

      - We changed homeostatic balance to network balance especially when describing the main finding as the response changes induced by the stimulation occurred on a fast timescale. We speculate the sustained changes observed in the post-stimulation condition are the result of homeostatic mechanisms.

      - We added additional verification on the target stimulation effect by adding a supplementary result showing its effect between the target and off-target z-planes, as well as demonstrating the minimal impact of the imaging laser to rsChRmine.

      - We added a simple toy model illustrating suppression specifically applied to co-tuned cells that yields the response amplitude decrease, to further support our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kang et al. provide the first experimental insights from holographic stimulation of auditory cortex. Using stimulation of functionally-defined ensembles, they test whether overactivation of a specific subpopulation biases simultaneous and subsequent sensory-evoked network activations.

      Strengths:

      The investigators use a novel technique to investigate the sensory response properties in functionally defined cell assemblies in auditory cortex. These data provide the first evidence of how acutely perturbing specific frequency-tuned neurons impacts the tuning across a broader population.

      Weaknesses:

      I have several main concerns about the interpretation of these data:<br /> (1) The premise of the paper suggests that sensory responses are noisy at the level of neurons, but that population activity is reliable and that different neurons may participate in sensory coding on different trials. However, no analysis related to single trial variance or overall stability of population coding is provided. Specifically, showing that population activity is stable across trials in terms of total activity level or in some latent low dimensional representation would be required to support the concept of "homeostatic balancing".

      Thank you for raising an important point. We agree that the term ‘homeostatic balancing’ may be not the best term to be applied to explain the main results. We now have toned down on the homeostatic plasticity aspect to explain the main result. We have changed the term to a simple ‘network balance’, potentially due to various factors including rapid synaptic plasticity. We speculate the persistent activity of co-tuned cells in the post-stimulation session as a result of homeostatic balance, instead of rapidly changing back their responses to the baseline. Relevant changes are implemented throughout the manuscript including Introduction (e.g., lines 76-78) and Discussion sections (e.g., lines 453-456).

      (2) Rebalancing would predict either that the responses of stimulated neurons would remain A) elevated after stimulation due to a hebbian mechanism or B) suppressed due to high activity levels on previous trials, a homeostatic mechanism. The authors report suppression in targeted neurons after stimulation blocks, but this appears similar to all other non-stimulated neurons. How do the authors interpret the post-stimulation effect in stimulated neurons?

      It is true that the post stimulation effect of no response change both from co-tuned and non co-tuned neurons, and both from stimulation and control sessions. This could be due to neuronal activity being adapted and decreased enough from the consecutive presentation of acoustic stimuli themselves. However, we still think that if the stimulation driven co-tuned non stimulated neurons’ response decrease is highly driven by stimulation without homeostasis, at least their responses should bounce back during the post-stimulation. We agree that further investigation would be required to further confirm such effect. We elaborated this as another discussion point in the discussion section (lines 457-464).

      (3) The authors suggest that ACtx is different from visual cortex in that neurons with different tuning properties are intermingled. While that is true at the level of individual neurons, there is global order, as demonstrated by the authors own widefield imaging data and others at the single cell level (e.g. Tischbirek et al. 2019). Generally, distance is dismissed as a variable in the paper, but this is not convincing. Work across multiple sensory systems, including the authors own work, has demonstrated that cortical neuron connectivity is not random but varies as a function of distance (e.g. Watkins et al. 2014). Better justification is needed for the spatial pattern of neurons that were chosen for stimulation. Further, analyses that account for center of mass of stimulation, rather than just the distance from any stimulated neuron would be important to any negative result related to distance.

      Thank you for the further suggestion regarding the distance matter. While Watkins et al., 2014 and Levy and Reyes (2012) showed stronger connectivity for nearby cells as well as for more distant patches, on a functional level, Winkowski & Kanold 2013 showed high frequency heterogeneity especially in L2/3, where we targeted to image in this study. Thus, connected cells can have varied tuning consistent with spine imaging (Konnerth paper). We now also calculated the distance based on the center of mass of target cells to calculate the distance effect for an additional verification and still observed no distance related stimulation effect. We now replaced the Figure 4B with the result from the center of mass calculation.

      (4) Data curation and presentation: Broadly, the way the data were curated and plotted makes it difficult to determine how well-supported the authors claims are. In terms of curation, the removal of outliers 3 standard deviations above the mean in the analysis of stimulation effects is questionable. Given the single-cell stimulation data presented in Figure 1, the reader is led to believe that holographic stimulation is quite specific. However, the justification for removing these outliers is that there may be direct stimulation 20-30 um from the target. Without plotting and considering the outliers as well, it is difficult to understand if these outsized responses are due to strong synaptic connections with neighboring neurons or rather just direct off-target stimulation. Relatedly, data presentation is limited to the mean + SEM for almost all main effects and pre-post stimulation effects are only compared indirectly. Whether stimulation effects are driven by just a few neurons that are particularly suppressed or distinct populations which are suppressed or enhanced remains unclear.

      Thank you for pointing this out. Now we specifically removed neighboring cells that are < 20 um from the target point and we observed similar. We replaced all the relevant figures, texts, and statistical results to ensure that the exclusion was specific to overlapping neighboring cells.

      Reviewer #2 (Public review):

      The goal of HiJee Kang et al. in this study is to explore the interaction between assemblies of neurons with similar pure-tone selectivity in mouse auditory cortex. Using holographic optogenetic stimulation in a small subset of target cells selective for a given pure tone (PTsel), while optically monitoring calcium activity in surrounding non-target cells, they discovered a subtle rebalancing process: co-tuned neurons that are not optogenetically stimulated tend to reduce their activity. The cortical network reacts as if an increased response to PTsel in some tuned assemblies is immediately offset by a reduction in activity in the rest of the PTsel-tuned assemblies, leaving the overall response to PTsel unchanged. The authors show that this rebalancing process affects only the responses of neurons to PTsel, not to other pure tones. They also show that assemblies of neurons that are not selective for PTsel don't participate in the rebalancing process. They conclude that assemblies of neurons with similar pure-tone selectivity must interact in some way to organize this rebalancing process, and they suggest that mechanisms based on homeostatic signaling may play a role.

      he conclusions of this paper are very interesting but some aspects of the study including methods for optogenetic stimulation, statistical analysis of the results and interpretation of the underlying mechanisms need to be clarified and extended.

      (1) This study uses an all-optical approach to excite a restricted group of neurons chosen for their functional characteristics (their frequency tuning), and simultaneously record from the entire network observable in the FOV. As stated by the authors, this approach is applied for the first time to the auditory cortex, which is a tour de force. However, such an approach is complex and requires precise controls to be convincing. In the manuscript, several methodological aspects are not sufficiently described to allow a proper understanding.

      (i) The use of CRmine together with GCaMP8s has been reported as problematic as the 2Ph excitation of GCaMP8s also excites the opsin. Here, the authors use a red-shifted version of CRmine to prevent such cross excitation by the imaging laser. To be convincing, they should explain how they controlled for the absence of rsCRmine activation by the 940nm light. Showing the fluorescence traces immediately after the onset of the imaging session would ensure that neurons are not excited as they are imaged.

      Thank you for pointing this out. We realized that the important reference was omitted. Kishi et al. 2022 validated the efficacy of the rsChRmine compared to ChRmine. In this paper, they compared regular ChRmine and rsChRmine activity to different wavelengths and setting and showed the efficiency of rsChRmine with reduced optical cross talk. This reference is now included in the manuscript (line 98). We also checked the spontaneous baseline activity that lasted about 10 sec. before any of the sound presentation and observed a relatively stable activity throughout, rather than any imaging session onset related activation, which is also similar to what we see from another group of GCaMP6s transgenic animals.

      Author response image 1.

      Baseline fluorescence activity across cells within FOVs from AAV9-hSyn-GCaMP8s-T2A-rsChRmine injected mice (top) and CBA X Thy1-GCaMP6s F1 transgenic mice (bottom). Fluorescence levels and activity patterns remain similar, suggesting no evident imaging laser-induced activation from rsChRmine. Note that GCaMP8s examples are smoothed by using moving average of 4 points as GCaMP8s show faster activity.

      (ii) Holographic patterns used to excite 5 cells simultaneously may be associated with out-of-focus laser hot spots. Cells located outside of the FOV could be activated, therefore engaging other cells than the targeted ones in the stimulation. This would be problematic in this study as their tuning may be unrelated to the tuning of the targeted cells. To control for such an effect, one could in principle decouple the imaging and the excitation planes, and check for the absence of out-of-focus unwanted excitation.

      We further verified whether the laser power at the targeted z-plane influences cells’ activity at nearby z-planes. As the Reviewer pointed out, the previous x- and y-axis shifts were tested by single-cell stimulation. This time, we stimulated five cells simultaneously, to match the actual experiment setup and assess potential artifacts in other planes. We observed no stimulation-driven activity increase in cells at a z-planed shifted by 20 µm (Supplementary Figure 1). This confirms the holographic stimulation accurately manipulates the pre-selected target cells and the effects we observe is not likely due to out-of-focus stimulation artifacts. It is true that not all pre-selected cells showing significant response changes prior to the main experiment are effectively activated t every trial during the experiments. We varied the target cell distances across FOVs, from nearby cells to those farther apart within the FOV. We have not observed a significant relationship between the target cell distances and stimulation effect. Lastly, cells within < 20 µm of the target were excluded to prevent potential excitation due to the holographic stimulation power. Given the spontaneous movements of the FOV during imaging sessions due to animal’s movement, despite our efforts to minimize them, we believe that any excitation from these neighboring neurons would be directly from the stimulation rather than the light pattern artifact itself.

      (iii) The control shown in Figure 1B is intended to demonstrate the precision of the optogenetic stimulation: when the stimulation spiral is played at a distance larger or equal to 20 µm from a cell, it does not activate it. However, in the rest of the study, the stimulation is applied with a holographic approach, targeting 5 cells simultaneously instead of just one. As the holographic pattern of light could produce out-of-focus hot spots (absent in the single cell control), we don't know what is the extent of the contamination from non-targeted cells in this case. This is important because it would determine an objective criterion to exclude non-targeted but excited cells (last paragraph of the Result section: "For the stimulation condition, we excluded non-target cells that were within 15 µm distance of the target cells...")

      Highly sensitive neurons to certain frequency also shows the greatest adaptation effect, which can be observed the control condition. Therefore, the high sensitive neurons showing greater amplitude change is first related to the neuronal adaptation to its sensitive information. However, by stimulating the co-tuned target neurons, other co-tuned non-target neurons shows significantly greater amplitude decrease, compared to either non co-tuned target neurons stimulation or control (the latter did not meet the significance level).

      We also tried putting more rigorous criterion as 20 um instead of 15 um as you pointed out since the spiral size was 20 um. The result yielded further significant response amplitude decrease due to the stimulation effect only from co-tuned non-target neurons for processing their preferred frequency information.

      (2) A strength of this study comes from the design of the experimental protocol used to compare the activity in non-target co-tuned cells when the optogenetic stimulation is paired with their preferred tone versus a non-preferred pure tone. The difficulty lies in the co-occurrence of the rebalancing process and the adaptation to repeated auditory stimuli, especially when these auditory stimuli correspond to a cell's preferred pure tones. To distinguish between the two effects, the authors use a comparison with a control condition similar to the optogenetic stimulation conditions, except that the laser power is kept at 0 mW. The observed effect is shown as an extra reduction of activity in the condition with the optogenetic paired with the preferred tone, compared to the control condition. The specificity of this extra reduction when stimulation is synchronized with the preferred tone, but not with a non-preferred tone, is a potentially powerful result, as it points to an underlying mechanism that links the assemblies of cells that share the same preferred pure tones.

      The evidence for this specificity is shown in Figure 3A and 3D. However, the universality of this specificity is challenged by the fact that it is observed for 16kHz preferring cells, but not so clearly for 54kHz preferring cells: these 54kHz preferring cells also significantly (p = 0.044) reduce their response to 54kHz in the optogenetic stimulation condition applied to 16kHz preferring target cells compared to the control condition. The proposed explanation for this is the presence of many cells with a broad frequency tuning, meaning that these cells could have been categorized as 54kHz preferring cells, while they also responded significantly to a 16kHz pure tone. To account for this, the authors divide each category of pure tone cells into three subgroups with low, medium and high frequency preferences. Following the previous reasoning, one would expect at least the "high" subgroups to show a strong and significant specificity for an additional reduction only if the optogenetic stimulation is targeted to a group of cells with the same preferred frequency. Figure 3D fails to show this. The extra reduction for the "high" subgroups is significant only when the condition of opto-stimulation synchronized with the preferred frequency is compared to the control condition, but not when it is compared to the condition of opto-stimulation synchronized with the non-preferred frequency.

      Therefore, the claim that "these results indicate that the effect of holographic optogenetic stimulation depends not on the specific tuning of cells, but on the co-tuning between stimulated and non-stimulated neurons" (end of paragraph "Optogenetic holographic stimulation decreases activity in non-target co-tuned ensembles") seems somewhat exaggerated. Perhaps increasing the number of sessions in the 54kHz target cell optogenetic stimulation condition (12 FOV) to the number of sessions in the 16kHz target cell optogenetic stimulation condition (18 FOV) could help to reach significance levels consistent with this claim.

      We previously also tested by randomly subselecting 12 FOVs from 16kHz stimulation condition to match the same number of FOV between two groups and did not really see any result difference. However, to further ensure the results, we now added three more dataset for 54 kHz target cell stimulation condition (now 15 FOV) which yielded similar outcome. We have now updated the statistical values from added datasets.

      (3) To interpret the results of this study, the authors suggest that mechanisms based on homeostatic signaling could be important to allow the rebalancing of the activity of assemblies of co-tuned neurons. In particular, the authors try to rule out the possibility that inhibition plays a central role. Both mechanisms could produce effects on short timescales, making them potential candidates. The authors quantify the spatial distribution of the balanced non-targeted cells and show that they are not localized in the vicinity of the targeted cells. They conclude that local inhibition is unlikely to be responsible for the observed effect. This argument raises some questions. The method used to quantify spatial distribution calculates the minimum distance of a non-target cell to any target cell. If local inhibition is activated by the closest target cell, one would expect the decrease in activity to be stronger for non-target cells with a small minimum distance and to fade away for larger minimum distances. This is not what the authors observe (Figure 4B), so they reject inhibition as a plausible explanation. However, their quantification doesn't exclude the possibility that non-target cells in the minimum distance range could also be close and connected to the other 4 target cells, thus masking any inhibitory effect mediated by the closest target cell. In addition, the authors should provide a quantitative estimate of the range of local inhibition in layers 2/3 of the mouse auditory cortex to compare with the range of distances examined in this study (< 300 µm). Finally, the possibility that some target cells could be inhibitory cells themselves is considered unlikely by the authors, given the proportions of excitatory and inhibitory neurons in the upper cortical layers. On the other hand, it should be acknowledged that inhibitory cells are more electrically compact, making them easier to be activated optogenetically with low laser power.

      Minimum distance is defined as the smallest distance non-target cell to any of the target cells. Thus, if this is local inhibition, it is likely that the closest target cell would have affected the non-target cells’ response changes. We also calculated the distance based on the center of mass of target cells to calculate the distance effect for an additional verification, based on both Reviewers’ comments, and still observed no distance related stimulation effect. The result is now updated in Figure 4B.

      Based on previous literature, such as Levy & Reyes 2012, the excitatory and inhibitory connectivity is known to range around 100 um distance. Our results do not necessarily show any further effect observed for cells with distance below 100 um. This suggests that such effect is not limited to local inhibition. We also added further speculation on why our results are less likely due to increased inhibition, albeit the biological characteristics of inhibitory neurons to optogenetics.

      Reviewer #3 (Public review):

      Summary:

      The authors optogenetically stimulate 5 neurons all preferring the same pure tone frequency (16 or 54 kHz) in the mouse auditory cortex using a holography-based single cell resolution optogenetics during sound presentation. They demonstrate that the response boosting of target neurons leads to a broad suppression of surrounding neurons, which is significantly more pronounced in neurons that have the same pure tone tuning as the target neurons. This effect is immediate and spans several hundred micrometers. This suggests that the auditory cortical network balances its activity in response to excess spikes, a phenomenon already seen in visual cortex.

      Strengths:

      The study is based on a technologically very solid approach based on single-cell resolution two-photon optogenetics. The authors demonstrate the potency and resolution of this approach. The inhibitory effects observed upon targeted stimulation are clear and the relative specificity to co-tuned neurons is statistically clear although the effect size is moderate.

      Weaknesses:

      The evaluation of the results is brief and some aspects of the observed homeostatic are not quantified. For example, it is unclear whether stimulation produces a net increase or decrease of population activity, or if the homeostatic phenomenon fully balances activity. A comparison of population activity for all imaged neurons with and without stimulation would be instructive. The selectivity for co-tuned neurons is significant but weak. Although it is difficult to evaluate this issue, this result may be trivial, as co-tuned neurons fire more strongly. Therefore, the net activity decrease is expected to be larger, in particular, for the number of non-co-tuned neurons which actually do not fire to the target sound. The net effect for the latter neurons will be zero just because they do not respond. The authors do not make a very strong case for a specific inhibition model in comparison to a broad and non-specific inhibitory effect. Complementary modeling work would be needed to fully establish this point.

      Thank you for raising important points. We agree that the term homeostatic balancing may have been an overstatement. We toned down regarding the homeostatic plasticity and conclude the result from the rapid plasticity at a single trial level now. Regardless, the average activity level did not differ among stimulation conditions (control, 16kHz stim, and 54kHz stim), which seems to suggest that overall activity level has been maintained regardless of the stimulation. We added a new figure of the global activity change as Fig. 4A.

      We also added a simple model work in which a suppression term was applied either to all neurons or specifically to non-target co-tuned cells to test our results from the data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) For the first holography paper in A1, more information is needed about how holographic stimulation was performed and how stimulation artifacts were avoided or removed from the data set, especially as the text states that the PMTs were left open for the duration of the experiment.

      We further clarified the rationale of leaving the shutter open to avoid any mechanic sounds to activate neurons in the AC. We further clarified that we keep the uncaging shutter open since the Bruker default setting (Software version: 5.7) opens and closes the shutter for the every iteration of the stimulation which generates extra heavy mechanical sounds which then hinders whether the activation is due to the sound or stimulation.

      (2) The choice of the dF/F as the primary tool for quantifying data should be better justified. Presumably, cells have very different variances in baseline activity levels and baseline fluorescence levels that create a highly skewed distribution of responses across the population. Further, a

      To take the baseline activity variances into account, we first calculate dF/F normalising to the baseline period (about 330 ms before the sound onset) right before each trial, per cell level. By doing so, we minimize any effect that could have been driven by variable baseline activity levels across neurons.

      (3) More analysis should be performed to determine why 33% of stimulated cells are not activated, and instead are suppressed during stimulation. Is this related to a cells baseline fluorescence?

      Great point. Although we tried our best to pre-select stimulation-responsive neurons before we start the actual experiments and head fix the animals as much as possible, these neurons do not stay as the “best stimulation-responsive neurons” throughout the entire imaging session. There can be various caveats on this. First, they seem to change their activity levels due to the optogenetic stimulation after they are exposed to acoustic stimulation. Second, since the AC is in the temporal side, it is likely to be more affected from the animals’ and their brain movements throughout the imaging session, which could be bigger than visual cortex or motor cortex. However, 33% of 5 cells is about 1.5 cells so it is usually missed about one cell on average, although some sessions have all 5 cells being stimulated while some other sessions have clearly less effective holographic stimulation effect.

      We even manually visualised the fluorescence change due to the holographic stimulation before we start any imaging sessions. Regardless, they don’t stay as the ‘best stimulation responsive cells’ throughout which we cannot control the natural biological aspect of neuronal activities. Regardless, based on the significant stimulation effects observed by presenting different pure tone frequencies as well as delivering different target stimulation and no-stimulation control, we believe that the effect itself is valid. We added these caveats into the manuscript as a further discussion point and things to consider.

      (4) The linear mixed-effects model should include time as a variable as A) the authors hypothesize that responses should be reduced over time due to sensory adaptation and that B) stimulation induced suppression might be dynamic (though they find it is not).

      Since the stimulation effect seems to be independent from trial-by-trial changes among stimulation conditions (Fig. 4) and we now have toned down on the aspect of homeostasis, we kept the current mixed-effect model variables.

      (5) More speculation is needed on why stimulation suppresses responses from the first trial onwards.

      We further speculate such rapid response changes due to activity-dependent synaptic changes due to overall network energy shift from optogenetic stimulation to maintain the cortical circuit balance.  

      (6) What does each dot represent in Figure 4a vs. Figure 4B? They are very different in number.

      In 4A, each dot is average amplitude change values per each trial level. They are exactly same number of dots between frequency, cell groups and conditions as each dot represents each trial (20 each). The reason why it may look differ could be only due to some overlaps between frequencies.

      In 4B, each dot is each cell. The reason why it’s denser in Stimulation conditions’ 16kHz preferring cells panel is that it naturally had more FOVs thus more cells to be plotted. We further clarified these details in the figure legend.

      (7) How sensory responsive neurons were selected should be shown in the figures. Specifically, which fraction of the 30% of most responsive neurons were stimulated should be stated. Depending on the exact yield in the field of view, all or only a minority of strongly sensory responsive neurons are being stimulated, which in either case would color the interpretation of the data.

      We tried varying the FOV as much as possible across sessions to ensure that FOVs are directly in the A1 covering a range of frequencies. If we cannot observe more than 80 neurons as sound responsive neurons from processed suite2p data, we searched for another FOV.  

      We now included an example FOV of the widefield imaging we first conducted to identify A1, and another example FOV of the 2-photon imaging where we conducted a short sound presentation session to identify the sensory responsive neurons, as an inset of the ‘Cell selection’ part in Figure 1.

      Reviewer #2 (Recommendations for the authors):

      Minor points:

      - p.4, last line: "of" probably missing "the processing the target..."

      Fixed.

      - p.5, top, end of the first paragraph of this page: Figure 3B and 3E don't show exemplar traces.

      Corrected as Figure 2A and 2D.

      - P.5, first sentence of the paragraph "Optogenetic holographic stimulation increases activity in targeted ensembles": reference to Figure 3A and 3D should rather be Figure 2A and 2D.

      Corrected.

      - P.9, 2nd paragraph: sentence with a strange syntax: "since their response amplitude..."

      Corrected.

      - Figure 2: panels C and F are missing.

      Corrected.

      - p.11, methods: "wasthen" should be "was then".

      Corrected.

      - p.12, analysis: it is not clearly explained why the sound evoked activity is computed based on the 160ms to 660ms after sound onset instead of 0ms to 660 ms. It is likely related to some potential contamination but it should be explicitly explained.

      Due to the relatively slow calcium transient to more correctly capture the sound related evoked responses. Added this detail.

      - Methods, analysis: the authors should better explain how they conducted the random permutation described in the Figures 1D, 2B and 2E. Which signals were permutated?

      Random permutation to shuffle the target cell ID.

      - References 55 and 56 don't explicitly state that excitatory neurons generally have stronger responses to sound than inhibitory neurons.

      Thank you for pointing out this error. We replaced those references with Maor et al. 2016 and Kerlin et al. 2010, showing excitatory neurons show more selective tuning, and also changed the wording more appropriately.

      - It is not explained whether the imaging sessions are performed on awake or anaesthetized animals. It is probably done on awake animals, but then it is not clear what procedure is used to get the animals used to the head restraint. It usually takes a few days for the mice to get used to it, and the stress level is often different at the beginning and end of an experiment. Given the experimental protocol used in the study, in which sessions are performed sequentially and compared to each other, this aspect could play a role. However, the main comparison made is probably safe as it compares a control condition (laser at 0mW) and conditions with optogenetic stimulation, all done with similar sequences of sessions.

      The experiment was conducted on awake animals. Although we did not have any control on comparing their status in the beginning and the end of the experiment, they all had a widefield imaging session imaging session to identify the A1 region which uses the same head-fixation setup, thus they are more used to the setup when we conduct 2-photon imaging and stimulation. Regardless of the session, if animals show any sign of extra discomfort due to the unfamiliar setup, we keep them there for 10-15 minutes until they are accustomed to the setup with no movement. If they still show a sign of discomfort, we take them out and try for another day. We now included this detail on the manuscript.

      Reviewer #3 (Recommendations for the authors):

      - Evaluate the global effect of stimulation on the population activity averaged across all neurons (activated and non-activated).

      Thank you for your suggestions. We now included a new Figure 3A that present the population activity across all responsive cells. The average activity level did not differ among stimulation conditions (control, 16kHz stim, and 54kHz stim).

      - Evaluate with a simple model if a population of neurons with different sound tuning receiving non-specific inhibition would not produce the observed effect.

      Thank you for the suggestion. We generated a simple model in which a suppression term was applied either to all neurons or specifically to non-target co-tuned cells to test our results from the data. We took a similar range of number of neurons and FOVs to closely simulate the model to the real dataset structure. On 50 simulated calcium traces of neurons (n),

      Trace<sub>n(t)</sub> = R<sub>n(t)</sub> – theta<sub>n</sub> + epsilon<sub>n(t)</sub>

      Where R<sub>n(t)</sub> is a response amplitude from either baseline or stimulation session, theta<sub>n</sub> is a suppression term applied either to all neurons or only to non-target co-tuned neurons, only during the stimulation session, and epsilon<sub>n(t)</sub> is additive noise. Theta was defined based on the average amount of increased activity amplitudes generated from target neurons due to the stimulation, implemented from the real dataset with extra neuron-level jitter. Similar to the real data analyses, we compared the response change between the stimulation and baseline sessions’ trace amplitudes. By comparing two different model outcomes and the real data, we observed a significant effect of the model type (F(2, 2535) = 34.943, p < 0.0001) and interaction between the model type and cell groups was observed (F(2, 2535) = 36.348, p < 0.0001). Applying suppression to only non-target co-tuned cells during the stimulation session yielded a significant response amplitude decrease for co-tuned cells compared to non co-tuned cells (F(1, 2535) = 45.62, p < 0.0001), which resembles the real data In contrast, applying suppression to all non-target cells led to similar amplitude changes in both co-tuned and non co-tuned neurons (F(1, 2535) = 0.87, p = 0.35), which was not observed in either the real data or the simulated data restricted to co-tuned cell suppression. Therefore, the model predicts correctly that the specific suppression given to only co-tuned neurons drove the real data outcome. All of this information is now added into Methods and Results sections and the figure is added as Figure 3C.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      One important question is needed to further clarify the mechanisms of aberrant Ca2+ microwaves as described below.

      Synapsin promoter labels both excitatory pyramidal neurons and inhibitory neurons. To avoid aberrant Ca2+ microwave, a combination of Flex virus and CaMKII-Cre or Thy-1-GCaMP6s and 6f mice were tested. However, all these approaches limit the number of infected pyramidal neurons. While the comprehensive display of these results is appreciated, a crucial question remains unanswered. To distinguish whether the microwave of Ca2+ is caused selectively via the abnormality of interneurons, or just a matter of pyramidal neuron density, testing Flex-GCaMP6 in interneuron specific mouse lines such as PV-Cre and SOM-Cre will be critical.

      We agree that unravelling the role of interneurons is important to the understanding of the cellular mechanisms. However, the primary goal of this preprint was to alert the field and those embarking on in vivo Ca2+ imaging to AAV transduction induced artefacts mediated by one of the most widely used viral constructs for Ca2+ imaging in the field. It was important to us to distribute this finding among the community in a timely manner to avoid the unnecessary waste of resources.

      We consider a thorough understanding of cell-type specific mechanisms interesting. However, the biological relevance of the Ca2+ waves is as yet unclear and to disentangle exactly which cellular and subcellular factors that drive the aberrant phenomenon will require a large systematic effort which goes beyond our resources. For instance, it will be technically not trivial to separate biologically relevant contributions from technical differences. For instance, the absence of Ca2+ waves under the principal neuron promotor CaMKII may suggest the involvement of interneurons. However, alternate possibilities are a reduced density of expression across principal neurons or that the expression levels between the 2 promoters is different.

      The important, take-home message of the preprint, in our opinion, is that users check carefully their viral protocols, adjust the protocols for their specific scientific question and report any issues. We now emphasise the fact that although Ca2+ waves were not observed following conditional expression of syn.GCaMP with CaMKII.cre, this may not be due to a requirement for interneuronal expression but simply reflect differences in final GCaMP expression density and levels between the two transduction procedures (P12, L298-303).

      Reviewer #2 (Public Review):

      Weaknesses:

      Whether micro-waves are associated with the age of mice was not quantified. This would be good to know and the authors do have this data.

      We plotted the animal age at the time of injection for all injections of Syn.GCaMP6 into CA1/CA3 and found no correlation in either the occurrence of Ca2+ waves nor the frequency of Ca2+ waves during the age period between 5 – 79 wks (see reviewer Fig1; linear regression fit to the Ca2+ wave frequency against age was not significant: intercept = 1.37, slope = -0.007, p=0.62, n = 14; and generalized linear model relating Ca2+ wave ~ age was not significant: z score = 0.19, deviance above null = 0.04, p = 0.85, n=24). We have now added a statement to this in the revised manuscript (P14 L354-359) and for the reviewers we have added the plots below.

      Author response image 1.

      Plot of Ca2+ micro-wave frequency (left: number of Ca2+ waves/min) or occurrence (right: yes/no) against the animal age at the time of viral injection. Blue line is linear (left) or logistic (right) fit to the data with 95% confidence level.

      The effect of micro-waves on single cell function was not analyzed. It would be useful, for example, if we knew the influence of micro-waves on place fields. Can a place cell still express a place field in a hippocampus that produces micro-waves? What effect might a microwave passing over a cell have on its place field? Mice were not trained in these experiments, so the authors do not have the data.

      We agree that these are interesting questions; however, the preprint is focused on describing the GECI expression conditions prone to generating these artefacts. Studying the effects of Ca2+ micro-waves on the circuitry are scientific questions, and would require an experimental framework of testing the aberrant activity on a specific physiological function e.g. place activity or specific oscillations (e.g. sharp-wave activity). Ca2+ microwaves, as the ones described here, have not been reported under physiological conditions or pathophysiological conditions and studying the effects of such artefactual waves on the circuit was not our intention.

      With respect to place cell activity, specifically, it is intuitive that during the Ca2+ micro-wave the participating cell’s place field activity would be obscured by the artefactual activity. Cell activity appears to return immediately following the wave suggesting that the cells could exhibit place activity outside their participation in the Ca2+ micro-waves. However, we do not know if the Ca2+ micro-wave activity disrupts the generation or maintenance of place fields. We have now added a brief reference to possible effects on place coding to the paper (P12, L315-317).

      The CaMKII-Cre approach for flexed-syn-GCaMP expression shows no micro-waves and is convincing, but it is only from 2 animals, even though both had no micro-waves. In light of the reviewer’s comment, we have added a further 3 animals with conditional expression of GCaMP6m from the DZNE to complement the current dataset with conditional expression of GCaMP6s from UoB (P10, L236 & 239 and revised table 1). Although Ca2+ waves were not observed in any of the in total 5 animals, we still do not know with all certainty whether this approach is completely safe. Time will show if researchers still encounter the phenotype under certain conditions when using this conditional approach.

      The authors state in their Discussion that even without observable microwaves, a syn-Ca2+-indicator transduction strategy could still be problematic. This may be true, but they do not check this in their analysis, so it remains unknown

      We agree with the reviewer and have now made this point clearer in the revised discussion (P11, L257-258)

      Reviewer #3 (Public Review):

      Weaknesses:

      I believe that the weaknesses of the manuscript are appropriately highlighted by the authors themselves in the discussion. I would, however, like to emphasize several additional points.

      As the authors state, the exact conditions that lead to Ca2+ micro-waves are unclear from this manuscript. It is also unclear if Ca2+ micro-waves are specific to GECI expression or if high-titer viral transduction of other proteins such as genetically encoded voltage indicators, static fluorescent proteins, recombinases, etc could also cause Ca2+ micro-waves.

      The high expression of other proteins has been shown to result in artefactual phenomenon such as toxicity or fluorescent puncta (for GFP see Hechler et al. 2006; Katayama et al. 2008 for GEVI see Rühl et al. 2021), but we are not aware of reports of micro-waves. Although it is certainly possible that high expression levels of other proteins could lead to waves, we suspect the Ca2+ micro-waves observed in this preprint result from a dysregulation of Ca2+ homeostasis. This is not to suggest that voltage indicators could not result in micro-waves (e.g. Ca2+ homeostasis may be indirectly affected).

      The authors almost exclusively tested high titer (>5x10^12 vg/mL) large volume (500-1000 nL) injections using the synapsin promoter and AAV1 serotypes. It is possible that Ca2+ micro-waves are dramatically less frequent when titers are lowered further but still kept high enough to be useful for in vivo imaging (e.g. 1x10^12 vg/mL) or smaller injection volumes are used. It is also possible that Ca2+ micro-waves occur with high titer injections using other viral promoter sequences such as EF1α or CaMKIIα. There may additionally be effects of viral serotype on micro-wave occurrence.

      We agree with all points raised by the reviewer. Notably, we used viral transduction protocols with titers and volumes within in the range of those previously used for viral transduction of GCaMP under the synapsin promoter (see P11 L269-275) and we observed Ca2+ micro-waves. As the reviewer suggested, we did find that lowering the titer is an important factor in reducing these Ca2+ micro-waves and there is likely a wide range of approaches that avoid the phenomenon. With regards to viral serotype, we show that micro-waves occurred across AAV1 and 9, but it is possible that other serotypes may avoid the phenomenon.

      We reiterate in the abstract of the revised manuscript that expression level is a crucial factor (P2, L40 and P2, L44-45) and now mention that other promoters and induction protocols that result in high Ca2+ indicator expression may result in Ca2+ micro-waves (P12, L291-294.

      The number of animals in any particular condition are fairly low (Table 1) with the exception of V1 imaging and thy1-GCaMP6 imaging. This prohibits rigorous comparison of the frequency of pathological calcium activity across conditions.

      We have now added 3 more animals with conditional GCaMP6 expression. In total, the study contains 34 animals with viral injection into the hippocampus from different laboratories and under different conditions resulting in multiple groups. As such we are cognizant of the resulting limitations for statistical evaluation.

      However, in light of the reviewer’s comment, we have now employed a generalized linear model tested on all the data to examine the relationship between the Ca2+ micro-wave incidence and the different factors. The multivariate GLM did find a significant relationship between Ca2+ micro-wave incidence and both viral dilution and weeks post injection (see below and revised manuscript P8, L189-193).

      For injections into CA1 in the hippocampus (n=28), a GLM found no relationship between Ca2+ micro-waves and each of the individual variables x (Ca-wave ~ x) ; viral dilution: z score = 1.14, deviance above null = 1.31, p = 0.254; post injection weeks: : z score = 1.18, deviance above null = 1.44, p = 0.239; injection volume: : z score = -0.76, deviance above null = 0.59, p = 0.45; construct: : z score = 1.18, difference in deviance above null = 1.44, p = 0.239)

      However, a multivariable logistic GLM relating dilution and post injection weeks (Ca-wave ~ dilution + p.i_wks) showed that together both variables were significantly related to Ca2+ micro-waves (Deviation above null = 7.5; Dilution: z score = 2.18, p < 0.05; p.i_wks : z score = 2.22, p < 0.05).

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Results are straightforward and convincing. While a couple of ways to reduce the aberrant microwaves of calcium responses were demonstrated, delving into the functions of interneurons is crucial for a more comprehensive understanding of cellular causality.

      As mentioned in the public response, disentangling cellular mechanism from technical requirements will need a large and systematic study. To determine the contribution from interneurons, the use of specific interneuron promoters would be required, and viral titers systematically varied to result in similar cellular GCaMP expression levels as seen under the synapsin promoter condition.

      Reviewer #2 (Recommendations For The Authors):

      Do the authors think the cells are firing when they participate in a micro-wave, or do they think the calcium influx is due to something else? A discussion point on this would be good.

      This is an excellent point raised by the reviewer. We do not know if the elevated cellular Ca2+ during the artifactual Ca2+ micro-wave reflects action potential firing or an increase of Ca2+ from intracellular stores. As already described in the text of the preprint, their optical spatiotemporal profile neither fits with known microseizure progression patterns, nor with spreading depolarization/depression. We have adopted the reviewer’s suggestion and added the following point to the discussion section in the revised preprint (P12, L308-315):

      In a limited dataset, we attempted to detect the Ca2+ micro-waves by hippocampal LFP recordings (using a conventional insulated Tungsten wire, diameter ~110µm). We could not identify a specific signature, e.g. ictal activity or LFP depression, which may correspond to these Ca2+ micro-waves. The crucial shortcoming of this experiment of course is that with these LFP recordings, we could not simultaneous perform hippocampal 2-photon microscopy. Thus, it is uncertain if the Ca2+ micro-waves indeed occurred in proximity to our electrode.

      The results seem to suggest that micro-waves may involve interneurons as their CaMKII-Cre strategy avoids waves - possibly due to a lack of expression of GECIs in interneurons. It would be great to hear the author's thoughts on this and add a brief discussion point.

      As mentioned in public response to Reviewer 1, it is difficult to disentangle cellular mechanisms from technical requirements, and the exact requirements for the Ca2+ micro-waves to occur are still not fully clear. The absence of Ca2+ micro-waves in our CaMKII-Cre dataset may indeed reflect the requirement of interneurons. However, it could just as well be due to a sparse labelling of principle cells or simply reflect differences in the expression levels of GCaMP under the different promotors.

      All in all, a more complete understanding of the requirements of such Ca2+ micro-waves will require a community effort. Therefore, it is important that each group check the safety profile of their GECI and report problems to the community.

      We have added these points to the revised preprint (P12, L291 and P12, L298)

      Plotting the incidence of micro-waves as a function of the age of mice would be a nice addition (the authors have the data).

      There was no relationship of Ca2+ micro-wave occurrence or frequency with age over the range of 5-79 wks (see public response) and this has been added to the preprint (P14, L354)

      Reviewer #3 (Recommendations For The Authors):

      I appreciate the authors raising the awareness of this issue. I had personally observed micro-waves in my own data as well. In agreement with their findings, I found that the occurrence of micro-waves was dramatically lower when I reduced the viral titer. Anecdotally, I also observed voltage micro-waves when virally transducing genetically encoded voltage indicators at similar titers. For that reason, I am skeptical that this issue is exclusive to GECIs.

      We find it interesting that the reviewer has also seen artefactual micro-waves following viral transduction of genetically encoded voltage indicators. Without seeing the voltage waves the referee is referring to or the conditions, it is of course difficult to compare with the Ca2+ micro-waves we report. However, this comment again raises the question of mechanism. We believe that in the GECI framework, Ca2+ homeostatic aspects are important. Voltage indicators are based on different sensor mechanisms, and expressed in the cell membrane, but it may very well be that there are overlapping factors between Ca2+ and voltage indicators that could trigger a similar, or even the same phenomenon in the end.

      Minor comments:

      (1) Line 131-132: I believe the authors only tested for micro-waves in V1. This should be made clear in the results. It could be that micro-waves could occur in other parts of cortex with the same viral titers.

      Both V1 and somatosensory cortex were tested as described in the methods (P15, L395-397), we have made this clearer in the revised preprint (P6, L138).

      (2) There are no statistics associated with the data from Fig 1e.

      We have now added statistics (P5, L126).

      (3) The authors may be able to make a stronger claim about the pathological nature of the micro-waves if there are differences in the histology between the injected and non-injected hemispheres. For example, is there evidence of widespread cell death in the injected hemisphere (e.g. lower cell count, smaller hippocampal volume, caspase staining, etc).

      We found no evidence of gross morphological changes to the hippocampus following viral transduction with no changes in CA1 pyramidal cell layer thickness or CA1 thickness (pyramidal cell layer thickness: 49 ± 12.5 µm ipsilateral and 50.3 ± 11.1 µm contralateral, n=4, Student’s t-test p=0.89; CA1 thickness: 553.3 ± 14 µm ipsilateral and 555.8 ± 62 µm contralateral, n = 4, Student’s t-test p=0.94; 48 ± 13 weeks post injection at time of perfusion).

      We have added this to the preprint (P5, L117-122)

      (4) The broader micro-waves in the stratum oriens versus the stratum pyramidale are likely due to the spread of the basal dendrites of pyramidal cells. If the typical size of the basal dendritic arbor of CA1 pyramidal neurons is taken into account, does this explain the wider calcium waves in this layer.

      Absolutely, great point, yes, we completely agree on this. It is likely the active neuropil (including dendritic arbour) are contributing to the apparent broader diameter. In addition, as evident in the video 5 cell somata in the stratum Oriens (possibly interneurons) are active and their processes also contribute.

      We have now mentioned these points in the revised preprint (P5, L132)

      (5) Lines 179-181: Is the difference in the prevalence of micro-waves between viral titers statistically significant?

      Although we have a large number of animals in total (n=34) with viral injection into the hippocampus, the number of animals in each condition, given the many factors, is low. We therefore used a generalized linear model to test the relationship between the Ca2+ micro-waves and the variables.

      We have now added this analysis to the revised preprint (P8, L189-193)

      (6) Lines 200-203: The CA3 micro-waves were only observed at one institution. The current wording is slightly misleading.

      We agree and have changed this to be clearer (P9 L216)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the reviewer for the positive evaluation of our manuscript. We have closely examined the issues raised, and below we offer a point-by-point response to each comment. In the revised manuscript below, all the introduced changes are marked with red font.

      1. There may be a general typo concerning micromolar and millimolar…

      Response 1: The reviewer is correct, and during the reformatting of the manuscript, in some portions of the manuscript, the units used to indicate TPEN concentrations, always µM, were switched to mM. We have corrected those mistakes.

      1. In Figure 1C/Lines 150-152, the authors use DTPA and EDTA as extracellular chelators for zinc… Was the amount of zinc in the media measured and determined to be below the amount of chelator used? Additionally, these chelators are not specific for zinc, but can bind other divalent cations including calcium. Even though zinc binds more tightly than calcium to these chelators, by mass action calcium and magnesium ions may outcompete DTPA and EDTA, leaving zinc availability unperturbed. How do the authors take these interactions into account to determine that chelation of extracellular zinc has no effect on intracellular calcium oscillations? The best way to test this is to use zinc responsive fluorescent probes in a sample of the calcium- and magnesium-replete medium and see if the addition of the DTPA or EDTA alters zinc fluorescence in the cuvette.

      Response 2: We tested several conditions to determine the effect of chelators on the zinc concentration of the monitoring media using commercially available Zn2+ probes. The fluorescent zinc probe FluoZin3 added extracellularly shows high fluorescence, consistent with trace amounts of zinc and possibly non-specific bindings of other cations.

      Further, the media tested was replete with the concentrations of Ca2+ and Mg2+ in TLHEPES. To establish if the non-permeable external chelators we used could bind external Zn2+ despite the high concentrations of Ca2+ and Mg2+, we followed the reviewer’s suggestion of adding the chelators to the complete media in the presence of FluoZin3. The addition of EDTA caused a protracted, ~5 min, but significant decrease in FluoZin3’s fluorescence, suggesting it is effective at removing external Zn2+ despite the presence of other divalent cations (Author response image 1A). We used a second approach where we added the chelator in the presence of nominal concentrations of Ca2+ and Mg2+ to increase the chelators’ chances to find and chelate Zn2+ (Author response image 1B). Then, we injected mPlcζ mRNA, which initiated persistent but low-frequency oscillations, as expected due to the lack of external Ca2+. Remarkably, upon restoring it, the responses became of high frequency, and upon increasing Mg2+, they acquired the regular pattern, consistent with Mg2+’s inhibition of channels that mediate Ca2+ influx. These results show that the chelation of extracellular zinc does not replicate TPEN’s effect, which suggests that TPEN’s abrupt and inhibiting ability on Ca2+ oscillations is most likely due to the 43 chelation of internal Zn2+.

      Author response image 1.

      Cell-impermeable chelators effectively reduce Zn2+ levels in external media but do prevent initiation or continuation of Ca2+ oscillations. (A) A representative trace of FluoZin3 fluorescence in replete monitoring media (TL-HEPES). The media was supplemented with cell-impermeable FluoZin-3, and after initiation of monitoring, the addition of EDTA (100 μM) occurred at the designated point (triangle). (B) The left black trace represents Ca2+ oscillations initiation by injection of mPlcζ mRNA (0.01 μg/μl). The oscillations were monitored in Ca2+ and Mg2+-free media and in the presence of EDTA (110 μM) to chelate residual divalent cations derived from the water source or reagents used to make the media. The right red trace represents the initiation of oscillations as above, but after a period indicated by the black and green bars, Ca2+ and Mg2+ were sequentially added back.

      Noteworthy, low EDTA concentrations, 10-µM, have been used to enhance in vitro culture conditions of mammalian embryos. In fact, it is the key ingredient to overcome the two-cell block that initially prevented the in vitro development of zygotes srom inbred strains. It is unknown how EDTA mediates this effect, which is detectable in Ca2+ and Mg2+ replete media and is only effective when placed extracellularly, but it has been attributed to its ability to chelate toxic metals introduced as impurities by other media components; one study demonstrated that the Zn2+ present in the oil used to overlay the culture medium micro drops was the target (Erbach et al., Human Reproduction, 1995, 10, 3248-54). We included some of these points in the revised version of the manuscript and added this figure as Supplementary Figure 1.

      1. The reviewer noted that while dKO eggs showed reduced labile zinc levels, the amount of total zinc is not determined. Further, the response to thapsigargin in dKO eggs didn’t phenocopy the profile in eggs treated with TPEN. The reviewer argued that without further experimentation, such as comparing polar body extrusion and egg activation rate between WT and dKO, it seems to be a stretch to state that these eggs are zinc deficient.

      Response 3: We agree that the statement, ‘zinc deficient,’ is an overstatement without determining the total zinc levels and associated phenotypes. Therefore, in the revised version of the manuscript, we referred to dKO-derived eggs and embryos as “low-level labile Zn2+ eggs”. Our follow-up studies show that eggs from dKO females seem to undergo egg activation events, such as the timing and rate of second polar body extrusion and pronuclear formation, with a similar dynamic to WT females. Hence, we estimate that the labile Zn2+ levels in dKO eggs are not as low as those of WT eggs treated with TPEN. Consequently, these intermediate zinc levels may have subtle effects, such as changing the Thapsigargin-induced Ca2+ release through the IP3R1 without causing widespread inhibition of cellular events observed after TPEN. We would argue that this approach is significant because it can distinguish how the different cellular events and proteins and enzymes have distinct affinities or zinc requirements and, in this case, start uncovering the channel(s) present in oocytes and eggs that may contribute to regulating zinc homeostasis.

      1. The reviewer pointed out that since zinc is not redox active, it is unclear how zinc could be modifying cysteine residues of IP3R1.The reviewer suggested the possibility that excess zinc is binding to the cysteines and preventing their oxidation leading to the inhibition of the IP3R1 by blocking the channel, thereby preventing calcium release.

      Response 4: The reviewer correctly points out that the mechanism(s) whereby excess Zn2+ modifies the IP3R1 function is undetermined in our study. Further, our description of ‘modifying’ is ambiguous and could be misinterpreted. Data in the literature, some of which we cite in the manuscript, shows that “oxidation of cysteine residues enhances receptor’s sensitivity to ligands in various cell types”. Zn2+ preferentially binds to reduced cysteine residues, and thus, we agree with the proposed reviewer's suggestion that “excess zinc may occupy reduced cysteine residues, preventing their oxidization required to sensitize the receptor”. As noted by the reviewer, we cannot rule out that it might be directly blocking the IP3R1 channel. We have modified the corresponding paragraphs in the Discussion.

      1. Line 80 and 411, there are three other reports demonstrate the zinc reallocation to the egg shell or ejection as the zinc spark; Zebrafish: Converse et al. in Sci. Reports 10, 15673 (2020); X. lavis: Seeler et al. in Nature Chem. 13, 683-691 (2021), C. elegans: Mendoza et al. in Biology of Reproduction 107(2):406-418 (2022).

      Response 5: Thank you for pointing this out, and we have added these references.

      1. Line 129, when discussing that Zn2+ concentrations are reduced after TPEN as visualized by FluoZin-3, the authors should cite the article in which FluoZin-3 was first reported and this result was demonstrated initially: "Detection and Imaging of Zinc Secretion from Pancreatic β-Cells Using a New Fluorescent Zinc Indicator" by Gee et al. J. Am. Chem. Soc 124, 5, 776-778.

      Response 6: Thank you for pointing this out, and we have added this reference.

      1. In Figure 1E/Table 1 the authors evaluated if TPEN supplementation affects meiosis and pronuclear formation; however, the timing of TPEN treatment is unclear. When was TPEN introduced? Were the eggs left in the same media containing TPEN following fertilization, or were they transferred to different media?

      Response 7: Thank you for pointing this out, and we have noted the time of the addition in the figure and text.

      1. Line 1011 and 1012, ZnTP should be ZnPT.

      Response 8: Thank you for pointing this out, which is now corrected.

      Reviewer #2:

      1. The reviewer raises the question of whether a more complex relationship could exist between the levels of zinc in MII eggs by indicating, “a more active relationship such that zinc efflux associated with each calcium spike could be necessary for terminating the Ca spike by depleting cytoplasmic zinc.” The reviewer also states, “Perhaps, rather than simply a permissive role, the normal Zn fluxes during activation may be acutely changing IP3-R gating sensitivity.”

      Response 1: We agree that the demonstration that TPEN dose-dependently delays and consistently terminates ongoing Ca2+ rises perhaps reflects a more nuanced relationship between cytoplasmic labile zinc concentrations, Ca2+ oscillations, and IP3R1 function. Uncovering the precise nature of this relationship would require additional studies, such as determining the impact of TPEN on IP3 binding to its cognate receptor, regulation of channel gating, and more in-depth functional-structural experiments. However, these studies will demand time and complex experimental design and are beyond the scope of the current work. Nevertheless, they are excellent suggestions for future studies.

      We would argue against the reviewer’s suggestion that “zinc sparks directly contribute to shaping the oscillations.” Zn2+ released during the sparks is not labile, but Zn2+ bound to cortical granules-resident proteins, most of which are inaccessible to the cytosol and hence to IP3R1s and should not perturb its function. We examined (data not shown) that the levels of cytosolic labile Zn2+, as assessed with FluoZin3, remained steady for over three hours of Plcζ mRNA-initiated oscillations. Further, because the Zn2+ sparks cease after the third or fourth Ca2+ rise, it would mean, at the very least, that this mechanism only operates on the first few responses. Thus, while the change of cytosolic Ca2+ concentrations triggers the Zn2+ sparks, we argue that the opposite influence is unlikely to hold true.

      1. The reviewer also pointed out that the role of Trpv3 and Trpm7 in Zn2+ homeostasis seems to be minor and that the effects of genetic deletion of those channels are not as clear as those obtained by TPEN. Given that dKO eggs make it to the MII and release more but not less calcium upon thapsigargin than control despite the lowered labile Zn2+ level, the reviewer speculated that the loss of those channels changes calcium gating independent of Zn2+ concentration.

      Response 2: TRPV3, TRPM7, and Cav3.2 are the three channels identified to permeate Ca2+ during oocyte maturation and egg activation in mice. We and other groups have observed that in oocytes and eggs, these channels partly compensate for the absence of each other because the deletion of these channels individually has a limited effect on Ca2+ oscillations and fertility. Thus, in the case of oocytes from Trpv3 and Trpm7 dKO animals, the other plasma membrane channel(s), most likely Cav3.2, is plausibly compensating, and its enhanced function underlies the increased Ca2+ response to Thapsigargin.

      Nevertheless, the slower time to the peak and the lesser steep rise of the Thapsigargin induced rise suggest a negative impact of the dKO environment on IP3R1’s ability to mediate Ca2+ release. Based on the rest of the results in the manuscript, we attribute this change to the lower levels of labile Zn2+ in dKO eggs.

      1. Lastly, the reviewer noted the upregulation of the Fura-2AM following addition of ZnPT. The reviewer indicated that 0.05 uM ZnPT might not increase intracellular Zn2+ to change Fura-2 fluorescence, but it might be sufficient Zn2+ to enter the cell and keep the IP3R1 channels open causing a sustained rise in cytoplasmic calcium and preventing oscillations. Further, if this interpretation holds true, the inhibitory effects of high Zn2+ on IP3R1’s gating shown in figure 7 would be precluded.

      Response 3: We acknowledge that the increased levels of Fura-2 fluorescence following the addition of ZnPT could be due to the increased Zn2+ levels acting on IP3R1, increasing its open probability, and elevating cytosolic Ca2+ levels. We have added this consideration to the discussion. Nevertheless, our evidence suggests that this is unlikely because, as shown in Figure 6 H, I, the ER-Ca2+ levels as assessed by D1ER recordings did not change following the addition of ZnPT, whereas Rhod-2 fluorescence did, suggesting that the two events are seemingly uncoupled. Further, constant leak from the ER and extended high cytosolic Ca2+ would lead to egg activation or cell death, neither of which changes were observed.

      Reviewer #3:

      The reviewer noted that the present study deepened the understanding of the role of zinc in regulating calcium channels and stores at fertilization beyond the previously known Zn2+ requirement in oocyte maturation and the cell cycle progression. We appreciate these comments.

      1. Fig. 1. The reviewer wondered why we selected 10 μM TPEN for most of the experiments in the manuscript. The reviewer noted this concentration only stopped the Ca2+oscillations in just half of the eggs after ICSI.

      Response 1: We used 10-μM TPEN throughout the study because it blocked ~50% of the oscillations of a robust trigger of Ca2+ responses such as ICSI and reduced the frequency in the remaining eggs. This concentration of TPEN abrogates and prevents the responses by milder stimuli, such as Acetylcholine and SrCl2. Importantly, thimerosal and Plcζ mRNA overcome the inhibition by 10μM but not 50-μM TPEN. However, 50μM TPEN inactivates Emi2, a Zn2+-dependent enzyme, causing parthenogenic activation and cell cycle progression, and we wanted to avoid this confounding factor. Therefore, we determined 10-μM is a “threshold” concentration and selected it for the remaining studies. We also reasoned that it would allow the detection of more subtle effects of reducing the levels of labile zinc, causing a milder inhibition of IP3R1 sensitivity and a progressive delay or modification of the responses to other agonists rather than fully abrogating them, which is the case with higher concentrations.

      1. Line131 - no concentration of TPEN stated? Or 'the addition of different concentrations of TPEN"?

      Response 2: We have corrected this. We have now added 50-100 µM concentrations.

      1. Line 146 - instead of TPEN, all TPEN concentrations?

      Response 3: We have added these corrections, as at the concentrations we tested here, 5μM TPEN and above, all caused a reduction in the baseline of Fura-2 fluorescence.

      1. Line 1046 - 'We submit'? Propose?

      Response 4: We have replaced the word submit for propose. Thank you for the suggestion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work describes the mechanism of protein disaggregation by the ClpL AAA+ protein of Listeria monocytogenes. Using several model subtrate proteins the authors first show that ClpL possesses a robust disaggregase activity that does not further require the endogenous DnaK chaperone in vitro. In addition, they found that ClpL is more thermostable than the endogenous L. monocytogenes DnaK and has the capacity to unfold tightly folded protein domains. The mechanistic basis for the robust disaggregase activity of ClpL was also dissected in vitro and in some cases, supported by in vivo data performed in chaperonedeficient E. coli strains. The data presented show that the two AAA domains, the pore-2 site and the N-terminal domain (NTD) of ClpL are critical for its disaggregase activity. Remarkably, grafting the NTD of ClpL to ClpB converted ClpB into an autonomous disaggregase, highlighting the importance of such a domain in the DnaK-independent disaggregation of proteins. The role of the ClpL NTD domain was further dissected, identifying key residues and positions necessary for aggregate recognition and disaggregation. Finally, using sets of SEC and negative staining EM experiments combined with conditional covalent linkages and disaggregation assays the authors found that ClpL shows significant structural plasticity, forming dynamic hexameric and heptameric active single rings that can further form higher assembly states via their middle domains.

      Strengths:

      The manuscript is well-written and the experimental work is well executed. It contains a robust and complete set of in vitro data that push further our knowledge of such important disaggregases. It shows the importance of the atypical ClpL N-terminal domain in the disaggregation process as well as the structural malleability of such AAA+ proteins. More generally, this work expands our knowledge of heat resistance in bacterial pathogens.

      Weaknesses:

      There is no specific weakness in this work, although it would have helped to have a drawing model showing how ClpL performs protein disaggregation based on their new findings. The function of the higher assembly states of ClpL remains unresolved and will need further extensive research. Similarly, it will be interesting in the future to see whether the sole function of the plasmid-encoded ClpL is to cope with general protein aggregates under heat stress.

      We thank the reviewer for the positive evaluation. We agree with the reviewer that it will be important to test whether ClpL can bind to and process non-aggregated protein substrates. Our preliminary analysis suggests that the disaggregation activity of ClpL is most relevant in vivo, pointing to protein aggregates as main target.

      We also agree that the role of dimers or tetramers of ClpL rings needs to be further explored. Our initial analysis suggests a function of ring dimers as a resting state. It will now be important to study the dynamics of ClpL assembly formation and test whether substrate presence shifts ClpL assemblies towards an active, single ring state.

      Reviewer #2 (Public Review):

      The manuscript by Bohl et al. is an interesting and carefully done study on the biochemical properties and mode of action of potent autonomous AAA+ disaggregase ClpL from Listeria monocytogenes. ClpL is encoded on plasmids. It shows high thermal stability and provides Listeria monocytogenes food-pathogen substantial increase in resistance to heat. The authors show that ClpL interacts with aggregated proteins through the aromatic residues present in its N-terminal domain and subsequently unfolds proteins from aggregates translocating polypeptide chains through the central pore in its oligomeric ring structure. The structure of ClpL oligomers was also investigated in the manuscript. The results suggest that mono-ring structure and not dimer or trimer of rings, observed in addition to mono-ring structures under EM, is an active species of disaggregase.

      Presented experiments are conclusive and well-controlled. Several mutants were created to analyze the importance of a particular ClpL domain.

      The study's strength lies in the direct comparison of ClpL biochemical properties with autonomous ClpG disaggregase present in selected Gram-negative bacteria and well-studied E. coli system consisting of ClpB disaggregase and DnaK and its cochaperones. This puts the obtained results in a broader context.

      We thank the reviewer for the detailed comments. There are no specific weaknesses indicated in the public review.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript details the characterization of ClpL from L. monocytogenes as a potent and autonomous AAA+ disaggregase. The authors demonstrate that ClpL has potent and DnaKindependent disaggregase activity towards a variety of aggregated model substrates and that this disaggregase activity appears to be greater than that observed with the canonical DnaK/ClpB co-chaperone. Furthermore, Lm ClpL appears to have greater thermostability as compared to Lm DnaK, suggesting that ClpL-expressing cells may be able to withstand more severe heat stress conditions. Interestingly, Lm ClpP can provide thermotolerance to E. coli that have been genetically depleted of either ClpB or in cells expressing a mutant DnaK103. The authors further characterized the mechanisms by which ClpL interacts with protein aggregates, identifying that the N-terminal domain of ClpL is essential for disaggregase function. Lastly, by EM and mutagenesis analysis, the authors report that ClpL can exist in a variety of larger macromolecular complexes, including dimer or trimers of hexamers/heptamers, and they provide evidence that the N-terminal domains of ClpL prevent dimer ring formation, thus promoting an active and substrate-binding ClpL complex. Throughout this manuscript the authors compare Lm ClpL to ClpG, another potent and autonomous disaggregase found in gram-negative bacteria that have been reported on previously, demonstrating that these two enzymes share homologous activity and qualities. Taken together this report clearly establishes ClpL as a novel and autonomous disaggregase.

      Strengths:

      The work presented in this report amounts to a significant body of novel and significant work that will be of interest to the protein chaperone community. Furthermore, by providing examples of how ClpL can provide in vivo thermotolerance to both E. coli and L. gasseri the authors have expanded the significance of this work and provided novel insight into potential mechanisms responsible for thermotolerance in food-borne pathogens.

      Weaknesses:

      The figures are clearly depicted and easy to understand, though some of the axis labeling is a bit misleading or confusing and may warrant revision. While I do feel that the results and discussion as presented support the authors' hypothesis and overall goal of demonstrating ClpL as a novel disaggregase, interpretation of the data is hindered as no statistical tests are provided throughout the manuscript. Because of this only qualitative analysis can be made, and as such many of the concluding statements involving pairwise comparisons need to be revisited or quantitative data with stats needs to be provided. The addition of statistical analysis is critical and should not be difficult, nor do I anticipate that it will change the conclusions of this report.

      We thank the reviewer for the valid criticism. We addressed the major concern of the reviewer and added the requested statistical analysis to all relevant figures. The analysis confirms our conclusions. We also followed the advice of the reviewer and revised axis labeling to increase clarity.

      Reviewer #1 (Recommendations For The Authors):

      • It would really help to have a model showing how ClpL performs protein disaggregation based on their findings.

      We show that ClpL exerts a threading activity that is fueled by ATP hydrolysis in both AAA domains and executed by pore-located aromatic residues. The basic disaggregation mechanism of ClpL therefore does not differ from ClpB and ClpG disaggregases. Similarly, the specificity of ClpL towards protein aggregates is based on simultaneous interactions of multiple N-terminal domains with the aggregate surface. We could recently describe a similar mode of aggregate recognition for ClpG [1]. We therefore prefer not to add a model to the manuscript. We are currently in preparation of a review that includes the characterization of the novel bacterial disaggregases and will present models there as we consider a review article as more appropriate for such illustrations.

      • AAA2 domain of ClpL in Fig 3E should be the same color as in Fig 1A.

      We used light grey instead of dark grey for the ClpL AAA2 domain in Fig 3E, to distinguish between ClpL and ClpB AAA domains. This kind of illustration allows for clearer separation of both AAA+ proteins and the fusion construct LN-ClpB*. We therefore prefer keeping the color code.

      • Partial suppression of the dnaK mutant could be added in the main manuscript Figure.

      The main figure 3 is already very dense and we therefore prefer showing respective data as part of a supplementary figure.

      • It would have been interesting to know if the robust autonomous disaggregation activity of ClpL would be sufficient to rescue the growth of more severe E. coli chaperone mutants, like dnaK tig for example. Did the authors test this?

      We tested whether expression of clpL can rescue growth of E. coli dnaK103 mutant cells at 40°C on LB plates. This experiment is different from the restoration of heat resistance in dnaK103 cells (Figure 3, figure supplement 2A), as continuous growth at elevated temperatures (40°C) is monitored instead of cell survival upon abrupt severe heat shock (49°C). We did not observe rescue of the temperature-sensitive growth phenotype (40°C) of dnaK103 cells upon clpL expression, though expression of clpG complemented the temperature-sensitive growth phenotype (see Author response image 1 below). This finding points to differences in chaperone activities of ClpL and ClpG. It also suggests that ClpL activity is largely restricted to heat-shock generated protein aggregates, enabling ClpL to complement the missing disaggregation function of DnaK but not other Hsp70 activities including folding and targeting of newly synthesized proteins. We believe that dissecting the molecular reasons for differences in ClpG and ClpL complementation activities should be part of an independent study and prefer showing the growth-complementation data only in the response letter.

      Author response image 1.

      Serial dilutions (10-1 – 10-6) of E. coli dnaK103 mutant cells expressing E. coli dnaK, L. monocytogenes clpL or P. aeruginosa clpG were spotted on LB plates including the indicated IPTG concentrations. Plates were incubated at 30°C or 40°C for 24 h. p: empty vector control.

      Reviewer #2 (Recommendations For The Authors):

      Based on results presented in Fig. 2B the authors conclude "that stand-alone disaggregases ClpL and ClpG but not the canonical KJE/ClpB disaggregase exhibit robust threading activities that allow for unfolding of tightly folded domains" (page 5 line 209). In this experiment, the threading power of disaggregases was assessed by monitoring YFP fluorescence during the disaggregation of aggregates formed by fusion luciferase-YFP protein. In my opinion, the results of the experiment depend not only on the threading power of disaggregases but also on the substrate recognition by analyzed disaggregating systems and/or processivity of disaggregases. N-terminal domain in the case of ClpL and KJE chaperones in the case of the KJE/ClpB system are involved in recognition. This is not discussed in the manuscript and the obtained result might be misinterpreted. The authors have created the LN-ClpB* construct (N-terminal domain of ClpL fused to derepressed ClpB) (Fig. 3 E and F). In my opinion, this construct should be used as an additional control in the experiment in Fig. 2 B. It possesses the same substrate recognition domain and therefore the direct comparison of disaggregases threading power might be possible.

      We performed the requested experiment (new Figure 3 - figure supplement 2D). We did not observe unfolding of YFP by LN-ClpB. Sínce ClpL and LN-ClpB do not differ in their aggregate targeting mechanisms, this finding underlines the differences in threading power between ClpL and activated (derepressed) ClpB. It also suggests that the AAA threading motors and the aggregate-targeting NTD largely function independently.

      Presented results suggest that tetramer and dimer of rings might be a "storage form" of disaggregase. It would be interesting to analyze the thermotolerance and/or phenotype of ClpL mutants that do not form tetramer and dimer (E352A). This variant possesses similar to WT disaggregation activity but does not form dimers and tetramers. If in vivo the differences are observed (for example toxicity of the mutant), the "storage form" hypothesis will be probable.

      When testing expression of clpL-MD mutants (E352A, F354A), which cannot form dimers and tetramers of ClpL rings, in E. coli ∆clpB cells, we observed reduced production levels as compared to ClpL wildtype and speculated that reduced expression might be linked to cellular toxicity. We therefore compared spotting efficiencies of E. coli ∆clpB cells expression clpL, ∆NclpL or the clpL-MD mutants at different temperatures. Expression of clpL at high levels abrogated colony formation at 42°C (new Figure 6 - figure supplement 3). ClpL toxicity was dependent on its NTD as no effect was observed upon expression of ∆N-clpL. ClpL-MD mutants (E352A, F354A) were expressed at much lower levels and exhibited strongly increased toxicity as compared to ClpL-WT when produced at comparable levels (new Figure 6 – figure supplement 3). This implies a protective role of ClpL ring dimers and tetramers in the cellular environment by downregulating ClpL activity. We envision that the formation of ClpL assemblies restricts accessibility of the ClpL NTDs and reduces substrate interaction. Increased toxicity of ClpL-E352A and ClpL-F354A points to a physiological relevance of the dimers and tetramers of ClpL rings and is in agreement with the proposed function as storage forms. We added this potential role of ClpL ring assemblies to the discussion section. Due to the strongly reduced production levels of ClpL MD mutants and their enhanced toxicity at elevated temperatures we did not test for their ability to restore thermotolerance in E. coli ∆clpB cells.

      Figure 6G and Figure 6 -figure supplement 2 - it is not clear what is the difference in the preparation of WT and WTox forms of ClpL.

      ClpL WT was purified under reduced conditions (+ 2 mM DTT), whereas WTox was purified in absence of DTT, thus serving as control for ClpL-T355C, which forms disulfide bonds upon purification without DTT. We have added respective information to the figure legend and the materials and methods section.

      Page 5 line 250 - wrong figure citation. Instead of Figure 1 - Figure Supplement 2A should be Figure 3 - Figure Supplement 2A.

      Page 5 line 251 - wrong figure citation. Instead of Figure 1 - Figure Supplement 2B/C should be Figure 3 - Figure Supplement 2B/C.

      Page 7 line 315 - wrong figure citation. Instead of Figure 4F, it should be Figure 4G Figure 1 - Figure Supplement 2E - At first glance, this Figure does not correspond to the text and is confusing. It would be nice to have bars for Lm ClpL activity in the figure. Alternatively, the description of the y-axis might be changed to "relative to Lm ClpL disaggregation activity" instead of "relative disaggregation activity". One has to carefully read the figure legend to find out that 1 corresponds to Lm ClpL activity.

      We have corrected all mistakes and changed the description of y-axis (Figure 1 - figure Supplement 2E) as suggested.

      Reviewer #3 (Recommendations For The Authors):

      (1) While the authors make many experimental comparisons throughout their study, no statistical tests are described or presented with their results or figures, nor are these statistical tests described in the methods. While the data as presented does appear to support the author's conclusions, without these statistical tests no meaningful conclusions from paired analysis can be drawn. Critically, please report these statistical tests. As a general suggestion please include the statistics (p-values) in the results section when presenting this data, as well as in the figure legends, as this will allow the reader to better understand the authors' presentation and interpretation of the data.

      We have added statistical tests to all relevant figures. The analysis is confirming our former statements. We have further clarified our approach for the statistical analysis in the methods section. We report p-values in the results section, however, due to the volume of comparisons we did not add individual p-values to the figure legends but used standard labeling with stars.

      (2) Some of the axis labels for the presented graphs are a bit misleading or confusing. Many describe a relative (%) disaggregation rate, but it is not clear from the methods or figure legends what this rate is relative to. Is it relative to non-denatured substrates, to no chaperone conditions, etc.? Is it possible to present the figures with the raw data rates/activity (ex. luciferase activity / time) vs. relative rates? I think that labeling these figure axes with "disaggregation rate" is a bit misleading as none of these experiments measure the actual rate of disaggregation of these model substrates per se (say by SEC-MALS or other biophysical measurements), but instead infer the extent of disaggregation by measuring a property of these substrates, i.e. luciferase activity or fluorescence intensity over time. Thus, labeling these figures with the appropriate axis for what is being measured, and then clarifying in the methods and results what is being inferred by these measurements, will help solidify the author's conclusions.

      Relative (%) disaggregation rate usually refers to the disaggregation activity of ClpL wildtype serving as reference. We clarified this point in the revised text and respective figure legends. We now also refer to the process measured (e.g. relative refolding activity of aggregated Luciferase instead of relative disaggregation activity) as suggested by the reviewer and added clarifications to text and materials and methods.

      Since we have many measurements for our most frequently used assays and have a reasonable estimate for the general variance within these assays, we found it reasonable to show activity data in relation to fixed controls. This reduces the impact of unspecific variance and thereby makes more accurate comparisons between different repetitions. The reference is now indicated in the axis title.

      (3) The figures are well presented, clutter-free, and graphically easy to understand. Figure legends have sufficient information aside from the aforementioned statistical information and should include the exact number of independent replicates for each panel/experiment (ex. n=4), not just a greater than 3. While the figures do show each data point along with the mean and error, in some figures it is difficult to determine the number of replicate data points. Example figures 2c, 2d, and 3a. Also, please state whether the error is std. error or SEM.

      While we agree, that this is valuable information, we fear that overloading the figure legends with information may take a toll on the readability. We therefore decided to append the number of replicates for each experiment in a separate supplementary table (Table S2). The depicted error is showing the SD and not the SEM, which we also specified in the figure legends.

      (4) There are various examples throughout the results where qualitative descriptors are used to describe comparisons. Examples of this are "hardly enhanced" (Figure 1) and "partially reduced" (Figure 6). While this is not necessarily wrong, qualitative descriptions of comparisons in this manner would require further explanation. What is the definition of "hardly" or "partially"? My recommendation is to just state the data quantitatively, such as "% enhanced" or "reduced by x", this way there is no misinterpretation. Examples of this can be found in Figures 6C-G. This would require a full statistical overview and presentation of these stats in the results.

      We followed the reviewer`s advice and no longer use the terms criticized (e.g. “hardly enhanced”). We instead provide the requested quantifications in the text.

      Questions for Figures:

      Figures 1B and 1C:

      (1) Is the disaggregase activity of ClpL towards heat-denatured luciferase and GFP ATPdependent? While the authors later in the manuscript show that mutations within the Walker B domains dramatically impair reactivation (disaggregation) of denatured luciferase, this does not rule out an ATP-independent effect of these mutations. Thus, the authors should test whether disaggregase activity is observed when wild-type ClpL is incubated with denatured substrates without ATP present or in the presence of ADP only.

      We tested for ClpL disaggregation activity in absence of nucleotide and presence of ADP only (new Figure 1 – figure supplement 2A). We did not observe any activity, demonstrating that ClpL activity depends on ATP binding and hydrolysis (see also Figure 3 – figure supplement 1D: ATPase-deficient ClpL-E197A/E530A is lacking disaggregation activity).

      (2) The authors suggest that a reduction in disaggregase activity observed in samples combining Lm ClpL and KJE (Figure 1C, supp. 1C-E) could be due to competition for protein aggregate binding as observed previously with ClpG. Did the authors test this directly by pulldown assay or another interaction-based assay? While ClpL and ClpG appear to work in a similar manner, it would be good to confirm this. Also, clarification on how this competition operates would be useful. Is it that ClpL prevents aggregates from interacting with KJE, or vice versa?

      We probed for binding of ClpL to aggregated Malate Dehydrogenase in the presence of L. monocytogenes or E. coli Hsp70 (DnaK + respective J-domain protein DnaJ) by a centrifugation-based assay. Here, we used the ATPase-deficient ClpL-E197A/E530A (ClpLDWB) mutant, ensuring stable substrate interaction in presence of ATP. We observe reduced binding of ClpL-DWB to protein aggregates in presence of DnaK/DnaJ (new Figure 1 – figure supplement 2G). This finding indicates that both chaperones compete for binding to aggregated proteins and explains inhibition of ClpL disaggregation activity in presence of Hsp70.

      (3) Related to the above, while incubation of aggregated substrates with ClpL and KJE does appear to reduce aggregase activity towards GFP (Figure 1c), α-glucosidase (Supp. 1C), and MDH (Supp. 1D), this doesn't appear to be the case towards luciferase (Figure 1b, Supp. 1b). Furthermore, ClpL aggregase activity is reduced towards luciferase when combined with E. coli KJE (Supp. 1e) but not with Lm KJE (Figure 1b). The authors provide no commentary or explanation for these observations. Furthermore, these results complicate the concluding statement that "combining ClpL with Lm KJE always led to a strong reduction in disaggregation activity ... ".

      We suggest that the differing inhibitory degrees of the KJE system on ClpL disaggregation activities reflect diverse binding affinities of KJE and ClpL to the respective aggregates. While we usually observe strong inhibition of ClpL activity in presence of KJE, this is different for aggregated Luciferase. This points to specific structural features of Luciferase aggregates or the presence of distinct binding sites on the aggregate surface that favour ClpL binding. We have added a respective comment to the revised manuscript.

      The former statement that “combining ClpL with Lm KJE always led to a strong reduction in disaggregation activity” referred to aggregated GFP, MDH and α-Glucosidase for which a strong inhibition of ClpL activity was observed. We have specified this point.

      Figures 1D and 1E:

      (1) The authors conclude that the heat sensitivity of ΔClpL L. gasseri cells is because they do not express the canonical ClpB disaggregase. A good test to validate this would be to express KJE/ClpB in these Lg ΔClpL cells to see if heat-sensitivity could be fully or partially rescued.

      We agree that such experiment would further strengthen the in vivo function of ClpL as alternative disaggregase. However, such approach would demand for co-expression of E. coli ClpB with the authentic E. coli DnaK chaperone system (KJE), as ClpB and DnaK cooperate in a species-specific manner [2-4]. This makes the experiment challenging, also because the individual components need to be expressed at a correct stochiometry. Furthermore, the presence of the authentic L. gasseri KJE system, which is likely competing with the E. coli KJE system for aggregate binding, will hamper E. coli KJE/ClpB disaggregation activity in L. gasseri. In view of these limitations, we would like to refrain from conducting such an experiment.

      (2) The rationale for investigating Lg ClpL, and the aggregase activity assays are compelling and support the hypothesis that ClpL contributes to thermotolerance in multiple grampositive species. Though, from Figure 1d, why was only Lg ClpL investigated? It appears that S. thermophilus also lacks the canonical ClpB disaggregase and demonstrates ΔClpL heat sensitivity. There is also other Lactobacillus sp. presented that lack ClpB but were not tested for heat sensitivity. Why only test and move forward with L. gasseri? Lastly, L. mesenteroides is ClpB-negative but doesn't demonstrate ΔClpL heat sensitivity. Why?

      We wanted to document high, partner-independent disaggregation activity for another ClpL homolog. We chose L. gasseri, as (i) this bacterial species lacks a ClpB homolog and (ii) a ∆clpL mutant exhibit reduced survival upon severe heat shock (thermotolerance phenotype), which is associated with defects in cellular protein disaggregation. The characterization of L. gasseri ClpL as potent disaggregase in vitro represents a proof-of-concept and allows to generalize our conclusion. We therefore did not further test S. thermophilus ClpL. L. mesenteroides encodes for ClpL but not ClpB, yet, a ∆clpL mutant has not yet been characterized in this species to the best of our knowledge. As we wanted to link ClpL in vitro activity with an in vivo phenotype, we did not characterize L. mesenteroides ClpL.

      We agree with the reviewer that the characterization of additional ClpL homologs is meaningful and interesting, however, we strongly believe that such analysis should be part of an exhaustive and independent study.

      Figures 2A and 2B:

      (1) Figure 2B demonstrates that both ClpL and ClpG, but not the canonical KJE/ClpB, are able to unfold YFP during the luciferase disaggregation process, suggesting that ClpL and ClpG exhibit stronger threading activity. A technical question, can luciferase activity be measured alongside in the same assay sample? If so, would you expect to observe a concomitant increase in luciferase activity as YFP fluorescence decreases?

      KJE/ClpB can partially disaggregate and refold aggregated Luciferase-YFP without unfolding YFP during the disaggregation reaction [5]. YFP unfolding is therefore not linked to refolding of aggregated Luciferase-YFP. On the other hand, unfolding of YFP during disaggregation can hamper the refolding of the fused Luciferase moiety as observed for the AAA+ protein ClpC in presence of its partner MecA [5]. These diverse effects make the interpretation of LuciferaseYFP refolding experiments difficult as the degree of YFP unfolding activity does not necessarily correlate with the extend of Luciferase refolding. We therefore avoided to perform the suggested experiment.

      Figure 2C and 2D:

      (1) Thermal shift assays for ClpL, ClpG, and DnaK were completed with various nucleotides. Were these experiments also completed with samples in their nucleotide-free apo state? Also, while all these chaperones are ATPases, the nucleotides used differ, but no explanation is provided. Comparison should be made of these ATPases bound to the same molecules.

      We did not monitor thermal stabilities of chaperones without nucleotide as such state is likely not relevant in vivo. We used ATPγS in case of ClpL to keep the AAA+ protein in the ATPconformation. ATP would be rapidly converted to ADP due to the high intrinsic ATPase activity of ClpL. In case of DnaK ATPγS cannot be used as it does not induce the ATP conformation [6]. The low intrinsic ATPase activity of DnaK allows determining the thermal stability of its ATP conformation in presence of ATP. This is confirmed by calculating a reduced thermal stability of ADP-bound DnaK.

      (2) The authors suggest that incubation at 55⁰C will cause unfolding of Lm DnaK, but not ClpL, providing ClpL-positive Lm cells disaggregase activity at 55⁰C. While the thermal shift assays in Figures 2C and 2D support this, an experiment to test this would be to heat-treat Lm DnaK and ClpL at 55⁰C then test for disaggregase activity using either aggregated luciferase or GFP as in Figure 1.

      We followed the suggestion of the reviewer and incubated Lm ClpL and DnaK at 55-58°C in presence of ATP for 15 min prior to their use in disaggregation assays. We compared the activities of pre-heated chaperones with controls that were incubated at 30°C for 15 min. Notably, we did not observe a loss of DnaK disaggregation activity, suggesting that thermal unfolding of DnaK at this temperature is reversible. We provide these data as Figure 2 -figure supplement 1 and added a respective statement to the revised manuscript.

      Figure 3B:

      (1) The authors state that ATPase activity of ΔN-ClpL was "hardly affected", but from the data provided it appeared to result in an approximate 35% reduction. As discussed above, no stats are provided for this figure, but given the error bars, it is highly likely that this reduction is significant. Please perform this statistical test, and if significant, please reflect this in the written results as well as the figure. Lastly, if this reduction in ATPase activity is significant, why would this be so, and could this contribute to the reduction in aggregase activity towards luciferase and MDH observed in Figure 3A?

      We applied statistical tests as suggested by the reviewer, showing that the reduction in ATPase activity of ∆N-ClpL is statistically significant. N-terminal domains of Hsp100 proteins can modulate ATPase activity as shown for the family member ClpB, functioning as auxiliary regulatory element for fine tuning of ClpB activity [7]. We speculate that the impact of the ClpL-NTD on the assembly state (stabilization of ClpL ring dimers) might affect ClpL ATPase activity. We would like to point out that other ClpL mutants (e.g. NTD mutant ClpL-Y51A; MDmutant ClpL-F354A) have a similarly reduced ATPase activity, yet exhibit substantial disaggregation activity (approx. 2-fold reduced compared to ClpL wildtype). In contrast ∆NClpL does not exhibit any disaggregation activity. This suggests that the loss of disaggregation activity is caused by a substrate binding defect but not by a partial reduction in ATPase activity. We added a comment on the reduced ATPase activity and also discuss its potential reasons in the discussion section.

      (2) I think the authors' conclusion that deletion of the ClpL NTD does not contribute to structural defects of ClpL is premature given the apparent reduction in ATPase activity. Did the authors perform any biophysical analysis of ΔN-ClpL to confirm this conclusion? Thermal shift assays, Native-PAGE, or size-exclusion chromatography for aggregates would all be good assays to demonstrate that the wild-type and ΔN-ClpL have similar structural properties. Surprisingly, Figure 6 describes significant macromolecular changes associated with ΔN-ClpL such that it preferentially forms a dimer of rings. Furthermore, in Supp. Figure 6D the authors report that ΔN-ClpL appears to have an increased Tm as compared to WT- or ΔM-ClpL. The authors should reflect these observations as deletion of the ClpL NTD does appear to contribute to structural changes, though perhaps only at the macromolecular scale, i.e. dimerization of the rings.

      We have characterized the oligomeric state of ∆N-ClpL by size exclusion chromatography (Figure 6 – figure supplement 1A) and negative staining electron microscopy (Figure 6C), both showing that it forms assemblies similar to ClpL wildtype. We did not observe an increased tendency of ∆N-ClpL to form aggregates and the protein remained fully soluble after several cycles of thawing and freezing. EM data reveal that ∆N-ClpL exclusively form ring dimers, suggesting that the NTDs destabilize MD-MD interactions. The stabilized interaction between two ∆N-ClpL rings can explain the increased thermal stability (Figure 6 – figure supplement 1D). We speculate that the ClpL NTDs either affect MD-MD interactions through steric hindrance or by directly contacting MDs. We have added a respective statement to the discussion section.

      Figure 3C and 3D:

      (1) Given the larger error in samples expressing ClpG (100) or ClpL (100) statistical analysis with p-values is required to make conclusions regarding the comparison of these samples vs. plasmid-only control. The effect of ΔN-ClpL vs. wild-type ClpL looks compelling and does appear to attenuate the ClpL-induced thermotolerance. This is nicely demonstrated in Figure 3D.

      We quantified respective spot tests (new Figure 3E) and tested for statistical significance as suggested by the reviewer. We show that restoration of heat resistance is significant for the first 30 min. While we always observe rescue at later timepoints significance is lost here due to larger deviations in the number of viable cells and thus the degree of complementation.

      Figure 3F:

      (1) What is the role of the ClpB NTD? It appears to be dispensable for disaggregase activity, assuming that ClpB is co-incubated with KJE. A quick explanation of this domain in ClpB could be useful.

      The ClpB NTD is not required for disaggregation activity, as ClpB is recruited to protein aggregates by DnaK, which interacts with the ClpB MDs. Still, two functions have been described for the ClpB NTD. First, it can bind soluble unfolded substrates such as casein [8]. This substrate binding function can increase ClpB disaggregation activity towards some aggregated model substrates (e.g. Glucose-6-phosphate dehydrogenase) [9]. However, NTD deletion usually does not decrease ClpB disaggregation activity and can even lead to an increase [7, 10, 11]. An increased disaggregation activity of ∆N-ClpB correlates with an enhanced ATPase activity, which is explained by NTDs stabilizing a repressing conformation of the ClpB MDs, which function as main regulators of ClpB ATPase activity [7]. We added a short description on the role of the ClpB NTD to the respective results section.

      (2) The result of fusing the ClpL NTD to ClpB supports a role for this NTD in promoting autonomous disaggregase activity. What would you expect to observe if the fused Ln-ClpB protein was co-incubated with KJE? Would this further promote disaggregase activity, or potentially impair through competition? This experiment could potentially support the authors' hypothesis that ClpL and ClpB/KJE can compete with each other for aggregated substrates as suggested in Figure 1.

      We have performed the suggested experiment using aggregated MDH as model substrate. We did not observe an inhibition of LN-ClpB disaggregation activity in presence of KJE. In contrast ClpL disaggregation activity towards aggregated MDH is inhibited upon addition of KJE due to competition for aggregate binding (Figure 1 – figure supplement 2D/F). Disaggregation activity of LN-ClpB in presence of KJE can be explained by functional cooperation between both chaperone systems, which involves interactions between aggregate-bound DnaK and the ClpB MDs of the LN-ClpB fusion construct. We prefer showing these data only in the response letter but not including them in the manuscript, as respective results distract from the main message of the LN-ClpB fusion construct: the ClpL NTD functions as autonomous aggregatetargeting unit that can be transferred to other Hsp100 family members.

      Author response image 2.

      LN-ClpB cooperates with DnaK in protein disaggregation. Relative MDH disaggregation activities of indicated disaggregation systems were determined. KJE: DnaK/DnaJ/GrpE. The disaggregation activity of Lm ClpL was set to 1. Statistical Analysis: Oneway ANOVA, Welch’s Test for post-hoc multiple comparisons. Significance levels: **p < 0.001. n.s.: not significant.

      Figures 4E and 4F:

      (1) While the effect of various NTD mutations follows a similar trend in regard to the impairment of ClpL-mediated disaggregation of luciferase and MDH, the degree of these effects does appear different. For example, patch A and C mutations reduce ClpL disaggregase activity towards luciferase (~60% / 50% reduction) vs. MDH (>90%) respectively. While these results do suggest a critical role for residues in patches A and C of ClpL, these substrate-specific differences are not discussed. Why would we expect a difference in the effect of these patch A/C ClpL mutations on different substrates?

      We speculate that the aggregate structure and the presence or distributions of ClpL NTD binding sites differ between aggregated Luciferase and MDH. A difference between both aggregated model substrates was also observed when testing for an inhibitory effect of Lm KJE (and Ec KJE) on ClpL disaggregation activity (see comment above). We speculate that the mutated NTD residues make specific contributions to aggregate recognition. The severity of binding defects (and reduction of disaggregation activities) of these mutants will depend on specific features of the aggregated model substrates. We now point out that ClpL NTD patch mutants can differ in disaggregation activities depending on the aggregated model substrate used and refer to potential differences in aggregate structures.

      (2) The authors suggest that the loss of disaggregation activity of selected NTD mutants could be linked to reduced binding to aggregated luciferase. While this is likely given that these mutations do not appear to affect ATPase activity (Supp. 4), it could be possible that these mutants can still bind to aggregated luciferase and some other mechanism may impair disaggregation. A pull-down assay would help to prove whether reduced binding is observed in these NTD ClpL mutants. This also needs to be confirmed for Supp. Figure 4.2H.

      We have shown a strong correlation between loss of aggregate binding and disaggregation activity for several NTD mutants (Fig. 4G, Figure 4 – figure supplement 2H). We decided to perform the aggregate binding assay only with mutants that show a full but not a partial disaggregation defect as we made the experience that the centrifugation-based assay provides clear and reproducible results for loss-of-activity mutants but has limitations in revealing differences for partially affected mutants. This might be explained by the use of nonhydrolyzable ATPγS in these experiments, which strongly stabilizes substrate interactions, potentially covering partial binding defects. We agree with the reviewer that some ClpL NTD mutants might have additional effects on disaggregation activity by e.g. controlling substrate transfer to the processing pore site. We have added a respective comment to the revised manuscript.

      (3) Supp. Figure 4.2H has no description in the figure legend. The Y-axes states % aggregate bound to chaperone. How was this measured? See the above comments for Figures 4E and 4F.

      We apologize and added the description to the figure legend. The determination of % aggregate bound chaperone is based on the quantifications of chaperones present in the supernatant and pellet fractions after sample centrifugation. Background levels of chaperones in the pellet fractions in absence of protein aggregates were subtracted. We added this information to the materials and methods section.

      Figure 6G:

      The authors observed reduced disaggregase activity and ATPase activity of mutant T355C under both oxidative and reducing conditions. While this observation under oxidative conditions supports the authors' hypothesis, under reducing conditions (+DTT) we would expect the enzyme to behave similarly to wild-type ClpL unless this mutation has other effects. Can the authors please comment on this and provide an explanation or hypothesis?

      The reviewer is correct, ClpL-T355C exhibit a reduced disaggregation activity (Figure 6 – figure supplement 2B). We observe a similar reduction in disaggregation activity for the ClpL MD mutant F354A, pointing to an auxiliary function of the MD in protein disaggregation. We have made a respective comment in the discussion section of the revised manuscript. How exactly ClpL MDs support protein disaggregation is currently unclear and will be subject of future analysis in the lab. We strongly believe that such analysis should be part of an independent study.

      Discussion:

      In the fourth feature, it is discussed that one disaggregase feature of ClpL is that it does not cooperate with the ClpP protease. While a reference is provided for the canonical ClpB, no data in this paper, nor a reference, is provided demonstrating that ClpL does not interact with ClpP. As discussed, it is highly unlikely that ClpL interacts with ClpP given that ClpL does not contain the IGL/F loops that mediate the interaction of ClpP with cochaperones, such as ClpX, but data or a reference is needed to make such a factual statement.

      The absence of the IGL/F loop makes an interaction between ClpL and ClpP highly unlikely. However, the reviewer is correct, direct evidence for a ClpP-independent function of ClpL, though very likely, is not provided. We have therefore rephrased the respective statement: “Forth, novel disaggregases lack the specific IGL/F signature motif, which is essential for cooperation of other Hsp100 proteins with the peptidase ClpP. This feature is shared with the canonical ClpB disaggregase [12] suggesting that protein disaggregation is primarily linked to protein refolding.”.

      References

      (1) Katikaridis P, Simon B, Jenne T, Moon S, Lee C, Hennig J, et al. Structural basis of aggregate binding by the AAA+ disaggregase ClpG. J Biol Chem. 2023:105336.

      (2) Glover JR, Lindquist S. Hsp104, Hsp70, and Hsp40: A novel chaperone system that rescues previously aggregated proteins. Cell. 1998;94:73-82.

      (3) Krzewska J, Langer T, Liberek K. Mitochondrial Hsp78, a member of the Clp/Hsp100 family in Saccharomyces cerevisiae, cooperates with Hsp70 in protein refolding. FEBS Lett. 2001;489:92-6.

      (4) Seyffer F, Kummer E, Oguchi Y, Winkler J, Kumar M, Zahn R, et al. Hsp70 proteins bind Hsp100 regulatory M domains to activate AAA+ disaggregase at aggregate surfaces. Nat Struct Mol Biol. 2012;19:1347-55.

      (5) Haslberger T, Zdanowicz A, Brand I, Kirstein J, Turgay K, Mogk A, et al. Protein disaggregation by the AAA+ chaperone ClpB involves partial threading of looped polypeptide segments. Nat Struct Mol Biol. 2008;15:641-50.

      (6) Theyssen H, Schuster H-P, Bukau B, Reinstein J. The second step of ATP binding to DnaK induces peptide release. J Mol Biol. 1996;263:657-70.

      (7) Iljina M, Mazal H, Goloubinoff P, Riven I, Haran G. Entropic Inhibition: How the Activity of a AAA+ Machine Is Modulated by Its Substrate-Binding Domain. ACS chemical biology. 2021;16:775-85.

      (8) Rosenzweig R, Farber P, Velyvis A, Rennella E, Latham MP, Kay LE. ClpB N-terminal domain plays a regulatory role in protein disaggregation. Proc Natl Acad Sci U S A. 2015;112:E6872-81.

      (9) Barnett ME, Nagy M, Kedzierska S, Zolkiewski M. The amino-terminal domain of ClpB supports binding to strongly aggregated proteins. J Biol Chem. 2005;280:34940-5.

      (10) Beinker P, Schlee S, Groemping Y, Seidel R, Reinstein J. The N Terminus of ClpB from Thermus thermophilus Is Not Essential for the Chaperone Activity. J Biol Chem. 2002;277:47160-6.

      (11) Mogk A, Schlieker C, Strub C, Rist W, Weibezahn J, Bukau B. Roles of individual domains and conserved motifs of the AAA+ chaperone ClpB in oligomerization, ATP-hydrolysis and chaperone activity. J Biol Chem. 2003;278:15-24.

      (11) Weibezahn J, Tessarz P, Schlieker C, Zahn R, Maglica Z, Lee S, et al. Thermotolerance Requires Refolding of Aggregated Proteins by Substrate Translocation through the Central Pore of ClpB. Cell. 2004;119:653-65.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Major concerns.

      -The experimental details on the electron microscopy data and more specifically on the processing is too minimal. Because of the missing pieces of information, the data cannot be trusted in its current state. The authors should explain how they processed the data: number of particles, software used, 3D reconstruction algorithms etc...For instance, they do not mention anything about the final resolution and whether they tried to improve it. What is the dimension of the boxes used for 2D classes and 3D reconstruction? Besides, the resulting 3D volumes should be displayed at different orientations or from, at least, a movie so one can see whether the modelled data actually fits into the 3D volume in various orientations. Have the authors tried cryo-EM to improve the resolution of the data? Have they generated 3D classes? Also they should comment on why the resolution if rather low.

      Thank you for your valuable feedback on our work. We appreciate your suggestions for improvement and agree that we could provide more detailed information on the experimental details of our electron microscopy data. To address your concerns, we have provided additional information on the processing of the data in the revised manuscript.

      Regarding the use of cryo-EM, we attempted to use this technique to determine the structure of autoinhibited kinesin-1. Unfortunately, we encountered challenges in getting the kinesin-1 to behave well on the grids, which prevented us from obtaining meaningful results.

      -The report goes back and forth from focusing on KIF5B then KIF5C and back to KIF5B. It is thus confusing for the reader and the rationale for highlighting a specific isoform is not clear. Hence the authors should perform similar analysis for both isoforms. Specifically the alpha fold deed learning modeling should also be performed using KIF5C in parallel with the analysis performed on KIF5B.

      Thank you for your feedback on our manuscript. We apologize for any confusion caused by the shifting focus between KIF5B and KIF5C. The KIF5B and KIF5C are both kinesin-1 isoforms, should have high structural similarity and should adopt similar structures.

      In our current manuscript, we performed AlphaFold structure prediction on both KIF5B and KIF5C stalks and found that they adopt the same structure. Furthermore, the XL-MS data suggests that KIF5B and KIF5C exhibit similar patterns. We choose to model the KIF5B in this case.

      For the kinesin-1 tetramer, we re-performed XL-MS on KIF5B-KLC1 and KIF5C-KLC1 (Author response image 1 and 2) to confirm our analysis in the manuscript. Both data showed that KIF5B-KLC1 and KIF5C-KLC1 have a similar folding pattern. The differences between the two are: (1) The crosslinks within the KIF5B are sparse compared to KIF5C. (2) There are fewer crosslinks between KIF5B and KLC1 compared to KIF5C-KLC1. These differences will need further investigation. Given that there are more crosslinks in KIF5C-KLC1, we choose to model the KIF5C-KLC1 in our manuscript.

      Author response image 1.

      Crosslinked lysine pairs in KIF5B-KLC1 were mapped onto the domain diagram.

      Author response image 2.

      Crosslinked lysine pairs in KIF5C-KLC1 were mapped onto the domain diagram.

      -The proportion of compact versus extended form for KIF5B and KIF5C differs. It seems that KIF5B has a higher proportion of compact conformations both as homodimers and heterotetramers? Can the authors comment on this and suggest any possible molecular argument which would induce this difference? Can the authors comment on this discrepancy? What would induce any extended form given that the wild type constructs should be compact only? Is there any equilibrium in solution between the two conformations?

      Thank you for your comments on our manuscript. We appreciate your observation that the proportion of compact versus extended form for KIF5B and KIF5C appears to differ. We did observe that KIF5B has a higher proportion of compact conformations both as homodimers and heterotetramers. We have updated our main text and commented on this difference. We do not have a definitive explanation for this difference, but one possibility is that the differences in the sequence of the two isoforms may contribute to their differential propensities for compact versus extended conformations. It is possible that there is an equilibrium between the two conformations, but we did not explicitly investigate this in our study.

      • In Figure 1.C, lower panel, the "extended" conformation does not appear as extended as stated in the text, looking at the negative stain image. In particular, the one on the bottom right look rather compact, instead. The resulting graph shown in Figure 1.E seems a bit off as compared with the images. How were the measurements performed to generate figure 1.E? Were all the particles selected for measurement or were only some of them picked or were the measurements done using class averages? In the same line, the authors should show class averages of the extended conformation as well.

      Thank you for your feedback on our manuscript. We appreciate your comments on the presentation of our data in Figure 1C. We agree that some kinesin may not appear as extended in the negative stain images as we stated in the text. For EM sample preparation, we took the fraction corresponding to the extended conformation, used BS3 to crosslink them and then examined them under EM. The compact kinesin-1 molecule could come from the aggregated molecule during the crosslinking process.

      Regarding the measurement, we measured the length of individual molecules which clearly looks like the KIF5B from the raw micrographs. Molecules that show any sign of aggregation were not measured. For the class averages of the extended state, given that the extended molecule is about 80 nm in length and very flexible, it would be hard to get meaningful averages. We have updated the methods section to include this measurement method.

      -In figure 2B, the EM envelope does not accommodate the CC1 domain which extends way beyond the contour of the 3D volume and thus suggest that the modeling and/or the 3D EM reconstruction is not correct. Also the authors do not comment at all on this even though this is a striking feature. The CC1 might thereby be less disorganized or more flexible than expected by the model.

      Thank you for your feedback on our manuscript, particularly with regard to Figure 2B. We appreciate your observation that the EM envelope does not accommodate the CC1 domain, which extends beyond the contour of the 3D volume. We agree that this is a striking feature that may suggest that the modeling and/or the 3D EM reconstruction is not entirely correct. We have added comments regarding this feature in the main text. However, given the current data, we could not generate a better model to describe the structure of CC1 besides using results from the AlphaFold prediction.

      -The so called "C-shaped" feature on the class averages (Fig 3D) does not stand out clearly on all of the class averages. It is visible on the right hand panels but not visible on the left hand side. What is the proportion of classes and thus of the dataset which clearly displayed this peculiar C-shaped feature?? Can the authors analyze this?

      Thank you for your feedback on our manuscript, particularly with regard to Figure 3D. We acknowledge your observation that the "C-shaped" feature is not clearly visible on all of the class averages. We believe that it could be due to the different orientations of the class averages. We have revised our main text to comment on this.

      -The different mutants were subjected to motility assays. However, mutations/truncations could strongly affect their structural features and conformation. The authors should thus, at least for some of them, check their global ultrastructure using electron microscopy, for instance, and 2D class averaging. In particular, it would be worthwhile testing how different mutations induce any transition from a compact to an extended state. Besides, it is not specified whether the truncated mutants are homo-dimeric or monomeric.

      Thank you for your valuable feedback on our manuscript, particularly with regard to the motility assays conducted on the different mutants. All the KIF5B mutants should be homodimers as WT KIF5B. We agree that it would be beneficial to check some of the mutants under EM to examine their conformation. However, due to time constraints, we were unable to perform these analyses.

      Minor concerns

      • Does AlphaFold generate several possible models? Can a selection of those be displayed at least in the supplementary material so the reader can understand how any given model is selected? A short introduction on the alpha fold methodology and how the different obtained structures compare with one another and ultimately how the best structure is selected.

      Yes, AlphaFold generates several possible models during the protein structure prediction process. These models are ranked based on their confidence scores, which reflect the degree of certainty with which AlphaFold has predicted each model. In our study, we chose the model with the highest score, while we noticed that the top 5 models from the AlphaFold prediction generally tend to be very similar in the case of the kinesin-1 structure prediction. We have updated the text in the method section to help the reader appreciate our approach.

      -When expressing the hetero-tetramers, do the authors generate homodimers as well? If so, can they estimate the relative proportion of all the possible populations?

      We used the multibac expression system to co-express the kinesin heavy chain and light chain in sf9 cells. We believe that the hetero-tetramers should account for the majority of products, though we can not rule out the possibility of formation of homodimers.

      -The motility assays should be better described.

      We have added more text to describe the assay.

      -The report does not discuss whether any combinations of isoforms (for instance KIF2B-KIF2C) could assemble into a complex and whether it has already been observed in cells?

      We believe that you are asking about whether KIF5B and KIF5C form heterodimer. We did not see any previous literature report on this and have not tested this possibility.

      -The authors should discuss why they do not obtain the same results as Kaan et al (2011). For instance, would the experimental conditions responsible for the discrepancies observed?

      In the study done by Kaan et al (2011), their structures showed that kinesin-1 motor domains crystallized with a tail peptide holding the motors in an immotile conformation, which supports the model of kinesin-1 autoinhibition where the C-terminal tail of kinesin-1 drives autoinhibition to block motility. However, there are several limitations regarding this study as we mentioned in our manuscript. First, the authors used truncated kinesin heavy chains that only include the motor domain and the neck coil instead of the full length protein. Second, the crystal structure was obtained by adding the tail peptide in trans. Thus, how kinesin-1 folds into an autoinhibited state remains poorly understood, severely limiting our understanding of kinesin-1 regulation.

      Our model confirms the critical role of the tail domain as the study done by Kaan et al (2011). We observe that the tail domain lies very close to the motor heads which are consistent with what has been reported in the study done by Kaan et al (2011). However, due to lack of enough lysine residues and the unstructured nature of the tail domain, we could not resolve the exact conformation of the tail domain.

      We have addressed the question in our discussion section regarding the tail domain and IAK motif.

      -A final schematic model would be beneficial to support the model and could be inserted within the discussion section.

      We have added a final model figure as Figure 7 in the discussion section.

      -The authors should discuss why the shortest mutant is the most active in the motility assay and how this compares with the full length protein in vivo? Can full-length kinesin1 reach similar motility?

      The shortest mutant KIF5B(1-420) only contains the motor domain and CC1, without any regulatory elements to lock it into the inhibited state. It should reflect the intrinsic biophysical property of the kinesin-1 motor domain on the microtubules. We have revised our main text to include this point. However, kinesins in cells are all full length proteins and are subjected to multiple layers of regulation. It would be hard to make the comparison between full length kinesins in vivo and the shortest mutant KIF5B(1-420).

      -Have the authors attempted to obtain the structure of a TRAK-1 kinesisn1 complex, for instance by electron microscopy? Will they consider addressing the structure of such full complexes to see whether the protein-protein interactions they infer are indeed reflected within the complexes?

      Yes, we did want to check the TRAK1-KIF5B complex using negative staining EM. However, due to the flexibility of TRAK1-KIF5B complex and the low contrast of TRAK1 protein under the negative staining EM, we could not get meaningful results.

      -Can the authors test kinesin-TRAK1 complexes in motility assays?

      There are already two studies (Canty et al., 2021, Henrichs et al., 2020) that confirmed that TRAK1 can activate the motility of kinesin-1, which we cited in our manuscript. Therefore, we did not test it in our studies.

      Reviewer 2

      -The lack of crosslinks seems to be interpreted as the lack of interactions, but that this is not necessarily the case. Also BS3 crosslinks mainly amino groups that are about 25A apart, which gives a read out of proximity rather than interactions. How many times were the crosslinking experiments done? In figure 6, there are not many crosslinks for TRAK and kinesin-1 so it would be good to know if it has been repeated.

      The number of XL-MS we have done for each sample are: KIF5B (three times), KIF5C (once), KIF5B-KLC1 (twice), KIF5C-KLC1 (twice), KIF5B(1-562) (once), KIF5B-TRAK1 (once) and KIF5B(IAK/AAA) (once). We have added the above information in the method section for the XL-MS.

      For the kinesin-1 heterotetramers, we re-performed XL-MS on KIF5B-KLC1 and KIF5C-KLC1 (Figure 1 and Figure 2) to validate our analysis in the manuscript, which shows consistent results as in our manuscript. For the XL-MS experiment on the KIF5B-TRAK1 complex, due to the time limitation, we only performed it once but would like to explore it in the future.

      We summarized identified cross-linked pairs for each kinesin-1 sample as supplementary files.

      -Regarding the interaction between TRAP and Kif5b, the authors propose TRAP activate Kif5b by disrupted the autoinhibited conformation from the lack of crosslinks and the position of the cross-links identified. What does Kif5b+TRAP (after or before crosslinking) look like by negative stain EM? The authors have done this experiments for the other samples Kif5b and Kif5b KLC so it would should be easy for the authors to do this for Ki5f5b-TRAP. Also can alphafold mutimer predict the Ki5fb-TRAP interface?

      Thanks for bringing this up. We tried to get the EM images for the TRAK1-KIF5B complex. We observed that the KIF5B alone and the TRAK1-KIF5B complex tend to fall apart if not being crosslinked before putting onto the grids. For the crosslinked samples, we are unable to see the TRAK1 clearly on the KIF5B due to the flexibility of the TRAK1-KIF5B complex and the low contrast of TRAK1 protein under the negative staining EM. We would like to explore this further.

      As for the AlphaFold prediction on KIF5B-TRAK1 complex, we found that AlphaFold did not perform well in predicting the TRAK1 on kinesin-1 stalk. We tried the combination of various TRAK1 and KIF5B fragments, but could not get any meaningful results.

      -Figure 4. Very long crosslinks are not explained by the model, and suggest the model could be partially incorrect. Can the authors state the distance between the crosslinked residues in their model in figures? Generally the authors should report all crosslink distance in their figures with molecular models.

      Thanks for bringing this up. For the model building, we used the XL-MS data as guidance to model the autoinhibited kinesin-1 with the input from AlphaFold structure prediction and EM map. We assembled the model by piecing together multiple rigid kinesin-1 fragments generated from AlphaFold structure prediction as described in the method section.

      We realize that some crosslinked residues in our model have distances greater than the maximum distance allowed for the BS3 crosslinkers, especially for the crosslinked pairs between the TPR and motor domain. We admit that our current model could be partially incorrect. Since we do not have high resolution structure data on kinesin-1, we are unsure about how to make our model to satisfy all the distance constraints. We have addressed the above limitations in our discussion section.

      -Figure 5: motility assays, the amount of data analyzed seems quite low. There are only 2 repeats done for each condition. The number of microtubules is reported rather than number of measurements done-can the authors report number of events/motors measured. It would be useful to have the concentration of motors used in the figure. Landing rate: are authors not differentiating motile vs non motile tracks also? What do the mutants look like in EM class averages?

      Thanks for bringing this up. We have revised our method section about the single molecule assay to include this information.

      Finally, we agree that it would be beneficial to check the mutants under EM. However, due to time limitations, we were unable to perform this experiment.

      -The figure in 6D needs revising. This does not look like a pulldown experiment, controls are missing and the proteins do not seem to be stoichiometric. In particular, the third lane. There are also no protein markers.

      Thank you for bringing this up. We revised Figure 6 and added the protocol for the pulldown assay in our method section for protein expression and purification.

      Minor points

      -Is the data available in PRIDE, etc...? Could the authors provide a table of xlinks?

      We have included crosslinked pairs detected in our XL-MS as supplementary files for KIF5B, KIF5C, KIF5B-KLC1, KIF5C-KLC1, KIF5B(1-565), KIF5B(IAK/AAA) and KIF5B-TRAK1. We have added a new section called Data Availability in the main manuscript to fully describe this.

      -It would be better to have the mapping of the crosslinks in the same figures as the corresponding crosslink map.

      Due to the layout of the figure, we choose to show the model and the mapped crosslinks in the same figure.

      -No crosslinks were obtained between the IAK motif and the motor domain. This could be due to the lack of neighbouring groups that can crosslink with the K in the motif, rather than the tail not binding/crosslinking to the motor. The text could be edited to explain this

      Thanks for bringing this up. We edited the text to add this point.

      -Figure 5. Typo in mutation

      We revised the figure5

      -No hyphen between c and terminus (as that is a noun)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      1) Here are a few sentences that could potentially benefit from further discussion, particularly in the context of the plant developmental framework of an effective germline. It is important to note that the idea of an effective germline is supported by many, but not all, scientists. Nevertheless, as long as this concept remains relevant, a discussion based on it may be appropriate.

      The early establishment of germlines during development is crucial in addressing the impact of somatic mutation on the next generation. To emphasize this aspect, we have included an additional sentence addressing this point in ll. 242–244.

      2) Lines 161-163: The suggestion that long-lived tropical trees do not necessarily suppress somatic mutation rates to the same extent as their temperate counterparts might warrant additional examination.

      We have revised our statement to present a more balanced perspective, and we have also included a sentence to emphasize the importance of conducting further studies in future.

      3) Lines 200-202: The observation of potential influences of GC-biased gene conversion during meiosis or biased purifying selection for C>T inter-individual nucleotide substitutions could be further elaborated upon.

      Our data does not provide enough information to delve into a more detailed discussion regarding GC-biased gene conversion during meiosis or biased purifying selection for C>T substitution. However, future studies that obtain genome sequences from somatic cells, male or female gametophytes, and offspring (such as seeds or seedlings) would offer opportunities to assess these phenomena.

      4) Line 245: The statement "somatic mutations can be transmitted to seeds" might be correct, but it would be helpful to explore the extent to which this occurs.

      In response to the comment from Reviewer 1 (#4) and 2 (#16), we have decided to remove the discussion about the heritability of somatic mutations in next generation. We have completely rewritten the final paragraph to discuss the possibility of a disparity in the relationship between lifespan and somatic mutation rates between plants and animals.

      Reviewer #2

      5) l. 108- 115: The authors seem to have made a really great work at assembling and annotating two reference genomes. Even if this does not represent the main result of the manuscript, these genomic resources are a plus for the community, especially given that reference genomes from tropical trees are known to be underrepresented in the literature (e.g. Plomion et al. 2016). The authors have made the particular effort of generating two high-quality reference genome assemblies for two species of the same genus, including one with an excellent contiguity. Even if they do not explicitly indicate the divergence time between the two species, it is clear that the cheapest solution would have been to map the reads of the two species against a single assembly, but this could have generated some biases. So by generating two de novo assemblies, the authors have used here the best design possible to control for some potential biases for the detection of somatic mutations. However, given the interests these two assemblies represent by themselves, I consider that a couple of additional investigations could have been made on local synteny and orthologous genes in particular. Thanks to whole-genome alignments and orthology (e.g. Lovell et al. 2022), they could have generated more general information regarding the two assembles and investigated additional questions regarding mutations, e.g. mutations in collinear / non-collinear (if any) segments, intensity of purifying selection (or neutral evolution) at single vs. multiple copies or between shared vs. private genes, etc.

      To address the comment by Reviewer 2, we performed synteny analysis using the MCScanX in TBtools-II and added Supplementary Figure 3 to illustrate conserved synteny relationship between S. laevis and S. leprosula. Detecting selection in the genome will be a future study as our current data are not sufficient for the aim because of limited number of individuals (n = 2 for each species).

      6) l. 123-124. Here, the authors indicate that they have "validated" 93.9% of the mutations. It would be more accurate to indicate that they have "validated" 31/33 mutations (94%), 22/24 mutations on S1 and 9/9 on S2 (Table S5). Can the authors indicate why no somatic mutations from the F1 and F2 were tested? According to me, the use of the word "validation" is not totally accurate (see also Schmitt et al. 2022), since amplicon sequencing can be viewed as a kind of validation but it doesn't represent a complete validation since it represents new sequencing data that are mapped against the same reference assembly, in such a way that we could always imagine that the same biases are at play, leading to a similarly false positive call. Reciprocally, a "non-validated" mutation could be associated to a mutation that is at a too low allele frequency, at least after amplification, in such a way that the call is not heterozygous despite the fact that the mutation is real. I think that another terminology than "validated" could be used, plus one or two sentences explaining this degree of complexity.

      To improve the clarity of the statement, we have modified the sentence as follows: We conducted an independent evaluation of a subset of the inferred single nucleotide variants (SNVs) using amplicon sequencing. Our analysis demonstrated accurate annotation for 31 out of 33 mutations (94% overall), with 22 out of 24 mutations on S1 and all 9 mutations on S2 (Supplementary Table 5).”

      While we did not conduct additional assessments using F1 and F2, we anticipate a similar high level of agreement between the somatic SNV calls and amplicon sequencing in these trees. We have included sentences in the Materials and Methods section to elucidate the challenges involved in validating true somatic mutations.

      7) l. 135-137 the reasoning appears to be quite circular to me. As indicated by the authors in the line just before, an incongruent pattern could also be explained biologically, in such a way that the overall congruency between the phylogenetic tree and the tree architecture cannot be considered as a way to prove the reliability of the detection. In some species, it seems clear that the phylogenetic tree do not seem to follow the plant architecture (Zahradnikova et al. 2020) in such a way that we should argue to not consider the plant architecture in the design and not consider this represents either a way to validate mutations or a way to validate the methodological framework. I suggest removing this sentence.

      We have removed the sentence as suggested by Reviewer 2.

      8) l. 150. It seems that the differences in length and diameter between the two species come from two different studies and therefore that no statistical test has been performed to test its significance.

      We agree with Reviewer 2. To clarify this point, we have replaced “significantly” with “substantially” in the revised text.

      9) l. 156-159: the same sentence is repeated twice.

      We have removed the repeated sentence.

      10) l. 159-161: Comparing somatic mutation rates between studies is difficult. It is too sensitive to the methodology used, here again see Schmitt et al. 2022. I propose to remove these two sentences. It represents an interesting working hypothesis but would require a better design, or at least, to reanalyze all the data with the same pipeline.

      We have toned down our statement, and added a sentence that additional studies are required to compare somatic mutation rates among trees in tropical, temperate, and boreal regions, employing standardized methodologies.

      11) l. 171-175: Here I am wondering if the authors could provide more information regarding the enrichment at CpG sites? I suggest first estimating the proportion of CpG sites thanks to the two genome assemblies and then using this information as a way to weight the results and therefore to estimate the level of enrichment of mutations at CpG sites.

      In response to the comment by Reviewer 2, we first determined the proportion of CpG sites as 0.030 and 0.028 for S. laevis and S. leprosula, respectively, based on the triplet matrix using the reference genome of each species. Subsequently, we estimated the proportion of somatic mutations at CpG sites. The results revealed a 4.54-fold and 3.53-fold increase in somatic mutations at CpG sites for S1 and S2, and a 3.38-fold and 2.56-fold increase for F1 and F2, respectively. We have incorporated this finding into ll. 172–175.

      12) l. 176-187. Interesting comparison and insights. You could also indicate that SBS5 is also detected in all human cancers too. So the detection of SBS1 and SBS5 signatures indeed suggest some shared mutation biases. Note that in humans, a specific signature of UV is associated to TCG -> TTG mutations (Martincorena & Campbell, 2015). It seems that there is a substantial difference in the mutation spectra between the two trees for this specific category, note sure if this difference could be associated to UV.

      We slightly modified the sentence to indicate that SBS5 is also detected in all human cancers. We are very interested in the potential impact of UV on somatic mutations in tropical trees, considering the high levels of UVR in the tropics. Conducting a comparative analysis of the mutational spectrum among trees inhabiting diverse UVR environments would provide valuable insights to substantiate this hypothesis.

      13) l. 206: I rather suggest "the somatic mutation rate per year is roughly the same, suggesting that somatic mutations rates are independent of growth rate".

      In response to the suggestion from Reviewer 2, we have revised the sentence as follows: "The somatic mutation rate per year remains largely consistent, indicating that somatic mutation rates are independent of the growth rate."

      14) l. 207-232: Here, It is the section looks a mixture between a result and a discussion. I guess the authors consider here that it remains a verbal model at this stage and it therefore represents more a discussion. If so, I agree but it could be good to discuss more this part, in particular to know how this model could be improved and empirically tested.

      The argument based on the model will be more accurate when the cell cycle duration can be directly estimated for each tree. We have added this explanation in the revised text.

      15) l. 238-239: The parallel drawn with the molecular clock is interesting but according to me, it remains a working hypothesis at this stage, since it is not validated outside the two focal species. I encourage the readers to continue to work on this question and to investigate also some annual plants for instance in the future (assuming that they have a higher α) in order to be able to derive a global model. In addition, even if I consider that the authors use and interpret this parallel wisely, I consider that the use of this terminology could be misleading for some readers. That's why I also suggest removing "molecular clock" from the title and using a more explicit one, e.g. "Somatic mutation rates scale with time not growth rate in dipterocarp trees".

      We agree with Reviewer 2. We have changed the title to “Somatic mutation rates scale with time not growth rate in long-lived tropical trees.”

      16) l. 245-249: The results rather suggest that (i) there is little diversity due to somatic mutations and that (ii) most heritable non-synonymous mutations are deleterious and therefore purged from the population. So rather than this last section of this discussion that has little interest and could be quite debatable, I consider that the authors could extend their discussion, e.g. the differences with somatic mutations in mammals (recently, Cagan and coauthors (2022) demonstrated that somatic mutation rates are inversely correlated with lifespan in mammals) or the overall low rate of molecular evolution in trees could be some directions. But there are many others.

      We have completely rewritten the final paragraph to propose the possibility of a disparity in the relationship between lifespan and somatic mutation rates between plants and animals, rather than discussing the heritability of somatic mutation in next generation.

      17) l. 570-571: I guess, the reader should understand here "fixed at the heterozygous state"

      To avoid confusion, we have modified the text as follows: “If the alternative allele was present or absent in all eight branches in the amplicon sequence, the site was determined as fixed within an individual tree.” We have also removed “heterozygote” in Supplementary Figure 5.

      18) Fig. 4d. the y-axis would be easier to interpret by writing "Delta Inter-individual vs. Somatic SNPs" and/or by adding arrows on the right margin of the plot to indicate the directions with some short sentences such as "more somatic mutations observed than expected assuming the inter-individual comparison", "less somatic mutation than expected". According to me, some statistical tests are lacking here. Are the differences in the mutation spectra significant given the relatively limited amount of somatic mutations detected?

      We have added short sentences explaining the directions.

      19) Supplementary Tables (excel file): please correct the typos. There are many on these supplementary tables.

      We carefully checked supplementary tables and corrected the typos.

      Reviewer #3

      20) To estimate false negative rates, the authors might consider using mutation insertion tools such as Bamsurgeon (https://github.com/adamewing/bamsurgeon) to create simulated mutations. Alternatively, one could assess the calling rate of high-confidence SNPs that differ between individuals of the same species to get at the FNR.

      We agree with Reviewer 3. To calibrate our pipeline, we previously performed simulation to estimate the false negative and positive rates in different tree species (Betula platyphylla) using wgsim v0.1.11 (https://github.com/lh3/wgsim). Based on our simulations, we found that the false negative and false positive rates were very low, averaging at 0.050 and 0.046, respectively. It is important to note that the estimated false positive rate obtained from the simulation data was substantially lower than the proportion of potential false positive SNVs (as shown in Supplementary Fig. 5). This observation suggests that simulation-based evaluation of the false positive rate is not reliable, at least for the tree species we studied. Similarly, the same argument could be applied to the false negative rate. Therefore, we conclude that the simulation-based analysis for estimating false positive and false negative rates is not informative for our study.

      The rate of true-positive or false-negative mutation calls can be estimated only when the true mutational status is known, but the data are not currently available. However, under the assumption that the final set of SNVs represents true somatic mutations, we were able to calculate the potential false negative rate. Our findings indicate that this rate is low, specifically less than 10%, when using less stringent filtering thresholds such as BQ20 and MQ20. While these estimated values may not precisely represent the true false negative rate, we included them as potential false negative rates in Supplementary Figure 7 of the revised manuscript. This information provides additional insights into the performance of our pipeline under different filtering thresholds and contributes to the overall assessment of our study.

      21) It may be interesting to examine the mutation trees for constancy (or not) in mutation rate per meter. Examining Figure 1, it appears that the number of mutations near the crown "4" node is consistently higher than in nearby nodes (3-1 and 3-2).

      We calculated the branch-level increment of SNVs per meter by dividing the number of single nucleotide variations (SNVs) by the physical distance. Our analysis revealed a slight increase in the number of SNVs per meter as the branch position became higher in S. laevis, as shown in Author response table 1. However, this trend was not clearly observed in S. leprosula. We found this observation in S. laevis intriguing, particularly because our recent analysis (Tomimoto et al., in preparation) demonstrated that genetic distance increases in branch pairs located in the upper part of a tree. This was elucidated through a mathematical model that describes the dynamics of the stem cell population during elongation and branching. We opted not to delve further into the findings in the current manuscript, as this topic will be extensively investigated in a future study.

      Author response table 1.

      The branch-level increment of SNVs per meter.

      22) Line 150: Use of "significantly different" is confusing as the phrase is usually reserved for statistical significance. Consider replacing with "substantially different."

      We have replaced “significantly” with “substantially” in the revised text.

      23) In the Discussion, a clearer explanation of the assumptions that underlie the authors' reasoning would be welcome: e.g., constancy in mutation rate per meter within an individual tree. In particular, the authors assume that mutations that are seen in one leaf and not in another cannot have predated the most recent common meristematic node linking the two leaves. Is this a reasonable assumption? Since the meristem is multicellular, is it possible for a mutation to have arisen earlier in development and "assorted" into one cell lineage but not another?

      We greatly appreciate an important comment. It is true that when the meristem is multicellular, and the stem cell lines are retained during mutation accumulation (e.g. a structured meristem analyzed in Tomimoto and Satake 2023), it is possible for a mutation to have arisen earlier before the bifurcation. Using a mathematical model, we have proved that the intercept and slope of the linear regression between the pairwise genetic distance and physical distance are influenced by the type of a meristem (strength of somatic genetic drift in a meristem) as well as the branching architecture of the tree. We have included an explanation of this point in the revised manuscript (ll. 244–249).

      24) Supplementary Data 7: Column J should be "2_2"

      We corrected the typo.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary - This study was designed to investigate changes in gene expression and associated chromatin accessibility patterns in spermatogonia in mice at different postnatal stages from pups to adults. The objective was to describe dynamic changes in these patterns that potentially correlate with functional changes in spermatogonia as a function of development and reproductive maturation. The potential utility of this information is to serve as a reference against which similar data from animals subjected to various disruptive environmental influences can be compared.

      Major Strengths and Weaknesses of the Methods and Results - A strength of the study is that it reviews previously published datasets describing gene expression and chromatin accessibility patterns in mouse spermatogonia. A weakness of the study is that it is not clear what new information is provided by the data provided that was not already known from previously published studies (see below). Specific weaknesses include the following:

      • Terminology - in the Abstract and first part of the Introduction the authors use the generic term "spermatogonial cells" in a manner that seems to be referring primarily to spermatogonial stem cells (SSCs) but initially ignores the well-known heterogeneity among spermatogonia - particularly the fact that only a small proportion of developing spermatogonia become SSCs - and ONLY those SSCs and NOT other developing spermatogonia - support steady-state spermatogenesis by retaining the capacity to either self-renew or contribute to the differentiating spermatogenic lineage throughout the male reproductive lifespan. The authors eventually mention other types of developing male germ cells, but their description of prospermatogonial stages that precede spermatogonial stages is deficient in that M-prospermatogonia - which occur after PGCs but before T1-prospermatogonia - are not mentioned. This description also seems to imply that all T2-prospermatogonia give rise to SSCs which is far from the case. It is the case that prospermatogonia give rise to spermatogonia, but only a very small proportion of undifferentiated spermatogonia form the foundational SSCs and ONLY SSCs possess the capacity to either self-renew or give rise to sequential waves of spermatogenesis.

      We thank Reviewer 1 for the comments and clarifications. As suggested in the previous revision, we use the term spermatogonial cells (SPGs) to make it clear that our cell preparations do not exclusively contain SSCs but all SPGs since they derive from a FACS enrichment strategy. This is explained in the manuscript. Further, we conducted deconvolution analyses on the datasets to examine the composition of the enriched SPGs preparations and provide new sequencing information confirming the presence of SSCs and differentiating SPGs.

      • Introduction - Statements regarding distinguishing transcriptional signatures in spermatogonia at different postnatal stages appear to refer to ALL subtypes of spermatogonia present at each stage collectively, thereby ignoring the well-known fact that there are distinct spermatogonial subtypes present at each postnatal stage and that some of those occur at certain stages but not at others. This brings into question the usefulness of the authors' discussion of what types of genes are expressed and/or what types of changes in chromatin accessibility are detected in spermatogonia at each stage.

      We agree that our data do not provide information about the transcriptional program of each subtype of SPGs. Rather they provide information about the dynamics of transcriptional programs in the transition from postnatal stage to adulthood in an enriched population of SPGs. The datasets are comprehensive and contain mRNA and non-coding RNA (with and without a polyA+ tail), which provides more precise transcriptomic information than classical single cell methods.

      • Methodology - The authors based recovery (enrichment) of spermatogonia from male pups on FACS sorting for THY1 and RMV-1. While sorting total testis cells for THY1+ cells does enrich for spermaogonia, this approach is now known to not be highly specific for spermatogonia (somatic cells are also recovered) and definitely not for SSCs. There are more effective means for isolating SSCs from total testis cells that have been validated by transplantation experiments (e.g. use of the Id4/eGFP transgene marker).

      We acknowledge the technical limitations of our enrichment strategy and made them clear in our revised manuscript.

      The authors then used "deconvolution" of bulk RNA-seq data in an attempt to discern spermatogonial subtype-specific transcriptomes. It is not clear why this is necessary or how it is beneficial given the availability of multiple single-cell RNA-seq datasets already published that accomplish this objective quite nicely - as the authors essentially acknowledge. Beyond this concern, a potential flaw with the deconvolution of bulk RNA-seq data is that this is a derivative approach that requires assumptions/computational manipulations of apparent mRNA abundance estimates that may confound interpretation of the relative abundance of different cellular subtypes within the hetergeneous cell population from which the bulk RNA-seq data is derived. Bottom line, it is not clear that this approach affords any experimental advantage over use of the publicly available scRNA-seq datasets and it is possible that attempts to employ this approach may be flawed yielding misleading data.

      The deconvolution analyses were necessary to address the question of the cell composition of our preparations raised by reviewers. These analyses were highly beneficial because they clarify the presence of different SPGs including SSCs in the samples. They are also advantageous because the datasets they are conducted upon have significantly higher sequencing coverage than published single cell datasets. They contain the full transcriptome and not just polyA+ transcripts as 10x datasets thus they provide considerably richer and more comprehensive transcriptomic information. This is very important to correctly interpret the results and to gain additional biological information. For the deconvolution analyses, we used state-of-the-art methods with proper computational controls for calibration. We selected published single-cell RNA-seq datasets of the highest quality. These analyses are extremely useful because they confirm the predominance of SSCs in the postnatal and adult cell samples and a minimal contamination by somatic cells. Our approach also provides a useful workflow that can easily be used by other researchers who cannot afford single-cell RNA-seq and allow them gain more information about the cellular composition of their samples. Finally, the execution of any computational analyses, including analyses of single-cell RNA-seq datasets requires to make assumptions during the development and the use of a method. The assumptions made for deconvolution analyses are not special in this respect and do not introduce more confounds than other methods. What is critical for such analyses is to include proper controls for calibration, which we carefully did and validated using our own previously published datasets for Sertoli cells.

      • Results & Discussion - In general, much of the information reported in this study is not novel. The authors' discussion of the makeup of various spermatogonial subtypes in the testis at various ages does not really add anything to what has been known for many years on the basis of classic morphological studies. Further, as noted above, the gene expression data provided by the authors on the basis of their deconvolution of bulk RNA-seq data does not add any novel information to what has been shown in recent years by multiple elegant scRNA-seq studies - and, in fact, as also noted above - represents an approach fraught with potential for misleading results. The potential value of the authors' report of "other cell types" not corresponding to major somatic cell types identified in earlier published studies seems quite limited given that they provide no follow-up data that might indicate the nature of these alternative cell types. Beyond this, much of the gene expression and chromatin accessibility data reported by the authors - by their own admission given the references they cite - is largely confirmatory of previously published results. Similarly, results of the authors' analyses of putative factor binding sites within regions of differentially accessible chromatin also appear to confirm previously reported results. Ultimately, it is not at all novel to note that changes in gene expression patterns are accompanied by changes in patterns of chromatin accessibility in either related promoters or enhancers. The discussion of these observations provided by the authors takes on more of a review nature than that of any sort of truly novel results. As a result, it is difficult to discern how the data reported in this manuscript advance the field in any sort of novel or useful way beyond providing a review of previously published studies on these topics.

      • Likely impact - The likely impact of this work is relatively low because, other than the value it provides as a review of previously published datasets, the new datasets provided are not novel and so do not advance the field in any significant manner.

      We acknowledge that much of the reported information is not novel but this is not necessarily a drawback as sequencing datasets on the same tissues or cells produced by different groups using comparable methods are common. This does not diminish the validity and usefulness of the datasets but rather enriches the respective fields as omics methods and data analyses can deliver different findings. Thus, our study cannot be criticized and disqualified because other datasets have been published but instead it should be acknowledged for providing high resolution full transcriptome information from different stages and adult of SCs that other studies do not provide. In this respect, the subjective nature of Reviewer 1’s statements is of concern. For instance, the statement: “…represents an approach fraught with potential for misleading results”. Such declaration suggests that all studies that previously used enrichment strategies are “fraught with potential for misleading results», which disqualifies the work of many colleagues. Further, this wrongly assumes that newer technologies are exempt of “potential for misleading results» which is not the case. Single-cell RNA-seq methods, extensively used to study SPGs, has been questioned for their limitation and potential biases due to low sequencing coverage, issues with transcript detection, low capture efficiency and higher degree of noise than bulk RNA datasets. Thus, caution is needed to interpret single-cell datasets on SPGs and these datasets also have their biases. For our datasets, we made major efforts to address the criticisms raised by the reviewer and reduce any potential misleading information by conducting additional analyses, by providing more details on the methods and enrichment strategy and by being careful with data interpretation. We would be grateful if these efforts could be acknowledged and the improvements on the manuscript and the value of the datasets be evaluated with objectivity.

      Reviewer #2 (Public Review):

      This revised manuscript attempts to explore the underlying chromatin accessibility landscape of spermatogonia from the developing and adult mouse testis. The key criticism of the first version of this manuscript was that bulk preparations of mixed populations of spermatogonia were used to generate the data that form the basis of the entire manuscript. To address this concern, the authors applied a deconvolution strategy (CIBERSORTx (Newman et al., 2019)) in an attempt to demonstrate that their multi-parameter FACS isolation (from Kubota 2004) of spermatogonia enriched for PLZF+ cells recovered spermatogonial stem cells (SSCs). PLZF (ZBTB16) protein is a transcription factor known to mark all or nearly all undifferentiated spermatogonia and some differentiating spermatogonia (KIT+ at the protein level) - see Niedenberger et al., 2015 (PMID: 25737569). The authors' deconvolution using single-cell transcriptomes produced at postnatal day 6 (P6) argue that 99% of the PLZF+ spermatogonia at P8 are SSCs, 85% at P15 and 93% in adults. Quite frankly given the established overlap between PLZF and KIT and known identity of spermatogonia at these developmental stages, this is impossible. Indeed - the authors' own analysis of the reference dataset demonstrates abundant PLZF mRNA in P6 progenitor spermatogonia - what is the authors' explanation for this observation? The same is essentially true in the use of adult references for celltype assignment. The authors found 63-82% of SSCs using this different definition of types (from a different dataset), begging the question of which of these results is true.

      For full transparency, we provided information about the deconvolution analyses for all libraries that use cell-type specific matrices generated from PND6 and adult single-cell RNA-seq reference datasets in our previous response (Fig1-3, response to reviewer 1). However, we don’t claim “that 99% of the PLZF+ spermatogonia at P8 are SSCs, 85% at P15 and 93% in adults”. Of these percentages, the ones that correspond to our postnatal libraries are the ones reported in our updated manuscript (Please see FigS2). Importantly, we never claimed that these percentages correspond to “PLZF+ spermatogonia», exclusively. Rather, they were inferred using gene expression-specific signature matrices (Fig1-c response to Reviewer 1 as example). As clearly evident in feature maps in FigS2 of our updated manuscript, the cellular population identified as SSCs using the dataset from Hermann et al., 2018 shows overlap for the expression of Ddx4, Zbtb16 (PLZF), Gfra1 and Id4 but minimal Kit. In agreement with the reviewer’s observation, progenitors also show a signal for Zbtb16 but have a different gene expression signature matrix (see Fig.1c and 2c for an example of gene signature matrices from PND6 and adult samples from the same publication).

      Regarding the question of which of these results are true, we observed that deconvolution analyses of our postnatal libraries using two different single-cell postnatal RNA-seq reference datasets consistently suggest a high contribution (>90%) by SSCs (defined using cell-specific expression matrices following identification of cell-types that match the closest ones reported by each study (See FigS2 updated manuscript). The analyses of our adult libraries using published adult datasets from the same group (Hermann et al., 2018; Fig1 response to Reviewer 1 and FigS2 updated manuscript) suggest that the contribution of adult SSCs to the cell population is lower than at postnatal stages, but SSCs still are the most abundant cell stage identified in our libraries (FigS2g). We reported these analyses and acknowledge that in our adult samples, we also likely have differentiating SPGs.

      In their rebuttal, the authors also raise a fair point about the precision of differential gene expression among spermatogonial subsets. At the mRNA level, Kit is definitely detectable in undifferentiated spermatogonia, but it is never observed at the protein level until progenitors respond to retinoic acid (see Hermann et al., 2015). I agree with the authors that the mRNAs for "cell type markers" are rarely differentially abundant at absolute levels (0 or 1), but instead, there are a multitude of shades of grey in mRNA abundance that "separate" cell types, particularly in the male germline and among the highly related spermatogonial subtypes of interest (SSCs, progenitor spermatogonia and differentiating spermatogonia). That is, spermatogonial biology should be considered as a continuous variable (not categorical), so examining specific cell populations with defined phenotypes (markers, function) likely oversimplifies the underlying heterogeneity in the male germ lineage. But, here, the authors have ignored this heterogeneity entirely by selecting complex populations and examining them in aggregate. We already know that PLZF protein marks a wide range of spermatogonia, complicating the interpretation of aggregate results emerging from such samples. In their rebuttal, the authors nicely demonstrate the existence of these mixtures using deconvolution estimation. What remains a mystery is why the authors did not choose to perform single-cell multiome (RNA-seq + ATAC-seq) to validate their results and provide high-confidence outcomes. This is an accessible technique and was requested after the initial version, but essentially ignored by the authors.

      We agree with the reviewer that the male germ lineage should be considered as a continuous variable and that examining specific cell populations with defined features oversimplifies its heterogeneity. Regarding the use of single-cell multiome (RNA-seq + ATAC-seq), we also agree that this technology can provide additional insight by integrating RNA and chromatin accessibility in the same cells. However, it is an refined method that is expensive, time consuming and requires human resources that are beyond our capacity for this project.

      A separate question is whether these data are novel. A prior publication by the Griswold lab (Schleif et al., 2023; PMID: 36983846) already performed ATAC-seq (and prior data exist for RNA-seq) from germ cells isolated from synchronized testes. These existing data are higher resolution than those provided in the current manuscript because they examine germ cells before and after RA-induced differentiation, which the authors do not base on their selection methods. Another prior publication from the Namekawa lab extensively examined the transcriptome and epigenome in adult testes (Maezawa et al., 2000; PMID: 32895557; and several prior papers). The authors should explain how their results extend our knowledge of spermatogonial biology in light of the preceding reports.

      Our data do extend previous studies because they provide high-resolution transcriptomic (full transcriptome) and chromatin accessibility profiling in postnatal and adult stages. They now also provide an approach for deconvolution analyses of bulk RNA datasets that can be of use to the community. Novelty in the field of omics is usually not a prime feature and it is common that datasets on the same tissues or cells be published by different groups using comparable methods and analyses.

      The authors are also encouraged to improve their use of terminology to describe the samples of interest. The mitotic male germ cells in the testis are called spermatogonia (not spermatogonial cells, because spermatogonia are cells). Spermatogonia arise from Prospermatogonia. Spermatogonia are divisible into two broad groups: undifferentiated spermatogonia (comprised of few spermatogonial stem cells or SSCs and many more progenitor spermatogonia - at roughly 1:10 ratio) and differentiating spermatogonia that have responded to RA. The authors also improperly indicate that SSCs directly produce differentiating spermatogonia - indeed, SSCs produce transit-amplifying progenitor spermatogonia, which subsequently differentiate in response to retinoic acid stimulation. Further, the use of Spermatogonial cells (and SPGs) is imprecise because these terms do not indicate which spermatogonia are in question. Moreover, there have been studies in the literature which have used similar terms inappropriately to refer to SSCs, including in culture. A correct description of the lineage and disambiguation by careful definition and rigorous cell type identification would benefit the reader.

      Overall, my concern from the initial version of this manuscript stands - critical methodological flaws prevent interpretation of the results and the data are not novel. Readers should take note that results in essentially all Figures do not reflect the biology of any one type of spermatogonium.

      We revised and improved the terminology wherever possible and also considering requests from other reviewers about terminology.

      Reviewer #3 (Public Review):

      In this study, Lazar-Contes and colleagues aimed to determine whether chromatin accessibility changes in the spermatogonial population during different phases postnatal mammalian testis development. Because actions of the spermatogonial population set the foundation for continual and robust spermatogenesis and the gene networks regulating their biology are undefined, the goal of the study has merit. To advance knowledge, the authors used mice as a model and isolated spermatogonia from three different postnatal developmental age points using cell sorting methodology that was based on cell surface markers reported in previous studies and then performed bulk RNA-sequencing and ATAC-sequencing. Overall, the technical aspects of the sequencing analyses and computational/bioinformatics seems sound but there are several concerns with the cell population isolated from testes and lack of acknowledgement for previous studies that have also performed ATAC-sequencing on spermatogonia of mouse and human testes. The limitations, described below, call into question validity of the interpretations and reduce the potential merit of the findings.

      I suggest changing the acronym for spermatogonial cells from SC to SPG for two reasons. First, SPG is the commonly used acronym in the field of mammalian spermatogenesis. Second, SC is commonly used for Sertoli Cells.

      This was suggested in the previous review by Reviewer 1 and was modified in the revised version of the manuscript.

      The authors should provide a rationale for why they used postnatal day 8 and 15 mice. The FACS sorting approach used was based on cell surface proteins that are not germline specific so there was undoubtedly somatic cells in the samples used for both RNA and ATAC sequencing. Thus, it is essential to demonstrate the level of both germ cell and undifferentiated spermatogonial enrichment in the isolated and profiled cell populations. To achieve this, the authors used PLZF as a biomarker of undifferentiated spermatogonia. Although PLZF is indeed expressed by undifferentiated spermatogonia, there have been several studies demonstrating that expression extends into differentiating spermatogonia. In addition, PLZF is not germ cell specific and single cell RNA-seq analyses of testicular tissue has revealed that there are somatic cell populations that express Plzf, at least at the mRNA level. For these reasons, I suggest that the authors assess the isolated cell populations using a germ cell specific biomarker such as DDX4 in combination with PLZF to get a more accurate assessment of the undifferentiated spermatogonial composition. This assessment is essential for interpretation of the RNA-seq and ATAC-seq data that was generated.

      A previous study by the Namekawa lab (PMID: 29126117) performed ATAC-seq on a similar cell population (THY1+ FACS sorted) that was isolated from pre-pubertal mouse testes. It was surprising to not see this study referenced to in the current manuscript. In addition, it seems prudent to cross-reference the two ATAC-seq datasets for commonalities and differences. In addition, there are several published studies on scATAC-seq of human spermatogonia that might be of interest to cross-reference with the ATAC-seq data presented in the current study to provide an understanding of translational merit for the findings.

      These points have been addressed in our previous response and in the revised manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Weaknesses:

      There appears to be a lack of basic knowledge of the process of spermatogenesis. For instance, the statement that "During the first week of postnatal life, a population of SCs continues to proliferate to give rise to undifferentiated Asingle (As), Apaired (Apr) and Aaligned (Aal) cells. The remaining SCs differentiate to form chains of daughter cells that become primary and secondary spermatocytes around postnatal day (PND) 10 to 12." is inaccurate. The Aal cells are the spermatogonial chains, the two are not distinct from one another. In addition, the authors fail to mention spermatogonial stem cells which form the basis for steady-state spermatogenesis. The authors also do not acknowledge the well-known fact that, in the mouse, the first wave of spermatogenesis is distinct from subsequent waves. Finally, the authors do not mention the presence of both undifferentiated spermatogonia (aka - type A) and differentiating spermatogonia (aka - type B). The premise for the study they present appears to be the implication that little is known about the dynamics of chromatin during the development of spermatogonia. However, there are published studies on this topic that have already provided much of the information that is presented in the current manuscript.

      Regarding the inaccuracy and incompleteness of some of the statements about spermatogonial cells and spermatogenesis. In the Introduction, we replaced the following statement: "During the first week of postnatal life, a population of SCs continues to proliferate to give rise to undifferentiated Asingle (As), Apaired (Apr) and Aaligned (Aal) cells. The remaining SCs differentiate to form chains of daughter cells that become primary and secondary spermatocytes around postnatal day (PND) 10 to 12." by: “Spermatogonial cells (SPGs) are the initiators and supporting cellular foundation of spermatogenesis in testis in many species, including mammals. In the mammalian testis, the founding germ cells are primordial germ cells (PGCs), which give rise sequentially to different populations of SPGs : primary transitional (T1)-prospermatogonia (ProSG), secondary transitional (T2)-ProSG, and then spermatogonial stem cells (SSCs) (McCarrey, 2013; Rabbani et al., 2022; Tan et al., 2020). The ProSG population is exhausted by postnatal day (PND) 5 (Drumond et al., 2011) and by PND6-8, distinct SPGs subtypes can be distinguished on the basis of specific marker proteins and regenerative capacity (Cheng et al., 2020; Ernst et al., 2019; Green et al., 2018; Hermann et al., 2018; Tan et al., 2020).

      SSCs represent an undifferentiated population of SPGs that retain regenerative capacity and divide to either self-renew or generate progenitors that initiate spermatogenic differentiation, giving rise to differentiating SPGs (diff-SPGs ). Diff-SPGs form chains of daughter cells that become primary and secondary spermatocytes around PND10 to 12. Spermatocytes then undergo meiosis and give rise to haploid spermatids that develop into spermatozoa. Spermatozoa are then released into the lumen of seminiferous tubules and continue to mature in the epididymis until becoming capable of fertilization by PND42-48 in mice  (Kubota and Brinster, 2018; Rooij, 2017).”

      Regarding the premise and implications of our findings. We clarified the premise of our finding in the revised manuscript. The following statement was included in the Discussion: "our findings complement existing datasets on spermatogonial cells by providing parallel transcriptomic and chromatin accessibility maps at high resolution from the same cell populations at early postnatal, late postnatal and adult stages collected from single individuals (for adults)".  

      It is not clear which spermatogonial subtype the authors intended to profile with their analyses. On the one hand, they used PLZF to FACS sort cells. This typically enriches for undifferentiated spermatogonia. On the other hand, they report detection in the sorted population of markers such as c-KIT which is a well-known marker of differentiating spermatogonia, and that is in the same population in which ID4, a well-known marker of spermatogonial stem cells, was detected. The authors cite multiple previously published studies of gene expression during spermatogenesis, including studies of gene expression in spermatogonia. It is not at all clear what the authors' data adds to the previously available data on this subject.

      The authors analyzed cells recovered at PND 8 and 15 and compared those to cells recovered from the adult testis. The PND 8 and 15 cells would be from the initial wave of spermatogenesis whereas those from the adult testis would represent steady-state spermatogenesis. However, as noted above, there appears to be a lack of awareness of the well-established differences between spermatogenesis occurring at each of these stages.

      We applied computational deconvolution to our bulk RNA-seq datasets, employing publicly available single-cell RNA-seq datasets, to estimate and identify cellular composition. Trained on high-quality RNA-seq datasets from pure or single-cell populations, deconvolution algorithms create expression matrices reflecting the cellular diversity in reference datasets. These cell-type-specific expression matrices are subsequently used to determine the cellular composition of bulk RNA-seq samples with unknown cellular components (Cobos et al., 2023).

      For our analysis, we chose CIBERSORTx (Newman et al., 2019), recognized as the most advanced deconvolution algorithm to date, employing it with three high-quality, publicly available single-cell RNA-seq datasets. First, we assessed the cellular composition of all our RNA-seq libraries, using datasets generated by (Hermann et al., 2018) which characterized the single-cell transcriptomes of testicular cells and various populations of spermatogonial progenitor cells (SPGs) in early postnatal (PND6) and adult stages. This enabled us to not only address potential somatic cell contamination but also to analyse the composition of isolated SPGs using a unified dataset source.

      Author response image 1.

      Deconvolution analysis of bulk RNA-seq samples using PND6 single-cell RNA seq from Hermann et al, 2018 a. Seurat clusters from PND6 single-cell RNA-seq. b. Feature maps of gene expression for markers of SPGs and somatic cells. c. Gene expression signature matrix from PND6  single-cell RNA-seq datasets. d. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. e. Dotplot of the average estimated proportion of SSCs in all bulk RNA-seq libraries reported in this study.

      By re-analyzing the single-cell RNA-seq datasets, we identified distinct cell-type clusters, marked by specific cellular markers as reported in the original and subsequent studies (Author response image 1a,b and Author response image 2a,b). Then, CIBERSORTx generated gene-expression signature matrices and estimated the cell-type proportions within our 18 bulk RNA-seq libraries. Evaluation of our postnatal libraries (PND8 and 15) against a PND6 signature matrix revealed a predominant derivation from SPGs, with average estimated proportions of spermatogonial stem cells (SSCs) being 0.99 and 0.85 for PND8 and PND15 samples, respectively (Author response image 1c-e). Notably, the analysis of PND15 libraries also suggested the presence of additional SPGs types, including progenitors and differentiating SPGs (Author response image 1d), albeit at lower frequency. 

      Similarly, evaluation of our adult RNA-seq libraries, using an adult signature matrix, showed an average SSC proportion of 0.82, indicating a primary derivation from SSC cells. Consistent with the findings from PND15 libraries, our deconvolution analysis also suggests the presence of additional SPG types, including progenitors and differentiating SPGs (Author response image 1d). However, unlike our early and late postnatal stage libraries, the deconvolution analysis of adult libraries indicated the presence of other cell types (labeled "Other"), not corresponding to the major somatic cell types identified by Hermann et al. 2018. The estimated average proportion of these cells was less than 0.05 in two adult libraries and 0.10 in the others. This variance in cellular composition underlines the deconvolution method's effectiveness in dissecting complex cellular compositions in bulk RNA-seq samples.

      Author response image 2.

      Deconvolution analysis of bulk RNA-seq samples using Adult single-cell RNA seq (Hermann et al, 2018) a. Seurat clusters from Adult single-cell RNA-seq. b. Feature maps of gene expression for markers of SPG and somatic cells. c. Gene expression signature matrix from Adult single-cell RNA-seq datasets. d. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. e. Dotplot of the average estimated proportion of SSCs in all bulk RNA-seq libraries reported in this study.

      To further validate our observations, we re-analyzed two additional testicular single-cell RNA-seq datasets derived from an early postnatal stage (PND7) (Tan et al., 2020) and adult (Green et al., 2018) (Author response image 3a,b and Author response image 4a,b). We identified distinct cell-type clusters, marked by specific cellular markers (Author response image 3a,b and Author response image 4a,b), and proceeded with the deconvolution analysis using CIBERSORTx. Evaluation of our postnatal libraries (PND8 and 15) against the PND7 signature matrix from Tan et al., 2020 confirmed a derivation from germ cells (Author response image 3d,e), in particular from SSCs (Author response image 3g), with average estimated proportions of SSCs being 0.93 and 0.86 for PND8 and PND15 samples, respectively, and the rest estimated to be in origin from differentiating SPGs (Author response image 3g,h). In the case of the adult samples, evaluation against the adult signature matrix from Green et al., 2018 confirmed a predominant derivation from SSCs, with average estimated proportions of SSCs being 0.79, consistent with the 0.82 estimated proportion from Hermann et al., 2018. 

      Author response image 3.

      Deconvolution analysis of bulk RNA-seq samples with additional single-cell datasets. Seurat clusters from PND7 single-cell RNA-seq (Tang 2020). b. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. c. Dotplot of the average estimated proportion of germ cells in all bulk RNA-seq libraries reported in this study. d. Re-clustering of germ cell cluster shown in a. e. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. f. Dotplot of the average estimated proportion of SSCs in all bulk RNA-seq libraries reported in this study. g. Seurat clusters from adult single-cell RNA-seq (Green et al., 2018). h. Barplot of estimated cellular proportions for all bulk RNA-seq libraries reported in this study. i. Dotplot of the average estimated proportion of germ cells in all bulk RNA-seq libraries reported in this study.

      To further validate our deconvolution strategy, we interrogated the cellular composition of bulk RNA-seq libraries derived from cellular populations enriched in Sertoli cells, generated by our group using a similar enrichment/sorting strategy (Thumfart et al., 2022). As expected, our results show that all our libraries are mainly composed of Sertoli cells suggesting that the deconvolution strategy employed is accurate in detecting cell-type composition (Author response image 4).

      Author response image 4.

      Deconvolution analysis of Sertoli bulk RNA-seq samples. Barplots of estimated cellular proportions for bulk RNAseq libraries reported in Thumfart et al., 2022. Expression matrices were derived from the analysis of single-cell RNA-seq datasets used to asses cellular composition of the SPGs bulk libraries.

      Author response image 5.

      Id4 and Kit are transcribed in SSCs. Seurat clusters from PND6 single-cell RNA-seq (left) and feature maps of gene expression for Id4 (center) and Kit (right). Zoom in into SSCs (red).

      Finally, regarding the following observation by the reviewer: "On the other hand, they report detection in the sorted population of markers such as c-KIT which is a well-known marker of differentiating spermatogonia, and that is in the same population in which ID4, a well-known marker of spermatogonial stem cells, was detected." It was recently shown using single-cell RNA that “nearly all differentiating spermatogonia at P3 (delineated as c-KIT+) are ID4-eGFP” (Law et al., 2019).  While this finding does not exclude the fact that we have a mixture of SPGs cells, this finding supports the possibility that SPG cells express both markers of undifferentiated and differentiated cells, particularly in the early stages of postnatal development. Indeed, we observe that some cells labeled as SSC show signals for both Id4 and Kit in single-cell RNA-seq data from Hermann et al., 2018 (Author response image 5).

      Therefore, the results from the deconvolution analysis and our immunofluorescence data showing 85-95% PLZF+  cells in our cellular preparations underscore that our bulk RNA-seq libraries are mainly composed of SPGs. The deconvolution analysis also suggests a predominantly cellular composition of SSCs and to a lesser degree of differentiating SPGs. Our adult RNA-seq libraries show a small proportion of somatic cells (<0.10). 

      In the revised manuscript, we compiled the deconvolution analyses and present them in a condensed version in Supplementary Fig 2. 

      In general, the authors present observational data of the sort that is generated by RNA-seq and ATAC-seq analyses, and they speculate on the potential significance of several of these observations. However, they provide no definitive data to support any of their speculations. This further illustrates the fact that this study contributes little if any new information beyond that already available from the numerous previously published RNA-seq and ATAC-seq studies of spermatogenesis. In short, the study described in this manuscript does not advance the field.

      We acknowledge that RNA-seq and ATAC-seq datasets like ours are observational and that their interpretation can be speculative. Nevertheless, our datasets represent an additional useful resource for the community because they are comprehensive and high resolution, and can be exploited for instance, for studies in environmental epigenetics and epigenetic inheritance examining the immediate and long-term effects of postnatal exposure and their dynamics. The depth of our RNA sequencing allowed detect transcripts with a high dynamic range, which has been limited with classical RNA sequencing analyses of spermatogonial cells and with single-cell analyses (which have comparatively low coverage). Further, our experimental pipeline is affordable (more than single cell sequencing approaches) and in the case of adults, provides data per animal informing on the intrinsic variability in transcriptional and chromatin regulation across males. These points will be discussed in the revised manuscript.

      In general, the authors present observational data of the sort that is generated by RNA-seq and ATAC-seq analyses, and they speculate on the potential significance of several of these observations. However, they provide no definitive data to support any of their speculations. This further illustrates the fact that this study contributes little if any new information beyond that already available from the numerous previously published RNA-seq and ATAC-seq studies of spermatogenesis. In short, the study described in this manuscript does not advance the field.

      Relevant information for both points was included in the Discussion of the revised manuscript.  

      The phenomenon of epigenetic priming is discussed, but then it seems that there is some expression of surprise that the data demonstrate what this reviewer would argue are examples of that phenomenon. The authors discuss the "modest correspondence between transcription and chromatin accessibility in SCs." Chromatin accessibility is an example of an epigenetic parameter associated with the primed state. The primed state is not fully equivalent to the actively expressing state. It appears that certain histone modifications along with transcription factors are critical to the transition between the primed and actively expressing states (in either direction). The cell types that were investigated in this study are closely related spermatogenic, and predominantly spermatogonial cell types. It is very likely that the differentially expressed loci will be primed in both the early (PND 8 or 15) and adult stages, even though those genes are differentially expressed at those stages. Thus, it is not surprising that there is not a strict concordance between +/- chromatin accessibility and +/- active or elevated expression.

      Relevant information was included in the Discussion of the revised manuscript.

      Reviewer #2:

      The objective of this study from Lazar-Contes et al. is to examine chromatin accessibility changes in "spermatogonial cells" (SCs) across testis development. Exactly what SCs are, however, remains a mystery. The authors mention in the abstract that SCs are undifferentiated male germ cells and have self-renewal and differentiation activity, which would be true for Spermatogonial STEM Cells (SSCs), a very small subset of total spermatogonia, but then the methods they use to retrieve such cells using antibodies that enrich for undifferentiated spermatogonia encompass both undifferentiated and differentiating spermatogonia. Data in Fig. 1B prove that most (85-95%) are PLZF+, but PLZF is known to be expressed both by undifferentiated and differentiating (KIT+) spermatogonia (Niedenberger et al., 2015; PMID: 25737569). Thus, the bulk RNA-seq and ATAC-seq data arising from these cells constitute the aggregate results comprising the phenotype of a highly heterogeneous mixture of spermatogonia (plus contaminating somatic cells), NOT SSCs. Indeed, Fig. 1C demonstrates this by showing the detection of Kit mRNA (a well-known marker of differentiating spermatogonia - which the authors claim on line 89 is a marker of SCs!), along with the detection of markers of various somatic cell populations (albeit at lower levels).

      The reviewer is correct that our spermatogonial cell populations are mixed and include undifferentiated and differentiated cells, hence the name of spermatogonia (SCs), and probably also contains some somatic cells. We acknowledge that this is a limitation of our isolation approach. To circumvent this limitation, we will conduct in silico deconvolution analysis using publicly available single-cell RNA sequencing datasets to obtain information about markers corresponding to undifferentiated and differentiated spermatogonia cells, and somatic cells. These additional analyses will provide information about the cellular composition of the samples and clarify the representation of undifferentiated and differentiated spermatogonial cells and other cells.

      This admixture problem influences the results - the authors show ATAC-seq accessibility traces for several genes in Fig. 2E (exhibiting differences between P15 and Adult), including Ihh, which is not expressed by spermatogenic cells, and Col6a1, which is expressed by peritubular myoid cells. Thus, the methods in this paper are fundamentally flawed, which precludes drawing any firm conclusions from the data about changes in chromatin accessibility among spermatogonia (SCs?) across postnatal testis development.

      The reviewer raises concern about the lack of correspondence between chromatin accessibility and expression observed for some genes, arguing that this precludes drawing firm conclusions. However, a dissociation between chromatin accessibility and gene expression is normal and expected since chromatin accessibility is only a readout of protein deposition and occupancy e.g. by transcription factors, chromatin regulators, or nucleosomes, at specific genomic loci that does not give functional information of whether there is ongoing transcriptional activity or not. A gene that is repressed or poised for expression can still show a clear signal of chromatin accessibility at regulatory elements. The dissociation between chromatin accessibility and transcription has been reported in many different cells and conditions (PMID: 36069349, PMID: 33098772) including in spermatogonial cells (PMID: 28985528) and in gonads in different species (PMID: 36323261). Therefore, the dissociation between accessibility and transcription is not a reason to conclude that our data are flawed.

      In addition, there already are numerous scRNA-seq datasets from mouse spermatogenic cells at the same developmental stages in question.

      This is true but full transcriptomic profiling like ours on cell populations provides different transcriptional information that is deeper and more comprehensive. Our datasets identified >17,000 genes while scRNA-seq typically identifies a few thousand of genes. Our analyses also identified full-length transcripts, variants, isoforms, and low abundance transcripts. These datasets are therefore a valuable addition to existing scRNAseq.

      Moreover, several groups have used bulk ATAC-seq to profile enriched populations of spermatogonia, including from synchronized spermatogenesis which reflects a high degree of purity (see Maezawa et al., 2018 PMID: 29126117 and Schlief et al., 2023 PMID: 36983846 and in cultured spermatogonia - Suen et al., 2022 PMID: 36509798) - so this topic has already begun to be examined. None of these papers was cited, so it appears the authors were unaware of this work.

      We apologize for not mentioning these studies in our manuscript, we will do so in the revised version.

      The authors' methodological choice is even more surprising given the wealth of single-cell evidence in the literature since 2018 demonstrating the exceptional heterogeneity among spermatogonia at these developmental stages (the authors DID cite some of these papers, so they are aware). Indeed, it is currently possible to perform concurrent scATAC-seq and scRNA-seq (10x Genomics Multiome), which would have made these data quite useful and robust. As it stands, given the lack of novelty and critical methodological flaws, readers should be cautioned that there is little new information to be learned about spermatogenesis from this study, and in fact, the data in Figures 2-5 may lead readers astray because they do not reflect the biology of any one type of male germ cell. Indeed, not only do these data not add to our understanding of spermatogonial development, but they are damaging to the field if their source and identity are properly understood. Here are some specific examples of the problems with these data:

      Fig. 2D - Gata4 and Lhcgr are not expressed by germ cells in the testis.

      Fig. 3A - WT1 is expressed by Sertoli cells, so the change in accessibility of regions containing a WT1 motif suggests differential contamination with Sertoli cells. Since Wt1 mRNA was differentially high in P15 (Fig. 3B) - this seems to be the most likely explanation for the results. How was this excluded?

      Fig. 3D - Since Dmrt1 is expressed by Sertoli cells, the "downregulation" likely represents a reduction in Sertoli cell contamination in the adult, like the point above. Did the authors consider this?

      Regarding concerns about contamination by somatic cells (Transcription). In addition to the results of our deconvolution analysis (see response to Reviewer #1), we addressed the specific concern of the paradoxical expression of genes considered markers of somatic cells in the testis. For instance, we plotted the expression values of Ihh, Lhcgr, Gata4, Col16a, Wt1, and Dmrt1 along with the expression values of Ddx4 and Zbtb16. We observe that the expression level of Ddx4 and Zbtb16, genes expressed predominantly in SPGs, is orders of magnitude higher than the one observed for the rest of the genes with the notable exception of Dmrt1 which is also highly expressed (Fig.6). Indeed, our analysis of publicly available single-cell RNA-seq datasets shows that Dmrt1 is robustly expressed in germ cells (Author response image 7), and as also noted by the reviewer, in Sertoli cells in postnatal stages. Notably, we observe a significant stepwise decrease in the expression of Dmrt1 across the postnatal maturation of SPG cells. This is highly unlikely to be a result of major contamination by Sertoli cells of just our postnatal libraries. We based this statement on three observations. First, the deconvolution analysis of all our RNA-seq libraries using four different expression signature matrices from high-quality single-cell RNAseq from testis showed that our libraries are largely derived from SPGs. Second, the evaluation of our adult libraries with the PND6 signature matrix from Green et al., 2018 suggested that the proportion of Sertoli cells in our adult libraries, if any, would be higher than in our postnatal libraries (Author response image 3d, blue bars). This makes it unlikely that the observed decrease in expression of Dmrt1 in adult samples is due to prominent somatic contamination of the postnatal libraries. Third, the step-wise decrease in Dmrt1 expression seems to correlate with progression during postnatal development (Author response image 7) as feature maps of Dmrt1 expression derived from public single-cell RNA-seq experiments show a reduction in expression in adult SPGs in comparison with early postnatal stages (Author response image 7 last two panels). Then, the observed effects are likely the result of developmental gene regulatory processes that operate during the developmental maturation of SPGs. 

      Author response image 6.

      Expression of germ and somatic cell markers in our RNA-seq datasets. Boxplots of log2(CPM) (Top) and CPM (Bottom) values for selected genes from our RNAseq datasets. Each point in boxplots represent the expression value of a biological replicate.

      Author response image 7.

      Expression of germ and somatic cell markers in publicly available single-cell RNA-seq datasets. Seurat clusters from all analyzed single-cell RNA-seq datasets (first column from left) and feature maps of gene expression for Zbtb16, Dmrt1 and Wt1.

      Consistent with the reviewer’s observation, Ihh is not expressed in germ cells and indeed we do not detect signal at this locus nor Lhcgr. Furthermore, while we indeed observe a significant increase in the expression of Wt1 in PND15 samples, its expression level is considerably lower than that of SPG markers. This is even more evident when plotting expression data in a linear scale rather than as a log2 transformation of the expression values. Whether such transcriptional profiles reflect developmentally regulated transcription, stochastic effects on gene expression, or potential somatic contamination is difficult to determine. However, based on our deconvolution data we believe it is unlikely that major contamination could account for our observations. 

      Notably, while Wt1 is robustly expressed in nearly all Sertoli cells across postnatal development (Author response image 7), it is also detected in other cell types including SPGs -although in fewer cells and with lower expression levels-, consistent with our observations (Author response image 6 and 8). Therefore, the assignment of a gene as a marker of a particular cell type does not imply that such a gene is expressed uniquely in such cell, rather it is expressed in more cells and likely at higher levels. 

      Author response image 8.

      Expression of Wt1 in publicly available single-cell RNA-seq datasets. Feature maps of gene expression for Wt1. In dashed boxes, a zoom-in into germ cells cluster that show expression of Wt1 at some of these cells.

      Regarding concerns about contamination by somatic cells (chromatin accessibility). In Figure 2 of our manuscript, we show the chromatin accessibility landscape of different genes, including genes either not expressed in testicular cells (Ihh) and those believed to be expressed exclusively in somatic cells (Lhcgr, Gata4, Col16a1, Wt1). For some of these genes, we reported changes in chromatin accessibility at specific sites between PND15 and adults (e.g. Wt1 and Col16a1). The observation of "traces of chromatin accessibility" at these loci and the reported changes in accessibility raised concerns of potential contamination which "fundamentally flaw" our results, as stated by the reviewer. While we acknowledge that all enrichment methods have a margin of potential contamination, we fundamentally disagree with the reviewer's observations. 

      The term chromatin accessibility can be misleading. In principle, the term accessibility might suggest the literal lack of protein deposition at a given place in the genome. Rather, chromatin accessibility as evaluated by ATAC- seq (as in this case) must be interpreted as a measure of protein occupancy genome-wide (PMID: 30675018). Depending on the type of fragments analyzed we can obtain information regarding the occupancy of transcription factors (TFs), nucleosomes, and other chromatin-associated proteins that are present at genomic locations at a given time within a population of cells. The detection of chromatin accessibility at a given locus does not necessarily indicate transcription of the gene in a given cell type. A gene can be repressed or poised for expression and still show a clear signal of chromatin accessibility at its regulatory elements or along the gene body. For instance, in agreement with the reviewer's observation, neither Ihh nor Lhcgr is expressed in our datasets (Author response image 6 and Author response image 9), however, they show a distinctive pattern of chromatin accessibility in our datasets and publicly available ATAC-seq data derived from undifferentiated (Id4bright) and differentiating SPGs (Id4-dim) (Cheng et al., 2020) (Author response image 9). A similar argument can be applied regarding other loci such as Wt1 and Col6a1 for which we also observe extremely low levels of transcription. Therefore, the lack of transcription does not exclude that these loci display clear patterns of chromatin accessibility (Author response image 9). Notably, while traces of  chromatin accessibility can also be observed in ATAC-seq datasets from embryonic Sertoli cells (Garcia-Moreno et al., 2019) and other somatic stem cells (hematopoietic stem cells; HSCs) (Xiang et al., 2020) (Author response image 9), the pattern of chromatin accessibility markedly differs with that observed in SPG cells. Therefore, the observed changes in chromatin accessibility are unlikely to result from contaminating somatic cells.

      To strengthen our observation, we identified regions of chromatin accessibility in SPGs, Sertoli, and HSCs using both our datasets and publicly available ATAC-seq datasets. Overlap analysis revealed at least four groups of ATAC-seq peaks: 1) peaks shared among all analyzed cell types, 2)peaks shared just among SPG cells, 3) peaks specific to Sertoli cells and 4) peaks specific to HSCs (Author response image 10). Peaks shared among all tested cell-types are predominantly located at promoters of genes involved in translation and DNA replication (GO analysis adj p-value<0.05). In contrast, cell-type specific peaks are localized at intergenic and intragenic regions, suggesting localization at enhancer elements (Author response image 10). Indeed, GO analysis of cell-type specific peaks revealed enrichment for genes involved in male meiosis for SPGs, vesicle-mediated transport for Sertoli cells and in immune system process for HSCs, consistent with cell-type specific functions. If contamination by somatic cells, such as Sertoli cells, would be prominent as stated by the reviewer, we would expect to observe prominent ATAC-seq signal from our datasets at peaks specific to Sertoli cells. Notably, we don't observe ATAC-seq signal at peaks specific for Sertoli cells using our ATAC-seq samples. However, we observe robust signals at shared peaks and peaks specific to SPG cells. This observation, strongly argues against the possibility of major contamination by somatic cells. 

      Author response image 9.

      Chromatin accessibility profiles at specific loci differ between SPG cells and other cell types. Genome-browser tracks for Ihh, Wt1, Col16a1 and Zbtb16. For each gene, an extended locus view is presented with RNA-seq data (this study) and normalized ATAC-seq tracks from our study and public sources (SPG Id4; GSE131657; Sertoli; GSM3346484; HSC; ENCFF204JEE). Public ATAC-seq datasets were generated enrichment methods similar to the one employed in our study.

      Author response image 10.

      Shared and cell-type specific ATAC-seq peaks among SPGs, Sertoli and HSC. Up, Normalized ATACseq signal heatmaps of shared and unique ATAC-seq peaks. PND15 and Adult samples are derived from our study. ATAC-seq signal is plotted +/- 500bp from peak center. Bottom, pie charts of ATAC-seq peaks genomic distribution.

      Reviewer #3:

      In this study, Lazar-Contes and colleagues aimed to determine whether chromatin accessibility changes in the spermatogonial population during different phases of postnatal mammalian testis development. Because actions of the spermatogonial population set the foundation for continual and robust spermatogenesis and the gene networks regulating their biology are undefined, the goal of the study has merit. To advance knowledge, the authors used mice as a model and isolated spermatogonia from three different postnatal developmental age points using a cell sorting methodology that was based on cell surface markers reported in previous studies and then performed bulk RNA-sequencing and ATAC-sequencing. Overall, the technical aspects of the sequencing analyses and computational/bioinformatics seem sound but there are several concerns with the cell population isolated from testes and lack of acknowledgment for previous studies that have also performed ATACsequencing on spermatogonia of mouse and human testes. The limitations, described below, call into question the validity of the interpretations and reduce the potential merit of the findings. I suggest changing the acronym for spermatogonial cells from SC to SPG for two reasons. First, SPG is the commonly used acronym in the field of mammalian spermatogenesis. Second, SC is commonly used for Sertoli Cells.

      We thank the reviewer for the suggestion and will rename SCs into SPG cells in the revised manuscript.

      The authors should provide a rationale for why they used postnatal day 8 and 15 mice.

      We will provide a rationale for the use of postnatal 8 and 15 stages in the revised manuscript. Briefly, these stages are interesting to study because early to mid postnatal life is a critical window of development for germ cells during which environmental exposure can have strong and persistent effects. The possibility that changes in germ cells can happen during this period and persist until adulthood is an important area of research linked to disciplines like epigenetic toxicology and epigenetic inheritance.

      The FACS sorting approach used was based on cell surface proteins that are not germline-specific so there were undoubtedly somatic cells in the samples used for both RNA and ATAC sequencing. Thus, it is essential to demonstrate the level of both germ cell and undifferentiated spermatogonial enrichment in the isolated and profiled cell populations. To achieve this, the authors used PLZF as a biomarker of undifferentiated spermatogonia. Although PLZF is indeed expressed by undifferentiated spermatogonia, there have been several studies demonstrating that expression extends into differentiating spermatogonia. In addition, PLZF is not germ-cell specific and single-cell RNA-seq analyses of testicular tissue have revealed that there are somatic cell populations that express Plzf, at least at the mRNA level. For these reasons, I suggest that the authors assess the isolated cell populations using a germ-cell specific biomarker such as DDX4 in combination with PLZF to get a more accurate assessment of the undifferentiated spermatogonial composition. This assessment is essential for the interpretation of the RNA-seq and ATAC-seq data that was generated.

      In agreement with the reviewer’s observation, Zbtb16 (PLZF) is expressed in germ cells but also in somatic cells, in particular in the dataset derived from Green et al., 2018 (Author response image 11). However, when evaluating the expression patterns of Ddx4, we noticed that similar to Zbtb16, it is expressed both in the germ line and in the somatic compartment (Author response image 11). Notably, we observe expression of Ddx4 in SSC but also in progenitors and differentiating SPGs (Author response image 11g). These observations suggest that at least at the transcript level, both genes are transcribed in germ cells and to a lesser degree in somatic cells. 

      Author response image 11.

      Single-cell expression of Ddx4 and Zbtb16. Seurat clusters from all analyzed single-cell RNA-seq datasets (a,c,e,g,i) and feature maps of gene expression for Ddx4 and Zbtb16 (b,d,f,j, h).

      Finally, our deconvolution analysis using geneexpression signature matrices for different cellular populations suggest that our RNA-seq and ATAC-seq libraries are largely derived from SPG cells and in particular of SSCs.

      Furthermore, while this analysis suggested the presence of somatic cells, their proportion is minimal in comparison with germ cells (Author response images 1-4). This is also supported by ATAC-seq analysis of somatic cells from testis (Author response images 9 and 10). 

      A previous study by the Namekawa lab (PMID: 29126117) performed ATAC-seq on a similar cell population (THY1+ FACS sorted) that was isolated from pre-pubertal mouse testes. It was surprising to not see this study referenced in the current manuscript. In addition, it seems prudent to cross-reference the two ATAC-seq datasets for commonalities and differences. In addition, there are several published studies on scATACseq of human spermatogonia that might be of interest to cross-reference with the ATAC-seq data presented in the current study to provide an understanding of translational merit for the findings.

      We compared our ATAC-seq datasets with the ones from (Maezawa et al., 2017) and those from (Cheng et al., 2020). All these datasets were generated from FACSs sorted cells enriched for undifferentiating and differentiating SPGs. Sequencing files from Cheng et al, 2020 were equally processed as described in out methods section, while our pipeline was adjusted to process files from Maezawa et al., 2018 as they were single-end sequencing files. We generated a reference set of peaks from SPGs and calculated signal scores for all peaks across all samples. Then, calculated the Pearson correlation for all pairwise comparisons and generated a heatmap of correlations (Author response image 12). Two clusters emerge that separate the SPG samples from the pachytene spermatocytes and round spermatids reported by Maezawa et al., 2018. As expected SPG samples clustered together based on study of origin. Consistently, our postnatal samples formed one cluster next to but separated from the adult one. Similarly, the id4-bright samples clustered together and next to the id4-sim and the sample applied for the Thy1 and cKit samples. Notably, our samples and the ones from Cheng et al., 2020 have a higher correlation with each other when compared with the ones from Maezawa et al., 2018. Given the fundamental difference in library sequencing (single-end instead of the widely used paired-end for ATAC-seq experiments) we reasoned a comparison with the Maezawa et al., 2018 datasets is not optimal. Therefore, this data in addition to the one presented before (see response to Reviewer 1 and 2) strongly supports a predominantly SPG derivation of all our sequencing libraries. 

      Author response image 12.

      Pearson correlation at the peak level among different ATAC-seq datasets. a) Our ATAC-seq libraries and ATAC-seq libraries from b) Cheng et al., 2020 and c) Maezawa et al., 2020. Thy1-1 and cKit libraries correspond to undifferentiated and differentiating SPGs, respectively. PS, pachytene spermatocytes and RS, round spermatids. Correlation analysis was done using Deeptools.

      References

      Cheng K, Chen I-C, Cheng C-HE, Mutoji K, Hale BJ, Hermann BP, Geyer CB, Oatley JM, McCarrey JR. 2020. Unique Epigenetic Programming Distinguishes Regenerative Spermatogonial Stem Cells in the Developing Mouse Testis. iScience 23:101596. doi:10.1016/j.isci.2020.101596

      Cobos FA, Panah MJN, Epps J, Long X, Man T-K, Chiu H-S, Chomsky E, Kiner E, Krueger MJ, Bernardo D di, Voloch L, Molenaar J, Hooff SR van, Westermann F, Jansky S, Redell ML, Mestdagh P, Sumazin P. 2023. Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes. Genome Biol 24:177. doi:10.1186/s13059-023-03016-6

      Drumond AL, Meistrich ML, Chiarini-Garcia H. 2011. Spermatogonial morphology and kinetics during testis development in mice: a high-resolution light microscopy approach. Reproduction 142:145–155. doi:10.1530/rep-10-0431

      Ernst C, Eling N, Martinez-Jimenez CP, Marioni JC, Odom DT. 2019. Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis. Nat Commun 10:1251. doi:10.1038/s41467-019-09182-1

      Garcia-Moreno SA, Futtner CR, Salamone IM, Gonen N, Lovell-Badge R, Maatouk DM. 2019. Gonadal supporting cells acquire sex-specific chromatin landscapes during mammalian sex determination. Dev Biol 446:168–179. doi:10.1016/j.ydbio.2018.12.023

      Green CD, Ma Q, Manske GL, Shami AN, Zheng X, Marini S, Moritz L, Sultan C, Gurczynski SJ, Moore BB, Tallquist MD, Li JZ, Hammoud SS. 2018. A Comprehensive Roadmap of Murine Spermatogenesis Defined by Single-Cell RNA-Seq. Dev Cell 46:651-667.e10. doi:10.1016/j.devcel.2018.07.025

      Hermann BP, Cheng K, Singh A, Cruz LR-DL, Mutoji KN, Chen I-C, Gildersleeve H, Lehle JD, Mayo M, Westernströer B, Law NC, Oatley MJ, Velte EK, Niedenberger BA, Fritze D, Silber S, Geyer CB, Oatley JM, McCarrey JR. 2018. The Mammalian Spermatogenesis Single-Cell Transcriptome, from Spermatogonial Stem Cells to Spermatids. Cell Rep 25:1650-1667.e8. doi:10.1016/j.celrep.2018.10.026

      Kubota H, Brinster RL. 2018. Spermatogonial stem cells†. Biol Reprod 99:52–74. doi:10.1093/biolre/ioy077

      Law NC, Oatley MJ, Oatley JM. 2019. Developmental kinetics and transcriptome dynamics of stem cell specification in the spermatogenic lineage. Nat Commun 10:2787. doi:10.1038/s41467-019-10596-0

      Maezawa S, Yukawa M, Alavattam KG, Barski A, Namekawa SH. 2017. Dynamic reorganization of open chromatin underlies diverse transcriptomes during spermatogenesis. Nucleic Acids Res 46:gkx1052-. doi:10.1093/nar/gkx1052

      McCarrey JR. 2013. Toward a More Precise and Informative Nomenclature Describing Fetal and Neonatal Male Germ Cells in Rodents1. Biol Reprod 89:Article 47, 1-9. doi:10.1095/biolreprod.113.110502

      Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, Khodadoust MS, Esfahani MS, Luca BA, Steiner D, Diehn M, Alizadeh AA. 2019. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol 37:773–782. doi:10.1038/s41587-019-0114-2

      Rabbani M, Zheng X, Manske GL, Vargo A, Shami AN, Li JZ, Hammoud SS. 2022. Decoding the Spermatogenesis Program: New Insights from Transcriptomic Analyses. Annu Rev Genet 56:339–368.

      doi:10.1146/annurev-genet-080320-040045

      Rooij DG de. 2017. The nature and dynamics of spermatogonial stem cells. Development 144:3022–3030. doi:10.1242/dev.146571

      Tan K, Song H-W, Wilkinson MF. 2020. Single-cell RNAseq analysis of testicular germ and somatic cell development during the perinatal period. Development 147:dev183251. doi:10.1242/dev.183251

      Thumfart KM, Lazzeri S, Manuella F, Mansuy IM. 2022. Long-term effects of early postnatal stress on Sertoli cells. Front Genet 13:1024805. doi:10.3389/fgene.2022.1024805

      Xiang G, Keller CA, Heuston EF, Giardine BM, An L, Wixom AQ, Miller A, Cockburn A, Sauria MEG, Weaver K, Lichtenberg J, Göttgens B, Li Q, Bodine D, Mahony S, Taylor J, Blobel GA, Weiss MJ, Cheng Y, Yue F, Hughes J, Higgs DR, Zhang Y, Hardison RC. 2020. An integrative view of the regulatory and transcriptional landscapes in mouse hematopoiesis. Genome Res 30:gr.255760.119. doi:10.1101/gr.255760.119

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Transcriptional readthrough, intron retention, and transposon expression have been previously shown to be elevated in mammalian aging and senescence by multiple studies. The current manuscript claims that the increased intron retention and readthrough could completely explain the findings of elevated transposon expression seen in these conditions. To that end, they analyze multiple RNA-seq expression datasets of human aging, human senescence, and mouse aging, and establish a series of correlations between the overall expression of these three entities in all datasets.

      While the findings are useful, the strength of the evidence is incomplete, as the individual analyses unfortunately do not support the claims. Specifically, to establish this claim there is a burden of proof on the authors to analyze both intron-by-intron and gene-by-gene, using internal matched regions, and, in addition, thoroughly quantify the extent of transcription of completely intergenic transposons and show that they do not contribute to the increase in aging/senescence. Furthermore, the authors chose to analyze the datasets as unstranded, even though strand information is crucial to their claim, as both introns and readthrough are stranded, and if there is causality, than opposite strand transposons should show no preferential increase in aging/senescence. Finally, there are some unclear figures that do not seem to show what the authors claim. Overall, the study is not convincing.

      Major concerns: 1) Why were all datasets treated as unstanded? Strand information seems critical, and should not be discarded. Specifically, stranded information is crucial to increase the confidence in the causality claimed by the authors, since readthrough and intron retention are both strand specific, and therefore should influence only the same strand transposons and not the opposite-strand ones.

      This is an excellent suggestion. Since only one of our datasets was stranded, we did not run stranded analyses for the sake of consistency. We would like to provide two analyses here that consider strandedness:

      First, we find that within the set of all expressed transposons (passing minimal read filtering), 86% of intronic transposons match the strand of the intron (3147 out of 3613). In contrast, the number is 51% after permutation of the strands. Similarly, when we randomly select 1000 intronic transposons 45% match the strandedness of the intron (here we select from the set of all transposons). This is consistent with the idea that most transposons are only detectable because they are co-expressed on the sense strand of other features that are highly expressed.

      As for the readthrough data, 287 out of 360 transposons (79%) within readthrough regions matched the strand of the gene and its readthrough.

      Second, in the model we postulate, the majority of transposon transcription occurs as a co-transcriptional artifact. This applies equally to genic transposons (gene expression), intronic (intron retention) and gene proximal (readthrough or readin) transposons. Therefore, we performed the following analysis for the set of all transposons in the Fleischer et al. fibroblast dataset.

      When we invert the strand annotation for transposons, before counting and differential expression, we would expect the counts and log fold changes to be lower compared to using the “correct” annotation file.

      Indeed, we show that out of 6623 significantly changed transposons with age only 226 show any expression in the “inverted run” (-96%). (Any expression is defined as passing basic read filtering.)

      Out of the 226 transposons that can be detected in both runs most show lower counts (A) and age-related differential expression converging towards zero (B) in the inverted run (Fig. L1).

      Author response image 1.

      Transposons with inverted strandedness (“reverse”) show lower expression levels (log counts; A) and no differential expression with age (B) when compared to matched differentially expressed transposons (“actual”). For this analysis we selected all transposons showing significant differential expression with age in the actual dataset that also showed at least minimal expression in the strand-inverted analysis (n=226). Data from Fleischer et al. (2018). (A) The log (counts) are clipped because we only used transposons that passed minimal read filtering in this analysis. (B) The distribution of expression values in the actual dataset is bimodal and positive since some transposons are significantly up- or downregulated. This bimodal distribution is lost in the strand-inverted analysis.

      2) "Altogether this data suggests that intron retention contributes to the age-related increase in the expression of transposons" - this analysis doesn't demonstrate the claim. In order to prove this they need to show that transposons that are independent of introns are either negligible, or non-changing with age.

      We would like to emphasize that we never claimed that intron retention and readthrough can explain all of the age-related increases in transposon expression. In fact, our data is compatible with a multifactorial origin of transposons expression. Age- and senescence-related transposon expression can occur due to: 1/ intron retention, 2/ readthrough, 3/ loss of intergenic heterochromatin. Specifically, we do not try to refute 3.

      However, since most transposons are found in introns or downstream of genes, this suggests that intron retention and readthrough will be major, albeit non-exclusive, drivers of age-related changes in transposons expression. Even if the fold-change for intergenic transposons with aging or senescence were higher this would not account for the broadscale expression patterns seen in RNAseq data.

      To further illustrate this, we analyzed transposons located in introns, genes, downstream (ds) or upstream (us) of genes (distance to gene < 25 kb) or in intergenic regions (distance to gene > 25 kb). Indeed, we find that although intergenic transposons show similar log-fold changes to other transposon classes (Fig. L2A), their total contribution to read counts is negligible (Fig. L2B, Fig. Fig. S15). We have also now added a more nuanced explanation of this issue to the discussion.

      Author response image 2.

      We analyzed transposons located in introns, genes, downstream (ds) or upstream (us) of genes (distance to gene < 25 kb) or in intergenic regions (distance to gene > 25 kb). Independent of their location, transposons show similar differential expression with aging or cellular senescence (A). In contrast, the expression of transposons (log counts) is highly dependent on their location and the median log(count) value decreases in the order: genic > intronic > ds > us > intergenic.

      Author response image 3.

      Total counts are the sum of all counts from transposons located in introns, genes, downstream (ds) or upstream (us) of genes (distance to gene < 25 kb) or in intergenic regions (distance to gene > 25 kb). Counts were defined as cumulative counts across all samples.

      3) Additionally, the correct control regions should be intronic regions other than the transposon, which overall contributed to the read counts of the intron.

      4) Furthermore, analysis of read spanning intron and partly transposons should more directly show this contribution.

      Thank you for this comment. To rephrase this, if we understand correctly, the concern is that an increase in transposon expression could bias the analysis of intron retention since transposons often make up a substantial portion of an intron. We would like to address this concern with the following three points:

      First, if the concern is the correlation between log fold-change of transposons vs log fold-change of their containing introns, we do not think that this kind of data is biased. While transposons make up much of the intron, a single transposon on average only accounts for less than 10% of an intron.

      Second, to address this more directly, we show here that even introns that do not contain expressed transposons are increased in aging fibroblasts and after induction of cellular senescence (Fig. S8). This shows that intron retention is universal and most likely not heavily biased by the presence or absence of expressed transposons.

      Author response image 4.

      We split the set of introns that significantly change with cellular aging (A) or cell senescence (B) into introns that contain at least one transposon (has_t) and those that do not contain any transposons (has_no_t). Intron retention is increased in both groups. In this analysis we included all transposons that passed minimal read filtering (n=63782 in A and n=124173 in B). Median log-fold change indicated with a dashed red line for the group of introns without transposons.

      Third, we provide an argument based on the distribution of transposons within introns (Fig. L3).

      Author response image 5.

      The 5’ and 3’ splice sites show the highest sequence conservation between introns, whereas the majority of the intronic sequence does not. This is because these sites contain binding sites for splicing factors such as U1, U2 and SF1 (A). Transposons could affect splicing and we present a biologically plausible mechanism and two ancillary hypotheses here (B). If transposons affect the splicing (retention) of introns the most likely mechanism would be via impairment of splice site recognition because a transposon close to the site forms a secondary structure, binds an effector protein or provides inadequate sequences for pairing. Hypothesis 1: Transposons impair splicing because they are close to the splice site. Hypothesis 2: Transposons do not impair splicing because they are located away from the splice junction. Retained introns should show a similar depletion of transposons around the junction. Image adapted from: Ren, Pingping, et al. "Alternative splicing: a new cause and potential therapeutic target in autoimmune disease." Frontiers in Immunology 12 (2021): 713540.

      Consistent with hypothesis 2 (“transposons do not impair splicing”), we show that the distribution of transposons within introns is similar for the set of all transposons and all significant transposons within significantly overexpressed introns (Fig. S7. A and B is similar in the case of aged fibroblasts; D and E is similar in the case of cellular senescence). If transposon expression was causally linked to changes in intron retention, the most likely mechanism would be via an impairment of splicing. We would expect transposons to be located close to the splice junction, which is not what we observed. Instead, the data is more consistent with intron retention as a driver of transposon expression.

      Author response image 6.

      Transposons are evenly distributed within introns except for the region close to splice junctions (A-E). Transposons appear to be excluded from the splice junction-adjacent region both in all introns (A, D) and in significantly retained introns (B, E). In addition, transposon density of all introns and significantly retained introns is comparable (C, F). We included only introns containing at least one transposon in this analysis. A) Distribution of 2292769 transposons within 163498 introns among all annotated transposons. B) Distribution of 195190 transposons within 14100 introns significantly retained with age. C) Density (transposon/1kb of intron) of transposons in all introns (n=163498) compared to significantly retained introns (n=14100). D) as in (A) E) Distribution of 428130 transposons within 13205 introns significantly retained with induced senescence. F) Density (transposon/1kb of intron) of transposons in all introns (n=163498) compared to significantly retained introns (n=13205).

      5) "This contrasts with the almost completely even distribution of randomly permuted transposons." How was random permutation of transposons performed? Why is this contract not trivial, and why is this a good control?

      Permutation was performed using the bedtools shuffle function (Quinlan et al. 2010). We use the set of all annotated transposons and all reshuffled transposons as a control. It is interesting to observe that these two show a very similar distribution with transposons evenly spread out relative to genes. In contrast, expressed transposons are found to cluster downstream of genes. This gave rise to our initial working hypothesis that readthrough should affect transposon expression.

      6) Fig 4: the choice to analyze only the 10kb-20kb region downstream to TSE for readthrough regions has probably reduced the number of regions substantially (there are only 200 left) and to what extent this faithfully represent the overall trend is unclear at this point.

      This is addressed in Suppl. Fig. 7, we repeated the analysis for every 10kb region between 0 and 100kb, showing similar results.

      Furthermore, we show below in a new figure that the results are comparable when we measure readthrough in the 0 to 10kb region, while the sample size of readthrough regions is increased.

      Finally, it is commonly accepted to remove readthrough regions overlapping genes, which while reducing sample size, increases accuracy for readthrough determination (Rosa-Mercado et al. 2021). Without filtering readthrough regions can overlap neighboring genes which is reflected in an elevated ratio of Readthrough_counts/Genic_counts (Fig. S9).

      Author response image 7.

      A) Readthrough was determined in a region 0 to 10 kb downstream of genes for a subset of genes that were at least 10 kb away from the nearest neighboring gene (n=684 regions). The log2 ratio of readthrough to gene expression is plotted across five age groups (adolescent n=32, young n=31, middle-aged n=22, old n=37 and very old n=21). B) As in (A) but data is plotted on a per sample basis. C) Readthrough was determined in a region 0 to 10 kb downstream of genes for a subset of genes that were at least 10 kb away from the nearest neighboring gene (n=1045 regions). The log2 ratio of readthrough to gene expression is plotted for the groups comprising senescence (n=12) and the non-senescent group (n=6). D) As in (D) but data is plotted on a per sample basis and for additional control datasets (serum-starved, immortalized, intermediate passage and early passage). N=3 per group.

      7) Fig. 5B shows the opposite of the authors claims: in the control samples there are more transposon reads than in the KCl samples.

      Thank you for pointing this out. During preparation of the manuscript the labels of Fig. 5B were switched (however, the color matching between Fig. 5A-C is correct). We apologize for this mistake, which we have now corrected.

      8) "induced readthrough led to preferential expression of gene proximal transposons (i.e. those within 25 kb of genes), when compared with senescence or aging". A convincing analysis would show if there is indeed preferential proximity of induced transposons to TSEs. Since readthrough transcription decays as a function of distance from TSEs, the expression of transposons should show the same trends if indeed simply caused by readthrough. Also, these should be compared to the extent of transposon expression (not induction) in intergenic regions without any readthrough, in these conditions.

      This is a very good suggestion. We now provide two new supplementary figures analyzing the distance-dependence of transposon expression.

      In the first figure (Fig. S13) we show that readthrough decreases with distance (A, B) and we show that transposon counts are higher for transposons close to genes, following a similar pattern to readthrough. This is true in fibroblasts isolated from aged donors (A) and with cellular senescence (B).

      Author response image 8.

      Readthrough counts (rt_counts) decrease exponentially downstream of genes, both in the aging dataset (A) and in the cellular senescence dataset (B). Although noisier, the pattern for transposon counts (transp_cum_counts) is similar with higher counts closer to gene terminals, both in the aging dataset (C) and in the cellular senescence dataset (D). Readthrough counts are the cumulative counts across all genes and samples. Readthrough was determined in 10 kb bins and the values are assigned to the midpoint of the bin for easier plotting. Transposon counts are the cumulative counts across all samples for each transposon that did not overlap a neighboring gene. n=801 in (C) and n=3479 in (D).

      In the second figure (Fig. S14) we show that transposons found downstream of genes with high readthrough show a more pronounced log-fold change (differential expression) than transposons downstream of genes with low readthrough (defined based on log-fold change). This is true in fibroblasts isolated from aged donors (A) and with cellular senescence (B). Furthermore, the difference between high and low readthrough region transposons is diminished for transposons that are more than 10 kb downstream of genes, as would be expected given that readthrough decreases with distance.

      Author response image 9.

      Transposons found downstream of genes with high readthrough (hi_RT) show a more pronounced log-fold change (transp_logfc) than transposons downstream of genes with low readthrough (low_RT). This is true in fibroblasts isolated from aged donors (A) and with cellular senescence (B). Furthermore, the difference between high and low readthrough region transposons is diminished for transposons that are more than 10 kb downstream of genes (“Transp > 10 kb”). Transposons in high readthrough regions were defined as those in the top 20% of readthrough log-fold change. Readthrough was measured between 0 and 10 kb downstream from genes. n=2124 transposons in (A) and n=6061 transposons in (B) included in the analysis.

      Reviewer #2 (Public Review):

      In this manuscript, the authors examined the role of transcription readout and intron retention in increasing transcription of transposable elements during aging in mammals. It is assumed that most transposable elements have lost the regulatory elements necessary for transcription activation. Using available RNA-seq datasets, the authors showed that an increase in intron retention and readthrough transcription during aging contributes to an increase in the number of transcripts containing transposable elements.

      Previously, it was assumed that the activation of transposable elements during aging is a consequence of a gradual imbalance of transcriptional repression and a decrease in the functionality of heterochromatin (de repression of transcription in heterochromatin). Therefore, this is an interesting study with important novel conclusion. However, there are many questions about bioinformatics analysis and the results obtained.

      Major comments:

      1) In Introduction the authors indicated that only small fraction of LINE-1 and SINE elements are expressed from functional promoters and most of LINE-1 are co-expressed with neighboring transcriptional units. What about other classes of mobile elements (LTR mobile element and transposons)?

      We thank the reviewer for this comment. Historically, most repetitive elements, e.g. DNA elements and retrotransposon-like elements, have been considered inactive, having accrued mutations which prevent them from transposition. On the other hand, based on recent data it is indeed very possible that certain LTR elements become active with aging as suggested in several manuscripts (Liu et al. 2023, Autio et al. 2020). However, these elements are not well annotated and our final analysis (Fig. 6) relies on a well-defined distinction between active and inactive elements. (See also question 2 for further discussion.)

      Finally, we would like to point out some of the difficulties with defining expression and re-activation of LTR/ERV elements based on RNAseq data that have been highlighted for the Liu manuscript and are concordant with several of our results: https://pubpeer.com/publications/364E785636ADF94732A977604E0256

      Liu, Xiaoqian, et al. "Resurrection of endogenous retroviruses during aging reinforces senescence." Cell 186.2 (2023): 287-304.

      Autio A, Nevalainen T, Mishra BH, Jylhä M, Flinck H, Hurme M. Effect of ageing on the transcriptomic changes associated with expression at the HERV-K (HML-2) provirus at 1q22. Immun Ageing. 2020;17(1):11.

      2) Results: Why authors considered all classes of mobile elements together? It is likely that most of the LTR containing mobile elements and transposons contain active promoters that are repressed in heterochromatin or by KRAB-C2H2 proteins.

      We do not consider LTR containing elements because there is uncertainty regarding their overall expression levels and their expression with aging (Nevalainen et al. 2018). Furthermore, we believe that substantial activity of LTR elements in human genomes should have been detectable through patterns of insertional mutagenesis. Yet studies generally show low to negligible levels of LTR (ERV) mutagenesis. Here, for example, at a 200-fold lower rate than for LINEs (Lee et al. 2012).

      Importantly, our analysis in Fig. 6 relies on well-annotated elements like LINEs, which is why we do not include LTR or SINE elements that could be potentially expressed. However, for other analyses we did consider element families independently as can be seen in Table S1, for example.

      Nevalainen, Tapio, et al. "Aging-associated patterns in the expression of human endogenous retroviruses." PLoS One 13.12 (2018): e0207407.

      Lee, Eunjung, et al. "Landscape of somatic retrotransposition in human cancers." Science 337.6097 (2012): 967-971.

      3) Fig. 2. A schematic model of transposon expression is not presented clearly. What is the purpose of showing three identical spliced transcripts?

      This is indeed confusing. There are three spliced transcripts to schematically indicate that the majority of transcripts will be correctly spliced and that intron retention is rare (estimated at 4% of all reads in our dataset). We have clarified the figure now, please see below:

      Author response image 10.

      A schematic model of transposon expression. In our model, represented in this schematic, transcription (A) can give rise to mRNAs and pre-mRNAs that contain retained introns when co-transcriptional splicing is impaired. This is often seen during aging and senescence, and these can contain transposon sequences (B). In addition, transcription can give rise to mRNAs and pre-mRNAs that contain transposon sequences towards the 3’-end of the mRNA when co-transcriptional termination at the polyadenylation signal (PAS) is impaired (C, D) as seen with aging and senescence. Some of these RNAs may be successfully polyadenylated (as depicted here) whereas others will be subject to nonsense mediated decay. Image created with Biorender.

      4) The study analyzed the levels of RNA from cell cultures of human fibroblasts of different ages. The annotation to the dataset indicated that the cells were cultured and maintained. (The cells were cultured in high-glucose (4.5mg/ml) DMEM (Gibco) supplemented with 15% (vol/vol) fetal bovine serum (Gibco), 1X glutamax (Gibco), 1X non-essential amino acids (Gibco) and 1% (vol/vol) penicillin-streptomycin (Gibco). How correct that gene expression levels in cell cultures are the same as in body cells? In cell cultures, transcription is optimized for efficient division and is very different from that of cells in the body. In order to correlate a result on cells with an organism, there must be rigorous evidence that the transcriptomes match.

      We agree and have updated the discussion to reflect this shortcoming. While we do not have human tissue data, we would like to draw the reviewer’s attention to Fig. S3 where we presented some liver data for mice. We now provide an additional supplementary figure (in a style similar to Fig. S2) showing how readthrough, transposon expression and intron retention changes in 26 vs 5-month-old mice (Fig. S4). Indeed, intron, readthrough and transposons increase with age in mice, although this is more pronounced for transposons and readthrough.

      Author response image 11.

      Intron, readthrough and transposon elements are elevated in the liver of aging mice (26 vs 5-month-old, n=6 per group). Readthrough and transposon expression is especially elevated even when compered to genic transcripts. The percentage of upregulated transcripts is indicated above each violin plot and the median log10-fold change for genic transcripts is indicated with a dashed red line.

      Finally, just to elaborate, we used the aging fibroblast dataset by Fleischer et al. for three reasons:

      1) Yes, aging fibroblasts could be a model of human aging, with important caveats as you correctly point out,

      2) it is one of the largest such datasets allowing us to draw conclusions with higher statistical confidence and do things such as partial correlations

      3) it has been analyzed using similar techniques before (LaRocca, Cavalier and Wahl 2020) and this dataset is often used to make strong statements about transposons and aging such as transposon expression in this dataset being “consistent with growing evidence that [repetitive element] transcripts contribute directly to aging and disease”. Our goal was to put these statements into perspective and to provide a more nuanced interpretation.

      LaRocca, Thomas J., Alyssa N. Cavalier, and Devin Wahl. "Repetitive elements as a transcriptomic marker of aging: evidence in multiple datasets and models." Aging Cell 19.7 (2020): e13167.

      5) The results obtained for isolated cultures of fibroblasts are transferred to the whole organism, which has not been verified. The conclusions should be more accurate.

      We agree and have updated the discussion accordingly.

      6) The full pipeline with all the configuration files IS NOT available on github (pabisk/aging_transposons).

      Thank you for pointing this out, we have now uploaded the full pipeline and configuration files.

      7) Analysis of transcripts passing through repeating regions is a complex matter. There is always a high probability of incorrect mapping of multi-reads to the genome. Things worsen if unpaired short reads are used, as in the study (L=51). Therefore, the authors used the Expectation maximization algorithm to quantify transposon reads. Such an option is possible. But it is necessary to indicate how statistically reliable the calculated levels are. It would be nice to make a similar comparison of TE levels using only unique reads. The density of reads would drop, but in this case it would be possible to avoid the artifacts of the EM algorithm.

      We thank the reviewer for this suggestion. We show here that mapping only unique alignments (outFilterMultimapNmax=1 in STAR) leads to similar results.

      For the aging fibroblast dataset:

      Author response image 12.

      For the induced senescence dataset:

      Author response image 13.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for The Authors):

      Q1: Please replace lymphocytes with lymphatic endothelial cells throughout the manuscript.

      A1: Thank you for your conscientious review. Per your suggestion, we have replaced “lymphocytes” with “lymphatic endothelial cells (LECs)” throughout the manuscript.

      Q2: Please re-analyse lymphatics using LYVE1 and CD68 or another macrophage marker, as Lyve1 is NOT specific for lymphatics.

      A2: Thank you for your suggestion. We completely agree with your opinion. Because both the CD68 (CST,97778S) and LYVE1 antibodies (Abcam,ab14917) are rabbit multiclonal antibodies and to more accurately label cardiac lymphatics, we performed immunofluorescence co-staining using LYVE1 and PDPN antibodies (Thermo,53-5381-82) and re-measured the lymphatic vessel area using the Image J software (version 1.53). The result is shown in Figure 1A and 1B. Further, we performed co-staining with PDPN and CD68 to observe the relationship between macrophage and cardiac lymphatic vessel distributions at different time points post-myocardial infarction (MI) (Figure1-figure supplement 1F). Per your comment, some LYVE1 markers are positive, whereas PDPN markers may be negative for macrophages in the heart tissue. We have added notes on the catalog numbers of anti-PDPN and anti-CD68 in the methods (Page 10, Lines 351‒352) and updated them in the KRT template and MDAR checklist.

      Q3: Rephrase title 2.6, 2.7 to fit the results in these sections that are purely descriptive and do not add any insight into the functional relevance of the findings.

      A3: Thank you for your suggestion. We have rephrased titles 2.6 and 2.7 as follows:

      2.6 AQP1 in LEC is correlated with myocardial edema occurrence and resolution post-MI.

      2.7 Gal9 secreted by LEC can affect macrophage migration.

      Q4: Please refrain from extensive discussion of non-significant findings, such as Figures 6D, and 7A, B, and M (ifng vs ifng + antiGal9 is n.s).

      A4: Thank you for your suggestions. Lymphatic endothelial cells (LECs) are a type of cell that exists in the myocardial tissue in small quantities. Owing to the extremely small number of LECs, elucidating their biological functions and regulation may be challenging during MI. To gain a deeper understanding of the role of the lymphatic system post-MI, we attempted to analyze the transcriptomic changes of LEC subsets at different time points after MI by combining single-cell sequencing and spatial transcriptomics data. We have selected relevant molecules with significant differences in transcription levels and conducted the validation analysis in LECs at different time points after MI. Among them, AQP1 and GAL9 showed significant differences. CD44, as a receptor for GAL9, showed significant differences in its expression in macrophages at different time points after MI. Therefore, we have added the relevant information to the discussion section (marked with yellow) on Page 9, Lines 299‒312.

      Q5: Please explain the method used to calculate lymphatic areas in Figure 1.

      A5: Thank you for your observation. The method we used is consistent with that described in previous studies[1,2]. (PMID: 30582443 and PMID: 32404007). The detailed methods have been described in the Methods as follows (Page 10, Lines 358‒363):

      For quantification of vessel area, vessels with visible co-staining were measured using Image J software. First, we selected an image, turned it into 8-bit, and then applied a suitable threshold adjustment (present co-stained areas wherever possible). Second, five equally sized squares were selected in the respective zones (remote, infarct, and border zones) of each slice. ROI manager tools were used to analyze the automatic signal intensity quantification by the software in the area inside this square. Finally, the GraphPad software was used to plot the results as a bar graph.   

      Q6: In Figure 1 supp C, the upper and lower panels don't seem to have the same zoom factor.

      A6: Thank you for pointing this out. The upper and lower images in Figure S1C have the same magnification. To facilitate your review, we have added a 1× image and re-labeled the position and scale information of the image. The revised Figure S1C was added to the manuscript and is shown as follows:

      Q7: In Figure 2d please include aqp1 among displayed genes.

      A7: Thank you for your suggestion. The Aqp1 gene is already displayed in the 11th, and we have labeled it.

      Q8: In Figure 2f include markers of LECs such as Prox1, Flt4, Itga9, and also show Aqp1 here.

      A8: Thank you for your valuable comment. We have updated Figure 2f.

      Q9: Please indicate in Figure 3a what the y axis means? % of total LECs? % of total LECs at a given time point? The data is really not clear.

      A9: Thank you for your suggestion. The y-axis represents the percentage of the total number of LECs at d1, d3, d7, d14, and d28 post-MI, relative to the number of LECs at d0, which is used as the reference value set at 100%. Meanwhile, different colors were applied to represent the proportion of different cell subtypes at different time points. We have updated Figure 3a.

      Q10:Add n of LECs per time points in Figures 3a and b.

      A10: Thank you for your suggestion. We have updated Figure 3b.

      Q11: For Figure 3c please explain what marker genes were used to identify LEC enriched areas. What was the spatial resolution of the transcriptomic screens? How do these images relate to the localization of lymphatics in the heart?

      A11: We appreciate your observation. We have added the required information to the Methods on Page 13, Lines 442‒448, as follows:

      “We conducted spatial transcriptome data analysis using the deconvolution algorithm. The deconvolution algorithm refers to the application of feature genes to infer the full matrix information of single-cell transcriptome of cell subclusters. We then compared and anchored the matrix information of the single-cell transcriptome with the information of each SPOT in the spatial transcriptome, predicting cell types based on the similarity between the two sets of information.”

      Q12:Figure 6 explains the y-axis in panel A, the timepoint in panel G, and absence of aqp1 staining in blood vessels in images d1 and d3 in panel D.

      A12: Thank you for your suggestion. The y-axis in Figure 6A (Figure to reviewer 7A) shows Aqp1 expression in LECs at different time points from the sc-RNA sequence data. We have also added the timepoint in Figure 6G, which is for 24 hours. To clarify the expression trend of APQ1 more clearly, we performed immunofluorescence staining of APQ1 and LYVE1 at different time points after MI (d0, d1, d3, d7, and d14). The results are shown in Figure to reviewer 7C. APQ1 expression was found to be increased in the border zone of infarction at d3 post-MI adjacent to LYVE1 staining positive area.

      Q13: Explain the y-axis unit in Figure 7a.

      A13: Thank you for your comment. The y-axis in Figure 7A shows Lgals9 gene expression in LECs at different time points from the Sc-RNA sequence data.

      Q14: In Figure 7c, d how was the induction of cell death excluded as a cause of IFNg-mediated effects in LECs?

      A14: Thank you for your suggestion. To remove the interference of apoptosis on the results, we performed TUNEL staining of LECs after stimulation with different concentrations of IFN-r for 24 h. As shown in the Figure to reviewer 9, little apoptosis of LECs was observed in this concentration gradient range. Therefore, we can exclude the potential impact of IFN-r-induced cell apoptosis.

      Author response image 1.

      TUNEL staining of LECs after stimulation with different concentrations of IFN-r for 24 h.

      Q15: Results with hypoxia in Figure 7 are mentioned but not shown.

      A15: Thank you for your observation. In the revised article, we supplemented the detection of Gal9 expression after hypoxic stimulation. We conducted hypoxia intervention experiments using two methods. First, we applied 1% oxygen concentration stimulation to detect the expression of Gal9 at 0 h, 2 h, 4 h, 8 h, 12 h, and 24 hours. Second, we applied CoCl2 intervention to activate HIF1α expression and simulated cell hypoxia stimulation to detect Gal9 expression. Both results confirmed that hypoxia could not stimulate LECs to secrete galectin 9. The results are presented in Figure 7-figure Supplement 1 (A-D).

      Reviewer #3 (Recommendations For The Authors):

      Q1: In Figure 1, the so-called "LYVE1-labeled lymphatic capillaries with discontinuous walls" might be macrophages. The authors measured lymphatic area by measuring "vessels with visible lumens", which is unclear. This may underestimate the number of capillaries that expand after MI in the border zone of the infarct area. The authors need to use CD68 and Pdpn markers, as Lyve1 is not specific for lymphatics and also stains macrophages, and Pdpn is more reliable for assessing lymphatic identity.

      A1: Thank you for your good suggestion. We totally agree with your opinion. Because both the CD68 (CST,97778S) and LYVE1 antibodies (Abcam,ab14917) are rabbit multiclonal antibodies and to more accurately label cardiac lymphatics, we performed immunofluorescence co-staining using LYVE1 and PDPN antibodies(Thermo,53-5381-82) and re-measured the lymphatic vessel area using the Image J software (version 1.53). The result is shown in Figure to reviewer 1 (Figure 1A and 1B in manuscript). Further, we performed co-staining with PDPN and CD68 to observe the relationship between macrophage and cardiac lymphatic vessel distributions at different time points post-myocardial infarction (Figure to reviewer 2,and Figure1-figure supplement 1F in manuscript). Per your comment, some LYVE1 markers are positive, whereas PDPN markers may be negative for macrophages in the heart tissue. We have added notes on the catalog numbers of anti-PDPN and anti-CD68 in the methods (Page 10, Lines 351‒352) and updated them in the KRT template and MDAR checklist.

      Q2: It is not clear how they analyse the lymphatic area in Figure 1, please explain.

      A2: Thank you for your observation. The method we used is consistent with that described in previous studies[1,2]. (PMID: 30582443 and PMID: 32404007). The detailed methods have been described in the Methods as follows (Page 10, Lines 347‒352):

      For quantification of vessel area, vessels with visible co-staining were measured using Image J software. First, we selected an image, turned it into 8-bit, and then applied a suitable threshold adjustment (present co-stained areas wherever possible). Second, five equally sized squares were selected in the respective zones (remote, infarct, and border zones) of each slice. ROI manager tools were used to analyze the automatic signal intensity quantification by the software in the area inside this square. Finally, the GraphPad software was used to plot the results as a bar graph.   

      Q3: Figure 1-supplement 1D: The authors claim that the observed structure is a lymphatic valve, however in 2D sections, this shape might result from membrane destruction due to the cutting and staining process. To accurately identify valves, the authors should employ 3D imaging of the lymphatic network, such as using a clearing protocol followed by lightsheet microscopy.

      A3: Thank you for your good suggestion. We performed a 3D scan using a confocal microscope on another slice. The results are shown in Figure 1-supplement 1D. We believe it is more like the lymphatic valve than chips from membrane destruction.

      Q4: In Figure 2, the number of LECs is too little. Indeed, 242 LECs were identified over 44860 total cell numbers and 5688 endothelial cells cannot be representative and cannot afford to distinguish 4 different clusters.

      A4: We further analyzed the percentage of LEC in the adult mouse heart in the physiological state on day d0 based on the results of single-cell nuclear sequencing from public databases (GSE214611). A total of 292 LEC cells were obtained from 26,779 cells captured on board in three samples, meaning that the percentage of LEC cells in the normal adult mouse heart is 1.09%. Cardiac LECs are really rare, and enrichment methods such as flow cytometry and magnetic beads separation for cardiac LECs are under marked probing, which might exhibit more irrefutable evidence in future studies.

      Q5: The authors claimed that there is transcriptional heterogeneity in regenerated cardiac LECs post-MI, based on their over-clusterization. However, to substantiate this claim, they need to include a control comparison. Currently, the observed differences in cardiac LEC profiles lack a direct connection to the disease condition.

      A5: Thank you for pointing this out. Because we could not download spatial transcriptome data for day d0 in the public database (GSE214611) or from the authors, we have used data of 1 h after IR as a reference for approximating the physiological state in Figure 3 and in Supplemental Figure 1.

      Q6: Line 131, what is the regeneration ratio the authors cite here?

      A6: Thank you for the comment. Regeneration ratio is an inappropriate use of the word, and we apologize for this confusion. We were actually referring to the regenerative potential of LECs.

      Q7: Line 132, it is not clear what is the "normal myocardial tissue" in the graphs presented Figures 3A and B. Is it d0 time point?

      A7: Thank you for your suggestion. The d0 time point means LECs in the normal adult mouse heart.

      Q8: In Figure 2D, please add more lymphatic markers as Ccl21, Flt4, Itga9, FoxC2 and Aqp1.

      A8: Thank you for your suggestion. We have added these markers (Except Ccl21, whose gene expression is too low to mark) in Figure 2D in the revised manuscript.

      Q9: The authors must replace "lymphocyte" with "lymphatic" from 2.5, where they start to present interactions between lymphatic and immune cells.

      A9: Thank you for your good comments. We have corrected these words.

      Q10: In Figure 3, please indicate what the color scale means.

      A10: Thank you for your suggestion. We have supplied a color scale label.

      Q11: In Figures 3C and D, the authors distinguished the same LECs clusters in the spatial transcriptomic as in the scRNAseq analysis. This is not clear whether they used the same markers.

      A11: We appreciate your observation. We have added the required information to the Methods on Page 12, Lines 429‒434, as follows:

      “We conducted spatial transcriptome data analysis using the deconvolution algorithm. The deconvolution algorithm refers to the application of feature genes to infer the full matrix information of single-cell transcriptome of cell subclusters. We then compared and anchored the matrix information of the single-cell transcriptome with the information of each SPOT in the spatial transcriptome, predicting cell types based on the similarity between the two sets of information.”

      Q12: In 2.5, it is not clear whether the main message is about macrophage interactions with lymphocytes or with lymphatics(LEC interact with others)

      A12: Thank you for your suggestion. We have revised the title 2.5 as “Assessment of Cell-Cell Communication between LECs and immune cells,” which is clearer for the reader.

      Q13: In 2.6, the authors claim that they reveal "that fluid retention occurs in LEC ca I and LEC co. They don't show any data supporting this.

      A13: Thank you for your comment. “…that fluid retention occurs in LEC ca I and LEC co” is mainly supported by Figure 3D KEGG enrichment. LEC Ca I is related to vasopressin-regulated water reabsorption, and LEC co is related to renin secretion.

      Q14: In Figure 6A, please add statistical values, as the authors claim a significant correlation. Please also add a figure to support the correlation between Aqp1 and edema score, as mentioned in 2.6.

      A14: Thank you for pointing this out. We have presented the information on statistical values in Figure 6A. Moreover, we calculated the correlation between Aqp1 and edema score in Figure 6D (shown in Author response image 2).

      Author response image 2.

      Correlation between Aqp1 expression intensity and edema score.

      Q15: In Figure 6B, myocardial edema assessment using H&E staining is not accurate. If the authors wish to analyse cardiac edema, they must use gravimetry or MRI techniques.

      A15: Thank you for your comment. We totally agree with your opinion. However, owing to limitations in experimental conditions, we could not perform MRI detection of mouse myocardial injury. To evaluate whether edema occurred in the mouse heart tissue, we used classic pathological evaluation methods described in the literature (PMID: 30582443). This method has been described in detail as follows (Page 11, Lines 365‒370):

      Four high-power (40×) representative images were chosen per animal under the H&E stained section; each image must have a clear border of the section visible. Images were blinded, and five visual fields per sample were evaluated. Subsequently, an edema score was determined for each sample (Score 1=no edema, 2=mild edema, 3=severe edema). Graphs represent the average score value per animal.

      Q16: Line 227, please correct "LVEC" with "LEC".

      A16: Thank you for your careful review. We have revised this in the manuscript.

      Q17: In Figure 6D, IF co-staining of Aqp1 and lymphatic vessels is mentioned as "significantly reduced". However, we don't see any quantification data supporting this.

      A17: Thank you for your comment. To clarify the expression trend of APQ1 more clearly, we performed immunofluorescence staining of APQ1 and LYVE1 at different time points post-MI (d0, d1, d3, d7, and d14). The results are shown in the corrected Figure 6-figure supplement 1A. The result showed that APQ1 expression increased in the border zone of infarction in d3 post-MI adjacent to LYVE1 staining positive area.

      Q18: As Gal9 was not significantly impaired in LECs post. MI, Figure 7A does not support any real finding concerning the role of this molecule in monocytes/macrophages interaction with cardiac lymphatics.

      A18: Thank you for your comment. The Lgals9 gene is significantly impaired in LEC post-MI, as well as the Cd44 gene in macrophage. We have updated them in Figures 7A and 7B.

      Q19:  In Figure 7, please correct INF by IFN.

      A19: Thank you for your careful review. We have revised this in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary.

      The authors goal was to map the neural circuitry underlying cold sensitive contraction in Drosophila. The circuitry underlying most sensory modalities has been characterized but noxious cold sensory circuitry has not been well studied. The authors achieve their goal and map out sensory and post-sensory neurons involved in this behavior.

      Strengths.

      The manuscript provides convincing evidence for sensory and post sensory neurons involved in noxious cold sensitive behavior. They use both connectivity data and functional data to identify these neurons. This work is a clear advance in our understanding of noxious cold behavior. The experiments are done with a high degree of experimental rigor.

      Positive comments

      - Campari is nicely done to map cold responsive neurons, although it doesn't give data on individual neurons.

      - Chrimson and TNT experiments are nicely done.

      - Cold temperature activates basin neurons, it's a solid and convincing result.

      Weaknesses.

      Among the few weaknesses in this manuscript is the failure to trace the circuit from sensory neuron to motor neuron; and to ignore analysis of the muscles driving, cold induced contraction. Authors also need to elaborate more on the novel aspects of their work in the introduction or abstract.

      We have performed a more thorough em connectivity analysis of the CIII md neuron circuit (Figure 1A, Figure 1 – Figure supplement 1, Figure 10A). We now report all premotor neurons that are connected to CIII md neurons along with two additional projection/commandlike neurons. These additional premotor neurons (A01d3, A02e, A02f, A02g, A27k, and A31k) that are primarily implicated in locomotion were not required for cold nociception (Figure 5 – Figure supplement 2). Collectively, we have tested the requirement in cold nociception for ~94% synapses between CIII md->premotor neurons and all tested premotor with available driver lines. The requirement in cold nociception was also assessed for the two projection/command-like neurons dLIP7 and A02o neurons, which are required for sensory integration and directional avoidance to noxious touch, respectively (Figure 7 – Figure supplement 2) (Hu et al., 2017; Takagi et al., 2017). Silencing dLIP7 neurons resulted in modest reduction in cold-evoked behaviors, meanwhile A02o neurons were not required for cold nociception (Figure 7 – Figure supplement 2). To complete the analysis from thermosensation to evoked behavior, we analyzed cold-evoked Ca<sup>2+</sup> responses of larval musculature (Figure 10). Premotor neurons, which are connected to CIII md neurons, target multiple muscle groups (DL, DO, LT, VL, and VO) (Figure 10A). Individual larval segments have unique cold-evoked Ca<sup>2+</sup> responses, where the strongest cold-evoked Ca<sup>2+</sup> occurs in the central abdominal segments (Figure 10B-D). Inhibiting motor neuron activity or using an anesthetic (ethyl ether), there is a negligible cold-evoked Ca<sup>2+</sup> response compared to controls (Figure 10 – Figure supplement 1). Analysis of cold-evoked Ca<sup>2+</sup> in individual muscles reveal unique Ca<sup>2+</sup> dynamics for individual muscle groups (Figure 10E-H).

      Major comments.

      - Class three sensory neuron connectivity is known, and role in cold response is known (turner 16, 18). Need to make it clearer what the novelty of the experiments are.

      In figure 1, we are trying to guide the audience to CIII md neuron circuitry and emphasize the necessity and sufficiency CIII md neurons in cold nociception. Previously, only transient (GCaMP6) cold-evoked Ca<sup>2+</sup> were reported (Turner et al., 2016, 2018). However, here using CaMPARI, we performed dendritic spatial (sholl) analysis of cold-evoked Ca<sup>2+</sup> responses (Figure 1B-C). During the revision, we evaluated both CIII- and cold-evoked CT throughout larval development (Figure 1G, H). All in all, the findings from the first figure reiterate and replicate previous findings for the role of CIII md neuron in cold nociception. CIII md connectivity might be known, however, we investigated the functional and physiological roles of individual circuit neurons.

      - Why focus on premotor neurons in mechano nociceptive pathways? Why not focus on PMNs innervating longitudinal muscles, likely involved in longitudinal larval contraction? Especially since chosen premotor neurons have only weak effects on cold induced contraction?

      We assessed requirements for all premotor neurons that are connected to CIII md neurons and for which there are validated driver lines. Only premotor neurons (DnB, mCSI and Chair-1), which were previously initially implicated in mechanosensation, were also required for cold nociception. Premotor neurons previously implicated in locomotion (A01d3, A02e, A02f, A02g, A27k, and A31k) are not required for cold-evoked behaviors (Figure 5 – Figure supplement 2).

      Reviewer #2 (Public Review):

      Patel et al perform the analysis of neurons in a somatosensory network involved in responses to noxious cold in Drosophila larvae. Using a combination of behavioral experiments, Calcium imaging, optogenetics, and synaptic connectivity analysis in the Drosophila larval they assess the function of circuit elements in the somatosensory network downstream of multimodal somatosensory neurons involved in innocuous and noxious stimuli sensing and probe their function in noxious cold processing, Consistent with their previous findings they find the multidendritic class III neurons, to be the key cold sensing neurons that are both required and sufficient for the CT behaviors response (shown to evoked by noxious cold). They further investigate the downstream neurons identified based on literature and connectivity from EM at different stages of sensory processing characterize the different phenotypes upon activating/silencing those neurons and monitor their responses to noxious cold. The work reveals diverse phenotypes for the different neurons studied and provides the groundwork for understanding how information is processed in the nervous system from sensory input to motor output and how information from different modalities is processed by neuronal networks. However, at times the writing could be clearer and some results interpretations more rigorous.

      Specific comments

      (1) In Figure 1 -supplement 6D-F (Cho co-activation)

      The authors find that Ch neurons are cold sensitive and required for cold nociceptive behavior but do not facilitate behavioral responses induced but CIII neurons

      The authors show that coactivating mdIII and cho inhibits the CT (a typically observed coldinduced behavioral response) in the second part of the stimulation period, while Cho was required for cold-induced CT. Different levels of activation of md III and Cho (different light intensities) could bring some insights into the observed phenotypes upon Cho manipulation as different levels activate different downstream networks that could correspond to different stimuli. Also, it would be interesting to activate chordotonal during exposure to cold to determine how a behavioral response to cold is affected by the activation of chordotonal sensory neurons.

      Modulating both CIII md and Ch activation to assess the contribution of individual sensory neuron’s role in thermosensation would certainly shed unique insights. However, we believe that such analyses are beyond the scope of the current manuscript and better suited to future followup studies.

      (2) Throughout the paper the co-activation experiments investigate whether co-activating the different candidate neurons and md III neurons facilitates the md III-induced CT response. However, the cold noxious stimuli will presumably activate different neurons downstream than optogenetic activation of MdIII and thus can reveal more accurately the role of the different candidate neurons in facilitating cold nociception.

      We agree that the CIII md neuron activation of the downstream circuitry would be different from the cold-evoked activation of neurons downstream of primary sensory neurons. We believe that our current finding lay foundations for future works to evaluate how multiple sensory neurons work in concert for generating stimulus specific behavioral responses.

      (3) Use of blue lights in behavioral and imaging experiments

      Strong Blue and UV have been shown to activate MDIV neurons (Xiang, Y., Yuan, Q., Vogt, N. et al. Light-avoidance-mediating photoreceptors tile the Drosophila larval body wall. Nature 468, 921-926 (2010). https://doi.org/10.1038/nature09576) and some of the neurons tested receive input from MdIV.

      In their experiments, the authors used blue light to optogenetically activate CDIII neurons and then monitored Calcium responses in Basin neurons, premotor neurons, and ascending neurons and UV light is necessary for photoconversion in Campari Experiments. Therefore, some of the neurons monitored could be activated by blue light and not cdIII activation. Indeed, responses of Basin-4 neurons can be observed in the no ATR condition (Fig 3HI) and quite strong responses of DnB neurons. (Figure 6E) How do authors discern that the effects they see on the different neurons are indeed due to cold nociception and not the synergy of cold and blue light responses could especially be the case for DNB that could have in facilitating the response to cold in a multisensory context (where mdIV are activated by light).

      In addition, the silencing of DNB neurons during cold stimulation does not seem to give very robust phenotypes (no significant CT decrease compared to empty GAL4 control).

      It would be important to for example show that even in the absence of blue light the DNB facilitates the mdIII activation or cold-induced CT by using red light and Chrimson for example or TrpA activation (for coactivation with md III).

      Alternatively, in some other cases, the phenotype upon co-activation could be inhibited by blue light (e.g. chair-1 (Figure 5 H-I)).

      More generally, given the multimodal nature of stimuli activating mdIV , MdIII (and Cho) and their shared downstream circuitry it is important to either control for using the blue light in these stimuli or take into account the presence of the stimulus in interpreting the results as the coactivation of for example Cho and mdIII using blue lights also could activate mdIV (and downstream neurons, alter the state of the network that could inhibit the md III induced CT responses.

      Assessing the differences in behavioral phenotypes in the different conditions could give an idea of the influence of combining different modalities in these assays. For example, did the authors observe any other behaviors upon co-activation of MDIII and Cho (at the expense of CT in the second part of the stimulation) or did the larvae resume crawling? Blue light typically induces reorientation behavior. What about when co-activating mdIII and Basin-4?

      Using Chrimson and red light or TrpA in some key experiments e.g. with Cho, Basin-4, and DNB would clarify the implication of these neurons in cold nociception

      We agree that exposure to a bright light source results in avoidance behaviors in Drosophila larvae, which is primarily mediated by CIV md neurons. However, the light intensities used in our assays is much milder than the ones required to activate sensory neurons. Specifically, based on Xiang et al. 470nm light does not evoke any electrical response at the lowest tested light intensity (0.74mWmm<sup>-2</sup>), whereas our light intensity used in behavioral experiments was much lower at 0.15mWmm<sup>-2</sup>. Additionally, we assessed larval mobility and turning for control conditions ±ATR and also sensory neuron activation. As expected, there is an increase in larval immobility upon CIII md neurons activation (Author response image 1). Only activation of CIV md neurons resulted in light-evoked turning, meanwhile remaining conditions did show stimulus time locked turning response (Author response image 1). Furthermore, we tested whether the intensity of 470nm light used in our behavior experiments was enough to result in light-evoked Ca<sup>2+</sup> response in CIII md and CIV md neurons. We expressed RCaMP in sensory neurons using a pan-neural driver (GMR51C10<sup>GAL4</sup>). There was no detectable increase in light-evoked Ca<sup>2+</sup> response in either CIII md or CIV md neuron (Author response image 1).

      Furthermore, we also tested multiple optogenetic actuators (ChR2, ChR2-H134R, and CsChrimson) and two CIII md driver lines (19-12<sup>Gal4</sup> and R83B04<sup>Gal4</sup>). Regardless of the optogenetic actuator used or the wavelength of the light used, we observe light-evoked CT responses (Figure 1– Figure supplement 6). We found using CsChrimson raises several procedural challenges with our current experimental setup. In our hands, CsChrimson showed extreme sensitivity to any amount ambient white light intensities, whereas others have used infrared imaging to counteract ambient light sensitivity. Our imaging setup is equipped with visible spectrum imaging and cannot be retrofitted record infrared light sources. Thus, we have limited the use of CsChrimson to optogenetic-Ca<sup>2+</sup> imaging experiments, where we are not recording larval behavior.

      The use of TrpA1 would require heat stimulation for activating the channels, which in turn would impact downstream circuit neurons that are shared amongst sensory neurons.

      For CaMPARI experiments, the PC light was delivered using a similar custom filter cube, which was used in the original CaMPARI paper (Fosque et al., 2015). This filter cube delivers 440nm wavelength as the PC light. PC light exposure in absence of cold stimulus does not result in differential CaMPARI conversion between CIII md and CIV md (F<sub>red/green</sub> = 0.086 and 0.097, respectively). For the same condition, Ch neurons have high CaMPARI, but it is expected as they function in proprioception. Therefore, the chances of downstream neurons being solely activated by PC light remain low. The differential baseline CaMPARI F<sub>red/green</sub> ratios of individual circuit neurons could be a result of varying resting state cytosolic Ca<sup>2+</sup> concentrations.

      Lastly, for optogenetic-GCaMP experiments, where we use CIII md>CsChrimson and Basin-2/-4 or DnB>GCaMP to visualize CIII md evoked Ca<sup>2+</sup> responses in downstream neuron. Xiang et al. reported that confocal laser excitation for GCaMP does not activate CIV md neurons, which is consistent with what we have observed as well.

      Author response image 1.

      (A) For optogenetic experiments, percent turning was assessed in control conditions and sensory neuron activation. Only CIV md neurons activation results in an increase in bending response. Other conditions do not blue light-evoked turning. (A’) We assessed larval turning based on ellipse fitting using FIJI, the aspect ratio of the radii is indicative of larval bending state. We empirically determined that radii ratio of <2.5 represents a larval turning/bending. This method of ellipse fitting has previously been used to identify C. elegans postures using WrMTrck in FIJI (Nussbaum-Krammer et al., 2015). (B) Percent immobility for all control conditions plus sensory activation driver lines. Only CIII md neuron activation leads to sustained stimulus-locked increase in immobility. There’s also no blue light-evoked reductions in mobility, indicating that there was not increase in larval movement due to blue light. (C) We assessed CIII md (ddaF) and CIV md (ddaC) neurons response to blue light with similar light intensity that was used in behavioral optogenetic experiments. There is no blue light evoked increase in RCaMP fluorescence.

      (4) Basins

      - Page 17 line 442-3 "Neural silencing of all Basin (1-4) neurons, using two independent driver lines (R72F11GAL4 and R57F07<sup>GAL4</sup>).

      Did the authors check the expression profile of the R57F07 line that they use to probe "all basins"? The expression profile published previously (Ohyama et al, 2015, extended data) shows one basin neuron (identified as basin-4 ) and some neurons in the brain lobes. Also, the split GAL4 that labels Basin-4 (SS00740) is the intersection between R72F11 and R57F07 neurons. Thus the R57F07 likely labels Basin-4 and if that is the case the data in Figure 2 9 and supplement) and Figure 3 related to this driver line, should be annotated as Basin-4, and the results and their interpretation modified to take into account the different phenotypes for all basins and Basin-4 neurons.

      Due to the non-specific nature of R57F07<sup>GAL4</sup> in labeling Basin-4 and additional neuron types, we have decided to remove the driver line from our current analysis. We would need to perform further independent investigations to identify the other cell types and validate their role in cold nociception.

      Page 19 l. 521-525 I am confused by these sentences as the authors claim that Basin-4 showed reduced Calcium responses upon repetitive activation of CDIII md neurons but then they say they exhibit sensitization. Looking at the plots in FIG 3 F-I the Basin-4 responses upon repeated activation seem indeed to decrease on the second repetition compared to the first. What is the sensitization the authors refer to?

      We have rephrased this section.

      On Page 47-In this section of the discussion, the authors emit an interesting hypothesis that the Basin-1 neuron could modulate the gain of behavioral responses. While this is an interesting idea, I wonder what would be the explanation for the finding that co-activation of Cho and MDIII does not facilitate cold nociceptive responses. Would activation of Basin-1 facilitate the cold response in different contexts (in addition to CH0-mediated stimuli)?

      Page 48 Thus the implication of the inhibitory network in cold processing should be better contextualized.

      The authors explain the difference in the lower basin-2 Ca- response to Cold/ mdIII activation (compared to Basin-4) despite stronger connectivity, due a stronger inputs from inhibitory neurons to Basin-2 (compared to Basin-4). The previously described inhibitory neurons that synapse onto Basin-2 receive rather a small fraction of inputs from the class III sensory neurons. The differences in response to cold could be potentially assigned to the activation of the inhibitory neurons by the cold-sensing cho- neurons. However, that cannot explain the differences in responses induced by class III neurons. Do the authors refer to additional inhibitory neurons that would receive significant input from MdIII?

      Alternative explanations could exist for this difference in activation: electrical synapses from mdIII onto Basin-4, and by stronger inputs from mdIV (compared to Basin-2 in the case of responses to Cold stimulus (Cold induces responses in md IV sensory neurons). Different subtypes of CD III may differentially respond to cold and the cold-sensing ones could synapse preferentially on basin-4 etc.

      A possible explanation for lack of CT facilitation when Ch and CIII md neurons are both activated are likely the competing sensory inputs going into Basins and yet unknown role of the inhibitory network between sensory and Basin neurons in cold nociception (Jovanic et al., 2016). Mechanical activation of Ch leads to several behavioral responses (hunch, back-up, pause, crawl, and/or bend) and transition between behaviors (Kernan et al., 1994; Tsubouchi et al., 2012; Zhang et al., 2015; Turner et al., 2016, 2018; Jovanic et al., 2019; Masson et al., 2020).

      Meanwhile, primary CIII md-/cold-evoked is CT (Turner et al., 2016, 2018, Patel et al., 2022, Himmel et al., 2023). Certain touch- versus cold- evoked behaviors are mutually exclusive, where co-activation of Ch and CIII md likely leads to competing neural impulses leading to lack of any single behavioral enhancement. Furthermore, the mini circuit motif between Ch and Basins consisting of feedforward, feedback and lateral inhibitory neurons that play a role in behavioral selection and transitions might impact the overall output of Basin neurons. Upon Ch and CIII md neuron co-activation, the cumulative Basin neuronal output may be biased towards increased behavioral transitions instead of sustained singular behavior response.

      While we posited one possible mechanism explaining the differences between cold- or CIII mdevoked Ca<sup>2+</sup> responses in Basin 2 and 4 neurons, where we suggest the differences in evoked Ca<sup>2+</sup> responses may arise due to differential connectivity of TePns and inhibitory network neurons to Basin 2 and/or 4. Furthermore, ascending A00c neurons are connected to descending feedback SEZ neuron, SeIN128, which have connectivity to Basins (1-3 and strongest with Basin 2), A02o, DnB, Chair-1 and A02m/n (Ohyama et al., 2015; Zhu et al., 2024). However, how the 5 different subtypes of CIII md neurons respond to cold is unknown. Electrical recordings of the dorsal CIII md neurons revealed that within & between neuron subtypes there’s variability in temperature sensitivity of individual neurons, where population coding results in fine-tuned central temperature representation (Maksymchuk et al., 2022). Evaluating the role of how individual CIII md subtypes Basin activation could reveal important insights into the precise relationship between CIII md and multisensory integration Basin neurons. However, as of yet there are no known CIII md neuron driver lines that mark a subset of CIII md neurons thus limiting further clarification on how primary sensory information is transduced to integration neurons.

      (5) A00c

      Page 26 Figure 4F-I line While Goro may not be involved in cold nociception the A00c (and A05q) seems to be.

      A00c could convey information to other neurons other than Goro and thus be part of a pathway for cold-induced CT.

      A deeper look into A00c connectivity reveals that there is a reciprocal relationship between A00c and SEZ descending neuron, SeIN128 (Ohyama et al., 2015; Zhu et al., 2024). Additionally, this feedback SEZ descending neuron synapse onto A02o, A05q, Basins (highest connectivity to Basin 2 and weak connectivity to Basin 1 & 3), and select premotor neurons (Chair-1, DnB, and A02m/n) (Ohyama et al., 2015; Zhu et al., 2024). Interestingly, SEZ feedback neuron likely plays a role in the observed cold-/CIII md neuron evoked differential calcium activity and behavioral requirement amongst Basin-2 and -4 in cold nociception. We have added this to our discussion section.

      (6) Page 31 766-768 the conclusion that "premotor function is required for and can facilitate cold nociception" seems odd to stress as one would assume that some premotor neurons would be involved in controlling the behavioral responses to a stimulus. It would be more pertinent in the summary to specify which premotor neurons are involved and what is their function

      We have updated the section regarding premotor neurons’ role in cold nociception and now there’s a more specific concluding statement.

      (7) There are several Split GAL4 used in the study (with transgenes inserted in attP40 et attP2 site). A recent study points to a mutation related to attP40 that can have an effect on muscle function: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9750024/. The controls used in behavioral experiments do not contain the attP40 site. It would be important to check a control genotype bearing an attP40 site and characterize the different parameters of the CT behavior to cold and take this into account in interpreting the results of the experiments using the SplitGAL4 lines

      We have performed control experiments bearing empty attP40;attP2 sites in our neural silencing experiments. The observed muscle phenotypes were present in larvae bearing homozygous copies attP40/attP40 (van der Graaf et al., 2022). However, in our experiments, none of the larvae that we tested behaviorally had homozygous attP40;attP2 insertions. We have updated Table 1 to now include insertion sites.

      Reviewer #3 (Public Review):

      Summary:

      The authors follow up on prior studies where they have argued for the existence of cold nociception in Drosophila larvae. In the proposed pathway, mechanosensitive Class III multidendritic neurons are the noxious cold responding sensory cells. The current study attempts to explore the potential roles of second and third order neurons, based on information of the Class III neuron synaptic outputs that have been obtained from the larval connectome.

      Strengths:

      The major strength of the manuscript is the detailed discussion of the second and third order neurons that are downstream of the mechanosensory Class III multidendritic neurons. These will be useful in further studies of gentle touch mechanosensation and mechanonociception both of which rely on sensory input from these cells. Calcium imaging experiments on Class III

      activation with optogenetics support the wiring diagram.

      Weaknesses:

      The scientific premise is that a full body contraction in larvae that are exposed to noxious cold is a sensorimotor behavioral pathway. This premise is, to start with, questionable. A common definition of behavior is a set of "orderly movements with recognizable and repeatable patterns of activity produced by members of a species (Baker et al., 2001)." In the case of nociception behaviors, the patterns of movement are typically thought to play a protective role and to protect from potential tissue damage.

      Does noxious cold elicit a set of orderly movements with a recognizable and repeatable pattern in larvae? Can the patterns of movement that are stimulated by noxious cold allow the larvae to escape harm? Based on the available evidence, the answer to both questions is seemingly no. In response to noxious cold stimulation many, if not all, of the muscles in the larva, simultaneously contract (Turner et al., 2016), and as a result the larva becomes stationary. In response to cold, the larva is literally "frozen" in place and it is incapable of moving away. This incapacitation by cold is the antithesis of what one might expect from a behavior that protects the animals from harm.

      Extensive literature has investigated the physiological responses of insects to cold (reviewed in Overgaard and MacMillan, 2017). In numerous studies of insects across many genera (excluding cold adapted insects such as snow flies), exposure to very cold temperatures quickly incapacitates the animal and induces a state that is known as a chill coma. During a chill coma, the insect becomes immobilized by the cold exposure, but if the exposure to cold is very brief the insect can often be revived without apparent damage. Indeed, it is common practice for many laboratories that use adult Drosophila for studies of behavior to use a brief chilling on ice as a form of anesthesia because chilling is less disruptive to subsequent behaviors than the more commonly used carbon dioxide anesthesia. If flies were to perceive cold as a noxious nociceptive stimulus, then this "chill coma" procedure would likely be disruptive to behavioral studies but is not. Furthermore, there is no evidence to suggest that larval sensation of "noxious cold" is aversive.

      The insect chill coma literature has investigated the effects of extreme cold on the physiology of nerves and muscles and the consensus view of the field is that the paralysis that results from cold is due to complex and combined action of direct effects of cold on muscle and on nerves (Overgaard and MacMillan, 2017). Electrophysiological measurements of muscles and neurons find that they are initially depolarized by cold, and after prolonged cold exposure they are unable to maintain potassium homeostasis and this eventually inhibits the firing of action potentials (Overgaard and MacMillan, 2017). The very small thermal capacitance of a Drosophila larva means that its entire neuromuscular system will be quickly exposed to the effect of cold in the behavioral assays under consideration here. It would seem impossible to disentangle the emergent properties of a complex combination of effects on physiology (including neuronal, glial, and muscle homeostasis) on any proposed sensorimotor transformation pathway.

      Nevertheless, the manuscript before us makes a courageous attempt at attempting this. A number of GAL4 drivers tested in the paper are found to affect parameters of contraction behavior (CT) in cold exposed larvae in silencing experiments. However, notably absent from all of the silencing experiments are measurements of larval mobility following cold exposure. Thus, it is not known from the study if these manipulations are truly protecting the larvae from paralysis following cold exposure, or if they are simply reducing the magnitude of the initial muscle contraction that occurs immediately following cold (ie reducing CT). The strongest effect of silencing occurs with the 19-12-GAL4 driver which targets Class III neurons (but is not completely specific to these cells).

      Optogenetic experiments for Class III neurons relying on the 19-12-GAL4 driver combined with a very strong optogenetic acuator (ChETA) show the CT behavior that was reported in prior studies. It should be noted that this actuator drives very strong activation, and other studies with milder optogenetic stimulation of Class III neurons have shown that these cells produce behavioral responses that resemble gentle touch responses (Tsubouchi et al 2012 and Yan et al 2013). As well, these neurons express mechanoreceptor ion channels such as NompC and Rpk that are required for gentle touch responses. The latter makes the reported Calcium responses to cold difficult to interpret in light of the fact that the strong muscle contractions driven by cold may actually be driving mechanosensory responses in these cells (ie through deformation of the mechanosensitive dendrites). Are the cIII calcium signals still observed in a preparation where cold induced muscle contractions are prevented?

      A major weakness of the study is that none of the second or third order neurons (that are downstream of CIII neurons) are found to trigger the CT behavioral responses even when strongly activated with the ChETA actuator (Figure 2 Supplement 2). These findings raise major concerns for this and prior studies and it does not support the hypothesis that the CIII neurons drive the CT behaviors.

      Later experiments in the paper that investigate strong CIII activation (with ChETA) in combination with other second and third order neurons does support the idea activating those neurons can facilitate body-wide muscle contractions. But many of the co-activated cells in question are either repeated in each abdominal neuromere or they project to cells that are found all along the ventral nerve cord, so it is therefore unsurprising that their activation would contribute to what appears to be a non-specific body-wide activation of muscles along the AP axis. Also, if these neurons are already downstream of the CIII neurons the logic of this coactivation approach is not particularly clear. A more convincing experiment would be to silence the different classes of cells in the context of the optogenetic activation of CIII neurons to test for a block of the effects, a set of experiments that is notably absent from the study.

      The authors argument that the co-activation studies support "a population code" for cold nociception is a very optimistic interpretation of a brute force optogenetics approach that ultimately results in an enhancement of a relatively non-specific body-wide muscle convulsion.

      We have responded extensively to reviewer 3’s comments in our provisional response to address the critiques regarding conceptual merit of this paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      We would like to thank the reviewers for providing constructive feedback on the manuscript. To address their concerns, we have performed additional experiments, analyzed the new data, and revised the manuscript.

      (1) The utility of a pipeline depends on the generalization properties.

      While the proposed pipeline seems to work for the data the authors acquired, it is unclear if this pipeline will actually generalize to novel data sets possibly recorded by a different microscope (e.g. different brand), or different imagining conditions (e.g. illumination or different imagining artifacts) or even to different brain regions or animal species, etc.

      The authors provide a 'black-box' approach that might work well for their particular data sets and image acquisition settings but it is left unclear how this pipeline is actually widely applicable to other conditions as such data is not provided.

      In my experience, without well-defined image pre-processing steps and without training on a wide range of image conditions pipelines typically require significant retraining, which in turn requires generating sufficient amounts of training data, partly defying the purpose of the pipeline.

      It is unclear from the manuscript, how well this pipeline will perform on novel data possibly recorded by a different lab or with a different microscope.

      To address the generalizability of our DL segmentation model, we have performed several validation experiments with deploying our model on out-of-distribution data that 1) had distinct channels  2) were acquired in different species (rat) with a different vascular fluorescent label and a different imaging protocol, and 3) were acquired on a different microscope and with a different vascular label. We first used our model to segment images (507x507um lateral FOV, 170-250 um axial range) from three C57BL/6 mice imaged on the same two-photon fluorescent microscope following the same imaging protocol. The vasculature was labelled by intravenous injection of the Texas Red dextran (70 kDa MW, Thermo Fisher Scientific Inc, Waltham MA), as in the current experiment. In lieu of the EYFP signal from pyramidal neurons that was present in the original data, we added Gaussian noise with a mean and standard deviation identical to the acquired vascular channel in the out-of-distribution dataset. Second, we applied our model to images (507x507um lateral FOV, 300-400 um axial range) from two Fischer rats that were injected with 2000-kDa Alexa680-dextran via a tail vein catheter. These rats were imaged on the same two-photon fluorescence microscope, but with Galvano scanners (instead of resonant scanners). As before, a second channel of Gaussian noise was added to simulate the missing EYFP signal. Finally, we segmented an image of vasculature from an ex-vivo cleared mouse brain (1665x1205x780 um) acquired on a light sheet fluorescence microscope (Miltenyi UltraMicroscope Blaze), with a Lectin-DyLight 649 labelling the vessel walls.  The Dice Score, Precision, Recall, Hausdorff 95%, and Mean surface distance were reported for segmentations of 2PFM data sets, following the generation of ground truth images by assisted manual segmentation in ilastik. Examples of the generated segmentation masks are presented in Supplementary figure 9 for visual comparison. We have described the image pre-processing steps/transforms before model inference in the revised Methods section. In general, should the segmentation results on a data set be deemed unsatisfactory, our model can be further fine-tuned on out-of-distribution data. Furthermore, the image analyses downstream from segmentation are applicable irrespective of the method utilized to arrive at a robust vascular segmentation.

      Author response table 1.

      Dataset performance comparison for UNETR

      (2) Some of the chosen analysis results seem to not fully match the shown data, or the visualization of the data is hard to interpret in the current form.

      We have updated the visualizations to make them more accessible and ensure close correspondence between tables and figures.

      (3) Additionally, some measures seem not fully adapted to the current situation (e.g. the efficiency measure does not consider possible sources or sinks). Thus, some additional analysis work might be required to account for this.

      Thank you for your comment. The efficiency metric was selected as it does not consider sources or sinks. We do agree that accounting for vessel subtypes in the analysis (thus classifying larger vessels as either suppliers/sources or drainers/sinks) would be very useful: notwithstanding, this classification is extremely laborious, as we have noted in our prior work1 . We are therefore leveraging machine learning in a parallel project to afford vessel classification by type. Notwithstanding, the source/sink analysis based on in vivo 2PFM data is confounded by the small FOV.

      (4) The authors apply their method to in vivo data. However, there are some weaknesses in the design that make it hard to accept many of the conclusions and even to see that the method could yield much useful data with this type of application. Primarily, the acquisition of a large volume of tissue is very slow. In order to obtain a network of vascular activity, large volumes are imaged with high resolution. However, the volumes are scanned once every 42 seconds following stimulation. Most vascular responses to neuronal activation have come and gone in 42 seconds so each vessel segment is only being sampled at a single time point in the vascular response. So all of the data on diameter changes are impossible to compare since some vessels are sampled during the initial phase of the vascular response, some during the decay, and many probably after it has already returned to baseline. The authors attempt to overcome this by alternating the direction of the scan (from surface to deep and vice versa). But this only provides two sample points along the vascular response curve and so the problem still remains.

      We thank the Reviewer for bringing up this important point. Although vessels can show relatively rapid responses to perturbation, vascular responses to photostimulation of ChannelRhodopsin-2 in neighbouring neurons are long-lasting: they do not come and go in 42 seconds. To demonstrate this point, we acquired higher temporal-resolution images of smaller volumes of tissue over 5 minutes preceding and 5 minutes following the 5-s photoactivation with the original photostimulation parameters. The imaging protocol was different in that we utilized a piezoelectric motor, a smaller field of view (512um x (80-128)um x (34-73)um), and only 3x frame averaging, resulting in a temporal resolution of 1.57-3.17 seconds per frame. This acquisition was repeated at different cortical depths in three Thy1-ChR2 mice and the vascular radii were estimated using our presented pipeline. Significantly responding vessels here were selected via an F-test of radius estimates before vs. after stimulation. LOESS fits to the time-dependent radius of significantly responding vessels are shown in Supplementary Figure 5. Vessels shorter than 20 um in length were excluded from the analysis so as to focus on vessel segments where averaging the vascular radius over many vertices was possible. A video of one of the acquisitions is shown along with the timecourses of select vessels’ calibre changes in Author response image 1. The vascular calibre changes following photostimulation persisted for several minutes, consistent with earlier observations by us and others2–5. These small-volume acquisitions demonstrated that dilations were repeatedly longer than the 42 seconds (i.e. our original temporal resolution).

      Our temporal sampling was chosen to permit a large field of view acquisition while still being well within the span of the vascular response to look at larger scale vascular coordination that has not previously been studied. The pipeline readily adapts to smaller fields of view at a finer temporal sampling, though such an acquisition precludes the study of the response coordination across hundreds of vessels. While a greater number of baseline frames would help with the baseline variability estimation, maintaining animals under anesthesia during prolonged imaging is exceedingly difficult, precluding us from extending our total acquisition time.

      Author response image 1.

      Estimated vascular radius at each timepoint for select vessels from the imaging stack shown in the following video: https://flip.com/s/kB1eTwYzwMJE

      (5) A second problem is the use of optogenetic stimulation to activate the tissue. First, it has been shown that blue light itself can increase blood flow (Rungta et al 2017). The authors note the concern about temperature increases but that is not the same issue. The discussion mentions that non-transgenic mice were used to control for this with "data not shown". This is very important data given these earlier reports that have found such effects and so should be included.

      We have updated the manuscript to incorporate the data on volumetric scanning in (nontransgenic) C57BL/6 mice undergoing blue light stimulation, with identical parameters as those used in Thy-ChR2 mice (Supplementary Figure 8). As before, responders were identified as vessels that following blue light stimulation showed a radius change greater than 2 standard deviations of their baseline radius standard deviation: their estimated radii changes are shown in Supplementary Figure 8.  There was no statistical difference between the radii distributions of any of the photostimulation conditions and pre-photostimulation baseline.

      (6) Secondly, there doesn't seem to be any monitoring of neural activity following the photo-stimulation. The authors repeatedly mention "activated" neurons and claim that vessel properties change based on distance from "activated" neurons. But I can't find anything to suggest that they know which neurons were active versus just labeled. Third, the stimulation laser is focused at a single depth plane. Since it is single-photon excitation, there is likely a large volume of activated neurons. But there is no way of knowing the spatial arrangement of neural activity and so again, including this as a factor in the analysis of vascular responses seems unjustified.

      Given the high fidelity of Channel-Rhodpsin2 activation with blue light photostimulation found by us and others3, we assume that all labeled neurons within the volume of photostimulation are being activated. Depending on their respective connectivities, their postsynaptic neurons (whether or not they are labeled) may also get activated. We therefore agree with the reviewer that the spatial distribution of neuronal activation is not well defined. The manuscript has been revised to update the terminology from activated to labeled neurons and stress in the Discussion that the motivation for assessing the distance to the closest labeled neuron as one of our metrics is purely to demonstrate the possibility of linking vascular response to activations in their neighbouring neurons and including morphological metrics in the computational pipeline.

      (7) The study could also benefit from more clear illustration of the quality of the model's output. It is hard to tell from static images of 3-D volumes how accurate the vessel segmentation is. Perhaps some videos going through the volume with the masks overlaid would provide some clarity. Also, a comparison to commercial vessel segmentation programs would be useful in addition to benchmarking to the ground truth manual data.

      We generated a video demonstrating the deep-learning model outputs and have made the video available here: https://flip.com/s/_XBs4yVxisNs. We aimed to develop an open-source method for the research community as the vast majority of groups do not have access to commercial software for vessel segmentation.

      (8) Another useful metric for the model's success would be the reproducibility of the vessel responses. Seeing such a large number of vessels showing constrictions raises some flags and so showing that the model pulled out the same response from the same vessels across multiple repetitions would make such data easier to accept.

      We have generated a figure demonstrating the repeatability of the vascular responses following photostimulation in a volume and presented them next to the corresponding raw acquisitions for visual inspection (Supplementary figure 6). It is important to note that there is a significant biological variability in vessels’ responses to repeated stimulation, as described previously 3,6: a well-performing model should be able to quantify biological heterogeneity as it of itself may be of interest. Constrictions have been reported in the literature by our group and others 1,2,4,5,7, though their prevalence has not been systematically studied to date. Concerning the reproducibility of our analysis, we have demonstrated model reproducibility (as a metric of its success) on a dataset where vessels visually appeared to dilate consistently following 452 nm light stimulation: these results are now presented in Supplementary Figure 6 of the revised Manuscript. We thus observed that the model repeatedly detected the vessels - that appeared to dilate on visual inspections - as dilating. Examples of vessels constricting repeatedly were also examined and maximal intensity projections of the vessel before and after photostimulation inspected, confirming their repeated constriction (Author response image 2).

      It is also worth noting that while the presence of the response (defined as change above 2 standard deviations of the radius across baseline frames) was infrequent (2107 vessels responded at least once, out of a total of 10,552 unique vessels imaged), the direction of the response was highly consistent across trials. Given twice the baseline variability as the threshold for response, of the vessels that responded more than once, 31.7% dilated on some trials while constricting on others; 41.1% dilated on each trial; and 27.2% constricted on each trial. (Note that some trials use 1.1 vs. 4.3 mW/mm2 and some have opposite scanning directions).

      Author response image 2.

      Sample capillaries constrictions from maximum intensity projections at repeated time points following optogenetic stimulation. Baseline (pre-stimulation) image is shown on the left and the post-stimulation image, is on the right, with the estimated radius changes listed to the left.

      (9) A number of findings are questionable, at least in part due to these design properties. There are unrealistically large dilations and constrictions indicated. These are likely due to artifacts of the automated platform. Inspection of these results by eye would help understand what is going on.

      Some of the dilations were indeed large in magnitude. We present select examples of large dilations and constrictions ranging in magnitude from 2.08 to 10.80 um for visual inspection (Author response image 3) (for reference, average, across vessel and stimuli, the magnitude of radius changes were 0.32 +/- 0.54 um). Diameter changes above 5 um were visually inspected.

      Author response image 3.

      Additional views of diameter change in maximum intensity projections ranging in magnitude from 2.08 um to 10.80 um.

      (10) In Figure 6, there doesn't seem to be much correlation between vessels with large baseline level changes and vessels with large stimulus-evoked changes. It would be expected that large arteries would have a lot of variability in both conditions and veins much less. There is also not much within-vessel consistency. For instance, the third row shows what looks like a surface vessel constricting to stimulation but a branch coming off of it dilating - this seems biologically unrealistic.

      We now plot photostimulation-elicited vessel-wise radius changes vs. their corresponding baseline radius standard deviations (Author response image 4). The Pearson correlation coefficient between the baseline standard deviation and the radius change was 0.08 (p<1e-5) for  552nm 4.3 mW/mm^2 stimulation,  -0.08 (p<1e-5) for  458nm 1.1 mW/mm^2 stimulation, and -0.04 (p<1e-5) for  458nm 4.3 mW/mm^2 stimulation. For non-control (i.e. blue) photostimulation conditions, the change in the radius is thus negatively correlated to the vessel’s baseline radius standard deviation: this small negative correlation indicates that there is little correlation between vessel radius change and the baseline variability in the vessel radius. Classification of vessels by type (arteries vs. veins) is needed before we can comment on differences between these vascular components. The between-vessel (i.e. between parent vessels and their daughter branches separated by branch points) consistency is explicitly evaluated by the assortativity metric, in Figure 9: vessels do somewhat tend to react similarly to their downstream branches: we observed a mean assortativity of 0.4. As for the instance of a surface vessel constricting while a downstream vessel dilates, it is important to remember that the 2PFM FOV restricts us to imaging a very small portion of the cortical microvascular network: one (among many) daughter vessels showing changes in the opposite direction to the parent vessel is not violating the conservation of mass; in addition, mural cells on adjacent branches can respond differently.

      Author response image 4.

      Vessel radius change elicited by photostimulation vs. baseline radius standard deviation across all vessels. The threshold level for response identification is shown as the black line.

      (11) As mentioned, the large proportion of constricting capillaries is not something found in the literature. Do these happen at a certain time point following the stimulation? Did the same vessel segments show dilation at times and constriction at other times? In fact, the overall proportion of dilators and constrictors is not given. Are they spatially clustered? The assortativity result implies that there is some clustering, and the theory of blood stealing by active tissue from inactive tissue is cited. However, this theory would imply a region where virtually all vessels are dilating and another region away from the active tissue with constrictions. Was anything that dramatic seen?

      The kinetics of the vascular responses are not accessible via the current imaging protocol and acquired data; however, this computational pipeline can readily be adapted to test hypotheses surrounding the temporal evolution of the vascular responses, as shown in Supplementary Figure 2 (with higher temporal-resolution data). Some vessels dilate at some time points and constrict at others as shown in Supplementary Figure 2. As listed in Table 2, 4.4% of all vessels constrict and 7.5% dilate for 452nm stimulation at 4.3 mW/mm^2. There was no obvious spatial clustering of dilators or constrictors: we expect such spatial patterns to be more common with different modes of stimulation and/or in the presence of pathology. The assortativity peaked at 0.4 (quite far from 1 where each vessel’s response exactly matches that of its neighbour).

      (12) Why were nearly all vessels > 5um diameter not responding >2SD above baseline? Did they have highly variable baselines or small responses? Usually, bigger vessels respond strongly to local neural activity.

      In Author response image 5, we now present the stimulation-induced radius changes vs. baseline radius variability across vessels with a radius greater than 5 um. The Pearson correlation between the radius change and the baseline radius standard deviation across time was low: r=0.05 (p=0.5) for  552nm 4.3 mW/mm^2 stimulation,  r=-0.27 (p<1e-5) for  458nm 1.1 mW/mm^2 stimulation, and r=-0.31 (p<1e-5) for 458nm 4.3 mW/mm^2 stimulation. These results demonstrate that the changes following optogenetic stimulation are lower than twice the baseline standard deviation across time for most of these vessels. The pulsatility of arteries results in significant variability in their baseline radius8; in turn, literature to date suggests very limited radius changes in veins. Both of these effects could contribute to the radius response not being detected in many larger vessels.

      Author response image 5.

      The change in the vessel radius elicited by photostimulation vs. baseline vessel radius standard deviation in vessels with a baseline radius greater than 5 um. The threshold level for response identification is shown as the black line.

      References

      (1) Mester JR, Rozak MW, Dorr A, Goubran M, Sled JG, Stefanovic B. Network response of brain microvasculature to neuronal stimulation. NeuroImage. 2024;287:120512. doi:10.1016/j.neuroimage.2024.120512

      (2) Alarcon-Martinez L, Villafranca-Baughman D, Quintero H, et al. Interpericyte tunnelling nanotubes regulate neurovascular coupling. Nature. 2020;kir 2.1(7823):91-95. doi:10.1038/s41586-020-2589-x

      (3) Mester JR, Bazzigaluppi P, Weisspapir I, et al. In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2. NeuroImage. 2019;192:135-144. doi:10.1016/j.neuroimage.2019.01.036

      (4) O’Herron PJ, Hartmann DA, Xie K, Kara P, Shih AY. 3D optogenetic control of arteriole diameter in vivo. Nelson MT, Calabrese RL, Nelson MT, Devor A, Rungta R, eds. eLife. 2022;11:e72802. doi:10.7554/eLife.72802

      (5) Hartmann DA, Berthiaume AA, Grant RI, et al. Brain capillary pericytes exert a substantial but slow influence on blood flow. Nat Neurosci. Published online February 18, 2021:1-13. doi:10.1038/s41593-020-00793-2

      (6) Mester JR, Bazzigaluppi P, Dorr A, et al. Attenuation of tonic inhibition prevents chronic neurovascular impairments in a Thy1-ChR2 mouse model of repeated, mild traumatic brain injury. Theranostics. 2021;11(16):7685-7699. doi:10.7150/thno.60190

      (7) Hall CN, Reynell C, Gesslein B, et al. Capillary pericytes regulate cerebral blood flow in health and disease. Nature. 2014;508(7494):55-60. doi:10.1038/nature13165

      (8) Meng G, Zhong J, Zhang Q, et al. Ultrafast two-photon fluorescence imaging of cerebral blood circulation in the mouse brain in vivo. Proc Natl Acad Sci U S A. 2022;119(23):e2117346119. doi:10.1073/pnas.2117346119

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 207: a superfluous '.' before the references.

      This has been corrected.

      Line 273 ff:

      While the metrics are described in mathematical terms which is very useful, the appearing distances (d) and mathematical symbols are not. While mostly intuitively clear, precise definitions of all symbols introduced should be given to avoid ambiguities.

      The description has been clarified.

      This applies to all formulas appearing in the manuscript and the authors might want to check them carefully.

      We have updated them wherever needed.

      The mean surface distance seems not to reflect the mean MINIMAL surface distance but just the overall mean surface distance. Or a different definition of the appearing symbols is used, highlighting the need for introducing every mathematical symbol carefully.

      The definitions have been updated for clarity, specifying the distinction between Hausdorff 95% distance and mean surface distance.

      Line 284:

      It is unclear to me why center-line detection was performed in MATLAB and not Python. Using multiple languages/software packages and in addition relying on one that is not freely available/open source makes this tool much less attractive as a real open-source tool for the community. The authors stress in the manuscript abstract that their pipeline is an open and accessible tool, the use of MATLAB defies this logic to some extent in my view.

      Centerline detection for large volumetric data is available in Python, see e.g. Scipy packages as well for large data sets via ClearMap or VesselVio.

      We tested the centerline detection in Python, scipy (1.9.3) and Matlab. We found that the Matlab implementation performed better due to its inclusion of a branch length parameter for the identification of terminal branches, which greatly reduced the number of false branches; the Python implementation does not include this feature (in any version) and its output had many more such “hair” artifacts. Clearmap skeletonization uses an algorithm by Palagyi & Kuba(1999) to thin segmentation masks, which does not include hair removal. Vesselvio uses a parallelized version of the scipy implementation of Lee et al. (1994) algorithm which does not do hair removal based on a terminal branch length filter; instead, Vesselvio performs a threshold-based hair removal that is frequently overly aggressive (it removes true positive vessel branches), as highlighted by the authors.

      Moreover, the authors mention that robust center-line detection was critical. In my view, robust center-line extraction typically requires some additional processing of the binarized data, e.g. using a binary smoothing step. Various binary smoothers are available in the literature and as Python code.

      Indeed, binary smoothing was performed: background “holes” located within the vasculature were filled; the masks were dilated (3x) and then eroded to the centreline. Scipy’s binary closing function smoothes the morphology of binary segmentation masks by dilating and then eroding the segmentation masks (as a part of the selected skeletonization algorithm).

      Line 303:

      'RBC' is not defined (red blood cells?)

      This has been updated.

      Line 398:

      pPhotonsimulation -> Photostimulation

      This has been corrected.

      Line 400 ff: Efficiency:

      I am not sure how useful the measure really is without any information about the 'sources' (i.e. arteries) and sinks (i.e. veins) as blood does not need to be moved between any two arbitrary nodes.

      While blood reversals are observed, blood is typically not moved arbitrarily between two arbitrary nodes in capillary networks.

      We agree with the reviewer that classifying the vessels by type is important and are currently working on deep learning-based algorithms for the classification of microvasculature into arterioles and venules for future work.

      In addition, short paths between two nodes with low resistivity will potentially dominate the sum and the authors excluded vessels 10um and above. This threshold seems arbitrary.

      The 10-um diameter threshold was not applied in the computation of the network metrics. The 10-um thresholding was restricted to “capillary” identification in Figure 8: the 10-um cutoff for referring to a vessel as a capillary has long been applied in the literature [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11].

      Figure 3:

      It's unclear what the units are for the Mean Surface and Harsdorf Distances (pixel or um?).

      The units have now been specified (um).

      Figure 4:

      The binarized data, and particularly the crops are difficult to interpret in black and white. It would be much more useful to present the segmentation results in a way that is interpretable (e.g. improving the rendering of the 3d information, particularly in the crops by using shadows or color codes for depth, etc).

      We have updated these visualizations and shaded them based on cortical depth.

      Panel C indicates that the illastik is performing badly due to changes in imagining conditions (much higher background level). As pointed out before, in my view, a reasonable pipeline should start by removing and standardizing background levels as well as dynamic ranges and possibly other artifacts before performing a more detailed analysis. This would also make the pipeline more robust against data from other microscopes etc as only a few preprocessing parameters might need to be adjusted.

      I wonder whether after such a pre-processing step, UNET / UNETR would still perform in a way that was superior to ilastik, as ground truth data was generated with the aid of illastiks initially.

      The Ilastik model is based on semi-automatically generated foreground labels in small batches. We had to break it up into small groups during manual labelling as larger groups were not able to run due to the computational limits of Ilastik. Ilastik is typically trained in an iterative fashion on a few patches at a time because it takes 2-3 hours per patch to train and the resulting model does not generalize on the remaining patches or out-of-distribution data - even with image pre-processing steps. On the reviewer's comment, we did try inputting normalized images into Ilastik, but this did not improve its results. UNET and UNETR inputs have been normalized for signal intensities.

      Typical pre-processing/standard computer vision techniques with parameter tuning do not generalize on out-of-distribution data with different image characteristics, motivating the shift to DL-based approaches.

      Figure 5:

      This is a validation figure that might be better shown in an appendix or as a supplement.

      Since this is a methodological paper, we think it is important to highlight the validation of the proposed method.

      Line 476:

      It's surprising that the number of vessel segments almost doubles when taking the union. Is the number of RBC plugs expected to be so high?

      The etiology of discontinuities includes, but is not limited to, RBC plugs; we expect discontinuities to arise also from a very short pixel dwell time (0.067us) of the resonant scanning and have indeed observed apparent vessel discontinuities on resonant scanning that are not present with Galvano scanning using a pixel dwell time of 2us.

      Section 4.4 / 4.5 :

      The analysis in these sections provides mostly tables with numbers that are more difficult to read and hides possible interesting structures in the distribution of the various measures/quantities. For example, why is 5um a good choice to discriminate between small and large vessels, why not resolve this data more precisely via scatter plots?

      Some distributions are shown in the appendix and could be moved to the main analysis.

      Generally, visualizing the data and providing more detailed insights into the results would make this manuscript more interesting for the general reader.

      The radius of vessel segments drops off after 5.0 um, as shown in Supplementary Figure 4A. The 10-um diameter thresholding is based on prior literature [1], [12], [13], [14], [15], [16], [17], [18], [19] and is used to segregate different vessel types in a conservative manner. The smallest capillaries are expected to have pericytes on their vessel walls whereas arteries are expected to have smooth muscle cells on their vessel walls. These differences in mural cells also may lead to differences in respective vessels’ reactivity.

      The data summarized in Tables 1 and 2 are shown as scatter plots in Figures 8, Supplementary Fig 4 and Supplementary Fig 5.

      Line 556:

      The authors deem a certain change in radius as the relevant measure for responding vessels. They deem a vessel responding if it dilates by twice the std deviation in the radius.

      Based on this measure they find that large vessels rarely respond.

      However, I think this analysis might obscure some interesting effects:

      (1) The standard deviation of the radius depends on the correct estimation of the center point. Given the limited spatial resolution the center point (voxel) obtained from the binarization and skeletonization might not lie in the actual center of the vessel. This effect will be stronger for larger vessels. Center point coordinates should thus be corrected to minimize the std in radius.

      (2) Larger vessels will not necessarily have a perfectly circular shape, and thus the std measure is not necessarily a good measure of 'uncertainty' of estimating the actual radius.

      (3) The above reasons possibly contribute to the fact that from Figure 6 it seems vessels with larger radii have higher std in general (as indicated above some more detailed visualization of the data instead of plain tables could reveal such effects better, e.g. scatter radius vs std). This higher std is making it harder to detect changes in larger vessels. However, with respect to the blood flow, the critical factor is the cross-section of the vessel that scales with the radius squared. Thus, a fixed change in radius for a vessel (say 1um) will induce a larger increase in the flow rate in larger vessels as the change in cross-section is also proportional to the radius of the vessel.

      Thus, larger vessels to be deemed responders should probably have lower thresholds, thresholds should be taken on the cross-section change, or at least thresholds should not be higher for larger vessels as it is the case now using the higher std.

      (1) The radius estimate does not depend on the precise placement of the center point as the radius is not being estimated by the distance from the center point to the boundary of the vessel. Instead, our strategy is to estimate the cross-sectional area (A) of the vessel by the Riemann sum of the sectors with the apex at the center point; the radius is then quoted as sqrt(A/pi) (Supplementary figure 3B). Thus, estimated vessel radius estimates in each cross-sectional plane are then averaged across the cross-sectional planes placed every ~1um along the vessel length. The uncertainty in the cross-sectional plane’s vessel radius, the uncertainty in the vessel radius (upon averaging the cross-sectional planes), and the uncertainty in the radius estimate across repeated measures of a state (i.e. across different samples of the baseline vs, post-photostimulation states) are all reported, and the last one used to define responding vessels.

      To demonstrate the insensitivity to the precise placement of the vessel’s centrepoint, we have jittered the centerline in the perpendicular plane to the vessel tangent plane at each point along the vessel and then estimated the mean radius in 71 cross-sectional planes of larger vessels (mean radius > 5 um). The percent difference in the estimated radius at our selected vessel centrepoints vs. the jittered centrepoints is plotted above. The percent difference in the mean radius estimated was 0.64±3.44%  with 2.45±0.30 um centerpoint jittering. (In contrast, photostimulation was estimated to elicit an average 25.4±18.1% change in the magnitude of the radius of larger vessels, i.e. those with a baseline radius >5um.)

      (2) Indeed, the cross-sectional areas of either large or small vessels are not circles. Consequently, we are placing the vessel boundary, following other published work[20], at the minimum of the signal intensity gradients computed along thirty-six spokes emanating from the centrepoint (cf Figure 2H,K). The cross-sectional area of the vessel in the said cross-sectional plane is then estimated by summing the areas of the sectors flanked by neighbouring spokes. We do not make an assumption about the cross-sectional area being circular. We report radii of circles with the equivalent area as that of the cross-sectional areas merely for ease of communication (as most of the literature to date reports vessel radii, rather than vessel cross-sectional areas.)

      To demonstrate the robustness of this approach, we show the sensitivity of vessel-wise radius estimate on the number of spokes used to estimate the radius in Supplementary Figure 3a. The radius estimate converges after 20 spokes have been used for estimation. Our pipeline utilizes 36 spokes and then excludes minima that lie over 2 STD away from the mean radius estimate across those 36 spokes. With 36 spokes, the vesselwise mean radius estimation was within 0.24±0.62% of the mean of radius estimates using 40-60 spokes.

      (3) Across-baseline sample uncertainty in vessel radius is not dependent on baseline vessel caliber (i.e. this uncertainty is not larger in larger vessels).

      Supplementary Figure 5 shows vessel radius changes for large vessels without a threshold defining responding or non-responding vessels. To explore the dependence of the outcomes on the threshold used to identify the responding vessels, we have explored an alternative strategy, whereby responding small vessels are identified as those vessels that show a post-photostimulation (vs. baseline) radius change of more than 10%. These data are now plotted in Supplementary Figure 10, for capillaries which is in agreement with Figure 8. These points are now also discussed in the Discussion section of the revised manuscript:

      “Additionally, alternative definitions of responding vessels may be useful depending on the end goal of a study (e.g., this could mean selecting a threshold for the radius change based on a percentage change from the baseline level).”

      Section 4.5.1

      Why is the distance to the next neuron a good measure here? If two or more neurons are just a bit further away there will be twice or multiple times the 'load' while the measure would only indicate the distance to the shortest neuron. I wonder how the results change if those 'ensemble' effects are taken into account.

      In this direction, looking for network-level effects with respect to the full spatial organization of the neurons would be very interesting to look at.

      We agree with the review that this question is interesting; however, it is not addressable using present data: activated neuronal firing will have effects on their postsynaptic neighbors, yet we have no means of measuring the spread of activation using the current experimental model.

      Figure 8

      The scatter plots shown are only partly described (e.g. what's the line with error bars in C, why does it only appear for the high-intensity stimulation?).

      Quadratic polynomial fit is shown only in C as the significant response was observed only for this condition, i.e. for the higher intensity blue photostimulation.

      From the scatter plots as shown it is not clear to me why dilations happen on average further away. This might be a density effect not well visible in this representation. The data does not seem to show a clear relationship between neuron distance and Delta R.

      Particularly in the right panel (high stimulation) there seems to be a similar number of close by neurons responding in both directions, but possibly a few more contracting at larger distances?

      So, the overall effect does not seem as 'simple' as suggested in the title of section 4.5.1 in my view, but rather more cells start to contract at larger distances while there seems to be a more intricate balance nearby.

      A more thorough analysis and visualization of the densities etc. might be needed to clarify this point.

      The language has been revised to:

      458-nm photostimulation resulted in a mix of constrictions and dilations with 44.1% of significantly responding vessels within 10 um of a labelled pyramidal neuron constricting and 55.1% dilating, while 53.3% of vessels further than 30 um constricted and 46.7% dilated. The cutoff distances from the closest labelled neuron were based on estimates of cerebral metabolic rate of oxygen consumption that showed a steep gradient in oxygen consumption with distance from arteries, CMRO2 being halved by 30 μm away

      We added a probability density plot for significant constrictors and dilators to Figure 8 and Supplementary Figure 5.

      Figure 8 Panel D / Section 4.5.2

      This is a very interesting result in my view found in this study.

      I am unclear how to interpret the effect. The authors state that dilators tend to be closer to the surface. Looking at the scatter plot (without real density information except the alpha value) it seems again the number of responders in both directions is about the same, but in deeper regions the contraction is just larger? This would be different, than how the authors interpret the data. It is unclear from the provided analysis/plots what is actually the case.

      We added a probability density function plot of the constrictors and dilators, which shows a greater incidence of constrictions (vs. dilations). The text of the paper was then clarified to include the proportion of significant constrictors/ dilators closer than 10 um vs. further than 30 um away from the closest labeled neuron.

      For the analyses above involving $Delta R$ I recommend also look how those results change when looking at changes in cross section instead, i.e. taking into account the actual vessel radius as well as discussed above.

      It would be interesting to speculate here or in the discussion on a reason why vessels in deeper regions might need to contract more?

      Unaddressed is the question if e.g. contraction in a vessel for small stimulation is predictive of contractions for larger stimulation or any other relationships?

      Thank you for your comment. Given its hierarchical organization and high within-vessel response heterogeneity, we believe that the vasculature is best analyzed as a network. Our radius estimates come from averaged cross-sectional estimates allowing us to examine heterogeneity within individual vessel segments.

      The discussion has been updated to include reasons as to why deeper vessels may contract more:

      “As the blue light stimulation power increased, the mean depth of both constricting and dilating vessels increased, likely resulting from higher intensity light reaching ChR2-expressing neurons deeper in the tissue and exciting superficial neurons (and thus their postsynaptic neurons) to a greater level [21], [22]. The blue light would be expected to excite a lower number of neurons farther from the cortical surface at lower powers.”

      Also, how consistent are contractions/dilations observed at a particular vessel etc.

      To look at the consistency of a particular vessel's response to the 1.1 or 4.3 mW/mm^2 blue light photostimulation, we categorized all significant responses as constrictions or dilations, defining a responding vessel as that showing a change that is either > 2 x baseline vessel radius variability or >10% of the vessel’s mean baseline radius.

      Given twice the baseline variability as the threshold for response, of the vessels that responded more than once, 31.7% dilated on some trials while constricting on others; 41.1% dilated on each trial; and 27.2% constricted on each trial. (Note that some trials use 1.1 vs. 4.3 mW/mm2 and some have opposite scanning directions).

      Section 4.5.3

      The results in assortativity are interesting. It would be interesting to look at how the increase in assortativity is mediated. For, example, is this in localized changes in some parts of the graph as visible in A or are there other trends? Do certain sub-graphs that systematically change their radius have certain properties (e.g. do activated neurons cluster there) or are these effects related to some hotspots that also show a coordinated change in control conditions (the assortativity seems not zero there)?

      I already discussed if the efficiency measure is necessarily the best measure to use here without taking into account 'sources' and 'sinks'.

      We plan to address this in future work once we have successfully trained models for the classification of vessels into arteries, veins, and capillaries. Capillaries will be classified based on their branch order from parent arteries to specify where in the network changes are occurring.

      Figure 9

      It's unclear to me why the Ohm symbol needs to be bold?

      It is not bolded (just the font’s appearance).

      Line 707:

      "458-nm photostimulation caused capillaries to dilate when pyramidal neurons were close, and constrict when they were further away."

      In my view, this interpretation is too simple, given the discussion above. A more detailed analysis could clarify this point.

      The discussion on this point has been revised to:

      458-nm photostimulation resulted in a mix of constrictions and dilations, with 44.1% of significantly responding vessels within 10 μm of a labelled pyramidal neuron constricting, and 55.1% dilating; while 53.3% of vessels further than 30 μm constricted and 46.7% dilated. The cutoff distances from the closest labelled neuron were based on estimates of cerebral metabolic rate of oxygen consumption that showed a steep gradient in oxygen consumption with distance from arteries, CMRO2 being halved by 30 μm away [23].

      Line 740:

      "The network efficiency here can be thought of as paralleling mean transit time, i.e., the time it takes blood to traverse the capillary network from the arteries to the veins".

      The network efficiency as defined by the authors seems not to rely on artery/vein information and thus this interpretation is not fully correct in my view.

      The authors might want to reconsider this measure for one that accounts for sources and sinks, if they like to interpret their results as in this line.

      Yes, the efficiency described does not account for sources and sinks. It estimates the resistivity of capillaries, as a proxy for the ease of moving through the observed capillary nexus. Looking at the efficiency metric from graph theory does not require knowledge of the direction of blood flow, and can comment on the resistivity changes across capillary networks.

      For future work, we are investigating methods of classifying vessels as arteries, capillaries, or veins. This type of analysis will provide more detailed information on paths between arteries and veins; it will not provide insight into large-scale network-wide modifications, as those require larger fields of view. 

      Line 754 Pipeline Limitations and Adaptability

      I think the additional 'problem' of generating new training data for novel data sets or data from other microscopes etc should be addressed or the pipeline tested on such data sets.

      Generating training data is typically the biggest time investment when adapting pipelines.

      The generalization properties of the current pipeline are not discussed (e.g. performance on a different microscope / different brain area / different species etc.).

      The public response to reviews has been updated with out-of-distribution data from other imaging protocols, microscopes, and species showing generalizability. These results have also been added to the paper as Supplementary Table 4, and Figure 6. The performance of our pipeline on these out-of-distribution data is now discussed in the updated Discussion section.

      Line 810

      Code availability should be coupled with the publication of this paper as it seems the main contribution. I don't see how the code can be made available after publication only. It should be directly available once the manuscript is published and it could help to make it available to the reviewers before that. It can be updated later of course.

      The code is being made available.

      Reviewer #2 (Recommendations For The Authors):

      This analytical pipeline could be quite useful but it needs to be better demonstrated. If faster volumetric imaging is not possible, perhaps using it over a small volume would still demonstrate its utility at a smaller but more believable scale.

      The higher temporal resolution scans (over smaller tissue volumes) have now been performed and the results of applying our pipeline to these data are summarized in Supplementary Figure 2.

      Using sensory stimuli for neuronal activation might be a better idea than optogenetic stimulation. It isn't necessary but it would avoid the blue light issue.

      The pipeline is readily applicable for analysis of vasoreactivity following different perturbers; however, the robustness of vessels’ response is higher with blue light photostimulation of ChR2 than with sensory stimuli [24]. Notwithstanding, an example of the vascular response to electrical stimulation of the contralateral forepaw is now included in Supplementary Figure 2.

      This tool could be quite useful even without neural activity mapping. It obviously makes it even more powerful, but again, the utility could be demonstrated with just vascular data or even anatomical neuronal data without function.

      We agree with both points, and have emphasized them in the revised discussion section.

      Line 559 says the average capillary diameter change was 1.04 um. The next sentence and the table below all have different values so this is unclear.

      The wording was updated to make this clearer.

      Line 584 - should 458 be 552?

      458 is correct.

      Figure 1 - the schematic doesn't seem right - the 650 LPF with the notches is positioned to pass short light and reflect long wavelengths and the notch bands.

      The figure has been updated to reflect this. The original layout was done for compactness.

      References

      (1) D. A. Hartmann, V. Coelho-Santos, and A. Y. Shih, “Pericyte Control of Blood Flow Across Microvascular Zones in the Central Nervous System,” Annu. Rev. Physiol., vol. 84, no. Volume 84, 2022, pp. 331–354, Feb. 2022, doi: 10.1146/annurev-physiol-061121-040127.

      (2) J. Batista, “An adaptive gradient-based boundary detector for MRI images of the brain,” in 7th International Conference on Image Processing and its Applications, Manchester, UK: IEE, 1999, pp. 440–444. doi: 10.1049/cp:19990360.

      (3) Y. Le, X. Xu, L. Zha, W. Zhao, and Y. Zhu, “Tumor boundary detection in ultrasound imagery using multi-scale generalized gradient vector flow,” J. Med. Ultrason., vol. 42, no. 1, pp. 25–38, Jan. 2015, doi: 10.1007/s10396-014-0559-3.

      (4) X. Ren, “Multi-scale Improves Boundary Detection in Natural Images,” in Computer Vision – ECCV 2008, D. Forsyth, P. Torr, and A. Zisserman, Eds., Berlin, Heidelberg: Springer, 2008, pp. 533–545. doi: 10.1007/978-3-540-88690-7_40.

      (5) C. Grigorescu, N. Petkov, and M. A. Westenberg, “Contour and boundary detection improved by surround suppression of texture edges,” Image Vis. Comput., vol. 22, no. 8, pp. 609–622, Aug. 2004, doi: 10.1016/j.imavis.2003.12.004.

      (6) J. Tang and S. T. Acton, “Vessel Boundary Tracking for Intravital Microscopy Via Multiscale Gradient Vector Flow Snakes,” IEEE Trans. Biomed. Eng., vol. 51, no. 2, pp. 316–324, Feb. 2004, doi: 10.1109/TBME.2003.820374.

      (7) J. Merkow, A. Marsden, D. Kriegman, and Z. Tu, “Dense Volume-to-Volume Vascular Boundary Detection,” in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016, S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, and W. Wells, Eds., Cham: Springer International Publishing, 2016, pp. 371–379. doi: 10.1007/978-3-319-46726-9_43.

      (8) F. Orujov, R. Maskeliūnas, R. Damaševičius, and W. Wei, “Fuzzy based image edge detection algorithm for blood vessel detection in retinal images,” Appl. Soft Comput., vol. 94, p. 106452, Sep. 2020, doi: 10.1016/j.asoc.2020.106452.

      (9) M. E. Martinez-Perez, A. D. Hughes, S. A. Thom, A. A. Bharath, and K. H. Parker, “Segmentation of blood vessels from red-free and fluorescein retinal images,” Med. Image Anal., vol. 11, no. 1, pp. 47–61, Feb. 2007, doi: 10.1016/j.media.2006.11.004.

      (10) A. M. Mendonca and A. Campilho, “Segmentation of retinal blood vessels by combining the detection of centerlines and morphological reconstruction,” IEEE Trans. Med. Imaging, vol. 25, no. 9, pp. 1200–1213, Sep. 2006, doi: 10.1109/TMI.2006.879955.

      (11) A. F. Frangi, W. J. Niessen, K. L. Vincken, and M. A. Viergever, “Multiscale vessel enhancement filtering,” in Medical Image Computing and Computer-Assisted Intervention — MICCAI’98, W. M. Wells, A. Colchester, and S. Delp, Eds., Berlin, Heidelberg: Springer, 1998, pp. 130–137. doi: 10.1007/BFb0056195.

      (12) K. Bisht et al., “Capillary-associated microglia regulate vascular structure and function through PANX1-P2RY12 coupling in mice,” Nat. Commun., vol. 12, no. 1, p. 5289, Sep. 2021, doi: 10.1038/s41467-021-25590-8.

      (13) Y. Wu et al., “Quantitative relationship between cerebrovascular network and neuronal cell types in mice,” Cell Rep., vol. 39, no. 12, p. 110978, Jun. 2022, doi: 10.1016/j.celrep.2022.110978.

      (14) T. Kirabali et al., “The amyloid-β degradation intermediate Aβ34 is pericyte-associated and reduced in brain capillaries of patients with Alzheimer’s disease,” Acta Neuropathol. Commun., vol. 7, no. 1, p. 194, Dec. 2019, doi: 10.1186/s40478-019-0846-8.

      (15) X. Ren et al., “Linking cortical astrocytic neogenin deficiency to the development of Moyamoya disease–like vasculopathy,” Neurobiol. Dis., vol. 154, p. 105339, Jul. 2021, doi: 10.1016/j.nbd.2021.105339.

      (16) J. Steinman, M. M. Koletar, B. Stefanovic, and J. G. Sled, “3D morphological analysis of the mouse cerebral vasculature: Comparison of in vivo and ex vivo methods,” PLOS ONE, vol. 12, no. 10, p. e0186676, Oct. 2017, doi: 10.1371/journal.pone.0186676.

      (17) A.-A. Berthiaume et al., “Dynamic Remodeling of Pericytes In Vivo Maintains Capillary Coverage in the Adult Mouse Brain,” Cell Rep., vol. 22, no. 1, pp. 8–16, Jan. 2018, doi: 10.1016/j.celrep.2017.12.016.

      (18) S. Katz, R. Gattegno, L. Peko, R. Zarik, Y. Hagani, and T. Ilovitsh, “Diameter-dependent assessment of microvascular leakage following ultrasound-mediated blood-brain barrier opening,” iScience, vol. 26, no. 6, p. 106965, Jun. 2023, doi: 10.1016/j.isci.2023.106965.

      (19) J. Drouin-Ouellet et al., “Cerebrovascular and blood-brain barrier impairments in Huntington’s disease: Potential implications for its pathophysiology,” Ann. Neurol., vol. 78, no. 2, pp. 160–177, Aug. 2015, doi: 10.1002/ana.24406.

      (20) K. P. McDowell, A.-A. Berthiaume, T. Tieu, D. A. Hartmann, and A. Y. Shih, “VasoMetrics: unbiased spatiotemporal analysis of microvascular diameter in multi-photon imaging applications,” Quant. Imaging Med. Surg., vol. 11, no. 3, pp. 969–982, Mar. 2021, doi: 10.21037/qims-20-920.

      (21) E. L. Johnson et al., “Characterization of light penetration through brain tissue, for optogenetic stimulation.” bioRxiv, p. 2021.04.08.438932, Apr. 08, 2021. doi: 10.1101/2021.04.08.438932.

      (22) S. I. Al-Juboori, A. Dondzillo, E. A. Stubblefield, G. Felsen, T. C. Lei, and A. Klug, “Light scattering properties vary across different regions of the adult mouse brain,” PloS One, vol. 8, no. 7, p. e67626, 2013, doi: 10.1371/journal.pone.0067626.

      (23) P. Mächler et al., “Baseline oxygen consumption decreases with cortical depth,” PLOS Biol., vol. 20, no. 10, p. e3001440, Oct. 2022, doi: 10.1371/journal.pbio.3001440.

      (24) J. R. Mester et al., “In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2,” NeuroImage, vol. 192, pp. 135–144, May 2019, doi: 10.1016/j.neuroimage.2019.01.036.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We have significant concerns about the eLife assessment and the reviews. The reviewers acknowledged substantial strengths in our work:

      • Reviewer 3 noted that “the single-unit analyses of tuning direction are robustly characterized”, “the differences in neural correlations across behaviors, regions and perturbations are robust”, and “The evidence for these claims is solid.”

      • Reviewer 2 stated that “the manuscript has been improved” with “new analyses [that] provide improved rigor”.

      Despite these, the final eLife assessment inexplicably downplayed the significance of the findings and strength of evidence.

      Broader Impact and Significance. The findings, not only the data, have theoretical and/or practical implications extending well beyond a single subfield relevant to:

      1. behavioral neuroscientists studying sensorimotor integration

      2. systems and theoretical neuroscientists

      3. neural and biomechanical engineers working on brain-computer interfaces for speech or oral or limb prosthetics

      4. soft robotics researchers

      5. comparative motor control researchers

      6. clinicians involved in the evaluation and rehabilitation of orolingual function (e.g., after stroke or glossectomy, dysphagia)

      Given this broad relevance, we question why the significance was characterized as merely "useful" rather than "important."

      Dismissive Tone Toward Descriptive Research. Some reviews displayed a dismissive or skeptical tone of the findings and their significance, even when methods were solid and support for the claims were strong. They critiqued the “descriptive nature” of our study, faulting the lack of mechanistic explanation. However, in poorly understood fields such as orofacial sensorimotor control, descriptive studies provide the empirical foundation for mechanistic studies. Rich descriptive data generate testable hypotheses that drive mechanistic discoveries forward, while mechanistic studies conducted without this groundwork often pursue precise answers to poorly formulated questions.

      Specific Issues with Reviews:

      1. Significant omission in study description:

      The eLife Assessment’s second sentence states: “The data, which include both electrophysiology and nerve block manipulations, will be of value to neuroscientists and

      neural engineers interested in tongue use.”

      This description omits our simultaneously recorded high-resolution 3D kinematics data—a significant oversight given that combining high-density electrophysiological recording from multiple cortical regions with high-resolution 3D tongue kinematics during naturalistic behaviors in non-human primates represents one of our study's key strengths. Currently, only two research labs in the US possess this capability.

      2. Overemphasis on the “smaller” and “inconsistent” findings

      While we acknowledge some inconsistent findings between animals, the reviews overemphasized these inconsistencies in ways that cast unwarranted doubt on our more significant and consistent results.

      a. Reviewer 1: “[...] the discrepancies in tuning changes across the two NHPs, coupled with the overall exploratory nature of the study, render the interpretation of these subtle differences somewhat speculative. “[...] in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which seemed to result in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.”

      The skeptical tone of the critique is in opposition to Reviewer 3’s statement that: “the evidence for these claims were solid”. In this statement, the reviewer characterized our findings as “somewhat speculative”, seemingly overlooking robust and consistent changes we documented:

      • “Following nerve block, MIo and SIo showed significant decreases in the proportion of directionally modulated neurons across both tasks (Fig. 10A; Chi-square, MIo: p <0.001, SIo: p < 0.05).”

      • “Nerve block significantly altered PD distributions during both tasks. During feeding, MIo neurons in both subjects exhibited a significant clockwise shift in mean PD toward the center (0°), resulting in more uniform distributions (Fig. 11A; circular k-test, p < 0.01).”

      These results were obtained through careful subsampling of trials with similar kinematics for both feeding and drinking tasks, ensuring that the tuning changes in the nerve block experiments could not be attributed to differing kinematics.

      b. Reviewer 2: “One weakness of the current study is that there is substantial variability in results between monkeys.”

      This vague critique, without specifying which results showed “substantial variability”, reads as though most findings were inconsistent, unfairly casting doubt on our study’s validity.

      3. Inaccurate statements in the Reviewers’ summaries

      Several reviewer statements contain factual inaccuracies:

      a. Reviewer 2: “A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulation depending on the direction of movement (i.e., exhibited directional tuning).”

      Reviewer 2's characterization of directional tuning misrepresents our findings. We reported substantial differences in the proportion of directionally tuned neurons between MIo and SIo during the feeding task but a smaller difference in the drinking task:

      • “The proportion of directionally tuned neurons [...] differed significantly between MIo and SIo during the feeding task in both subjects (Chi-square, p < 0.001). In rostral and caudal MIo, 80% of neurons were modulated to 3D direction (bootstrap, p < 0.05, Fig. 3B, left), compared to 52% in areas 1/2 and 3a/3b.

      • “During drinking, the proportion of directionally modulated neurons was more similar between regions (69% in MIo vs. 60% in SIo: Chi-square, p > 0.05, Fig. 3B right).”

      b. Reviewer 2: “There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during licking.”

      Reviewer 2's claim about task differences directly contradicts our findings. We consistently reported stronger tuning in feeding compared to drinking across multiple measures:

      • “The proportion of directionally tuned neurons was higher in the feeding vs. drinking task (Chi-square, p < 0.05, feeding: 72%, drinking: 66%)”;

      • “Cumulative explained variance for the first three factors was higher in feeding (MIo: 82%, SIo: 81%) than in drinking (MIo: 74%, SIo: 63%)”;

      • “Decoding using LSTM showed consistently higher accuracies in feeding compared to drinking regardless of the length of intervals used ..., behavioral window .., and directional angles ...”

      These results were also summarized in the Discussion.

      c. Reviewer 1: In Figure 12, factor 2 and 3 are plotted against each other? and factor 1 is left out?

      Reviewer 1’s observation about Figure 12 is incorrect. Factor 1 was included: Top subplots (feeding) show Factor 1 vs 3 (MIo) and Factor 1 vs 2 (SIo) while the bottom subplots (drinking) show Factor 2 vs 3 (MIo) and Factor 1 vs 2 (SIo). We plotted the two latent factors with highest explained variance for clarity, though all 20 factors were included in intertrajectory distance calculations.

      4. Framing and interpretive over-scrutiny

      Several critiques targeted framing rather than methodological rigor and emphasized that interpretations were speculative even when appropriately hedged:

      a. Reviewer 2: “A revised version of the manuscript incorporates more population-level analyses, but with inconsistent use of quantifications/statistics and without sufficient contextualization of what the reader is to make of these results.”

      Reviewer 2 mentioned "inconsistent use of quantifications/statistics" without specifying which analyses were problematic or updating their summary to include our additional population-level findings.

      b. Reviewer 2: “The described changes in tuning after nerve block could also be explained by changes in kinematics between these conditions, which temper the interpretation of these interesting results”

      Despite our addressing kinematic concerns through subsampled data analysis, Reviewer 2 remained unsatisfied, contrasting sharply with Reviewer 3's assessment that our arguments were "convincing" with "solid" evidence.

      c. Reviewer 2: “I am not convinced of the claim that tongue directional encoding fundamentally changes between drinking and feeding given the dramatically different kinematics and the involvement of other body parts like the jaw”

      Reviewer 2 expressed skepticism about fundamental encoding differences between tasks, despite our comprehensive controls including subsampled data with similar kinematics and multiple verification analyses (equal neuron numbers, stable neurons, various interval lengths, behavioral windows, and directional angles).

      Without describing why these analyses were insufficient, this criticism goes beyond methods or statistics. It casts doubt and challenges whether the conclusions are even worth drawing despite careful experimental controls.

      d. Reviewer 2: “The manuscript states that "An alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somatosensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer".

      By not updating this section, Reviewer 2 failed to acknowledge our responsive revisions, including Fano factor analysis showing higher variability in SIo during feeding versus drinking, and our updated discussion addressing their concerns about trial-to-trial variability: “Varying tongue shape, tongue’s contact with varying bolus properties (size and texture) and other oral structures (palate, teeth) may weaken the directional signal contained in SIo activity. Thus, small differences in tongue kinematics might create large differences in sensory signals across trials. When looking at trial-averaged signals, this natural variability could make the neural response patterns appear less precise or specific than they are. These are consistent with our findings that for both tasks, spiking variability was higher in SIo.”

      Authors’ Response to Recommendations for the authors:

      We thank the editors and the reviewers for their helpful comments. We have provided a response to reviewers’ recommendations and made some revisions on the manuscript. 

      Reviewer #1 (Recommendations for the authors): 

      In the newly added population factor analysis, several methodological decisions remain unclear to me:

      In Figure 7, why do the authors compare the mean distance between conditions in the latent spaces of MIo and SIo? Since these latent spaces are derived separately, they exist on different scales (with MIo appearing roughly four times larger than SIo), and this discrepancy is reflected in the reported mean distances (Figure 7, inset plots). Wouldn't this undermine a direct comparison?

      Thank you for this helpful feedback. The reviewer is correct that the latent spaces are derived separately for MIo and SIo, thus they exist on different scales as we have noted in the caption of Figure 7: “Axes for SIo are 1/4 scale of MIo.” 

      To allow for a direct comparison between MIo and SIo, we corrected the analysis by comparing their normalized mean inter-trajectory distances obtained by first calculating the geometric index (GI) of the inter-trajectory distances, d, between each pair of population trajectories per region as: GI= (d<sub>1</sub>-d<sub>2</sub>)/ (d<sub>1</sub>+d<sub>2</sub>). We then performed the statistics on the GIs and found a significant difference between mean inter-trajectory distances in MIo vs. SIo. We performed the same analysis comparing the distance travelled between MIo and SIo trajectories by getting the normalized difference in distances travelled and still found a significant difference in both tasks. We have updated the results and figure inset to reflect these changes.

      In Figure 12, unlike Figure 7 which shows three latent dimensions, only two factors are plotted. While the methods section describes a procedure for selecting the optimal number of latent factors, Figure 7 - figure supplement 3 shows that variance explained continues to increase up to about five latent dimensions across all areas. Why, then, are fewer dimensions shown?

      Thank you for the opportunity to clarify the figure. The m obtained from the 3-fold crossvalidation varied for the full sample and was 20 factors for the subsample. We clarify that all statistical analyses were done using 20 latent factors. Using the full sample of neurons, the first 3 factors explained 81% of variance in feeding data compared to 71% in drinking data. When extended to 5 factors, feeding maintained its advantage with 91% variance explained versus 82% for drinking. Because feeding showed higher variance explained than drinking across 3 or 5 factors, only three factors were shown in Figure 7 for better visualization. We added this clarification to the Methods and Results.

      Figure 12 shows the differences in the neural trajectories between the control and nerve block conditions. The control vs. nerve block comparison complicated the visualization of the results. Thus, we plotted only the two latent factors with the highest separation between population trajectories. This was clarified in the Methods and caption of Figure 12.

      In Figure 12, factor 2 and 3 are plotted against each other? and factor 1 is left out?

      This observation is incorrect; Factor 1 was included: Top subplots (feeding) show Factor 1 vs 3 (MIo) and Factor 1 vs 2 (SIo) while the bottom subplots (drinking) show Factor 2 vs 3 (MIo) and Factor 1 vs 2 (SIo).  We have clarified this in the Methods and caption of Figure 12.

      Finally, why are factor analysis results shown only for monkey R? 

      Factor analysis results were performed on both animals, but the results were shown only for monkey R to decrease the number of figures in the manuscript. Figure 7- figure supplement 1 shows the data for both monkeys. Here are the equivalent Figure 7 plots for monkey Y. 

      Author response image 1.

      Reviewer #2 (Recommendations for the authors): 

      Overall, the manuscript has been improved. 

      New analyses provide improved rigor (as just one example, organizing the feeding data into three-category split to better match the three-direction drinking data decoding analysis and also matching the neuron counts).

      The updated nerve block change method (using an equal number of trials with a similar leftright angle of movement in the last 100 ms of the tongue trajectory) somewhat reduces my concern that kinematic differences could account for the neural changes, but on the other hand the neural analyses use 250 ms (meaning that the neural differences could be related to behavioral differences earlier in the trial). Why not subselect to trials with similar trajectories throughout the whole movement(or at least show that as an additional analysis, albeit one with lower trial counts). 

      As the reviewer pointed out, selecting similar trajectories throughout the whole movement would result in lower trial counts that lead to poor statistical power. We think that the 100 ms prior to maximum tongue protrusion is a more important movement segment to control for similar kinematics between the control and nerve block conditions since this represents the subject’s intended movement endpoint. 

      A lot of the Results seemed like a list of measurements without sufficient hand-holding or guide-posting to explain what the take-away for the reader should be. Just one example to make concrete this broadly-applicable feedback: "Cumulative explained variance for the first three factors was higher in feeding (MIo: 82%, SIo: 81%) than in drinking (MIo: 74%, SIo: 63%) when all neurons were used for the factor analysis (Fig. 7)": why should we care about 3 factors specifically? Does this mean that in feeding, the neural dimensionality is lower (since 3 factors explain more of it)? Does that mean feeding is a "simpler" behavior (which is counter-intuitive and does not conform to the authors' comments about the higher complexity of feeding). And from later in that paragraph: what are we do make of the differences in neural trajectory distances (aside from quantifying using a different metric the same larger changes in firing rates that could just as well be quantified as statistics across single-neuron PETHs)?

      Thank you for the feedback on the writing style. We have made some revisions to describe the takeaway for the reader. That fewer latent factors explain 80% of the variance in the feeding data means that the underlying network activity is relatively simple despite apparent complexity. When neural population trajectories are farther away from each other in state space, it means that the patterns of activity across tongue directions are more distinct and separable, thus, less likely to be confused with each other. This signifies that neural representations of 3D tongue directions are more robust. When there is better neural discrimination and more reliable information processing, it is easier for downstream brain regions to distinguish between different tongue directions.  

      The addition of more population-level analyses is nice as it provides a more efficient summary of the neural measurements. However, it's a surface-level dive into these methods; ultimately the goal of ensemble "computation through dynamics" analyses is to discover simpler structure / organizational principles at the ensemble level (i.e., show things not evidence from single neurons), rather than just using them as a way to summarize data. For instance, here neural rotations are remarked upon in the Results, without referencing influential prior work describing such rotations and why neural circuits may use this computational motif to separate out conditions and shape muscle activity-generating readouts (Churchland et al. Nature 2012 and subsequent theoretical iterations including the Russo et al.). That said, the Russo et al tangling study was well-referenced and the present tangling results were eGectively contextualized with respect to that paper in terms of the interpretation. I wish more of the results were interpreted with comparable depth. 

      Speaking of Russo et al: the authors note qualitative differences in tangling between brain areas, but do not actually quantify tangling in either. These observations would be stronger if quantified and accompanied with statistics.

      Contrary to the reviewer’s critique, we did frame these results in the context of structure/organizational principles at the ensemble level. We had already cited prior work of Churchland et al., 2012; Michaels et al., 2016and Russo et al., 2018. In the Discussion, Differences across behaviors, we wrote: “In contrast, MIo trajectories in drinking exhibited a consistent rotational direction regardless of spout location (Fig. 7). This may reflect a predominant non-directional information such as condition-independent time-varying spiking activity during drinking (Kaufman et al., 2016; Kobak et al., 2016; Arce-McShane et al., 2023).” 

      Minor suggestions: 

      Some typos, e.g. 

      • no opening parenthesis in "We quantified directional differences in population activity by calculating the Euclidean distance over m latent factors)"

      • missing space in "independent neurons(Santhanam et al., 2009;..."); 

      • missing closing parentheses in "followed by the Posterior Inferior (Figure 3 - figure supplement 1."

      There is a one-page long paragraph in the Discussion. Please consider breaking up the text into more paragraphs each organized around one key idea to aid readability.

      Thank you, we have corrected these typos.

      Could it be that the Kaufman et al 2013 reference was intended to be Kaufman et al 2015 eNeuro (the condition-invariant signal paper)?

      Thank you, we have corrected this reference.

      At the end of the Clinical Implications subsection of the Discussion, the authors note the growing field of brain-computer interfaces with references for motor read-out or sensory write-in of hand motor/sensory cortices, respectively. Given that this study looks at orofacial cortices, an even more clinically relevant development is the more recent progress in speech BCIs (two     recent reviews: https://www.nature.com/articles/s41583-024-00819-9, https://www.annualreviews.org/content/journals/10.1146/annurev-bioeng-110122012818) many of which record from human ventral motor cortex and aspirations towards FES-like approaches for orofacial movements (e.g., https://link.springer.com/article/10.1186/s12984-023-01272-y).  

      Thank you, we have included these references.

      Reviewer #3 (Recommendations for the authors): 

      Major Suggestions 

      (1) For the factor analysis of feeding vs licking, it appears that the factors were calculated separately for the two behaviors. It could be informative to calculate the factors under both conditions and project the neural data for the two behaviors into that space. The overlap/separations of the subspace could be informative. 

      We clarify that we performed a factor analysis that included both feeding and licking for MIo, as stated in the Results: “To control for factors such as different neurons and kinematics that might influence the results, we performed factor analysis on stable neurons across both tasks using all trials (Fig. 7- figure supplement 2A) and using trials with similar kinematics (Fig. 7- figure supplement 2B).” We have revised the manuscript to reflect this more clearly.

      (2) For the LSTM, the Factor analyses and the decoding it is unclear if the firing rates are mean subtracted and being normalized (the methods section was a little unclear). Typically, papers in the field either z-score the data or do a softmax.

      The firing rates were z-scored for the LSTM and KNN. For the factor analysis, the spike counts were not z-scored, but the results were normalized. We clarified this in the Methods section.

      Minor: 

      Page 1: Abstract- '... how OSMCx contributes to...' 

      Since there are no direct causal manipulations of OSMCx in this manuscript, this study doesn't directly study the OSMCx's contribution to movement - I would recommend rewording this sentence.

      Similarly, Page 2: 'OSMCx plays an important role in coordination...' the citations in this paragraph are correlative, and do not demonstrate a causal role.

      There are similar usages of 'OSMCx coordinates...' in other places e.g. Page 8. 

      Thank you, we revised these sentences.

      Page 7: the LSTM here has 400 units, which is a very large network and contains >12000 parameters. Networks of this size are prone to memorization, it would be wise to test the rsquare of the validation set against a shuGled dataset to see if the network is actually working as intended. 

      Thank you for bringing up this important point of verifying that the network is learning meaningful patterns versus memorizing. Considering the size of our training samples, the ratio of samples to parameters is appropriate and thus the risk of memorization is low. Indeed, validation tests and cross-validation performed indicated expected network behavior and the R squared values obtained here were similar to those reported in our previous paper (Laurence-Chasen et al., 2023).


      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper, Hosack and Arce-McShane investigate how the 3D movement direction of the tongue is represented in the orofacial part of the sensory-motor cortex and how this representation changes with the loss of oral sensation. They examine the firing patterns of neurons in the orofacial parts of the primary motor cortex (MIo) and somatosensory cortex (SIo) in non-human primates (NHPs) during drinking and feeding tasks. While recording neural activity, they also tracked the kinematics of tongue movement using biplanar videoradiography of markers implanted in the tongue. Their findings indicate that most units in both MIo and SIo are directionally tuned during the drinking task. However, during the feeding task, directional turning was more frequent in MIo units and less prominent in SIo units. Additionally, in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which resulted in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.

      Strengths:

      The most significant strength of this paper lies in its unique combination of experimental tools. The author utilized a video-radiography method to capture 3D kinematics of the tongue movement during two behavioral tasks while simultaneously recording activity from two brain areas. Moreover, they employed a nerve-blocking procedure to halt sensory feedback. This specific dataset and experimental setup hold great potential for future research on the understudied orofacial segment of the sensory-motor area.

      Weaknesses:

      Aside from the last part of the result section, the majority of the analyses in this paper are focused on single units. I understand the need to characterize the number of single units that directly code for external variables like movement direction, especially for less-studied areas like the orofacial part of the sensory-motor cortex. However, as a field, our decadelong experience in the arm region of sensory-motor cortices suggests that many of the idiosyncratic behaviors of single units can be better understood when the neural activity is studied at the level of the state space of the population. By doing so, for the arm region, we were able to explain why units have "mixed selectivity" for external variables, why the tuning of units changes in the planning and execution phase of the movement, why activity in the planning phase does not lead to undesired muscle activity, etc. See (Gallego et al. 2017; Vyas et al. 2020; Churchland and Shenoy 2024) for a review. Therefore, I believe investigating the dynamics of the population activity in orofacial regions can similarly help the reader go beyond the peculiarities of single units and in a broader view, inform us if the same principles found in the arm region can be generalized to other segments of sensorymotor cortex.

      We thank and agree with the reviewer on the value of information gained from studying population activity. We also appreciate that population analyses have led to the understanding that individual neurons have “mixed selectivity”. We have shown previously that OSMCx neurons exhibit mixed selectivity in their population activity and clear separation between latent factors associated with gape and bite force levels (Arce-McShane FI, Sessle BJ, Ram Y, Ross CF, Hatsopoulos NG (2023) Multiple regions of primate orofacial sensorimotor cortex encode bite force and gape. Front Systems Neurosci. doi: 10.3389/fnsys.2023.1213279. PMID: 37808467 PMCID: 10556252), and chew-side and food types (Li Z & Arce-McShane FI (2023). Cortical representation of mastication in the primate orofacial sensorimotor cortex. Program No. NANO06.05. 2023 Neuroscience Meeting Planner. Washington, D.C.: Society for Neuroscience, 2023. Online.). 

      The primary goal of this paper was to characterize single units in the orofacial region and to do a follow-up paper on population activity. In the revised manuscript, we have now incorporated the results of population-level analyses. The combined results of the single unit and population analyses provide a deeper understanding of the cortical representation of 3D direction of tongue movements during natural feeding and drinking behaviors. 

      Further, for the nerve-blocking experiments, the authors demonstrate that the lack of sensory feedback severely alters how the movement is executed at the level of behavior and neural activity. However, I had a hard time interpreting these results since any change in neural activity after blocking the orofacial nerves could be due to either the lack of the sensory signal or, as the authors suggest, due to the NHPs executing a different movement to compensate for the lack of sensory information or the combination of both of these factors. Hence, it would be helpful to know if the authors have any hint in the data that can tease apart these factors. For example, analyzing a subset of nerve-blocked trials that have similar kinematics to the control.

      Thank you for bringing this important point. We agree with the reviewer that any change in the neural activity may be attributed to lack of sensory signal or to compensatory changes or a combination of these factors. To tease apart these factors, we sampled an equal number of trials with similar kinematics for both control and nerve block feeding sessions. We added clarifying description of this approach in the Results section of the revised manuscript: “To confirm this e ect was not merely due to altered kinematics, we conducted parallel analyses using carefully subsampled trials with matched kinematic profiles from both control and nerve-blocked conditions.”

      Furthermore, we ran additional analysis for the drinking datasets by subsampling a similar distribution of drinking movements from each condition. We compared the neural data from an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. We compared the directional tuning across an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. These analyses that control for similar kinematics showed that there was still a decrease in the proportion of directionally modulated neurons with nerve block compared to the control. This confirms that the results may be attributed to the lack of tactile information. These are now integrated in the revised paper under Methods section: Directional tuning of single neurons, as well as Results section: E ects of nerve block: Decreased directional tuning of MIo and SIo neurons and Figure 10 – figure supplement 1.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Hosack and Arce-McShane examines the directional tuning of neurons in macaque primary motor (MIo) and somatosensory (SIo) cortex. The neural basis of tongue control is far less studied than, for example, forelimb movements, partly because the tongue's kinematics and kinetics are difficult to measure. A major technical advantage of this study is using biplanar video-radiography, processed with modern motion tracking analysis software, to track the movement of the tongue inside the oral cavity. Compared to prior work, the behaviors are more naturalistic behaviors (feeding and licking water from one of three spouts), although the animals were still head-fixed.

      The study's main findings are that:

      • A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulations depending on the direction of movement (i.e., exhibited directional tuning). Examining the statistics of tuning across neurons, there was anisotropy (e.g., more neurons preferring anterior movement) and a lateral bias in which tongue direction neurons preferred that was consistent with the innervation patterns of tongue control muscles (although with some inconsistency between monkeys).

      • Consistent with this encoding, tongue position could be decoded with moderate accuracy even from small ensembles of ~28 neurons.

      • There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during licking. This potentially suggests behavioral context-dependent encoding.

      • The authors then went one step further and used a bilateral nerve block to the sensory inputs (trigeminal nerve) from the tongue. This impaired the precision of tongue movements and resulted in an apparent reduction and change in neural tuning in Mio and SIo.

      Strengths:

      The data are difficult to obtain and appear to have been rigorously measured, and provide a valuable contribution to this under-explored subfield of sensorimotor neuroscience. The analyses adopt well-established methods, especially from the arm motor control literature, and represent a natural starting point for characterizing tongue 3D direction tuning.

      Weaknesses:

      There are alternative explanations for some of the interpretations, but those interpretations are described in a way that clearly distinguishes results from interpretations, and readers can make their own assessments. Some of these limitations are described in more detail below.

      One weakness of the current study is that there is substantial variability in results between monkeys, and that only one session of data per monkey/condition is analyzed (8 sessions total). This raises the concern that the results could be idiosyncratic. The Methods mention that other datasets were collected, but not analyzed because the imaging pre-processing is very labor-intensive. While I recognize that time is precious, I do think in this case the manuscript would be substantially strengthened by showing that the results are similar on other sessions.

      We acknowledge the reviewer’s concern about inter-subject variability. Animal feeding and drinking behaviors are quite stable across sessions, thus, we do not think that additional sessions will address the concern that the results could be idiosyncratic. Each of the eight datasets analyzed here have su icient neural and kinematic data to capture neural and behavioral patterns.  Nevertheless, we performed some of the analyses on a second feeding dataset from Monkey R. The results from analyses on a subset of this data were consistent across datasets; for example, (1) similar proportions of directionally tuned neurons, (2) similar distances between population trajectories (t-test p > 0.9), and (3) a consistently smaller distance between Anterior-Posterior pairs than others in MIo (t-test p < 0.05) but not SIo (p > 0.1). 

      This study focuses on describing directional tuning using the preferred direction (PD) / cosine tuning model popularized by Georgopoulous and colleagues for understanding neural control of arm reaching in the 1980s. This is a reasonable starting point and a decent first-order description of neural tuning. However, the arm motor control field has moved far past that viewpoint, and in some ways, an over-fixation on static representational encoding models and PDs held that field back for many years. The manuscript benefits from drawing the readers' attention (perhaps in their Discussion) that PDs are a very simple starting point for characterizing how cortical activity relates to kinematics, but that there is likely much richer population-level dynamical structure and that a more mechanistic, control-focused analytical framework may be fruitful. A good review of this evolution in the arm field can be found in Vyas S, Golub MD, Sussillo D, Shenoy K. 2020. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 43(1):249-75

      Thank you for highlighting this important point. Research on orofacial movements hasn't progressed at the same pace as limb movement studies. Our manuscript focused specifically on characterizing the 3D directional tuning properties of individual neurons in the orofacial area—an analysis that has not been conducted previously for orofacial sensorimotor control. While we initially prioritized this individual neuron analysis, we recognize the value of broader population-level insights.

      Based on your helpful feedback, we have incorporated additional population analyses to provide a more comprehensive picture of orofacial sensorimotor control and expanded our discussion section. We appreciate your expertise in pushing our work to be more thorough and aligned with current neuroscience approaches.

      Can the authors explain (or at least speculate) why there was such a large difference in behavioral e ect due to nerve block between the two monkeys (Figure 7)?

      We acknowledge this as a variable inherent to this type of experimentation. Previous studies have found large kinematic variation in the effect of oral nerve block as well as in the following compensatory strategies between subjects. Each animal’s biology and response to perturbation vary naturally. Indeed, our subjects exhibited different feeding behavior even in the absence of nerve block perturbation (see Figure 2 in Laurence-Chasen et al., 2022). This is why each individual serves as its own control.

      Do the analyses showing a decrease in tuning after nerve block take into account the changes (and sometimes reduction in variability) of the kinematics between these conditions? In other words, if you subsampled trials to have similar distributions of kinematics between Control and Block conditions, does the effect hold true? The extreme scenario to illustrate my concern is that if Block conditions resulted in all identical movements (which of course they don't), the tuning analysis would find no tuned neurons. The lack of change in decoding accuracy is another yellow flag that there may be a methodological explanation for the decreased tuning result.

      Thank you for bringing up this point. We accounted for the changes in the variability of the kinematics between the control and nerve block conditions in the feeding dataset where we sampled an equal number of trials with similar kinematics for both control and nerve block. However, we did not control for similar kinematics in the drinking task. In the revised manuscript, we have clarified this and performed similar analysis for the drinking task. We sampled a similar distribution of drinking movements from each condition. We compared the neural data from an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. There was a decrease in the percentage of neurons that were directionally modulated (between 30 and 80%) with nerve block compared to the control. These results have been included in the revised paper under Methods section: Directional tuning of single neurons, as well as Results section: E ects of nerve block: Decreased directionality of MIo and SIo neurons.

      While the results from decoding using KNN did not show significant differences between decoding accuracies in control vs. nerve block conditions, the results from the additional factor analysis and decoding using LSTM were consistent with the decrease in directional tuning at the level of individual neurons.  

      The manuscript states that "Our results suggest that the somatosensory cortex may be less involved than the motor areas during feeding, possibly because it is a more ingrained and stereotyped behavior as opposed to tongue protrusion or drinking tasks". Could an alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somato sensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer.

      Thank you for bringing up this point. We have now incorporated this in our revised Discussion (see Comparison between MIo and SIo). We agree with the reviewer that trialby-trial variability in the a erent signals may account for the lower directional signal in SIo during feeding than in drinking. Indeed, SIo’s mean-matched Fano factor in feeding was significantly higher than those in drinking (Author response image 1). Moreover, the results of the additional population and decoding analyses also support this.  

      Author response image 1.

      Comparison of mean-matched Fano Factor between Sio neurons during feeding and drinking control tasks across both subjects (Wilcoxon rank sum test, p < 0.001).

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors aim to uncover how 3D tongue direction is represented in the Motor (M1o) and Somatosensory (S1o) cortex. In non-human primates implanted with chronic electrode arrays, they use X-ray-based imaging to track the kinematics of the tongue and jaw as the animal is either chewing food or licking from a spout. They then correlate the tongue kinematics with the recorded neural activity. Using linear regressions, they characterize the tuning properties and distributions of the recorded population during feeding and licking. Then, they recharacterize the tuning properties after bilateral lidocaine injections in the two sensory branches of the trigeminal nerve. They report that their nerve block causes a reorganization of the tuning properties. Overall, this paper concludes that M1o and S1o both contain representations of the tongue direction, but their numbers, their tuning properties, and susceptibility to perturbed sensory input are different.

      Strengths:

      The major strengths of this paper are in the state-of-the-art experimental methods employed to collect the electrophysiological and kinematic data.

      Weaknesses:

      However, this paper has a number of weaknesses in the analysis of this data.

      It is unclear how reliable the neural responses are to the stimuli. The trial-by-trial variability of the neural firing rates is not reported. Thus, it is unclear if the methods used for establishing that a neuron is modulated and tuned to a direction are susceptible to spurious correlations. The authors do not use shuffling or bootstrapping tests to determine the robustness of their fits or determining the 'preferred direction' of the neurons. This weakness colors the rest of the paper.

      Thank you for raising these points. We have performed the following additional analyses: (1) We have added analyses to ensure that the results could not be explained by neural variability. To show the trial-by-trial variability of the neural firing rates, we have calculated the Fano factor (mean overall = 1.34747; control = 1.46471; nerve block = 1.23023). The distribution was similar across directions, suggesting that responses of MIo and SIo neurons to varying 3D directions were reliable. (2) We have used a bootstrap procedure to ensure that directional tuning cannot be explained by mere chance. (3) To test the robustness of our PDs we also performed a bootstrap test, which yielded the same results for >90% of neurons, and a multiple linear regression test for fit to a cosine-tuning function. In the revised manuscript, the Methods and Results sections have been updated to include these analyses.  

      Author response image 2.

      Comparison of Fano Factor across directions for MIo and SIo Feeding Control (Kruskal-Wallis, p > 0.7).

      The authors compare the tuning properties during feeding to those during licking but only focus on the tongue-tip. However, the two behaviors are different also in their engagement of the jaw muscles. Thus many of the differences observed between the two 'tasks' might have very little to do with an alternation in the properties of the neural code - and more to do with the differences in the movements involved. 

      Using the tongue tip for the kinematic analysis of tongue directional movements was a deliberate choice as the anterior region of the tongue is highly mobile and sensitive due to a higher density of mechanoreceptors. The tongue tip is the first region that touches the spout in the drinking task and moves the food into the oral cavity for chewing and subsequent swallowing. 

      We agree with the reviewer that the jaw muscles are engaged differently in feeding vs. drinking (see Fig. 2). For example, a wider variety of jaw movements along the three axes are observed in feeding compared to the smaller amplitude and mostly vertical jaw movements in drinking. Also, the tongue movements are very different between the two behaviors. In feeding, the tongue moves in varied directions to position the food between left-right tooth rows during chewing, whereas in the drinking task, the tongue moves to discrete locations to receive the juice reward. Moreover, the tongue-jaw coordination differs between tasks; maximum tongue protrusion coincides with maximum gape in drinking but with minimum gape in the feeding behavior. Thus, the different tongue and jaw movements required in each behavior may account for some of the differences observed in the directional tuning properties of individual neurons and population activity. These points have been included in the revised Discussion.

      Author response image 3.

      Tongue tip position (mm) and jaw pitch(degree) during feeding (left) and drinking (right) behaviors. Most protruded tongue position coincides with minimum gape (jaw pitch at 0°) during  feeding but with maximum gape during drinking.

      Many of the neurons are likely correlated with both Jaw movements and tongue movements - this complicates the interpretations and raises the possibility that the differences in tuning properties across tasks are trivial.

      We thank the reviewer for raising this important point. In fact, we verified in a previous study whether the correlation between the tongue and jaw kinematics might explain differences in the encoding of tongue kinematics and shape in MIo (see Supplementary Fig. 4 in Laurence-Chasen et al., 2023): “Through iterative sampling of sub-regions of the test trials, we found that correlation of tongue kinematic variables with mandibular motion does not account for decoding accuracy. Even at times where tongue motion was completely un-correlated with the jaw, decoding accuracy could be quite high.” 

      The results obtained from population analyses showing distinct properties of population trajectories in feeding vs. drinking behaviors provide strong support to the interpretation that directional information varies between these behaviors.

      The population analyses for decoding are rudimentary and provide very coarse estimates (left, center, or right), it is also unclear what the major takeaways from the population decoding analyses are. The reduced classification accuracy could very well be a consequence of linear models being unable to account for the complexity of feeding movements, while the licking movements are 'simpler' and thus are better accounted for.

      We thank the reviewer for raising this point. The population decoding analyses provide additional insight on the directional information in population activity,  as well as a point of comparison with the results of numerous decoding studies on the arm region of the sensorimotor cortex. In the revised version, we have included the results from decoding tongue direction using a long short-term memory (LSTM) network for sequence-tosequence decoding. These results differed from the KNN results, indicating that a linear model such as KNN was better for drinking and that a non-linear and continuous decoder was better suited for feeding.  These results have been included in the revised manuscript.

      The nature of the nerve block and what sensory pathways are being affected is unclear - the trigeminal nerve contains many different sensory afferents - is there a characterization of how e ectively the nerve impulses are being blocked? Have the authors confirmed or characterized the strength of their inactivation or block, I was unable to find any electrophysiological evidence characterizing the perturbation.

      The strength of the nerve block is characterized by a decrease in the baseline firing rate of SIo neurons, as shown in Supplementary Figure 6 of “Loss of oral sensation impairs feeding performance and consistency of tongue–jaw coordination” (Laurence-Chasen et al., 2022)..

      Overall, while this paper provides a descriptive account of the observed neural correlations and their alteration by perturbation, a synthesis of the observed changes and some insight into neural processing of tongue kinematics would strengthen this paper.

      We thank the reviewer for this suggestion. We have revised the Discussion to provide a synthesis of the results and insights into the neural processing of tongue kinematics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The procedure for anesthesia explained in the method section was not clear to me. The following information was missing: what drug/dose was used? How long the animal was under anesthesia? How long after the recovery the experiments were done?

      The animals were fully sedated with ketamine (100 mg/ml, 10 mg/kg) for less than 30 minutes, and all of the data was collected within 90 minutes after the nerve block was administered.

      (2) In Figure 10, panels A and B are very close together, it was not at first clear whether the text "Monkey R, Monkey Y" belongs to panel A or B.

      We have separated the two panels further in the revised figure.

      (3) I found Figure 11 very busy and hard to interpret. Separating monkeys, fitting the line for each condition, or using a bar plot can help with the readability of the figure.

      Thank you for the suggestion. We agree with you and have reworked this figure. To simplify it we have shown the mean accuracy across iterations.

      (4) I found the laterality discussions like "This signifies that there are more neurons in the left hemisphere contributes toward one direction of tongue movement, suggesting that there is some laterality in the PDs of OSMCx neurons that varies between individuals" bit of an over-interpretation of data, given the low n value and the dissimilarity in how strongly the nerve blocking altered monkies behavior.

      Thank you for sharing this viewpoint. We do think that laterality is a good point of comparison with studies on M1 neurons in the arm/hand region. In our study, we found that the peak of the PD distribution coincides with leftward tongue movements in feeding. The distribution of PDs provides insight into how tongue muscles are coordinated during movement. Intrinsic and extrinsic tongue muscles are involved in shaping the tongue (e.g., elongation, broadening) and positioning the tongue (e.g., protrusion/retraction, elevation/depression), respectively. These muscles receive bilateral motor innervation except for genioglossus. Straight tongue protrusion requires the balanced action of the right and left genioglossi while the lateral protrusion involves primarily the contralateral genioglossus. Given this unilateral innervation pattern, we hypothesized that left MIo/SIo neurons would preferentially respond to leftward tongue movements, corresponding to right genioglossus activation. 

      Reviewer #2 (Recommendations for the authors):

      Are the observation of tuning peaks being most frequently observed toward the anterior and superior directions consistent with the statistics of the movements the tongue typically makes? This could be analogous to anisotropies previously reported in the arm literature, e.g., Lillicrap TP, Scott SH. 2013. Preference Distributions of Primary Motor Cortex Neurons Reflect Control Solutions Optimized for Limb Biomechanics. Neuron. 77(1):168-79

      Thank you for bringing our attention to analogous findings by Lillicrap & Scott, 2013. Indeed, we do observe the highest number of movements in the Anterior Superior directions, followed by the Posterior Inferior. This does align with the distribution of tuning peaks that we observed. Author response image 4 shows the proportions of observed movements in each group of directions across all feeding datasets. We have incorporated this data in the Results section: Neuronal modulation patterns differ between MIo and SIo, as well as added this point in the Discussion.

      Author response image 4.

      Proportion of feeding trials in each group of directions. Error bars represent ±1 standard deviation across datasets (n = 4).

      "The Euclidean distance was used to identify nearest neighbors, and the number of nearest neighbors used was K = 7. This K value was determined after testing different Ks which yielded comparable results." In general, it's a decoding best practice to tune hyperparameters (like K) on fully held-out data from the data used for evaluation. Otherwise, this tends to slightly inflate performance because one picks the hyperparameter that happened to give the best result. It sounds like that held-out validation set wasn't used here. I don't think that's going to change the results much at all (especially given the "comparable results" comment), but providing this suggestion for the future. If the authors replicate results on other datasets, I suggest they keep K = 7 to lock in the method.

      K = 7 was chosen based on the size of our smallest training dataset (n = 55). The purpose of testing different K values was not to select which value gave the best result, but to demonstrate that similar K values did not affect the results significantly. We tested the different K values on a subset of the feeding data, but that data was not fully held-out from the training set. We will keep your suggestion in mind for future analysis.

      The smoothing applied to Figure 2 PSTHs appears perhaps excessive (i.e., it may be obscuring interesting finer-grained details of these fast movements). Can the authors reduce the 50 ms Gaussian smoothing (I assume this is the s.d.?) ~25 ms is often used in studying arm kinematics. It also looks like the movement-related modulation may not be finished in these 200 ms / 500 ms windows. I suggest extending the shown time window. It would also be helpful to show some trial-averaged behavior (e.g. speed or % displacement from start) under or behind the PSTHs, to give a sense of what phase of the movement the neural activity corresponds to.

      Thank you for the suggestion. We have taken your suggestions into consideration and modified Figure 2 accordingly. We decreased the Gaussian kernel to 25 ms and extended the time window shown. The trial-averaged anterior/posterior displacement was also added to the drinking PSTHs.

      Reviewer #3 (Recommendations for the authors):

      The major consideration here is that the data reported for feeding appears to be very similar to that reported in a previous study:

      "Robust cortical encoding of 3D tongue shape during feeding in macaques"

      Are the neurons reported here the same as the ones used in this previous paper? It is deeply concerning that this is not reported anywhere in the methods section.

      These are the same neurons as in our previous paper, though here we include several additional datasets of the nerve block and drinking sessions. We have now included this in the methods section.

      Second, I strongly recommend that the authors consider a thorough rewrite of this manuscript and improve the presentation of the figures. As written, it was not easy to follow the paper, the logic of the experiments, or the specific data being presented in the figures.

      Thank you for this suggestion. We have done an extensive rewrite of the manuscript and revision of the figures.

      A few recommendations:

      (1) Please structure your results sections and use descriptive topic sentences to focus the reader. In the current version, it is unclear what the major point being conveyed for each analysis is.

      Thank you for this suggestion. We have added topic sentences to the begin each section of the results.

      (2) Please show raster plots for at least a few example neurons so that the readers have a sense of what the neural responses look like across trials. Is all of Figure 2 one example neuron or are they different neurons? Error bars for PETH would be useful to show the reliability and robustness of the tuning.

      Figure 2 shows different neurons, one from MIo and one from SIo for each task. There is shading showing ±1 standard error around the line for each direction, however this was a bit difficult to see. In addition to the other changes we have made to these figures, we made the lines smaller and darkened the error bar shading to accentuate this. We also added raster plots corresponding to the same neurons represented in Figure 2 as a supplement.

      (3) Since there are only two data points, I am not sure I understand why the authors have bar graphs and error bars for graphs such as Figure 3B, Figure 5B, etc. How can one have an error bar and means with just 2 data points?

      Those bars represent the standard error of the proportion. We have changed the y-axis label on these figures to make this clearer.

      (4) Results in Figure 6 could be due to differential placement of the electrodes across the animals. How is this being accounted for?

      Yes, this is a possibility which we have mentioned in the discussion. Even with careful placement there is no guarantee to capture a set of neurons with the exact same function in two subjects, as every individual is different. Rather we focus on analyses of data within the same animal. The purpose of Figure 6 is to show the difference between MIo and SIo, and between the two tasks, within the same subject. The more salient result from calculating the preferred direction is that there is a change in the distribution between control and nerve block within the same exact population. Discussions relating to the comparison between individuals are speculative and cannot be confirmed without the inclusion of many more subjects.

      (5) For Figure 7, I would recommend showing the results of the Sham injection in the same figure instead of a supplement.

      Thank you for the suggestion, we have added these results to the figure.

      (6) I think the e ects of the sensory block on the tongue kinematics are underexplored in Figure 7 and Figure 8. The authors could explore the deficits in tongue shape, and the temporal components of the trajectory.

      Some of these effects on feeding have been explored in a previous paper, LaurenceChasen et al., 2022. We performed some additional analyses on changes to kinematics during drinking, including the number of licks per 10 second trial and the length of individual licks. The results of these are included below. We also calculated the difference in the speed of tongue movement during drinking, which generally decreased and exhibited an increase in variance with nerve block (f-test, p < 0.001). However, we have not included these figures in the main paper as they do not inform us about directionality.

      Author response image 5.

      Left halves of hemi-violins (black) are control and right halves (red) are nerve block for an individual. Horizontal black lines represent the mean and horizontal red lines the median. Results of two-tailed t-test and f-test are indicated by asterisks and crosses, respectively: *,† p < 0.05; **,†† p < 0.01; ***,††† p < 0.001.

      (9) In Figures 9 and 10. Are the same neurons being recorded before and after the nerve block? It is unclear if the overall "population" properties are different, or if the properties of individual neurons are changing due to the nerve block.

      Yes, the same neurons are being recorded before and after nerve block. Specifically, Figure 9B shows that the properties of many individual neurons do change due to the nerve block. Differences in the overall population response may be attributed to some of the units having reduced/no activity during the nerve block session.

      Additionally, I recommend that the authors improve their introduction and provide more context to their discussion. Please elaborate on what you think are the main conceptual advances in your study, and place them in the context of the existing literature. By my count, there are 26 citations in this paper, 4 of which are self-citations - clearly, this can be improved upon.

      Thank you for this suggestion. We have done an extensive rewrite of the Introduction and Discussion. We discussed the main conceptual advances in our study and place them in the context of the existing literature.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors have tried to dissect the functions of Proteasome activator 28γ (PA28γ) which is known to activate proteasomal function in an ATP-independent manner. Although there are multiple works that have highlighted the role of this protein in tumours, this study specifically tried to develop a correlation with Complement C1q binding protein (C1QBp) that is associated with immune response and energy homeostasis.

      Strengths:

      The observations of the authors hint that beyond PA28y's association with the proteasome, it might also stabilize certain proteins such as C1QBP which influences energy metabolism.

      Weaknesses:

      The strength of the work also becomes its main drawback. That is, how PA28y stabilizes C1QBP or how C1QBP elicits its pro-tumourigenic role under PA28y OE.<br /> In most of the experiments, the authors have been dependent on the parallel changes in the expression of both the proteins to justify their stabilizing interaction. However, this approach is indirect at best and does not confirm the direct stabilizing effect of this interaction. IP experiments do not indicate direct interaction and have some quality issues. The upregulation of C1QBP might be indirect at best. It is quite possible that PA28y might be degrading some secondary protein/complex that is responsible for C1QBP expression. Since the core idea of the work is PA28y direct interaction with C1QBP stabilizing it, the same should be demonstrated in a more convincing manner.

      Thank you very much for the important comments. Using AlphaFold 3, we found that interaction between PA28γ and C1QBP may depend on amino acids 1-167 and 1-213 (Revised Appendix Figure 1D-H), which was confirmed by our immunoprecipitation (Revised Figure 1I). In the future, we will use nuclear magnetic resonance spectroscopy to analyze protein-protein interaction between PA28γ and C1QBP and demonstrate it by GST pull down in vitro experiments.

      In all of the assays, C1QBP has been detected as doublet. However, the expression pattern of the two bands varies depending on the experiment. In some cases, the upper band is intensely stained and in some the lower bands. Do C1QBP isoforms exist and are they differentially regulated depending on experiment conditions/tissue types?

      Thank you very much for the important comments. We have rechecked the experimental results with two bands, which may have been caused by using polyclonal antibody of C1QBP (Abcam: ab101267). Therefore, we conducted the experiment with monoclonal antibody of C1QBP (Cell Signaling Technology: #6502) and replaced the corresponding images in revised figure (Revised Figure 1E and Revised Appendix Figure 3D).

      Problems with the background of the work: Line 76. This statement is far-fetched. There are presently a number of works of literature that have dealt with the metabolic programming of OSCC including identification of specific metabolites. Moreover, beyond the estimation of OCR, the authors have not conducted any experiments related to metabolism. In the Introduction, the significance of this study and how it will extend our understanding of OSCC needs to be elaborated.

      Thank you very much for the important comments. Based on your suggestion, we have revised the content and updated the references (“Introduction”, Paragraph 2, Line 13-17 and Paragraph 4, Line 5-8). In addition, we plan to conduct experiments to investigate the regulation of metabolism by PA28γ and C1QBP and update our data in the future.

      The modified content is as follows:

      “Current research on metabolic reprogramming in OSCC primarily focused on mechanism of glycolytic metabolism and metabolic shift from glycolysis to oxidative phosphorylation (OXPHOS) of oral squamous cell carcinoma, which lays the groundwork for novel therapeutic interventions to counteract OSCC (Chen et al., 2024; Zhang et al., 2020).”

      “It is the first study to describe the undiscovered role of PA28γ in promoting the malignant progression of OSCC by elevating mitochondrial function, providing new clinical insights for the treatment of OSCC.”

      Reviewer #2 (Public review):

      Summary:

      The authors tried to determine how PA28g functions in oral squamous cell carcinoma (OSCC) cells. They hypothesized it may act through metabolic reprogramming in the mitochondria.

      Strengths:

      They found that the genes of PA28g and C1QBP are in an overlapping interaction network after an analysis of a genome database. They also found that the two proteins interact in coimmunoprecipitation and pull-down assays using the lysate from OSCC cells with or without expression of the exogenous genes. They used truncated C1QBP proteins to map the interaction site to the N-terminal 167 residues of C1QBP protein. They observed the levels of the two proteins are positively correlated in the cells. They provided evidence for the colocalization of the two proteins in the mitochondria, the effect on mitochondrial form and function in vitro and in vivo OSCC models, and the correlation of the protein expression with the prognosis of cancer patients.

      Weaknesses:

      Many data sets are shown in figures that cannot be understood without more descriptions, either in the text or the legend, e.g., Figure 1A. Similarly, many abbreviations are not defined.

      Thank you very much for the important comments. We have revised the descriptions in the legend to make it easier to understand.

      Some of the pull-down and coimmunoprecipitation data do not support the conclusion about the PA28g-C1QBP interaction. For example, in Appendix Figure 1B the Flag-C1QBP was detected in the Myc beads pull-down when the protein was expressed in the 293T cells without the Myc-PA28g, suggesting that the pull-down was not due to the interaction of the C1QBP and PA28g proteins. In Appendix Figure 1C, assume the SFB stands for a biotin tag, then the SFB-PA28g should be detected in the cells expressing this protein after pull-down by streptavidin; however, it was not. The Western blot data in Figure 1E and many other figures must be quantified before any conclusions about the levels of proteins can be drawn.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure (Revised Appendix Figure 1B, C). In addition, we have conducted a quantitative analysis of gray values to confirm the results of western blot data are accurate by Image J software.

      The immunoprecipitation method is flawed as it is described. The antigen (PA28g or C1QBP) should bind to the respective antibody that in turn should binds to Protein G beads. The resulting immunocomplex should end up in the pellet fraction after centrifugation and be analyzed further by Western blot for coprecipitates. However, the method in the Appendix states that the supernatant was used for the Western blot.

      Thank you very much for the careful review. We have corrected it in the revised appendix file (“Supplemental Materials and Methods”, Part“Immunoprecipitation assay”, Line 4-6).

      The modified content is as follows:

      The sample was shaken on a horizontal shaker for 4 h, after which the deposit was collected for western blotting.

      To conclude that PA28g stabilizes C1QBP through their physical interaction in the cells, one must show whether a protease inhibitor can substitute PA28q and prevent C1QBP degradation, and show whether a mutation that disrupts the PA28g-C1QBP interaction can reduce the stability of C1QBP. In Figure 1F, all cells expressed Myc-PA28g. Therefore, the conclusion that PA28g prevented C1QBP degradation cannot be reached. Instead, since more Myc-PA28g was detected in the cells expressing Flag-C1QBP compared to the cells not expressing this protein, a conclusion would be that the C1QBP stabilized the PA28g. Figure 1G is a quantification of Western blot data that should be shown.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure. Compared with the control group, the presence of Myc-PA28γ significantly increased the expression level of Flag-C1QBP (Revised Figure 1F). Gray value analysis showed that in cells transfected with Myc-PA28γ, the decay rate of Flag-C1QBP was significantly slower than that of the control group (Revised Figure 1G), suggesting that PA28γ can delay the protein degradation of C1QBP and stabilize its protein level. This indicates that an increase in the level of PA28γ protein can significantly enhance the expression level of C1QBP protein, while PA28γ can slow down the degradation rate of C1QBP and improve its stability. In addition, we plan to conduct experiments to investigate the effects of protease inhibitors and PA28γ mutants on the stability of C1QBP and update our data in the future.

      The binding site for PA28g in C1QBP was mapped to the N-terminal 167 residues using truncated proteins. One caveat would be that some truncated proteins did not fold correctly in the absence of the sequence that was removed. Thus, the C-terminal region of the C1QBP with residues 168-283 may still bind to the PA29g in the context of full-length protein. In Figure 1I, more Flag-C1QBP 1-167 was pulled down by Myc-PA28g than the full-length protein or the Flag-C1QBP 1-213. Why?

      Thank you very much for the important comments. Immunoprecipitation is a qualitative experiment. Using AlphaFold 3, we found that interaction between PA28γ and C1QBP may depend on amino acids 1-167 and 1-213 (Revised Appendix Figure 1D-H), which was confirmed by our immunoprecipitation (Revised Figure 1I).

      The interaction site in PA28g for C1QBP was not mapped, which prevents further analysis of the interaction. Also, if the interaction domain can be determined, structural modeling of the complex would be feasible using AlphaFold2 or other programs. Then, it is possible to test point mutations that may disrupt the interaction and if so, the functional effect.

      Thank you very much for the important comments. Based on your suggestion, we have added relevant content to the revised appendix figure. (Revised Appendix Figure 1D-H).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) There are a lot of typos in the figure and manuscript that need to be addressed.

      Thank you very much for the important comments. We have corrected the typos in the revised figure and manuscript.

      (2) Figure 1A: The amount of protein that has been immunoprecipitated is more than the actual amount present in the lysate. The authors should calculate the efficiency of the precipitation to support their results.

      Thank you very much for the important comments. Immunoprecipitation is a qualitative experiment. Moreover, it can enrich specific proteins and their binding partners, increase their concentration in the sample, and thus improve the sensitivity of detection.

      (3) Figure 1D: The relative expression levels of C1QBP look similar in almost all cell lines except for HN12. It seems that the relation of PA28y with C1QBP is more of a cell type-specific effect. It would be better if the blots were quantified, and the differences were statistically determined.

      Thank you very much for the important comments. We have conducted a quantitative analysis of gray values to confirm the results of western blot data are accurate by Image J software.

      (4) Figure 1E: How do the authors quantify the expression of the protein in absolute terms? From the methods, it is understood that the flag-tagged construct is stably expressed. Under such conditions, how the authors observed the variable expression of the protein should be elaborated.

      Thank you very much for the important comments. We transfected Flag-PA28γ plasmids at 0ug, 0.5ug, 1ug, and 2ug in 293T cells. After collecting the protein for Western Blot, we found that the protein expression of Flag-PA28γ gradually increased. Moreover, the increased protein expression of C1QBP is consistent with the expression of Flag-PA28γ, which indicated a dose-dependent relationship between the two proteins.

      (5) Figures 1F, G: The data does not correlate with the arguments presented in the text. The authors propose that interaction with PA28y increases the stability of C1QBP. However, the experiment lacks appropriate controls. Ideally, the expression of C1QBP should be tested in the presence and absence of PA28y. Moreover, the observed difference in expression between lanes 1-4 and 5-8 for myc-PA28y needs to be explained. Are the samples from different sources with variable PA28y expression? Figure 1G quantification for C1QBP does not correlate with the figure presented in F since the expression of the protein in the first four lanes is undetectable.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure. Compared with the control group, the presence of Myc-PA28γ significantly increased the expression level of Flag-C1QBP (Revised Figure 1F). Gray value analysis showed that in cells transfected with Myc-PA28γ, the decay rate of Flag-C1QBP was significantly slower than that of the control group (Revised Figure 1G), suggesting that PA28γ can delay the protein degradation of C1QBP and stabilize its protein level. This indicates that an increase in the level of PA28γ protein can significantly enhance the expression level of C1QBP protein, while PA28γ can slow down the degradation rate of C1QBP and improve its stability. In addition, we plan to conduct experiments to investigate the effects of protease inhibitors and PA28γ mutants on the stability of C1QBP and update our data in the future.

      (6) Appendix Figure 1B: Lane 1 does not express Myc-tagged protein but pull-down has been performed using Myc beads. Then how come flag-C1qbp is getting pulled down in lane 1 if there is no PA28y? This indicates a non-specific interaction of C1qbp with the substrata under the experimental conditions used. Similarly, in Figure 1C SFB-PA28y is expressed in both lanes but is reflected only in lane 2 and not in lane 1 even when pull-down is being performed using SFB beads, again reflecting the non-specificity of the interactions shown through immunoprecipitated.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure (Revised Appendix Figure 1B, C).

      (7) Figure 2A: Figure 2A the co-localization of P28y with C1QBP in mitochondria is not very convincing. The authors are urged to provide high-resolution images for the same along with quantification of co-localization coefficients.

      Thank you very much for the important comments. We plan to obtain high-resolution images of co-localization of PA28γ with C1QBP in mitochondria and add the quantification analysis. We will update our data in the future.

      (8) Figure 2C: Mitochondria dynamics is an interplay of multiple factors. From the images, it seems that PA28y OE elevates mitochondria biogenesis in general which is having an umbrella effect on mitochondria fusion/fission and OCR. Images also do not convincingly indicate changes in mitochondrial length. The role of PA28y on mitochondria dynamics requires further justification. However, the presented data does not underline whether the changes in mitochondria behaviour are a consequence of PA28y and C1QBP interaction. Correlating higher mitochondria respiration with ROS generation is a far-fetched conclusion since, at present, there are multiple reports that suggest otherwise.

      Thank you very much for the important comments. We plan to knock out the interaction regions between PA28γ and C1QBP (like amino acids 1-167 and 1-213) to confirm whether PA28γ affects mitochondrial function through C1QBP and update our data in the future.

      (9) Line 157: The presented data does not substantiate the claims made that Pa28y regulates mitochondrial function through C1QBP.

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Results”, Part “PA28γ and C1QBP colocalize in mitochondria and affect mitochondrial functions”, Paragraph 3, Line 1-2).

      The modified content is as follows:

      “Collectively, these data suggest that PA28γ, which co-localizes with C1QBP in mitochondria, may involve in regulating mitochondrial morphology and function.”

      (10) Line 159: From the past data it is not very clear how PA28y upregulates C1QBP, hence the statement is not well supported. The presented data indicates the presence of a functional association between the two proteins.

      Thank you very much for the important comments. We detected the expression of C1QBP in two PA28γ-overexpressing OSCC cells (UM1 and 4MOSC2) and found an increase in C1QBP expression (Revised Figure 4B). Based on the results of the protein levels of the mitochondrial respiratory chain complex and other mitochondrial functional proteins, we believe that PA28γ regulates mitochondrial function by upregulating C1QBP.

      (11) Figure 4A, B: Given the mitochondrial role of C1QBP, the lesser levels of mitochondrial proteins upon C1QBP silencing are expected. Does it get phenocopied upon PA28y silencing? Similarly, all the subsequent mitochondrial phenotypes in D should be seen in a PA28y-depleted background.

      Thank you very much for the important comments. We plan to detect the mitochondrial protein expressions and OCRs of PA28γ-silenced OSCC cells. We will update our data in the future.

      (12) Line 198: The presented data do indicate a functional association between these two proteins but it does not provide a solid evidence for the same.

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 1, Line 9-10).

      The modified content is as follows:

      “Excitingly, we found the evidence that PA28γ interacts with and stabilizes C1QBP.”

      (13) Line 218-220: In this work, the authors highlight the non-degradome role of PA28y and hence, this fact should be treated appropriately in discussion in line with the presented data.

      Thank you very much for the important comments. Based on your suggestion, we have added relevant content to the revised manuscript (“Discussion”, Paragraph 2, Line 16-19).

      The modified content is as follows:

      “In addition, PA28γ can also play as a non-degradome role on tumor angiogenesis. For example, PA28γ can regulate the activation of NF-κB to promote the secretion of IL-6 and CCL2 in OSCC cells, thus promoting the angiogenesis of endothelial cells ( S. Liu et al., 2018).”

      (14) Line 236-240: Although the authors' statement on organ heterogeneity being the cause for getting the contrasting result is justifiable but here there is no direct evidence of PA28y involvement in regulation of OXPHOS and its impact on cellular metabolism (glycolysis, metabolic signalling, etc).

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 3, Line 7-9).

      The modified content is as follows:

      “Therefore, PA28γ's regulation of OXPHOS may impact cellular energy metabolism.”

      (15) Line 249: No conclusive data supporting this statement.

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 5, Line 1-3).

      The modified content is as follows:

      “Furthermore, our study reveals that PA28γ can regulate C1QBP and influence mitochondrial morphology and function by enhancing the expression of OPA1, MFN1, MFN2 and the mitochondrial respiratory complex.”

      Reviewer #2 (Recommendations for the authors):

      (1) The images shown in Figure 2A need to be quantified before the conclusion about the mitochondrial colocalization of the two proteins can be drawn. In Figure 2B and Appendix Figure 2A, the mitochondrial vacuoles and ridge should be indicated for general readers, and quantification should be performed before the conclusion is drawn.

      Thank you very much for the important comments. We will update our data in the future.

      (2) The OCR data from two cell lines are shown in Figure 2E and F. Which is which? The sentence, "The results indicated ... compared to control cells" in lines 130-132, was confusing; perhaps, it would be clear if "were significantly greater" could be deleted.

      Thank you very much for the important comments. We have re-labeled the Figure 2E and F to make it clearly (Revised Figure 2E, F). Based on your suggestion, we have deleted the words in revised manuscript. (“Results”, Part “PA28γ and C1QBP colocalize in mitochondria and affect mitochondrial functions”, Paragraph 1, Line 9-11).

      The modified content is as follows:

      “The results indicated significantly higher basal respiration, maximal OCRs and ATP production in PA28γ-overexpressing cells compared to control cells (Fig. 2G-I and Appendix Fig. 2B-D).”

      (3) Figures 4E-H show the migration, invasive, and proliferation capabilities of the cells. Which for which?

      Thank you very much for the important comments. We have re-labeled the Figure 4F-H to make it clearly (Revised Figure 4F-H).

      (4) In the Discussion, lines 198-201, it states that "C1QBP enhances ... function of OPA1, MNF1, MFN2..." What is the evidence? In lines 222-224, it says that "the binding sites ... may mask the specific ... modification sites". Please justify. In lines 253-254, "fuse" and fuses" are misleading, Did the authors mean "localize" and "localizes"?

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 1, Line 9-13, Paragraph 2, Line 20-23, and Paragraph 5, Line 3-6).

      The modified content is as follows:

      “Excitingly, we found the evidence that PA28γ interacts with and stabilizes C1QBP. We speculate that aberrantly accumulated C1QBP enhances the function of mitochondrial OXPHOS and leads to the production of additional ATP and ROS by activating the expression and function of OPA1, MNF1, MFN2 and mitochondrial respiratory chain complex proteins.”

      “Our study reveals that PA28γ interacts with C1QBP and stabilizes C1QBP at the protein level. Therefore, we speculate that the binding sites of PA28γ and C1QBP may mask the specific post-translational modification sites of C1QBP and inhibit its degradation.”

      “Mitochondrial fusion, crucial for oxidative metabolism and cell proliferation, is regulated by MFN1, MFN2, and OPA1. The first two fuse with the outer mitochondrial membrane, while the last fuses with the inner mitochondrial membrane (Westermann, 2010).”

      (5) Figure 6 was not referred to in the text. In this figure, PA28g and C1QBP are located in the inner membrane and matrix. Has this been determined? What is the blue ovals that are intermediaries of PA28g/C1QBP and OPA1/MFN1/MFN2?

      Thank you very much for the important comments. According to our immunofluorescence assay (Figure 2A), PA28γ is in both the nucleus and cytoplasm. A recent study has demonstrated that PA28γ can shuttle between the nucleus and cytoplasm, participating in various cellular processes. Furthermore, GeneCard information indicates that the subcellular localization of PA28γ includes the nucleus, cytoplasm and mitochondria (Author response image 1). In this article, we mainly focus on the functions of PA28γ and C1QBP located in the cytoplasm. Therefore, figure 6 mainly displays PA28γ and C1QBP in the cytoplasm. Based on your suggestion, we have made some modifications to make it more accurate in revised figure (Revised Figure 6).

      Author response image 1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Drawing on insights from preceding studies, the researchers pinpointed mutations within the spag7 gene that correlate with metabolic aberrations in mice. The precise function of spag7 has not been fully described yet, thereby the primary objective of this investigation is to unravel its pivotal role in the development of obesity and metabolic disease in mice. First, they generated a mice model lacking spag7 and observed that KO mice exhibited diminished birth size, which subsequently progressed to manifest obesity and impaired glucose tolerance upon reaching adulthood. This behaviour was primarily attributed to a reduction in energy expenditure. In fact, KO animals demonstrated compromised exercise endurance and muscle functionality, stemming from a deterioration in mitochondrial activity. Intriguingly, none of these effects was observed when using a tamoxifen-induced KO mouse model, implying that Spag7's influence is predominantly confined to the embryonic developmental phase. Explorations within placental tissue unveiled that mice afflicted by Spag7 deficiency experienced placental insufficiency, likely due to aberrant development of the placental junctional zone, a phenomenon that could impede optimal nutrient conveyance to the developing fetus. Overall, the authors assert that Spag7 emerges as a crucial determinant orchestrating accurate embryogenesis and subsequent energy balance in the later stages of life.

      The study boasts several noteworthy strengths. Notably, it employs a combination of animal models and a thorough analysis of metabolic and exercise parameters, underscoring a meticulous approach. Furthermore, the investigation encompasses a comprehensive evaluation of fetal loss across distinct pregnancy stages, alongside a transcriptomic analysis of skeletal muscle, thereby imparting substantial value. However, a pivotal weakness of the study centres on its translational applicability. While the authors claim that "SPAG7 is well-conserved with 97% of the amino acid sequence being identical in humans and mice", the precise role of spag7 in the human context remains enigmatic. This limitation hampers a direct extrapolation of findings to human scenarios. Additionally, the study's elucidation of the molecular underpinnings behind the spag7-mediated anomalous development of the placental junction zone remains incomplete. Finally, the hypothesis positing a reduction in nutrient availability to the fetus, though intriguing, requires further substantiation, leaving an aspect of the mechanism unexplored.

      Hence, in order to fortify the solidity of their conclusions, these concerns necessitate meticulous attention and resolution in the forthcoming version of the manuscript. Upon the comprehensive addressing of these aspects, the study is poised to exert a substantial influence on the field, its significance reverberating significantly. The methodologies and data presented undoubtedly hold the potential to facilitate the community's deeper understanding of the ramifications stemming from disruptions during pregnancy, shedding light on their enduring impact on the metabolic well-being of subsequent generations.

      Thanks to this reviewer for their thoughtful analysis and commentary. Human mutations in SPAG7 are exceedingly rare (SPAG7 | pLoF (genebass.org)), potentially because of the deleterious effects of SPAG7-deficiency on prenatal development. This makes investigation into the causative effects of SPAG7 in humans challenging. There exist mutations in the SPAG7 region of the genome that are associated with BMI, but no direct coding variants within the spag7 gene itself have been studied.

      We agree with the reviewer that the precise role of spag7 in the placenta remains unknown. However, given its robust expression and high protein levels in the placenta, including in key cells, such as the syncytiotrophoblast (https://www.proteinatlas.org/ENSG00000091640-SPAG7/tissue/Placenta), it is highly likely that spag7 is critical for normal placenta development and function. Multiple studies (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9716072/) have recently shown that sperm associated RNAs play a critical role in embryonic and early placenta development. Our findings will provide the basis for future studies that can elucidate the role of spag7 in human placenta.

      Reviewer #2:

      Summary:

      The authors of this manuscript are interested in discovering and functionally characterizing genes that might cause obesity. To find such genes, they conducted a forward genetic screen in mice, selecting strains which displayed increased body weight and adiposity. They found a strain, with germ-line deficiency in the gene Spag7, which displayed significantly increased body weight, fat mass, and adipose depot sizes manifesting after the onset of adulthood (20 weeks). The mice also display decreased organ sizes, leading to decreased lean body mass. The increased adiposity was traced to decreased energy expenditure at both room temperature and thermoneutrality, correlating with decreased locomotor activity and muscle atrophy. Major metabolic abnormalities such as impaired glucose tolerance and insulin sensitivity also accompanied the phenotype. Unexpectedly, when the authors generated an inducible, whole body knockout mouse using a globally expressed Cre-ERT2 along with a globally floxed Spag7, and induced Spag7 knockout before the onset of obesity, none of the phenotypes seen in the original strain were recapitulated. The authors trace this discrepancy to the major effect of Spag7 being on placental development.

      Strengths:

      Strengths of the manuscript are its inherently unbiased approach, using a forward genetic screen to discover previously unknown genes linked to obesity phenotypes. Another strong aspect of the work was the generation of an independent, complementary, strain consisting of an inducible knockout model, in which the deficiency of the gene could be assessed in a more granular form. This approach enabled the discovery of Spag7 as a gene involved in the establishment of the mature placenta, which determines the metabolic fate of the offspring. Additional strengths include the extensive array of physiological parameters measured, which provided a deep understanding of the whole-body metabolic phenotype and pinpointed its likely origin to muscle energetic dysfunction.

      Weaknesses:

      Weaknesses that can be raised are the lack of molecular mechanistic understanding of the numerous phenotypic observations. For example, the specific role of Spag7 to promote placental development remains unclear. Also, the reason why placental developmental abnormalities lead to muscle dysfunction, and whether indeed the entire metabolic phenotype of the offspring can be attributed solely to decreased muscle energetics is not fully explored.

      Overall, the authors achieved a remarkable success in identifying genes associated with development of obesity and metabolic disease, discovering the role of Spag7 in placental development, and highlighting the fundamental role of in-utero development in setting future metabolic state of the offspring.

      We thank this reviewer for their thoughtful analysis and commentary. Significant effort has been made to understand the causes of the metabolic phenotypes observed in SPAG7-deficient mouse models. It is clear that hyperphagia is not the cause and the muscle energetics deficit is likely not the sole cause. We expect that decreased access to nutrition in utero will lead to widespread and varied metabolic adaptation.

      We agree with the reviewer that further work can be done to understand the molecular mechanism driving the metabolic phenotypes of SPAG7-deficient animals. We believe that full investigation of the processes behind the developmental abnormalities is beyond the scope of this paper and best to be done under a separate paper.

      Reviewer #3:

      Summary:

      The manuscript by Flaherty III S.E. et al identified SPAG7 gene in their forward mutagenetic screening and created the germline knockout and inducible knockout mice. The authors reported that the SPAG7 germline knockout mice had lower birth weight likely due to intrauterine growth restriction and placental insufficiency. The SPAG7 KO mice later developed obesity phenotype as a result of reduced energy expenditure. However, the inducible SPAG7 knockout mice had normal body weight and composition.

      Strengths:

      In this reviewer's opinion, this study has high significance in the field of metabolic research for the following reasons.

      1) The authors' findings are significant in the field of obesity research, especially from the perspective of maternal-fetal medicine. The authors created and analyzed the SPAG7 KO mice and found that the KO mice had a "thrifty phenotype" and developed obesity.

      2) SPAG7 gene function hasn't been thoroughly studied. The reported phenotype will fill the gap of knowledge.

      Overall, the authors have presented their results in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings.

      Weaknesses:

      The manuscript can be further strengthened with more clarification on the following points.

      1) The germline whole-body KO mice were female mice (Line293), however the inducible knockout mice were male mice (Line549). Sexual dimorphism is often observed in metabolic studies, therefore the metabolic phenotype of both female and male mice needs to be reported for the germline and inducible knockouts in order to make the justified conclusion.

      2) SPAG7 has an NLS. Does this protein function in gene expression? Whether the overall metabolic phenotype is the direct cause of SPAG7 ablation is unclear. For example, the Hsd17b10 gene was downregulated in all tissues in the KO mice. Could this have been coincidentally selected for and thus be the cause of the developmental issues and adulthood obesity? Do the iSpag7 mice demonstrate reduced expression of Hsd17b10?

      3) Figure 2c should display the energy expenditure normalized to body weight (or lean body mass).

      4) Please provide more information for the figure legend, including the statistical test that was conducted for each data set, animal numbers for each genotype and sexes.

      5) The authors should report how long after treatment the data was collected for figures 4F-M.

      6) The authors should justify ending the data collection after 8 weeks for the iSPAG7 mice in Figures 4C-E. In the WT vs germline KO mice, there was no clear difference in body weight or lean mass at 15 weeks of age.

      Response to point #1 (Weakness): We thank the reviewer for their thoughtful analysis and commentary. All inducible KO animals described in the paper are female (the typo in Line 549 has been corrected). We did perform studies in both male and female animals for both of these lines. Males display similar metabolic phenotypes, though not as robustly as the females. A table summarizing key data from male and female germline KO animals and inducible KO animals has been included below.

      Author response table 1.

      Author response table 2.

      Response to point #2 (Weakness): SPAG7 contains an R3H domain, which is predicted to bind polynucleotides, and other proteins that contain R3H domains are known to bind RNA or ssDNA. The iSPAG7 mice do display decreased hsd17b10 expression (to a lesser degree than the germline KOs) in the tissues examined. When we knock-down SPAG7 in specific tissues, we also see hsd17b10 expression decrease specifically in those tissues. These data all suggest that hsd17b10 expression is, at least, linked to spag7 expression. They also raise the question of why these animals have no metabolic phenotype. Some possible explanations are that hsd17b10 expression is essential only during early development, or that the lower magnitude of downregulation of hsd17b10 in the iSPAG7 is insufficient to produce the metabolic phenotypes seen in the germline Kos with higher magnitude of downregulation.

      Response to point #3 (Weakness): How best to normalize total energy expenditure data is a subject of debate within the energy expenditure field. As the animals have increased body weight and decreased lean mass, normalizing to either will skew the results in different directions. We have included the data normalized to body weight and to lean mass below. The decrease in total energy expenditure remains significant in either scenario.

      Author response image 1.

      Response to point #4 (Weakness): The information has been added to all figures.

      Response to point #5 (Weakness): Weeks after treatment have been added to the figure legends for Figures 4F-M.

      Response to point #6 (Weakness): Highly significant changes in fat mass, glucose tolerance and insulin sensitivity are already present in the germline SPAG7 KO mice at age of 15 week or earlier. Tamoxifen injection effectively induced SPA7 gene KO in less than a week in the iSPAG7 KO mice. Given the absence of significant changes or any trends towards significance in glucose and insulin tolerance test as well as other metabolic testes in the iSPAG7 KO mice at age of 15 week (same age as the germline KO when these changes observed) and 8 week after SPAG7 gene KO, we did not anticipate to see the changes beyond this point and decided to stop the study at 9 weeks after treatment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study uses state-of-the-art methods to label endogenous dopamine receptors in a subset of Drosophila mushroom body neuronal types. The authors report that DopR1 and Dop2R receptors, which have opposing effects in intracellular cAMP, are present in axons termini of Kenyon cells, as well as those of two classes of dopaminergic neurons that innervate the mushroom body indicative of autocrine modulation by dopaminergic neurons. Additional experiments showing opposing effects of starvation on DopR1 and DopR2 levels in mushroom body neurons are consistent with a role for dopamine receptor levels increasing the efficiency of learned food-odour associations in starved flies. Supported by solid data, this is a valuable contribution to the field.

      We thank the editors for the assessment, but request to change “DopR2” to “Dop2R”. The dopamine receptors in Drosophila have confusing names, but what we characterized in this study are called Dop1R1 (according to the Flybase; aka DopR1, dDA1, Dumb) and Dop2R (ibid; aka Dd2R). DopR2 is the name of a different dopamine receptor.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an important and interesting study that uses the split-GFP approach. Localization of receptors and correlating them to function is important in understanding the circuit basis of behavior.

      Strengths:

      The split-GFP approach allows visualization of subcellular enrichment of dopamine receptors in the plasma membrane of GAL4-expressing neurons allowing for a high level of specificity.

      The authors resolve the presynaptic localization of DopR1 and Dop2R, in "giant" Drosophila neurons differentiated from cytokinesis-arrested neuroblasts in culture as it is not clear in the lobes and calyx.

      Starvation-induced opposite responses of dopamine receptor expression in the PPL1 and PAM DANs provide key insights into models of appetitive learning.

      Starvation-induced increase in D2R allows for increased negative feedback that the authors test in D2R knockout flies where appetitive memory is diminished.

      This dual autoreceptor system is an attractive model for how amplitude and kinetics of dopamine release can be fine-tuned and controlled depending on the cellular function and this paper presents a good methodology to do it and a good system where the dynamics of dopamine release can be tested at the level of behavior.

      Weaknesses:

      LI measurements of Kenyon cells and lobes indicate that Dop2R was approximately twice as enriched in the lobe as the average density across the whole neuron, while the lobe enrichment of Dop1R1 was about 1.5 times the average, are these levels consistent during different times of the day and the state of the animal. How were these conditions controlled and how sensitive are receptor expression to the time of day of dissection, staining, etc.

      To answer this question, we repeated the experiment in two replicates at different times of day and confirmed that the receptor localization was consistent (Figure 3 – figure supplement 1); LI measurements showed that Dop2R is enriched more in the lobe and less in the calyx compared to Dop1R1 (Figure 3D). The states of animals that could affect LI (e.g. feeding state and anesthesia for sorting, see methods) were kept constant. 

      The authors assume without discussion as to why and how presynaptic enrichment of these receptors is similar in giant neurons and MB.

      In the revision, we added a short summary to recapitulate that the giant neurons exhibit many characteristics of mature neurons (Lines #152-156): "Importantly, these giant neurons exhibit characteristics of mature neurons, including firing patterns (Wu et al., 1990; Yao & Wu, 2001; Zhao & Wu, 1997) and acetylcholine release (Yao et al., 2000), both of which are regulated by cAMP and CaMKII signaling (Yao et al., 2000; Yao & Wu, 2001; Zhao & Wu, 1997)." In addition, we found punctate Brp accumulations localized to the axon terminals of the giant neurons (former Figure 4D and 4E). Therefore, the giant neuron serves as an excellent model to study the presynaptic localization of dopamine receptors in isolated large cells.

      Figures 1-3 show the expensive expression of receptors in alpha and beta lobes while Figure 5 focusses on PAM and localization in γ and β' projections of PAM leading to the conclusion that presynaptic dopamine neurons express these and have feedback regulation. Consistency between lobes or discussion of these differences is important to consider.

      In the revised manuscript, we show data in the γ KCs (Figure 4C, Figure 5 - figure supplement 1) in addition to α/β KCs, and demonstrate the consistent synaptic localization of Dop1R1 and Dop2R as in α/β KCs (Figure 4B and 5A). 

      Receptor expression in any learning-related MBONs is not discussed, and it would be intriguing as how receptors are organized in those cells. Given that these PAMs input to both KCs and MBONs these will have to work in some coordination.

      The subcellular localization of dopamine receptors in MBONs indeed provides important insights into the site of dopaminergic signaling in these neurons (Takemura et al., 2017; Pavlowsky et al., 2018; Pribbenow et al., 2022). Therefore, we added new data for Dop1R1 and Dop2R in MBON-γ1pedc>αβ (Figure 6). Interestingly, these receptors are localized to in the dendritic projection in the γ1 compartment as well as presynaptic boutons (Figure 6). 

      Although authors use the D2R enhancement post starvation to show that knocking down receptors eliminated appetitive memory, the knocking out is affecting multiple neurons within this circuit including PAMs and KCs. How does that account for the observed effect? Are those not important for appetitive learning? 

      In the appetitive memory experiment (Figure 9C), we knocked down Dop2R only in the select neurons of the PPL1 cluster, and this manipulation does not directly affect Dop2R expression in PAMs and KCs.

      Starvation-induced enhancement of Dop2R expression in the PPL1 neurons (Figure 8F) would attenuate their outputs and therefore disinhibit expression of appetitive memory in starved flies (Krashes et al., 2009). Consistently, Dop2R knock-down in PPL1 impaired appetitive memory in starved flies (Figure 9C). We revised the corresponding text to make this point clearer (Lines #224227).

      The evidence for fine-tuning is completely based on receptor expression and one behavioral outcome which could result from many possibilities. It is not clear if this fine-tuning and presynaptic feedback regulation-based dopamine release is a clear possibility. Alternate hypotheses and outcomes could be considered in the model as it is not completely substantiated by data at least as presented.

      The reviewer’s concern is valid, and the presynaptic dopamine tuning by autoreceptors may need more experimental support. We therefore additionally discussed another possibility (Lines #289-291): “Alternatively, these presynaptic receptors could potentially receive extrasynaptic dopamine released from other DANs. Therefore, the autoreceptor functions need to be experimentally clarified by manipulating the receptor expression in DANs.”

      Reviewer #2 (Public Review):

      Summary:

      Hiramatsu et al. investigated how cognate neurotransmitter receptors with antagonizing downstream effects localize within neurons when co-expressed. They focus on mapping the localization of the dopaminergic Dop1R1 and Dop2R receptors, which correspond to the mammalian D1- and D2-like dopamine receptors, which have opposing effects on intracellular cAMP levels, in neurons of the Drosophila mushroom body (MB). To visualize specific receptors in single neuron types within the crowded MB neuropil, the authors use existing dopamine receptor alleles tagged with 7 copies of split GFP to target reconstitution of GFP tags only in the neurons of interest as a read-out of receptor localization. The authors show that both Dop1R1 and Dop2R, with differing degrees, are enriched in axonal compartments of both the Kenyon Cells cholinergic presynaptic inputs and in different dopamine neurons (DANs), which project axons to the MB. Co-localization studies of dopamine receptors with the presynaptic marker Brp suggest that Dop1R1 and, to a larger extent Dop2R, localize in the proximity of release sites. This localization pattern in DANs suggests that Dop1R1 and Dop2R work in dual-feedback regulation as autoreceptors. Finally, they provide evidence that the balance of Dop1R1 and Dop2R in the axons of two different DAN populations is differentially modulated by starvation and that this regulation plays a role in regulating appetitive behaviors.

      Strengths:

      The authors use reconstitution of GFP fluorescence of split GFP tags knocked into the endogenous locus at the C-terminus of the dopamine receptors as a readout of dopamine receptor localization. This elegant approach preserves the endogenous transcriptional and post-transcriptional regulation of the receptor, which is essential for studies of protein localization.

      The study focuses on mapping the localization of dopamine receptors in neurons of the mushroom body. This is an excellent choice of system to address the question posed in this study, as the neurons are well-studied, and their connections are carefully reconstructed in the mushroom body connectome. Furthermore, the role of this circuit in different behaviors and associative memory permits the linking of patterns of receptor localization to circuit function and resulting behavior. Because of these features, the authors can provide evidence that two antagonizing dopamine receptors can act as autoreceptors within the axonal compartment of MB innervating DANs. The differential regulation of the balance of the two receptors under starvation in two distinct DAN innervations provides evidence of the role that regulation of this balance can play in circuit function and behavioral output.

      Weaknesses:

      The approach of using endogenously tagged alleles to study localization is a strength of this study, but the authors do not provide sufficient evidence that the insertion of 7 copies of split GFP to the C terminus of the dopamine receptors does not interfere with the endogenous localization pattern or function. Both sets of tagged alleles (1X Venus and 7X split GFP tagged) were previously reported (Kondo et al., 2020), but only the 1X Venus tagged alleles were further functionally validated in assays of olfactory appetitive memory. Despite the smaller size of the 7X split-GFP array tag knocked into the same location as the 1X venus tag, the reconstitution of 7 copies of GFP at the C terminus of the dopamine receptor, might substantially increase the molecular bulk at this site, potentially impeding the function of the receptor more significantly than the smaller, single Venus tag. The data presented by Kondo et al. 2020, is insufficient to conclude that the two alleles are equivalent.

      In the revision, we validated the function of these engineered receptors by a new set of olfactory learning experiments. Both these receptors in KCs were shown to be required for aversive memory (Kim et al., 2007, Scholz-Kornehl et al., 2016). As in the anatomical experiments, we induced GFP110 expression in KC of the flies homozygous for 7xGFP<sub>11</sub>-tagged receptors using MB-Switch and 3 days of RU486 feeding o. We confirmed STM performance of these flies were not significantly different from the control (Figure 2 – figure supplement 1). Thus, these fusion receptors are functional.

      The authors' conclusion that the receptors localize to presynaptic sites is weak. The analysis of the colocalization of the active zone marker Brp whole-brain staining with dopamine receptors labeled in specific neurons is insufficient to conclude that the receptors are localized at presynaptic sites. Given the highly crowded neuropil environment, the data cannot differentiate between the receptor localization postsynaptic to a dopamine release site or at a presynaptic site within the same neuron. The known distribution of presynaptic sites within the neurons analyzed in the study provides evidence that the receptors are enriched in axonal compartments, but co-labeling of presynaptic sites and receptors in the same neuron or super-resolution methods are needed to provide evidence of receptor localization at active zones.  The data presented in Figures 5K-5L provides compelling evidence that the receptors localize to neuronal varicosities in DANs where the receptors could play a role as autoreceptors.

      Given the highly crowded environment of the mushroom body neuropil, the analysis of dopamine receptor localization in Kenyon cells is not conclusive. The data is sufficient to conclude that the receptors are preferentially localizing to the axonal compartment of Kenyon cells, but co-localization with brain-wide Brp active zone immunostaining is not sufficient to determine if the receptor localizes juxtaposed to dopaminergic release sites, in proximity of release sites in Kenyon cells, or both.

      To better resolve the microcircuits of KCs, we triple-labeled the plasma membrane and DAR::rGFP in KCs, and Brp, and examined their localizations with high-resolution imaging with  Airyscan. This strategy revealed the receptor clusters associated with Brp accumulation within KCs (Figure 4). To further verify the association of DARs and active zones within KCs, we co-expressed Brp<sup>short</sup>::mStraw and GFP<sub>1-10</sub> and confirmed their colocalization (Figure 5A), suggesting presynaptic localization of DARs in KCs. With these additional characterizations, we now discuss the significance of receptors at the presynaptic sites of KCs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is an important and interesting study that uses the split-GFP approach. Localization of receptors and correlating them to function is important in understanding the circuit basis of behavior.

      For Figure 1, the authors show PAM, PPL1 neurons, and the ellipsoid body as a validation of their tools (Dop1R1-T2A-GAL4 and Dop2R-T2A-GAL4) and the idea that these receptors are colocalized. However, it appears that the technique was applied to the whole brain so it would be great to see the whole brain to understand how much labelling is specific and how stochastic. Methods could include how dissection conditions were controlled and how sensitive are receptor expression to the time of day of dissection, staining, etc.

      The expression patterns of the receptor T2A-GAL4 lines (Figure 1A and 1B) are consistent in the multiple whole brains (Kondo et al., 2020, Author response image 1).

      Author response image 1.

      The significance of the expression of these two receptors in an active zone is not clearly discussed and presynaptic localization is not elaborated on. Would something like expansion microscopy be useful in resolving this? It would be important to discuss that as giant neurons in culture don't replicate many aspects of the MB system.

      In the revised manuscript, we elaborated discussion regarding the function of the two antagonizing receptors at the AZ (Lines #226-275).

      Does MB-GeneSwitch > GFP1-1 reliably express in gamma lobes? Most of the figures show alpha/beta lobes.

      Yes. MB-GeneSwitch is also expressed in γ KCs, but weakly. 12 hours of RU486 feeding, which we did in the previous experiments, was insufficient to induce GFP reconstitution in the γ KCs. By extending the time of transgene induction, we visualized expression of Dop1R1 and Dop2R more clearly in γ KCs. Their localization is similar to that in the α/β KCs (Figure 4C, Figure 5 - figure supplement 1).

      Figure 6, y-axis says protein level. At first, I thought it was related to starvation so maybe authors can be more specific as the protein level doesn't indicate any aspect of starvation.

      We appreciate this comment, and the labels on the y-axis were now changed to “rGFP levels” (Figure 8C and 8F, Figure 8 - figure supplement 1B, 1D and 1F).

      Reviewer #2 (Recommendations For The Authors):

      Title:

      The title of the manuscript focuses on the tagging of the receptors and their synaptic enrichment.

      Given that the alleles used in the study were generated in a previously published study (Kondo et al, 2020), which describes the receptor tagging and that the data currently provided is insufficient to conclude that the receptors are localizing to synapses, the title should be changed to reflect the focus on localizing antagonistic cognate neurotransmitter receptors in the same neuron and their putative role as autoreceptors in DANs.

      Following this advice, we removed the methodology from the title and revised it to “Synaptic enrichment and dynamic regulation of the two opposing dopamine receptors within the same neurons”.

      Minor issues with text and figures:

      Figure 1

      A conclusion from Figure 1 is that the two receptors are co-expressed in Kenyon cells. Please provide panels equivalent to the ones shown in D-G, with Kenyon cells cell bodies, or mark these cells in the existing panels, if present. Line 111 refers to panel 1D as the Kenyon cells panel, which is currently a PAM panel.

      We added images for coexpression of these receptors in the cell bodies of KCs (Figure 1 - figure supplement 1) and revised the text accordingly (Lines #89-90).

      Given that most of the study centers on visualizing receptor localization, it would benefit the reader to include labels in Figure 1 that help understand that these panels reflect expression patterns rather than receptor localization. For instance, rCD2::GFP could be indicated in the Dop1R1-LexA panels.

      As suggested, labels were added to indicate the UAS and lexAop markers (Figure 1D, 1E, 1G-1I and Figure 1 – figure supplement 1).

      Given that panels D-E focus on the cell bodies of the neurons, it could be beneficial for the reader to present the ellipsoid body neurons using a similar view that only shows the cell bodies. Similarly, one could just show the glial cell bodies .

      We now show the cell bodies of ring neurons (Figure 1G) and ensheathing glia (Figure 1I).

      For panel 1E, please indicate the subset of PPL1 neurons that both expressed Dop1R1 and Dop2R, as indicated in the text, as it is currently unclear from the image.

      Dop1R1-T2A-LexA was barely detected in all PPL1 (Figure 1E). We corrected the confusing text (Lines #95-96).

      Figure 2

      The cartoon of the cell-type-specific labeling should show that the tag is 7XFP-11 and the UAScomponent FP-10, as the current cartoon leads the reader to conclude that the receptors are tagged with a single copy of split GFP. The detail that the receptors are tagged with 7 copies of split GFP is only provided through the genotype of the allele in the resource table.  This design aspect should be made clear in the figure and the text when describing the allele and approach used to tag receptors in specific neuron types.

      We now added the construct design in the scheme (Figure 2A) and revised the corresponding text (Line #101-103).

      Panel A. The arrow representing the endogenous promoter in the yellow gene representation should be placed at the beginning of the coding sequence. Currently, the different colors of what I assume are coding (yellow) and non-coding (white) transcript regions are not described in the legend.  I would omit these or represent them in the same color as thinner boxes if the authors want to emphasize that the tag is inserted at the C terminus within the endogenous locus.

      The color scheme was revised to be more consistent and intuitive (Figure 2A).

      Figure 3

      Labels of the calyx and MB lobes would benefit readers not as familiar with the system used in the study. In addition, it would be beneficial to the reader to indicate in panel A the location of the compartments analyzed in panel H (e.g., peduncle, α3).

      Figure 3A was amended to clearly indicate the analyzed MB compartments.

      Adding frontal and sagittal to panels B-E, as in Figure 2, would help the reader interpret the data. 

      In Figure 3B, “Frontal” and “Sagittal” were indicated.

      Panel F-G. A scale bar should be provided for the data shown in the insets. Could the author comment on the localization of Dop1R1 in KCs? The data in the current panel suggests that only a subset of KCs express high levels of receptors in their axons, as a portion of the membrane is devoid of receptor signals. This would be in line with differential dopamine receptor expression in subsets of Kenyon cells, as shown in Kondo et al., 2020, which is currently not commented on in the paper. 

      We confirmed that the majority of the KCs express both Dop1R1 and Dop2R genes (Figure 1 - figure supplement 1). LIs should be compared within the same cells rather than the differences of protein levels between cell types as they also reflect the GAL4 expression levels. 

      Panel H. Some P values are shown as n.s. (p> 0.05). Other non-significant p values in this panel and in other figures throughout the paper are instead reported (e.g. peduncle P=0.164). For consistency, please report the values as n.s. as indicated in the methods for all non-significant tests in this panel and throughout the manuscript.

      We now present the new dataset, and the graph represents the appropriate statistical results (Figure 3D; see the methods section for details).

      The methods of labeling the receptors through the expression of the GeneSwitch-controlled GFP1-10 in Kenyon cells induced by RU486 are not provided in the methods. Please provide a description of this as referenced in the figure legend and the genotypes used in the analysis shown in the panels.

      The method of RU486 feeding has been added. We apologize for the missing method.

      Figure 4

      Please provide scale bars for the inset in panels A-B.

      Scale bars were added to all confocal images.

      The current analysis cannot distinguish between postsynaptic and presynaptic dopamine receptors in KCs, and the figure title should reflect this.

      We now present the new data dopamine receptors in KCs and clearly distinguish Brp clusters of the KCs and other cell types (Figure 4, Figure 5).

      The reader could benefit from additional details of using the giant neuron model, as it is not commonly used, and it is not clear how to relate this to interpret the localization of dopaminergic receptors within Kenyon cells. The use of the venus-tagged receptor variant should be introduced in the text, as using a different allele currently lacks context. Figures 4F-4J show that the receptor is localizing throughout the neuron. Quantifying the fraction of receptor signal colocalizing with Brp could aid in interpreting the data.  However, it would still not be clear how to interpret this data in the context of understanding the localization of the receptors in neurons within fly brain circuits. In the absence of additional data, the data provided in Figure 4 is inconclusive and could be omitted, keeping the focus of the study on the analysis of the two receptors in DANs. Co-expressing a presynaptic marker in Kenyon cells (e.g., by expressing Brp::SNAP)  in conjunction with rGFP labeled receptor would provide additional evidence of the relationship of release sites in Kenyon cells and tagged dopamine receptors in these same cells and could add evidence in support to the current conclusion.

      Following the advice, we added a short summary to recapitulate that the giant neurons exhibit many characteristics of mature neurons (Lines #152-156): "Importantly, these giant neurons exhibit characteristics of mature neurons, including firing patterns (Wu et al., 1990; Yao & Wu, 2001; Zhao & Wu, 1997) and acetylcholine release (Yao et al., 2000), both of which are regulated by cAMP and CaMKII signaling (Yao et al., 2000; Yao & Wu, 2001; Zhao & Wu, 1997)." Therefore, the giant neuron serves as an excellent model to study the presynaptic localization in large cells in isolation.

      To clarify polarized localization of Brp clusters and dopamine receptors but not "localizing throughout the neuron", we now show less magnified data (Figure 5C). It clearly demonstrates punctate Brp accumulations localized to the axon terminals of the giant neurons (former Figure 4D and 4E). This is the same membrane segment where Dop1R1 and Dop2R are localized (Figure 5C). Therefore, the association of Brp clusters and the dopamine receptors in the isolated giant neurons suggests that the subcellular localization in the brain neurons is independent of the circuit context. 

      As the giant neurons do not form intermingled circuits, venus-tagged receptors are sufficient for this experiment and simpler in genetics.

      Following the suggestion to clarify the AZ association of the receptors in KCs, we coexpressed Brpshort-mStraw and GFP1-10 in KCs and confirmed their colocalization (Figure 5A).

      Figure 6

      The data and analysis show that starvation induces changes in the α3 compartment in PPL1 neurons only, while the data provided shows no significant change for PPL1 neurons innervating other MB compartments. This should be clearly stated in lines 174-175, as it is implied that there is a difference in the analysis for compartments other than α3. Panel L of Figure 6 - supplement 1 shows no significant change for all three compartments analyzed and should be indicated as n.s. in all instances, as stated in the methods. 

      We revised the text to clarify that the starvation-induced differences of Dop2R expression were not significant (Lines #217-219). The reason to highlight the α3 compartment is that both Dop1R1 and Dop2R are coexpressed in this PPL1 neuron (Figure 8D).

      Additional minor comments:

      There are a few typos and errors throughout the manuscript. The text should be carefully proofread to correct these. Here are the ones that came to my attention:

      Please reference all figure panels in the text. For instance, Figure 3A is not mentioned and should be revised in line 112 as Figure 3A-E.

      Lines 103-104. The sentence "LI was visualized as the color of the membrane signals" is unclear and should be revised. 

      Figure 4 legend - dendritic claws should likely be B and C and not B and E.

      Lines 147 - Incorrect figure panels, should be 5C-L or 5D-E.

      Line 241 - DNAs should be DANs.

      Methods - please define what the abbreviation CS stands for.

      We really appreciate for careful reading of this reviewer. All these were corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript.

      We appreciate the Editorial assessment on our paper’s strengths and novelty. We have implemented additional control analyses to show that neither task-related eye movements nor increasing overlap of finger movements during learning account for our findings, which are that contextualized neural representations in a network of bilateral frontoparietal brain regions actively contribute to skill learning. Importantly, we carried out additional analyses showing that contextualization develops predominantly during rest intervals.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning.

      Strengths:

      The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established and neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these socalled micro-offline rest periods. The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%.

      We have previously showed that neural replay of MEG activity representing the practiced skill was prominent during rest intervals of early learning, and that the replay density correlated with micro-offline gains (Buch et al., 2021). These findings are consistent with recent reports (from two different research groups) that hippocampal ripple density increases during these inter-practice rest periods, and predict offline learning gains (Chen et al., 2024; Sjøgård et al., 2024). However, decoder performance in our earlier work (Buch et al., 2021) left room for improvement. Here, we reported a strategy to improve decoding accuracy that could benefit future studies of neural replay or BCI using MEG.

      Weaknesses:

      There are a few concerns which the authors may well be able to resolve. These are not weaknesses as such, but factors that would be helpful to address as these concern potential contributions to the results that one would like to rule out. Regarding the decoding results shown in Figure 2 etc, a concern is that within individual frequency bands, the highest accuracy seems to be within frequencies that match the rate of keypresses. This is a general concern when relating movement to brain activity, so is not specific to decoding as done here. As far as reported, there was no specific restraint to the arm or shoulder, and even then it is conceivable that small head movements would correlate highly with the vigor of individual finger movements. This concern is supported by the highest contribution in decoding accuracy being in middle frontal regions - midline structures that would be specifically sensitive to movement artefacts and don't seem to come to mind as key structures for very simple sequential keypress tasks such as this - and the overall pattern is remarkably symmetrical (despite being a unimanual finger task) and spatially broad. This issue may well be matching the time course of learning, as the vigor and speed of finger presses will also influence the degree to which the arm/shoulder and head move. This is not to say that useful information is contained within either of the frequencies or broadband data. But it raises the question of whether a lot is dominated by movement "artefacts" and one may get a more specific answer if removing any such contributions.

      Reviewer #1 expresses concern that the combination of the low-frequency narrow-band decoder results, and the bilateral middle frontal regions displaying the highest average intra-parcel decoding performance across subjects is suggestive that the decoding results could be driven by head movement or other artefacts.

      Head movement artefacts are highly unlikely to contribute meaningfully to our results for the following reasons. First, in addition to ICA denoising, all “recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements” (see Methods). Second, the response pad was positioned in a manner that minimized wrist, arm or more proximal body movements during the task. Third, while online monitoring of head position was not performed for this study, it was assessed at the beginning and at the end of each recording. The head was restrained with an inflatable air bladder, and head movement between the beginning and end of each scan did not exceed 5mm for all participants included in the study.

      The Reviewer states a concern that “it is conceivable that small head movements would correlate highly with the vigor of individual finger movements”. We agree that despite the steps taken above, it is possible that minor head movements could still contribute to some remaining variance in the MEG data in our study. However, such correlations between small head movements and finger movements could only meaningfully contribute to decoding performance if: (A) they were consistent and pervasive throughout the recording (which might not be the case if the head movements were related to movement vigor and vigor changed over time); and (B) they systematically varied between different finger movements, and also between the same finger movement performed at different sequence locations (see 5-class decoding performance in Figure 4B). The possibility of any head movement artefacts meeting all these conditions is unlikely. Alternatively, for this task design a much more likely confound could be the contribution of eye movement artefacts to the decoder performance (an issue raised by Reviewer #3 in the comments below).

      Remember from Figure 1A in the manuscript that an asterisk marks the current position in the sequence and is updated at each keypress. Since participants make very few performance errors, the position of the asterisk on the display is highly correlated with the keypress being made in the sequence. Thus, it is possible that if participants are attending to the visual feedback provided on the display, they may generate eye movements that are systematically related to the task. Since we did record eye movements simultaneously with the MEG recordings (EyeLink 1000 Plus; Fs = 600 Hz), we were able to perform a control analysis to address this question. For each keypress event during trials in which no errors occurred (which is the same time-point that the asterisk position is updated), we extracted three features related to eye movements: 1) the gaze position at the time of asterisk position update (triggered by a KeyDown event), 2) the gaze position 150ms later, and 3) the peak velocity of the eye movement between the two positions. We then constructed a classifier from these features with the aim of predicting the location of the asterisk (ordinal positions 1-5) on the display. As shown in the confusion matrix below (Author response image 1), the classifier failed to perform above chance levels (overall cross-validated accuracy = 0.21817):

      Author response image 1.

      Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. – behavioral artefacts) (end of figure legend).

      Remember that the task display does not provide explicit feedback related to performance, only information about the present position in the sequence. Thus, it is possible that participants did not actively attend to the feedback. In fact, inspection of the eye position data revealed that on majority of trials, participants displayed random-walk-like gaze patterns around a central fixation point located near the center of the screen. Thus, participants did not attend to the asterisk position on the display, but instead intrinsically generated the action sequence. A similar realworld example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks) as provided in the study task – feedback which is typically ignored by the user.

      The minimal participant engagement with the visual task display observed in this study highlights another important point – that the behavior in explicit sequence learning motor tasks is highly generative in nature rather than reactive to stimulus cues as in the serial reaction time task (SRTT). This is a crucial difference that must be carefully considered when designing investigations and comparing findings across studies.

      We observed that initial keypress decoding accuracy was predominantly driven by contralateral primary sensorimotor cortex in the initial practice trials before transitioning to bilateral frontoparietal regions by trials 11 or 12 as performance gains plateaued. The contribution of contralateral primary sensorimotor areas to early skill learning has been extensively reported in humans and non-human animals.(Buch et al., 2021; Classen et al., 1998; Karni et al., 1995; Kleim et al., 1998) Similarly, the increased involvement of bilateral frontal and parietal regions to decoding during early skill learning in the non-dominant hand is well known. Enhanced bilateral activation in both frontal and parietal cortex during skill learning has been extensively reported (Doyon et al., 2002; Grafton et al., 1992; Hardwick et al., 2013; Kennerley et al., 2004; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001), and appears to be even more prominent during early fine motor skill learning in the non-dominant hand (Lee et al., 2019; Sawamura et al., 2019). The frontal regions identified in these studies are known to play crucial roles in executive control (Battaglia-Mayer & Caminiti, 2019), motor planning (Toni, Thoenissen, et al., 2001), and working memory (Andersen & Buneo, 2002; Buneo & Andersen, 2006; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001; Wolpert et al., 1998) processes, while the same parietal regions are known to integrate multimodal sensory feedback and support visuomotor transformations (Andersen & Buneo, 2002; Buneo & Andersen, 2006; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001; Wolpert et al., 1998), in addition to working memory (Grover et al., 2022). Thus, it is not surprising that these regions increasingly contribute to decoding as subjects internalize the sequential task. We now include a statement reflecting these considerations in the revised Discussion.

      A somewhat related point is this: when combining voxel and parcel space, a concern is whether a degree of circularity may have contributed to the improved accuracy of the combined data, because it seems to use the same MEG signals twice - the voxels most contributing are also those contributing most to a parcel being identified as relevant, as parcels reflect the average of voxels within a boundary. In this context, I struggled to understand the explanation given, ie that the improved accuracy of the hybrid model may be due to "lower spatially resolved whole-brain and higher spatially resolved regional activity patterns".

      We disagree with the Reviewer’s assertion that the construction of the hybrid-space decoder is circular for the following reasons. First, the base feature set for the hybrid-space decoder constructed for all participants includes whole-brain spatial patterns of MEG source activity averaged within parcels. As stated in the manuscript, these 148 inter-parcel features reflect “lower spatially resolved whole-brain activity patterns” or global brain dynamics. We then independently test how well spatial patterns of MEG source activity for all voxels distributed within individual parcels can decode keypress actions. Again, the testing of these intra-parcel spatial patterns, intended to capture “higher spatially resolved regional brain activity patterns”, is completely independent from one another and independent from the weighting of individual inter-parcel features. These intra-parcel features could, for example, provide additional information about muscle activation patterns or the task environment. These approximately 1150 intra-parcel voxels (on average, within the total number varying between subjects) are then combined with the 148 inter-parcel features to construct the final hybrid-space decoder. In fact, this varied spatial filter approach shares some similarities to the construction of convolutional neural networks (CNNs) used to perform object recognition in image classification applications (Srinivas et al., 2016). One could also view this hybrid-space decoding approach as a spatial analogue to common timefrequency based analyses such as theta-gamma phase amplitude coupling (θ/γ PAC), which assess interactions between two or more narrow-band spectral features derived from the same time-series data (Lisman & Jensen, 2013).

      We directly tested this hypothesis – that spatially overlapping intra- and inter-parcel features portray different information – by constructing an alternative hybrid-space decoder (Hybrid<sub>Alt</sub>) that excluded average inter-parcel features which spatially overlapped with intra-parcel voxel features, and comparing the performance to the decoder used in the manuscript (Hybrid<sub>Orig</sub>). The prediction was that if the overlapping parcel contained similar information to the more spatially resolved voxel patterns, then removing the parcel features (n=8) from the decoding analysis should not impact performance. In fact, despite making up less than 1% of the overall input feature space, removing those parcels resulted in a significant drop in overall performance greater than 2% (78.15% ± 7.03% SD for Hybrid<sub>Orig</sub> vs. 75.49% ± 7.17% for Hybrid<sub>Alt</sub>; Wilcoxon signed rank test, z = 3.7410, p = 1.8326e-04; Author response image 2).

      Author response image 2.

      Comparison of decoding performances with two different hybrid approaches. Hybrid<sub>Alt</sub>: Intra-parcel voxel-space features of top ranked parcels and inter-parcel features of remaining parcels. Hybrid<sub>Orig</sub>: Voxel-space features of top ranked parcels and whole-brain parcel-space features (i.e. – the version used in the manuscript). Dots represent decoding accuracy for individual subjects. Dashed lines indicate the trend in performance change across participants. Note, that Hybrid<sub>Orig</sub> (the approach used in our manuscript) significantly outperforms the Hybrid<sub>Alt</sub> approach, indicating that the excluded parcel features provide unique information compared to the spatially overlapping intra-parcel voxel patterns (end of figure legend).

      Firstly, there will be a relatively high degree of spatial contiguity among voxels because of the nature of the signal measured, i.e. nearby individual voxels are unlikely to be independent. Secondly, the voxel data gives a somewhat misleading sense of precision; the inversion can be set up to give an estimate for each voxel, but there will not just be dependence among adjacent voxels, but also substantial variation in the sensitivity and confidence with which activity can be projected to different parts of the brain. Midline and deeper structures come to mind, where the inversion will be more problematic than for regions along the dorsal convexity of the brain, and a concern is that in those midline structures, the highest decoding accuracy is seen.

      We agree with the Reviewer that some inter-parcel features representing neighboring (or spatially contiguous) voxels are likely to be correlated, an important confound in connectivity analyses (Colclough et al., 2015; Colclough et al., 2016), not performed in our investigation.

      In our study, correlations between adjacent voxels effectively reduce the dimensionality of the input feature space. However, as long as there are multiple groups of correlated voxels within each parcel (i.e. – the rank is greater than 1), the intra-parcel spatial patterns could meaningfully contribute to the decoder performance, as shown by the following results:

      First, we obtained higher decoding accuracy with voxel-space features (74.51% ± 7.34% SD) compared to parcel space features (68.77% ± 7.6%; Figure 3B), indicating individual voxels carry more information in decoding the keypresses than the averaged voxel-space features or parcel space features. Second, individual voxels within a parcel showed varying feature importance scores in decoding keypresses (Author response image 3). This finding shows that correlated voxels form mini subclusters that are much smaller spatially than the parcel they reside within.

      Author response image 3.:

      Feature importance score of individual voxels in decoding keypresses: MRMR was used to rank the individual voxel space features in decoding keypresses and the min-max normalized MRMR score was mapped to a structural brain surface. Note that individual voxels within a parcel showed different contribution to decoding (end of figure legend).

      Some of these concerns could be addressed by recording head movement (with enough precision) to regress out these contributions. The authors state that head movement was monitored with 3 fiducials, and their time courses ought to provide a way to deal with this issue. The ICA procedure may not have sufficiently dealt with removing movement-related problems, but one could eg relate individual components that were identified to the keypresses as another means for checking. An alternative could be to focus on frequency ranges above the movement frequencies. The accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment.

      We have already addressed the issue of movement related artefacts in the first response above. With respect to a focus on frequency ranges above movement frequencies, the Reviewer states the “accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment”. First, it is important to note that cortical delta-band oscillations measured with local field potentials (LFPs) in macaques is known to contain important information related to end-effector kinematics (Bansal et al., 2011; Mollazadeh et al., 2011) muscle activation patterns (Flint et al., 2012) and temporal sequencing (Churchland et al., 2012) during skilled reaching and grasping actions. Thus, there is a substantial body of evidence that low-frequency neural oscillatory activity in this range contains important information about the skill learning behavior investigated in the present study. Second, our own data shows (which the Reviewer also points out) that significant information related to the skill learning behavior is also present in higher frequency bands (see Figure 2A and Figure 3—figure supplement 1). As we pointed out in our earlier response to questions about the hybrid space decoder architecture (see above), it is likely that different, yet complimentary, information is encoded across different temporal frequencies (just as it is encoded across different spatial frequencies) (Heusser et al., 2016). Again, this interpretation is supported by our data as the highest performing classifiers in all cases (when holding all parameters constant) were always constructed from broadband input MEG data (Figure 2A and Figure 3—figure supplement 1).

      One question concerns the interpretation of the results shown in Figure 4. They imply that during the course of learning, entirely different brain networks underpin the behaviour. Not only that, but they also include regions that would seem rather unexpected to be key nodes for learning and expressing relatively simple finger sequences, such as here. What then is the biological plausibility of these results? The authors seem to circumnavigate this issue by moving into a distance metric that captures the (neural network) changes over the course of learning, but the discussion seems detached from which regions are actually involved; or they offer a rather broad discussion of the anatomical regions identified here, eg in the context of LFOs, where they merely refer to "frontoparietal regions".

      The Reviewer notes the shift in brain networks driving keypress decoding performance between trials 1, 11 and 36 as shown in Figure 4A. The Reviewer questions whether these shifts in brain network states underpinning the skill are biologically plausible, as well as the likelihood that bilateral superior and middle frontal and parietal cortex are important nodes within these networks.

      First, previous fMRI work in humans assessed changes in functional connectivity patterns while participants performed a similar sequence learning task to our present study (Bassett et al., 2011). Using a dynamic network analysis approach, Bassett et al. showed that flexibility in the composition of individual network modules (i.e. – changes in functional brain region membership of orthogonal brain networks) is up-regulated in novel learning environments and explains differences in learning rates across individuals. Thus, consistent with our findings, it is likely that functional brain networks rapidly reconfigure during early learning of novel sequential motor skills.

      Second, frontoparietal network activity is known to support motor memory encoding during early learning (Albouy et al., 2013; Albouy et al., 2012). For example, reactivation events in the posterior parietal (Qin et al., 1997) and medial prefrontal (Euston et al., 2007; Molle & Born, 2009) cortex (MPFC) have been temporally linked to hippocampal replay, and are posited to support memory consolidation across several memory domains (Frankland & Bontempi, 2005), including motor sequence learning (Albouy et al., 2015; Buch et al., 2021; F. Jacobacci et al., 2020). Further, synchronized interactions between MPFC and hippocampus are more prominent during early as opposed to later learning stages (Albouy et al., 2013; Gais et al., 2007; Sterpenich et al., 2009), perhaps reflecting “redistribution of hippocampal memories to MPFC” (Albouy et al., 2013). MPFC contributes to very early memory formation by learning association between contexts, locations, events and adaptive responses during rapid learning (Euston et al., 2012). Consistently, coupling between hippocampus and MPFC has been shown during initial memory encoding and during subsequent rest (van Kesteren et al., 2010; van Kesteren et al., 2012). Importantly, MPFC activity during initial memory encoding predicts subsequent recall (Wagner et al., 1998). Thus, the spatial map required to encode a motor sequence memory may be “built under the supervision of the prefrontal cortex” (Albouy et al., 2012), also engaged in the development of an abstract representation of the sequence (Ashe et al., 2006). In more abstract terms, the prefrontal, premotor and parietal cortices support novice performance “by deploying attentional and control processes” (Doyon et al., 2009; Hikosaka et al., 2002; Penhune & Steele, 2012) required during early learning (Doyon et al., 2009; Hikosaka et al., 2002; Penhune & Steele, 2012). The dorsolateral prefrontal cortex DLPFC specifically is thought to engage in goal selection and sequence monitoring during early skill practice (Schendan et al., 2003), all consistent with the schema model of declarative memory in which prefrontal cortices play an important role in encoding (Morris, 2006; Tse et al., 2007). Thus, several prefrontal and frontoparietal regions contributing to long term learning (Berlot et al., 2020) are also engaged in early stages of encoding. Altogether, there is strong biological support for the involvement of bilateral prefrontal and frontoparietal regions to decoding during early skill learning. We now address this issue in the revised manuscript.

      If I understand correctly, the offline neural representation analysis is in essence the comparison of the last keypress vs the first keypress of the next sequence. In that sense, the activity during offline rest periods is actually not considered. This makes the nomenclature somewhat confusing. While it matches the behavioural analysis, having only key presses one can't do it in any other way, but here the authors actually do have recordings of brain activity during offline rest. So at the very least calling it offline neural representation is misleading to this reviewer because what is compared is activity during the last and during the next keypress, not activity during offline periods. But it also seems a missed opportunity - the authors argue that most of the relevant learning occurs during offline rest periods, yet there is no attempt to actually test whether activity during this period can be useful for the questions at hand here.

      We agree with the Reviewer that our previous “offline neural representation” nomenclature could be misinterpreted. In the revised manuscript we refer to this difference as the “offline neural representational change”. Please, note that our previous work did link offline neural activity (i.e. – 16-22 Hz beta power (Bonstrup et al., 2019) and neural replay density (Buch et al., 2021) during inter-practice rest periods) to observed micro-offline gains.

      Reviewer #2 (Public review):

      Summary

      Dash et al. asked whether and how the neural representation of individual finger movements is "contextualized" within a trained sequence during the very early period of sequential skill learning by using decoding of MEG signal. Specifically, they assessed whether/how the same finger presses (pressing index finger) embedded in the different ordinal positions of a practiced sequence (4-1-3-2-4; here, the numbers 1 through 4 correspond to the little through the index fingers of the non-dominant left hand) change their representation (MEG feature). They did this by computing either the decoding accuracy of the index finger at the ordinal positions 1 vs. 5 (index_OP1 vs index_OP5) or pattern distance between index_OP1 vs. index_OP5 at each training trial and found that both the decoding accuracy and the pattern distance progressively increase over the course of learning trials. More interestingly, they also computed the pattern distance for index_OP5 for the last execution of a practice trial vs. index_OP1 for the first execution in the next practice trial (i.e., across the rest period). This "off-line" distance was significantly larger than the "on-line" distance, which was computed within practice trials and predicted micro-offline skill gain. Based on these results, the authors conclude that the differentiation of representation for the identical movement embedded in different positions of a sequential skill ("contextualization") primarily occurs during early skill learning, especially during rest, consistent with the recent theory of the "micro-offline learning" proposed by the authors' group. I think this is an important and timely topic for the field of motor learning and beyond.

      Strengths

      The specific strengths of the current work are as follows. First, the use of temporally rich neural information (MEG signal) has a large advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Second, through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. As claimed by the authors, this is one of the strengths of the paper (but see my comments). Third, although some potential refinement might be needed, comparing "online" and "offline" pattern distance is a neat idea.

      Weaknesses

      Along with the strengths I raised above, the paper has some weaknesses. First, the pursuit of high decoding accuracy, especially the choice of time points and window length (i.e., 200 msec window starting from 0 msec from key press onset), casts a shadow on the interpretation of the main result. Currently, it is unclear whether the decoding results simply reflect behavioral change or true underlying neural change. As shown in the behavioral data, the key press speed reached 3~4 presses per second already at around the end of the early learning period (11th trial), which means inter-press intervals become as short as 250-330 msec. Thus, in almost more than 60% of training period data, the time window for MEG feature extraction (200 msec) spans around 60% of the inter-press intervals. Considering that the preparation/cueing of subsequent presses starts ahead of the actual press (e.g., Kornysheva et al., 2019) and/or potential online planning (e.g., Ariani and Diedrichsen, 2019), the decoder likely has captured these future press information as well as the signal related to the current key press, independent of the formation of genuine sequential representation (e.g., "contextualization" of individual press). This may also explain the gradual increase in decoding accuracy or pattern distance between index_OP1 vs. index_OP5 (Figure 4C and 5A), which co-occurred with performance improvement, as shorter inter-press intervals are more favorable for the dissociating the two index finger presses followed by different finger presses. The compromised decoding accuracies for the control sequences can be explained in similar logic. Therefore, more careful consideration and elaborated discussion seem necessary when trying to both achieve high-performance decoding and assess early skill learning, as it can impact all the subsequent analyses.

      The Reviewer raises the possibility that (given the windowing parameters used in the present study) an increase in “contextualization” with learning could simply reflect faster typing speeds as opposed to an actual change in the underlying neural representation.

      We now include a new control analysis that addresses this issue as well as additional re-examination of previously reported results with respect to this issue – all of which are inconsistent with this alternative explanation that “contextualization” reflects a change in mixing of keypress related MEG features as opposed to a change in the underlying representations themselves. As correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged. One must also keep in mind that since participants repeat the sequence multiple times within the same trial, a majority of the index finger keypresses are performed adjacent to one another (i.e. - the “4-4” transition marking the end of one sequence and the beginning of the next). Thus, increased overlap between consecutive index finger keypresses as typing speed increased should increase their similarity and mask contextualization related changes to the underlying neural representations.

      We addressed this question by conducting a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis also affirmed that the possible alternative explanation that contextualization effects are simple reflections of increased mixing is not supported by the data (Adjusted R<sup>2</sup> = 0.00431; F = 5.62). We now include this new negative control analysis in the revised manuscript.

      We also re-examined our previously reported classification results with respect to this issue. We reasoned that if mixing effects reflecting the ordinal sequence structure is an important driver of the contextualization finding, these effects should be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A display a distribution of misclassifications that is inconsistent with an alternative mixing effect explanation of contextualization.

      Based upon the increased overlap between adjacent index finger keypresses (i.e. – “4-4” transition), we also reasoned that the decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position, should show decreased performance as typing speed increases. However, Figure 4C in our manuscript shows that this is not the case. The 2-class hybrid classifier actually displays improved classification performance over early practice trials despite greater temporal overlap. Again, this is inconsistent with the idea that the contextualization effect simply reflects increased mixing of individual keypress features.

      In summary, both re-examination of previously reported data and new control analyses all converged on the idea that the proximity between keypresses does not explain contextualization.

      We do agree with the Reviewer that the naturalistic, generative, self-paced task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of trade-offs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memory-related processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4—figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the KeyDown event strongly support the feasibility of such an approach.

      Related to the above point, testing only one particular sequence (4-1-3-2-4), aside from the control ones, limits the generalizability of the finding. This also may have contributed to the extremely high decoding accuracy reported in the current study.

      The Reviewer raises a question about the generalizability of the decoder accuracy reported in our study. Fortunately, a comparison between decoder performances on Day 1 and Day 2 datasets does provide insight into this issue. As the Reviewer points out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3 — figure supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. Both changes in accuracy are important with regards to the generalizability of our findings. First, 87.11% performance accuracy for the trained sequence data on Day 2 (a reduction of only 3.36%) indicates that the hybrid-space decoder performance is robust over multiple MEG sessions, and thus, robust to variations in SNR across the MEG sensor array caused by small differences in head position between scans. This indicates a substantial advantage over sensor-space decoding approaches. Furthermore, when tested on data from unpracticed sequences, overall performance dropped an additional 7.67%. This difference reflects the performance bias of the classifier for the trained sequence, possibly caused by high-order sequence structure being incorporated into the feature weights. In the future, it will be important to understand in more detail how random or repeated keypress sequence training data impacts overall decoder performance and generalization. We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue.

      In terms of clinical BCI, one of the potential relevance of the study, as claimed by the authors, it is not clear that the specific time window chosen in the current study (up to 200 msec since key press onset) is really useful. In most cases, clinical BCI would target neural signals with no overt movement execution due to patients' inability to move (e.g., Hochberg et al., 2012). Given the time window, the surprisingly high performance of the current decoder may result from sensory feedback and/or planning of subsequent movement, which may not always be available in the clinical BCI context. Of course, the decoding accuracy is still much higher than chance even when using signal before the key press (as shown in Figure 4 Supplement 2), but it is not immediately clear to me that the authors relate their high decoding accuracy based on post-movement signal to clinical BCI settings.

      The Reviewer questions the relevance of the specific window parameters used in the present study for clinical BCI applications, particularly for paretic patients who are unable to produce finger movements or for whom afferent sensory feedback is no longer intact. We strongly agree with the Reviewer that any intended clinical application must carefully consider the specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study. We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context.

      One of the important and fascinating claims of the current study is that the "contextualization" of individual finger movements in a trained sequence specifically occurs during short rest periods in very early skill learning, echoing the recent theory of micro-offline learning proposed by the authors' group. Here, I think two points need to be clarified. First, the concept of "contextualization" is kept somewhat blurry throughout the text. It is only at the later part of the Discussion (around line #330 on page 13) that some potential mechanism for the "contextualization" is provided as "what-and-where" binding. Still, it is unclear what "contextualization" actually is in the current data, as the MEG signal analyzed is extracted from 0-200 msec after the keypress. If one thinks something is contextualizing an action, that contextualization should come earlier than the action itself.

      The Reviewer requests that we: 1) more clearly define our use of the term “contextualization” and 2) provide the rationale for assessing it over a 200ms window aligned to the KeyDown event. This choice of window parameters means that the MEG activity used in our analysis was coincident with, rather than preceding, the actual keypresses. We define contextualization as the differentiation of representation for the identical movement embedded in different positions of a sequential skill. That is, representations of individual action elements progressively incorporate information about their relationship to the overall sequence structure as the skill is learned. We agree with the Reviewer that this can be appropriately interpreted as “what-and-where” binding. We now incorporate this definition in the Introduction of the revised manuscript as requested.

      The window parameters for optimizing accurate decoding individual finger movements were determined using a grid search of the parameter space (a sliding window of variable width between 25-350 ms with 25 ms increments variably aligned from 0 to +100ms with 10ms increments relative to the KeyDown event). This approach generated 140 different temporal windows for each keypress for each participant, with the final parameter selection determined through comparison of the resulting performance between each decoder. Importantly, the decision to optimize for decoding accuracy placed an emphasis on keypress representations characterized by the most consistent and robust features shared across subjects, which in turn maximize statistical power in detecting common learning-related changes. In this case, the optimal window encompassed a 200ms epoch aligned to the KeyDown event (t<sub>0</sub> = 0 ms). We then asked if the representations (i.e. – spatial patterns of combined parcel- and voxel-space activity) of the same digit at two different sequence positions changed with practice within this optimal decoding window. Of course, our findings do not rule out the possibility that contextualization can also be found before or even after this time window, as we did not directly address this issue in the present study. Future work in our lab, as pointed out above, are investigating contextualization within different time windows tailored specifically for assessing sequence skill action planning, execution, evaluation and memory processes.

      The second point is that the result provided by the authors is not yet convincing enough to support the claim that "contextualization" occurs during rest. In the original analysis, the authors presented the statistical significance regarding the correlation between the "offline" pattern differentiation and micro-offline skill gain (Figure 5. Supplement 1), as well as the larger "offline" distance than "online" distance (Figure 5B). However, this analysis looks like regressing two variables (monotonically) increasing as a function of the trial. Although some information in this analysis, such as what the independent/dependent variables were or how individual subjects were treated, was missing in the Methods, getting a statistically significant slope seems unsurprising in such a situation. Also, curiously, the same quantitative evidence was not provided for its "online" counterpart, and the authors only briefly mentioned in the text that there was no significant correlation between them. It may be true looking at the data in Figure 5A as the online representation distance looks less monotonically changing, but the classification accuracy presented in Figure 4C, which should reflect similar representational distance, shows a more monotonic increase up to the 11th trial. Further, the ways the "online" and "offline" representation distance was estimated seem to make them not directly comparable. While the "online" distance was computed using all the correct press data within each 10 sec of execution, the "offline" distance is basically computed by only two presses (i.e., the last index_OP5 vs. the first index_OP1 separated by 10 sec of rest). Theoretically, the distance between the neural activity patterns for temporally closer events tends to be closer than that between the patterns for temporally far-apart events. It would be fairer to use the distance between the first index_OP1 vs. the last index_OP5 within an execution period for "online" distance, as well.

      The Reviewer suggests that the current data is not enough to show that contextualization occurs during rest and raises two important concerns: 1) the relationship between online contextualization and micro-online gains is not shown, and 2) the online distance was calculated differently from its offline counterpart (i.e. - instead of calculating the distance between last Index<sub>OP5</sub> and first Index<sub>OP1</sub> from a single trial, the distance was calculated for each sequence within a trial and then averaged).

      We addressed the first concern by performing individual subject correlations between 1) contextualization changes during rest intervals and micro-offline gains; 2) contextualization changes during practice trials and micro-online gains, and 3) contextualization changes during practice trials and micro-offline gains (Figure 5 – figure supplement 4). We then statistically compared the resulting correlation coefficient distributions and found that within-subject correlations for contextualization changes during rest intervals and micro-offline gains were significantly higher than online contextualization and micro-online gains (t = 3.2827, p = 0.0015) and online contextualization and micro-offline gains (t = 3.7021, p = 5.3013e-04). These results are consistent with our interpretation that micro-offline gains are supported by contextualization changes during the inter-practice rest periods.

      With respect to the second concern, we agree with the Reviewer that one limitation of the analysis comparing online versus offline changes in contextualization as presented in the original manuscript, is that it does not eliminate the possibility that any differences could simply be explained by the passage of time (which is smaller for the online analysis compared to the offline analysis). The Reviewer suggests an approach that addresses this issue, which we have now carried out. When quantifying online changes in contextualization from the first Index<sub>OP1</sub> the last Index<sub>OP5</sub> keypress in the same trial we observed no learning-related trend (Figure 5 – figure supplement 5, right panel). Importantly, offline distances were significantly larger than online distances regardless of the measurement approach and neither predicted online learning (Figure 5 – figure supplement 6).

      A related concern regarding the control analysis, where individual values for max speed and the degree of online contextualization were compared (Figure 5 Supplement 3), is whether the individual difference is meaningful. If I understood correctly, the optimization of the decoding process (temporal window, feature inclusion/reduction, decoder, etc.) was performed for individual participants, and the same feature extraction was also employed for the analysis of representation distance (i.e., contextualization). If this is the case, the distances are individually differently calculated and they may need to be normalized relative to some stable reference (e.g., 1 vs. 4 or average distance within the control sequence presses) before comparison across the individuals.

      The Reviewer makes a good point here. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multiscale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning.

      Strengths:

      A clear strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of the concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers (though the manuscript reveals little about the comparison of the latter).

      We appreciate the Reviewer’s comments regarding the paper’s strengths.

      A simple control analysis based on shuffled class labels could lend further support to this complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). Furthermore, currently, the manuscript does not explain the huge drop in decoding accuracies for the voxel-space decoding (Figure 3B). Finally, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - what do the authors refer to when they talk about the sign of the "average source", line 477?).

      The Reviewer recommends that we: 1) conduct an additional control analysis on classifier performance using shuffled class labels, 2) provide a more detailed explanation regarding the drop in decoding accuracies for the voxel-space decoding following LDA dimensionality reduction (see Fig 3B), and 3) provide additional details on how problems related to dipole solution orientations were addressed in the present study.

      In relation to the first point, we have now implemented a random shuffling approach as a control for the classification analyses. The results of this analysis indicated that the chance level accuracy was 22.12% (± SD 9.1%) for individual keypress decoding (4-class classification), and 18.41% (± SD 7.4%) for individual sequence item decoding (5-class classification), irrespective of the input feature set or the type of decoder used. Thus, the decoding accuracy observed with the final model was substantially higher than these chance levels.

      Second, please note that the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes – 1; e.g. – 3 dimensions, for 4-class keypress decoding). Given the very high dimension of the voxel-space input features in this case, the resulting mapping exhibits reduced accuracy. Despite this general consideration, please refer to Figure 3—figure supplement 3, where we observe improvement in voxel-space decoder performance when utilizing alternative dimensionality reduction techniques.

      The decoders constructed in the present study assess the average spatial patterns across time (as defined by the windowing procedure) in the input feature space. We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis.

      Weaknesses:

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption.

      We thank the Reviewer for giving us the opportunity to address these issues in detail (see below).

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - Supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the key press, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides no evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context.

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - Figure Supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - Figure Supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for).

      The issues raised by Reviewer #3 here are similar to two issues raised by Reviewer #2 above. We agree they must both be carefully considered in any evaluation of our findings.

      As both Reviewers pointed out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. This classification performance difference of 7.67% when tested on the Day 2 data could reflect the performance bias of the classifier for the trained sequence, possibly caused by mixed information from temporally close keypresses being incorporated into the feature weights.

      Along these same lines, both Reviewers also raise the possibility that an increase in “ordinal coding/contextualization” with learning could simply reflect an increase in this mixing effect caused by faster typing speeds as opposed to an actual change in the underlying neural representation. The basic idea is that as correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Following this logic, it’s also possible that if the ordinal coding is largely driven by this mixing effect, the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      As noted in the above reply to Reviewer #2, we also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R<sup>2</sup> = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Finally, the Reviewer hints that one way to address this issue would be to compare MEG responses before and after learning for sequences typed at a fixed speed. However, given that the speed-accuracy trade-off should improve with learning, a comparison between unlearned and learned skill states would dictate that the skill be evaluated at a very low fixed speed. Essentially, such a design presents the problem that the post-training test is evaluating the representation in the unlearned behavioral state that is not representative of the acquired skill. Thus, this approach would miss most learning effects on a task in which speed is the main learning metrics.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).

      The Reviewer argues that the comparison of last finger movement of a trial and the first in the next trial are performed in different circumstances and contexts. This is an important point and one we tend to agree with. For this task, the first sequence in a practice trial is pre-planned before the first keypress is performed. This occurs in a somewhat different context from the sequence iterations that follow, which involve temporally overlapping planning, execution and evaluation processes. The Reviewer is concerned about a difference in the temporal mixing effect issue raised above between the first and last keypresses performed in a trial. Please, note that since neural representations of individual actions are competitively queued during the pre-planning period in a manner that reflects the ordinal structure of the learned sequence (Kornysheva et al., 2019), mixing effects are most likely present also for the first keypress in a trial.

      Separately, the Reviewer suggests that contextualization during early learning may reflect preplanning or online planning. This is an interesting proposal. Given the decoding time-window used in this investigation, we cannot dissect separate contributions of planning, memory and sensory feedback to contextualization. Taking advantage of the superior temporal resolution of MEG relative to fMRI tools, work under way in our lab is investigating decoding time-windows more appropriate to address each of these questions.

      Given these differences in the physical context and associated mental processes, it is not surprising that "offline differentiation", as defined here, is more pronounced than "online differentiation". For the latter, the authors compared movements that were better matched regarding the presence of consistent preceding and subsequent keypresses (online differentiation was defined as the mean difference between all first vs. last index finger movements during practice). It is unclear why the authors did not follow a similar definition for "online differentiation" as for "micro-online gains" (and, indeed, a definition that is more consistent with their definition of "offline differentiation"), i.e., the difference between the first index finger movement of the first correct sequence during practice, and the last index finger of the last correct sequence. While these two movements are, again, not matched for the presence of neighbouring keypresses (see the argument above), this mismatch would at least be the same across "offline differentiation" and "online differentiation", so they would be more comparable.

      This is the same point made earlier by Reviewer #2, and we agree with this assessment. As stated in the response to Reviewer #2 above, we have now carried out quantification of online contextualization using this approach and included it in the revised manuscript. We thank the Reviewer for this suggestion.

      A further complication in interpreting the results regarding "contextualization" stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen, irrespective of whether the keypress was correct or incorrect. As a result, incorrect (e.g., additional, or missing) keypresses could shift the phase of the visual feedback string (of asterisks) relative to the ordinal position of the current movement in the sequence (e.g., the fifth movement in the sequence could coincide with the presentation of any asterisk in the string, from the first to the fifth). Given that more incorrect keypresses are expected at the start of the experiment, compared to later stages, the consistency in visual feedback position, relative to the ordinal position of the movement in the sequence, increased across the experiment. A better differentiation between the first and the fifth movement with learning could, therefore, simply reflect better decoding of the more consistent visual feedback, based either on the feedback-induced brain response, or feedback-induced eye movements (the study did not include eye tracking). It is not clear why the authors introduced this complicated visual feedback in their task, besides consistency with their previous studies.

      We strongly agree with the Reviewer that eye movements related to task engagement are important to rule out as a potential driver of the decoding accuracy or contextualizaton effect. We address this issue above in response to a question raised by Reviewer #1 about the impact of movement related artefacts on our findings.

      First, the assumption the Reviewer makes here about the distribution of errors in this task is incorrect. On average across subjects, 2.32% ± 1.48% (mean ± SD) of all keypresses performed were errors, which were evenly distributed across the four possible keypress responses. While errors increased progressively over practice trials, they did so in proportion to the increase in correct keypresses, so that the overall ratio of correct-to-incorrect keypresses remained stable over the training session. Thus, the Reviewer’s assumptions that there is a higher relative frequency of errors in early trials, and a resulting systematic trend phase shift differences between the visual display updates (i.e. – a change in asterisk position above the displayed sequence) and the keypress performed is not substantiated by the data. To the contrary, the asterisk position on the display and the keypress being executed remained highly correlated over the entire training session. We now include a statement about the frequency and distribution of errors in the revised manuscript.

      Given this high correlation, we firmly agree with the Reviewer that the issue of eye movement related artefacts is still an important one to address. Fortunately, we did collect eye movement data during the MEG recordings so were able to investigate this. As detailed in the response to Reviewer #1 above, we found that gaze positions and eye-movement velocity time-locked to visual display updates (i.e. – a change in asterisk position above the displayed sequence) did not reflect the asterisk location above chance levels (Overall cross-validated accuracy = 0.21817; see Author response image 1). Furthermore, an inspection of the eye position data revealed that most participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. As pointed out above, a similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user.

      The minimal participant engagement with the visual display in this explicit sequence learning motor task (which is highly generative in nature) contrasts markedly with behavior observed when reactive responses to stimulus cues are needed in the serial reaction time task (SRTT). This is a crucial difference that must be carefully considered when comparing findings across studies using the two sequence learning tasks.

      The authors report a significant correlation between "offline differentiation" and cumulative microoffline gains. However, it would be more informative to correlate trial-by-trial changes in each of the two variables. This would address the question of whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - are performance changes (micro-offline gains) less pronounced across rest periods for which the change in "contextualization" is relatively low? Furthermore, is the relationship between micro-offline gains and "offline differentiation" significantly stronger than the relationship between micro-offline gains and "online differentiation"?

      In response to a similar issue raised above by Reviewer #2, we now include new analyses comparing correlation magnitudes between (1) “online differentiation” vs micro-online gains, (2) “online differentiation” vs micro-offline gains and (3) “offline differentiation” and micro-offline gains (see Figure 5 – figure supplement  4, 5 and 6). These new analyses and results have been added to the revised manuscript. Once again, we thank both Reviewers for this suggestion.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      We disagree with this statement. The original (Bonstrup et al., 2019) paper clearly states that micro-offline gains do not necessarily reflect offline learning in some cases and must be carefully interpreted based upon the behavioral context within which they are observed. Further, the paper lays out the conditions under which one can have confidence that micro-offline gains reflect offline learning. In fact, the excellent meta-analysis of (Pan & Rickard, 2015), which re-interprets the benefits of sleep in overnight skill consolidation from a “reactive inhibition” perspective, was a crucial resource in the experimental design of our initial study (Bonstrup et al., 2019), as well as in all our subsequent work. Pan & Rickard state:

      “Empirically, reactive inhibition refers to performance worsening that can accumulate during a period of continuous training (Hull, 1943 . It tends to dissipate, at least in part, when brief breaks are inserted between blocks of training. If there are multiple performance-break cycles over a training session, as in the motor sequence literature, performance can exhibit a scalloped effect, worsening during each uninterrupted performance block but improving across blocks(Brawn et al., 2010; Rickard et al., 2008 . Rickard, Cai, Rieth, Jones, and Ard (2008 and Brawn, Fenn, Nusbaum, and Margoliash (2010 (Brawn et al., 2010; Rickard et al., 2008 demonstrated highly robust scalloped reactive inhibition effects using the commonly employed 30 s–30 s performance break cycle, as shown for Rickard et al.’s (2008 massed practice sleep group in Figure 2. The scalloped effect is evident for that group after the first few 30 s blocks of each session. The absence of the scalloped effect during the first few blocks of training in the massed group suggests that rapid learning during that period masks any reactive inhibition effect.”

      Crucially, Pan & Rickard make several concrete recommendations for reducing the impact of the reactive inhibition confound on offline learning studies. One of these recommendations was to reduce practice times to 10s (most prior sequence learning studies up until that point had employed 30s long practice trials). They state:

      “The traditional design involving 30 s-30 s performance break cycles should be abandoned given the evidence that it results in a reactive inhibition confound, and alternative designs with reduced performance duration per block used instead (Pan & Rickard, 2015 . One promising possibility is to switch to 10 s performance durations for each performance-break cycle Instead (Pan & Rickard, 2015 . That design appears sufficient to eliminate at least the majority of the reactive inhibition effect (Brawn et al., 2010; Rickard et al., 2008 .”

      We mindfully incorporated recommendations from (Pan & Rickard, 2015) into our own study designs including 1) utilizing 10s practice trials and 2) constraining our analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur), which are prior to the emergence of the “scalloped” performance dynamics that are strongly linked to reactive inhibition effects.

      However, there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.

      We strongly disagree with the Reviewer’s assertion that “there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.” The initial (Bonstrup et al., 2019) report was followed up by a large online crowd-sourcing study (Bonstrup et al., 2020). This second (and much larger) study provided several additional important findings supporting our interpretation of micro-offline gains in cases where the important behavioral conditions clarified above were met (see Author response image 4 below for further details on these conditions).

      Author response image 4.

      This Figure shows that micro-offline gains o ser ed in learning and nonlearning contexts are attri uted to different underl ing causes. Micro-offline and online changes relative to overall trial-by-trial learning. This figure is based on data from (Bonstrup et al., 2019). During early learning, micro-offline gains (red bars) closely track trial-by-trial performance gains (green line with open circle markers), with minimal contribution from micro-online gains (blue bars). The stated conclusion in Bönstrup et al. (2019) is that micro-offline gains only during this Early Learning stage reflect rapid memory consolidation (see also (Bonstrup et al., 2020)). After early learning, about practice trial 11, skill plateaus. This plateau skill period is characterized by a striking emergence of coupled (and relatively stable) micro-online drops and micro-offline increases. Bönstrup et al. (2019) as well as others in the literature (Brooks et al., 2024; Gupta & Rickard, 2022; Florencia Jacobacci et al., 2020), argue that micro-offline gains during the plateau period likely reflect recovery from inhibitory performance factors such as reactive inhibition or fatigue, and thus must be excluded from analyses relating micro-offline gains to skill learning. The Non-repeating groups in Experiments 3 and 4 from Das et al. (2024) suffer from a lack of consideration of these known confounds (end of Fig legend).

      Evidence documented in that paper (Bonstrup et al., 2020) showed that micro-offline gains during early skill learning were: 1) replicable and generalized to subjects learning the task in their daily living environment (n=389); 2) equivalent when significantly shortening practice period duration, thus confirming that they are not a result of recovery from performance fatigue (n=118); 3) reduced (along with learning rates) by retroactive interference applied immediately after each practice period relative to interference applied after passage of time (n=373), indicating stabilization of the motor memory at a microscale of several seconds consistent with rapid consolidation; and 4) not modified by random termination of the practice periods, ruling out a contribution of predictive motor slowing (N = 71) (Bonstrup et al., 2020). Altogether, our findings were strongly consistent with the interpretation that micro-offline gains reflect memory consolidation supporting early skill learning. This is precisely the portion of the learning curve (Pan & Rickard, 2015) refer to when they state “…rapid learning during that period masks any reactive inhibition effect”.

      This interpretation is further supported by brain imaging evidence linking known memory-related networks and consolidation mechanisms to micro-offline gains. First, we reported that the density of fast hippocampo-neocortical skill memory replay events increases approximately three-fold during early learning inter-practice rest periods with the density explaining differences in the magnitude of micro-offline gains across subjects (Buch et al., 2021). Second, Jacobacci et al. (2020) independently reproduced our original behavioral findings and reported BOLD fMRI changes in the hippocampus and precuneus (regions also identified in our MEG study (Buch et al., 2021)) linked to micro-offline gains during early skill learning. These functional changes were coupled with rapid alterations in brain microstructure in the order of minutes, suggesting that the same network that operates during rest periods of early learning undergoes structural plasticity over several minutes following practice (Deleglise et al., 2023). Crucial to this point, Chen et al. (2024) and Sjøgård et al (2024) provided direct evidence from intracranial EEG in humans linking sharp-wave ripple density during rest periods (which are known markers for neural replay (Buzsaki, 2015)) in the human hippocampus (80-120 Hz) to micro-offline gains during early skill learning.

      Thus, there is now substantial converging evidence in humans across different indirect noninvasive and direct invasive recording techniques linking hippocampal activity, neural replay dynamics and offline performance gains in skill learning.

      On the contrary, recent evidence questions this interpretation (Gupta & Rickard, npj Sci Learn 2022; Gupta & Rickard, Sci Rep 2024; Das et al., bioRxiv 2024). Instead, there is evidence that micro-offline gains are transient performance benefits that emerge when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024).

      The recent work of (Gupta & Rickard, 2022, 2024) does not present any data that directly opposes our finding that early skill learning (Bonstrup et al., 2019) is expressed as micro-offline gains during rest breaks. These studies are an extension of the Rickard et al (2008) paper that employed a massed (30s practice followed by 30s breaks) vs spaced (10s practice followed by 10s breaks) experimental design to assess if recovery from reactive inhibition effects could account for performance gains measured after several minutes or hours. Gupta & Rickard (2022) added two additional groups (30s practice/10s break and 10s practice/10s break as used in the work from our group). The primary aim of the study was to assess whether it was more likely that changes in performance when retested 5 minutes after skill training (consisting of 12 practice trials for the massed groups and 36 practice trials for the spaced groups) had ended reflected memory consolidation effects or recovery from reactive inhibition effects. The Gupta & Rickard (2024) follow-up paper employed a similar design with the primary difference being that participants performed a fixed number of sequences on each trial as opposed to trials lasting a fixed duration. This was done to facilitate the fitting of a quantitative statistical model to the data.

      To reiterate, neither study included any analysis of micro-online or micro-offline gains and did not include any comparison focused on skill gains during early learning trials (only at retest 5 min later). Instead, Gupta & Rickard (2022), reported evidence for reactive inhibition effects for all groups over much longer training periods than early learning. In fact, we reported the same findings for trials following the early learning period in our original 2019 paper (Bonstrup et al., 2019) (Author response image 4). Please, note that we also reported that cumulative microoffline gains over early learning did not correlate with overnight offline consolidation measured 24 hours later (Bonstrup et al., 2019) (see the Results section and further elaboration in the Discussion). We interpreted these findings as indicative that the mechanisms underlying offline gains over the micro-scale of seconds during early skill learning versus over minutes or hours very likely differ.

      In the recent preprint from (Das et al., 2024), the authors make the strong claim that “micro-offline gains during early learning do not reflect offline learning” which is not supported by their own data. The authors hypothesize that if “micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”. The study utilizes a spaced vs. massed practice groups between-subjects design inspired by the reactive inhibition work from Rickard and others to test this hypothesis.

      Crucially, their design incorporates only a small fraction of the training used in other investigations to evaluate early skill learning (Bonstrup et al., 2020; Bonstrup et al., 2019; Brooks et al., 2024; Buch et al., 2021; Deleglise et al., 2023; F. Jacobacci et al., 2020; Mylonas et al., 2024). A direct comparison between the practice schedule designs for the spaced and massed groups in Das et al., and the training schedule all participants experienced in the original Bönstrup et al. (2019) paper highlights this issue as well as several others (Author response image 5):

      Author response image 5.

      This figure shows (A) Comparison of Das et al. Spaced & Massed group training session designs, and the training session design from the original (Bonstrup et al., 2019) paper. Similar to the approach taken by Das et al., all practice is visualized as 10-second practice trials with a variable number (either 0, 1 or 30) of 10-second-long inter-practice rest intervals to allow for direct comparisons between designs. The two key takeaways from this comparison are that (1) the intervention differences (i.e. – practice schedules) between the Massed and Spaced groups from the Das et al. report are extremely small (less than 12% of the overall session schedule) (gaps in the red shaded area) and (2) the overall amount of practice is much less than compared to the design from the original Bönstrup report (Bonstrup et al., 2019) (which has been utilized in several subsequent studies). (B) Group-level learning curve data from Bönstrup et al. (2019) (Bonstrup et al., 2019) is used to estimate the performance range accounted for by the equivalent periods covering Test 1, Training 1 and Test 2 from Das et al (2024). Note that the intervention in the Das et al. study is limited to a period covering less than 50% of the overall learning range (end of figure legend).

      Participants in the original (Bonstrup et al., 2019) experienced 157.14% more practice time and 46.97% less inter-practice rest time than the Spaced group in the Das et al. study (Author response image 5). Thus, the overall amount of practice and rest differ substantially between studies, with much more limited training occurring for participants in Das et al.

      In addition, the training interventions (i.e. – the practice schedule differences between the Spaced and Massed groups) were designed in a manner that minimized any chance of effectively testing their hypothesis. First, the interventions were applied over an extremely short period relative to the length of the total training session (5% and 12% of the total training session for Massed and Spaced groups, respectively; see gaps in the red shaded area in Author response image 5). Second, the intervention was applied during a period in which only half of the known total learning occurs. Specifically, we know from Bönstrup et al. (2019) that only 46.57% of the total performance gains occur in the practice interval covered by Das et al Training 1 intervention. Thus, early skill learning as evaluated by multiple groups (Bonstrup et al., 2020; Bonstrup et al., 2019; Brooks et al., 2024; Buch et al., 2021; Deleglise et al., 2023; F. Jacobacci et al., 2020; Mylonas et al., 2024), is in the Das et al experiment amputated to about half.

      Furthermore, a substantial amount of learning takes place during Das et al’s Test 1 and Test 2 periods (32.49% of total gains combined). The fact that substantial learning is known to occur over both the Test 1 (18.06%) and Test 2 (14.43%) intervals presents a fundamental problem described by Pan and Rickard (Pan & Rickard, 2015). They reported that averaging over intervals where substantial performance gains occur (i.e. – performance is not stable) inject crucial artefacts into analyses of skill learning:

      “A large amount of averaging has the advantage of yielding more precise estimates of each subject’s pretest and posttest scores and hence more statistical power to detect a performance gain. However, calculation of gain scores using that strategy runs the risk that learning that occurs during the pretest and (or posttest periods (i.e., online learning is incorporated into the gain score (Rickard et al., 2008; Robertson et al., 2004 .”

      The above statement indicates that the Test 1 and Test 2 performance scores from Das et al. (2024) are substantially contaminated by the learning rate within these intervals. This is particularly problematic if the intervention design results in different Test 2 learning rates between the two groups. This in fact, is apparent in their data (Figure 1C,E of the Das et al., 2024 preprint) as the Test 2 learning rate for the Spaced group is negative (indicating a unique interference effect observable only for this group). Specifically, the Massed group continues to show an increase in performance during Test 2 and 4 relative to the last 10 seconds of practice during Training 1 and 2, respectively, while the Spaced group displays a marked decrease. This post-training performance decrease for the Spaced group is in stark contrast to the monotonic performance increases observed for both groups at all other time-points. One possible cause could be related to the structure of the Test intervals, which include 20 seconds of uninterrupted practice. For the Spaced group, this effectively is a switch to a Massed practice environment (i.e., two 10-secondlong practice trials merged into one long trial), which interferes with greater Training 1 interval gains observed for the Space group. Interestingly, when statistical comparisons between the groups are made at the time-points when the intervention is present (Figure 1E) then the stated hypothesis, “If micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”, is confirmed.

      In summary, the experimental design and analyses used by Das et al does not contradict the view that early skill learning is expressed as micro-offline gains during rest breaks. The data presented by Gupta and Rickard (2022, 2024) and Das et al. (2024) is in many ways more confirmatory of the constraints employed by our group and others with respect to experimental design, analysis and interpretation of study findings, rather than contradictory. Still, it does highlight a limitation of the current micro-online/offline framework, which was originally only intended to be applied to early skill learning over spaced practice schedules when reactive inhibition effects are minimized (Bonstrup et al., 2019; Pan & Rickard, 2015). Extrapolation of this current framework to postplateau performance periods, longer timespans, or non-learning situations (e.g. – the Nonrepeating groups from Das et al. (2024)), when reactive inhibition plays a more substantive role, is not warranted. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I found Figure 2B too small to be useful, as the actual elements of the cells are very hard to read.

      We have removed the grid colormap panel (top-right) from Figure 2B. All of this colormap data is actually a subset of data presented in Figure 2 – figure supplement 1, so can still be found there.

      Reviewer #2 (Recommendations for the authors):

      (1) Related to the first point in my concerns, I would suggest the authors compare decoding accuracy between correct presses followed by correct vs. incorrect presses. This would clarify if the decoder is actually taking the MEG signal for subsequent press into account. I would also suggest the authors use pre-movement MEG features and post-movement features with shorter windows and compare each result with the results for the original post-movement MEG feature with a longer window.

      The present study does not contain enough errors to perform the analysis proposed by the Reviewer. As noted above, we did re-examine our data and now report a new control regression analysis, all of which indicate that the proximity between keypresses does not explain contextualization effects.

      (2) I was several times confused by the author's use of "neural representation of an action" or "sequence action representations" in understanding whether these terms refer to representation on the level of whole-brain, region (as defined by the specific parcellation used), or voxels. In fact, what is submitted to the decoder is some complicated whole-brain MEG feature (i.e., the "neural representation"), which is a hybrid of voxel and parcel features that is further dimension-reduced and not immediately interpretable. Clarifying this point early in the text and possibly using some more sensible terms, such as adding "brain-wise" before the "sequence action representation", would be the most helpful for the readers.

      We now clarified this terminology in the revised manuscript.

      (3) Although comparing many different ways in feature selection/reduction, time window selection, and decoder types is undoubtedly a meticulous work, the current version of the manuscript seems still lacking some explanation about the details of these methodological choices, like which decoding method was actually used to report the accuracy, whether or not different decoding methods were chosen for individual participants' data, how training data was selected (is it all of the correct presses in Day 1 data?), whether the frequency power or signal amplitude was used, and so on. I would highly appreciate these additional details in the Methods section.

      The reported accuracies were based on linear discriminant analysis classifier. A comparison of different decoders (Figure 3 – figure supplement 4) shows LDA was the optimal choice.

      Whether or not different decoding methods were chosen for individual participants' data

      We selected the same decoder (LDA) performance to report the final accuracy.

      How training data was selected (is it all of the correct presses in Day 1 data?),

      Decoder training was conducted as a randomized split of the data (all correct keypresses of Day 1) into training (90%) and test (10%) samples for 8 iterations.

      Whether the frequency power or signal amplitude was used

      Signal amplitude was used for feature calculation.

      (4) In terms of the Methods, please consider adding some references about the 'F1 score', the 'feature importance score,' and the 'MRMR-based feature ranking,' as the main readers of the current paper would not be from the machine learning community. Also, why did the LDA dimensionality reduction reduce accuracy specifically for the voxel feature?

      We have now added the following statements to the Methods section that provide more detailed descriptions and references for these metrics:

      “The F1 score, defined as the harmonic mean of the precision (percentage of true predictions that are actually true positive) and recall (percentage of true positives that were correctly predicted as true) scores, was used as a comprehensive metric for all one-versus-all keypress state decoders to assess class-wise performance that accounts for both false-positive and false-negative prediction tendencies [REF]. A weighted mean F1 score was then computed across all classes to assess the overall prediction performance of the multi-class model.”

      and

      “Feature Importance Scores

      The relative contribution of source-space voxels and parcels to decoding performance (i.e. – feature importance score) was calculated using minimum redundant maximum relevance (MRMR) and highlighted in topography plots. MRMR, an approach that combines both relevance and redundancy metrics, ranked individual features based upon their significance to the target variable (i.e. – keypress state identity) prediction accuracy and their non-redundancy with other features.”

      As stated in the Reviewer responses above, the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes-1; e.g. – 3 dimensions for 4-class keypress decoding). It is likely that the reduction in accuracy observed only for the voxel-space feature was due to the loss of relevant information during the mapping process that resulted in reduced accuracy. This reduction in accuracy for voxel-space decoding was specific to LDA. Figure 3—figure supplement 3 shows that voxel-space decoder performance actually improved when utilizing alternative dimensionality reduction techniques.

      (5) Paragraph 9, lines #139-142: "Notably, decoding associated with index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest number of misclassifications of all digits (N = 141 or 47.5% of all decoding errors; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed at different learning state or sequence context locations."

      This does not seem to be a fair comparison, as the index finger appears twice as many as the other fingers do in the sequence. To claim this, proper statistical analysis needs to be done taking this difference into account.

      We thank the Reviewer for bringing this issue to our attention. We have now corrected this comparison to evaluate relative false negative and false positive rates between individual keypress state decoders, and have revised this statement in the manuscript as follows:

      “Notably, decoding of index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest false negative (0.116 per keypress) and false positive (0.043 per keypress) misclassification rates compared with all other digits (false negative rate range = [0.067 0.114]; false positive rate range = [0.020 0.037]; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed within different contexts (i.e. - different learning states or sequence locations).”

      (6) Finally, the authors could consider acknowledging in the Discussion that the contribution of micro-offline learning to genuine skill learning is still under debate (e.g., Gupta and Rickard, 2023; 2024; Das et al., bioRxiv, 2024).

      We have added a paragraph in the Discussion that addresses this point.

      Reviewer #3 (Recommendations for the authors):

      In addition to the additional analyses suggested in the public review, I have the following suggestions/questions:

      (1) Given that the authors introduce a new decoding approach, it would be very helpful for readers to see a distribution of window sizes and window onsets eventually used across individuals, at least for the optimized decoder.

      We have now included a new supplemental figure (Figure 4 – figure Supplement 2) that provides this information.

      (2) Please explain in detail how you arrived at the (interpolated?) group-level plot shown in Figure 1B, starting from the discrete single-trial keypress transition times. Also, please specify what the shading shows.

      Instantaneous correct sequence speed (skill measure) was quantified as the inverse of time (in seconds) required to complete a single iteration of a correctly generated full 5-item sequence. Individual keypress responses were labeled as members of correct sequences if they occurred within a 5-item response pattern matching any possible circular shifts of the 5-item sequence displayed on the monitor (41324). This approach allowed us to quantify a measure of skill within each practice trial at the resolution of individual keypresses. The dark line indicates the group mean performance dynamics for each trial. The shaded region indicates the 95% confidence limit of the mean (see Methods).

      (3) Similarly, please explain how you arrived at the group-level plot shown in Figure 1C. What are the different colored lines (rows) within each trial? How exactly did the authors reach the conclusion that KTT variability stabilizes by trial 6?

      Figure 1C provides additional information to the correct sequence speed measure above, as it also tracks individual transition speed composition over learning. Figure 1C, thus, represents both changes in overall correct sequence speed dynamics (indicated by the overall narrowing of the horizontal speed lines moving from top to bottom) and the underlying composition of the individual transition patterns within and across trials. The coloring of the lines is a shading convention used to discriminate between different keypress transitions. These curves were sampled with 1ms resolution, as in Figure 1B. Addressing the underlying keypress transition patterns requires within-subject normalization before averaging across subjects. The distribution of KTTs was normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning.

      (4) Maybe I missed it, but it was not clear to me which of the tested classifiers was eventually used. Or was that individualized as well? More generally, a comparison of the different classifiers would be helpful, similar to the comparison of dimension reduction techniques.

      We have now included a new supplemental figure that provides this information.

      (5) Please add df and effect sizes to all statistics.

      Done.

      (6) Please explain in more detail your power calculation.

      The study was powered to determine the minimum sample size needed to detect a significant change in skill performance following training using a one-sample t-test (two-sided; alpha = 0.05; 95% statistical power; Cohen’s D effect size = 0.8115 calculated from previously acquired data in our lab). The calculated minimum sample size was 22. The included study sample size (n = 27) exceeded this minimum.

      This information is now included in the revised manuscript.

      (7) The cut-off for the high-pass filter is unusually high and seems risky in terms of potential signal distortions (de Cheveigne, Neuron 2019). Why did the authors choose such a high cut-off?

      The 1Hz high-pass cut-off frequency for the 1-150Hz band-pass filter applied to the continuous raw MEG data during preprocessing has been used in multiple previous MEG publications (Barratt et al., 2018; Brookes et al., 2012; Higgins et al., 2021; Seedat et al., 2020; Vidaurre et al., 2018).

      (8) "Furthermore, the magnitude of offline contextualization predicted skill gains while online contextualization did not", lines 336/337 - where is that analysis?

      Additional details pertaining to this analysis are now provided in the Results section (Figure 5 – figure supplement 4).

      (9) How were feature importance scores computed?

      We have now added a new subheading in the Methods section with a more detailed description of how feature importance scores were computed.

      (10)  Please add x and y ticks plus tick labels to Figure 5 - Figure Supplement 3, panel A

      Done

      (11) Line 369, what does "comparable" mean in this context?

      The sentence in the “Study Participants” part of the Methods section referred to here has now been revised for clarity.

      (12) In lines 496/497, please specify what t=0 means (KeyDown event, I guess?).

      Yes, the KeyDown event occurs at t = 0. This has now been clarified in the revised manuscript.

      (13) Please specify consistent boundaries between alpha- and beta-bands (they are currently not consistent in the Results vs. Methods (14/15 Hz or 15/16 Hz)).

      We thank the Reviewer for alerting us to this discrepancy caused by a typographic error in the Methods. We have now corrected this so that the alpha (8-14 Hz) and beta-band (15-24 Hz) frequency limits are described consistently throughout the revised manuscript.

      References

      Albouy, G., Fogel, S., King, B. R., Laventure, S., Benali, H., Karni, A., Carrier, J., Robertson, E. M., & Doyon, J. (2015). Maintaining vs. enhancing motor sequence memories: respective roles of striatal and hippocampal systems. Neuroimage, 108, 423-434. https://doi.org/10.1016/j.neuroimage.2014.12.049

      Albouy, G., King, B. R., Maquet, P., & Doyon, J. (2013). Hippocampus and striatum: dynamics and interaction during acquisition and sleep-related motor sequence memory consolidation. Hippocampus, 23(11), 985-1004. https://doi.org/10.1002/hipo.22183 Albouy, G., Sterpenich, V., Vandewalle, G., Darsaud, A., Gais, S., Rauchs, G., Desseilles, M., Boly, M., Dang-Vu, T., Balteau, E., Degueldre, C., Phillips, C., Luxen, A., & Maquet, P. (2012). Neural correlates of performance variability during motor sequence acquisition. NeuroImage, 60(1), 324-331. https://doi.org/10.1016/j.neuroimage.2011.12.049

      Andersen, R. A., & Buneo, C. A. (2002). Intentional maps in posterior parietal cortex. Annu Rev Neurosci, 25, 189-220. https://doi.org/10.1146/annurev.neuro.25.112701.142922 112701.142922 [pii]

      Ashe, J., Lungu, O. V., Basford, A. T., & Lu, X. (2006). Cortical control of motor sequences. Curr Opin Neurobiol, 16(2), 213-221. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=16563734

      Bansal, A. K., Vargas-Irwin, C. E., Truccolo, W., & Donoghue, J. P. (2011). Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices. J Neurophysiol, 105(4), 1603-1619. https://doi.org/10.1152/jn.00532.2010

      Barratt, E. L., Francis, S. T., Morris, P. G., & Brookes, M. J. (2018). Mapping the topological organisation of beta oscillations in motor cortex using MEG. NeuroImage, 181, 831-844. https://doi.org/10.1016/j.neuroimage.2018.06.041

      Bassett, D. S., Wymbs, N. F., Porter, M. A., Mucha, P. J., Carlson, J. M., & Grafton, S. T. (2011). Dynamic reconfiguration of human brain networks during learning. Proc Natl Acad Sci U S A, 108(18), 7641-7646. https://doi.org/10.1073/pnas.1018985108

      Battaglia-Mayer, A., & Caminiti, R. (2019). Corticocortical Systems Underlying High-Order Motor Control. J Neurosci, 39(23), 4404-4421. https://doi.org/10.1523/JNEUROSCI.2094-18.2019

      Berlot, E., Popp, N. J., & Diedrichsen, J. (2020). A critical re-evaluation of fMRI signatures of motor sequence learning. Elife, 9. https://doi.org/10.7554/eLife.55241

      Bonstrup, M., Iturrate, I., Hebart, M. N., Censor, N., & Cohen, L. G. (2020). Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced data. NPJ Sci Learn, 5, 7. https://doi.org/10.1038/s41539-020-0066-9

      Bonstrup, M., Iturrate, I., Thompson, R., Cruciani, G., Censor, N., & Cohen, L. G. (2019). A Rapid Form of Offline Consolidation in Skill Learning. Curr Biol, 29(8), 1346-1351 e1344. https://doi.org/10.1016/j.cub.2019.02.049

      Brawn, T. P., Fenn, K. M., Nusbaum, H. C., & Margoliash, D. (2010). Consolidating the effects of waking and sleep on motor-sequence learning. J Neurosci, 30(42), 13977-13982. https://doi.org/10.1523/JNEUROSCI.3295-10.2010

      Brookes, M. J., Woolrich, M. W., & Barnes, G. R. (2012). Measuring functional connectivity in MEG: a multivariate approach insensitive to linear source leakage. NeuroImage, 63(2), 910-920. https://doi.org/10.1016/j.neuroimage.2012.03.048

      Brooks, E., Wallis, S., Hendrikse, J., & Coxon, J. (2024). Micro-consolidation occurs when learning an implicit motor sequence, but is not influenced by HIIT exercise. NPJ Sci Learn, 9(1), 23. https://doi.org/10.1038/s41539-024-00238-6

      Buch, E. R., Claudino, L., Quentin, R., Bonstrup, M., & Cohen, L. G. (2021). Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Rep, 35(10), 109193. https://doi.org/10.1016/j.celrep.2021.109193

      Buneo, C. A., & Andersen, R. A. (2006). The posterior parietal cortex: sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia, 44(13), 2594-2606. https://doi.org/10.1016/j.neuropsychologia.2005.10.011

      Buzsaki, G. (2015). Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188. https://doi.org/10.1002/hipo.22488

      Chen, P.-C., Stritzelberger, J., Walther, K., Hamer, H., & Staresina, B. P. (2024). Hippocampal ripples during offline periods predict human motor sequence learning. bioRxiv, 2024.2010.2006.614680. https://doi.org/10.1101/2024.10.06.614680

      Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian, P., Ryu, S. I., & Shenoy, K. V. (2012). Neural population dynamics during reaching. Nature, 487(7405), 51-56. https://doi.org/10.1038/nature11129

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. J Neurophysiol, 79(2), 1117-1123. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=9463469

      Colclough, G. L., Brookes, M. J., Smith, S. M., & Woolrich, M. W. (2015). A symmetric multivariate leakage correction for MEG connectomes. NeuroImage, 117, 439-448. https://doi.org/10.1016/j.neuroimage.2015.03.071

      Colclough, G. L., Woolrich, M. W., Tewarie, P. K., Brookes, M. J., Quinn, A. J., & Smith, S. M. (2016). How reliable are MEG resting-state connectivity metrics? NeuroImage, 138, 284-293. https://doi.org/10.1016/j.neuroimage.2016.05.070

      Das, A., Karagiorgis, A., Diedrichsen, J., Stenner, M.-P., & Azanon, E. (2024). “Micro-offline gains” convey no benefit for motor skill learning. bioRxiv, 2024.2007.2011.602795. https://doi.org/10.1101/2024.07.11.602795

      Deleglise, A., Donnelly-Kehoe, P. A., Yeffal, A., Jacobacci, F., Jovicich, J., Amaro, E., Jr., Armony, J. L., Doyon, J., & Della-Maggiore, V. (2023). Human motor sequence learning drives transient changes in network topology and hippocampal connectivity early during memory consolidation. Cereb Cortex, 33(10), 6120-6131. https://doi.org/10.1093/cercor/bhac489

      Doyon, J., Bellec, P., Amsel, R., Penhune, V., Monchi, O., Carrier, J., Lehéricy, S., & Benali, H. (2009). Contributions of the basal ganglia and functionally related brain structures to motor learning. [Review]. Behavioural brain research, 199(1), 61-75. https://doi.org/10.1016/j.bbr.2008.11.012

      Doyon, J., Song, A. W., Karni, A., Lalonde, F., Adams, M. M., & Ungerleider, L. G. (2002). Experience-dependent changes in cerebellar contributions to motor sequence learning. Proc Natl Acad Sci U S A, 99(2), 1017-1022. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=11805340

      Euston, D. R., Gruber, A. J., & McNaughton, B. L. (2012). The role of medial prefrontal cortex in memory and decision making. Neuron, 76(6), 1057-1070. https://doi.org/10.1016/j.neuron.2012.12.002

      Euston, D. R., Tatsuno, M., & McNaughton, B. L. (2007). Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science, 318(5853), 1147-1150. https://doi.org/10.1126/science.1148979

      Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E., & Slutzky, M. W. (2012). Local field potentials allow accurate decoding of muscle activity. J Neurophysiol, 108(1), 18-24. https://doi.org/10.1152/jn.00832.2011

      Frankland, P. W., & Bontempi, B. (2005). The organization of recent and remote memories. Nat Rev Neurosci, 6(2), 119-130. https://doi.org/10.1038/nrn1607

      Gais, S., Albouy, G., Boly, M., Dang-Vu, T. T., Darsaud, A., Desseilles, M., Rauchs, G., Schabus, M., Sterpenich, V., Vandewalle, G., Maquet, P., & Peigneux, P. (2007). Sleep transforms the cerebral trace of declarative memories. Proc Natl Acad Sci U S A, 104(47), 1877818783. https://doi.org/10.1073/pnas.0705454104

      Grafton, S. T., Mazziotta, J. C., Presty, S., Friston, K. J., Frackowiak, R. S., & Phelps, M. E. (1992). Functional anatomy of human procedural learning determined with regional cerebral blood flow and PET. J Neurosci, 12(7), 2542-2548.

      Grover, S., Wen, W., Viswanathan, V., Gill, C. T., & Reinhart, R. M. G. (2022). Long-lasting, dissociable improvements in working memory and long-term memory in older adults with repetitive neuromodulation. Nat Neurosci, 25(9), 1237-1246. https://doi.org/10.1038/s41593-022-01132-3

      Gupta, M. W., & Rickard, T. C. (2022). Dissipation of reactive inhibition is sufficient to explain post-rest improvements in motor sequence learning. NPJ Sci Learn, 7(1), 25. https://doi.org/10.1038/s41539-022-00140-z

      Gupta, M. W., & Rickard, T. C. (2024). Comparison of online, offline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep, 14(1), 4661. https://doi.org/10.1038/s41598-024-52726-9

      Hardwick, R. M., Rottschy, C., Miall, R. C., & Eickhoff, S. B. (2013). A quantitative metaanalysis and review of motor learning in the human brain. NeuroImage, 67, 283-297. https://doi.org/10.1016/j.neuroimage.2012.11.020

      Heusser, A. C., Poeppel, D., Ezzyat, Y., & Davachi, L. (2016). Episodic sequence memory is supported by a theta-gamma phase code. Nat Neurosci, 19(10), 1374-1380. https://doi.org/10.1038/nn.4374

      Higgins, C., Liu, Y., Vidaurre, D., Kurth-Nelson, Z., Dolan, R., Behrens, T., & Woolrich, M. (2021). Replay bursts in humans coincide with activation of the default mode and parietal alpha networks. Neuron, 109(5), 882-893 e887. https://doi.org/10.1016/j.neuron.2020.12.007

      Hikosaka, O., Nakamura, K., Sakai, K., & Nakahara, H. (2002). Central mechanisms of motor skill learning. Curr Opin Neurobiol, 12(2), 217-222. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=12015240

      Jacobacci, F., Armony, J. L., Yeffal, A., Lerner, G., Amaro, E., Jr., Jovicich, J., Doyon, J., & Della-Maggiore, V. (2020). Rapid hippocampal plasticity supports motor sequence learning. Proc Natl Acad Sci U S A, 117(38), 23898-23903. https://doi.org/10.1073/pnas.2009576117

      Jacobacci, F., Armony, J. L., Yeffal, A., Lerner, G., Amaro Jr, E., Jovicich, J., Doyon, J., & DellaMaggiore, V. (2020). Rapid hippocampal plasticity supports motor sequence learning.

      Proceedings of the National Academy of Sciences, 117(38), 23898-23903. Karni, A., Meyer, G., Jezzard, P., Adams, M. M., Turner, R., & Ungerleider, L. G. (1995). Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Nature, 377(6545), 155-158. https://doi.org/10.1038/377155a0

      Kennerley, S. W., Sakai, K., & Rushworth, M. F. (2004). Organization of action sequences and the role of the pre-SMA. J Neurophysiol, 91(2), 978-993. https://doi.org/10.1152/jn.00651.2003 00651.2003 [pii]

      Kleim, J. A., Barbay, S., & Nudo, R. J. (1998). Functional reorganization of the rat motor cortex following motor skill learning. J Neurophysiol, 80, 3321-3325.

      Kornysheva, K., Bush, D., Meyer, S. S., Sadnicka, A., Barnes, G., & Burgess, N. (2019). Neural Competitive Queuing of Ordinal Structure Underlies Skilled Sequential Action. Neuron, 101(6), 1166-1180 e1163. https://doi.org/10.1016/j.neuron.2019.01.018

      Lee, S. H., Jin, S. H., & An, J. (2019). The difference in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep, 9(1), 14066. https://doi.org/10.1038/s41598-019-50644-9

      Lisman, J. E., & Jensen, O. (2013). The theta-gamma neural code. Neuron, 77(6), 1002-1016. https://doi.org/10.1016/j.neuron.2013.03.007

      Mollazadeh, M., Aggarwal, V., Davidson, A. G., Law, A. J., Thakor, N. V., & Schieber, M. H. (2011). Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movements. J Neurosci, 31(43), 15531-15543. https://doi.org/10.1523/JNEUROSCI.2999-11.2011

      Molle, M., & Born, J. (2009). Hippocampus whispering in deep sleep to prefrontal cortex--for good memories? Neuron, 61(4), 496-498. https://doi.org/10.1016/j.neuron.2009.02.002

      Morris, R. G. M. (2006). Elements of a neurobiological theory of hippocampal function: the role of synaptic plasticity, synaptic tagging and schemas. [Review]. The European journal of neuroscience, 23(11), 2829-2846. https://doi.org/10.1111/j.1460-9568.2006.04888.x

      Mylonas, D., Schapiro, A. C., Verfaellie, M., Baxter, B., Vangel, M., Stickgold, R., & Manoach, D. S. (2024). Maintenance of Procedural Motor Memory across Brief Rest Periods Requires the Hippocampus. J Neurosci, 44(14). https://doi.org/10.1523/JNEUROSCI.1839-23.2024

      Pan, S. C., & Rickard, T. C. (2015). Sleep and motor learning: Is there room for consolidation? Psychol Bull, 141(4), 812-834. https://doi.org/10.1037/bul0000009

      Penhune, V. B., & Steele, C. J. (2012). Parallel contributions of cerebellar, striatal and M1 mechanisms to motor sequence learning. Behav. Brain Res., 226(2), 579-591. https://doi.org/10.1016/j.bbr.2011.09.044

      Qin, Y. L., McNaughton, B. L., Skaggs, W. E., & Barnes, C. A. (1997). Memory reprocessing in corticocortical and hippocampocortical neuronal ensembles. Philos Trans R Soc Lond B Biol Sci, 352(1360), 1525-1533. https://doi.org/10.1098/rstb.1997.0139

      Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J., & Ard, M. C. (2008). Sleep does not enhance motor sequence learning. J Exp Psychol Learn Mem Cogn, 34(4), 834-842. https://doi.org/10.1037/0278-7393.34.4.834

      Robertson, E. M., Pascual-Leone, A., & Miall, R. C. (2004). Current concepts in procedural consolidation. Nat Rev Neurosci, 5(7), 576-582. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=15208699

      Sawamura, D., Sakuraba, S., Suzuki, Y., Asano, M., Yoshida, S., Honke, T., Kimura, M., Iwase, Y., Horimoto, Y., Yoshida, K., & Sakai, S. (2019). Acquisition of chopstick-operation skills with the non-dominant hand and concomitant changes in brain activity. Sci Rep, 9(1), 20397. https://doi.org/10.1038/s41598-019-56956-0

      Schendan, H. E., Searl, M. M., Melrose, R. J., & Stern, C. E. (2003). An FMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning. Neuron, 37(6), 1013-1025. https://doi.org/10.1016/s0896-6273(03)00123-5

      Seedat, Z. A., Quinn, A. J., Vidaurre, D., Liuzzi, L., Gascoyne, L. E., Hunt, B. A. E., O'Neill, G. C., Pakenham, D. O., Mullinger, K. J., Morris, P. G., Woolrich, M. W., & Brookes, M. J. (2020). The role of transient spectral 'bursts' in functional connectivity: A magnetoencephalography study. NeuroImage, 209, 116537. https://doi.org/10.1016/j.neuroimage.2020.116537

      Shadmehr, R., & Holcomb, H. H. (1997). Neural correlates of motor memory consolidation. Science, 277, 821-824.

      Sjøgård, M., Baxter, B., Mylonas, D., Driscoll, B., Kwok, K., Tolosa, A., Thompson, M., Stickgold, R., Vangel, M., Chu, C., & Manoach, D. S. (2024). Hippocampal ripples mediate motor learning during brief rest breaks in humans. bioRxiv. https://doi.org/10.1101/2024.05.02.592200

      Srinivas, S., Sarvadevabhatla, R. K., Mopuri, K. R., Prabhu, N., Kruthiventi, S. S. S., & Babu, R. V. (2016). A Taxonomy of Deep Convolutional Neural Nets for Computer Vision [Technology Report]. Frontiers in Robotics and AI, 2. https://doi.org/10.3389/frobt.2015.00036

      Sterpenich, V., Albouy, G., Darsaud, A., Schmidt, C., Vandewalle, G., Dang Vu, T. T., Desseilles, M., Phillips, C., Degueldre, C., Balteau, E., Collette, F., Luxen, A., & Maquet, P. (2009). Sleep promotes the neural reorganization of remote emotional memory. J Neurosci, 29(16), 5143-5152. https://doi.org/10.1523/JNEUROSCI.0561-09.2009

      Toni, I., Ramnani, N., Josephs, O., Ashburner, J., & Passingham, R. E. (2001). Learning arbitrary visuomotor associations: temporal dynamic of brain activity. Neuroimage, 14(5), 10481057. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=11697936

      Toni, I., Thoenissen, D., & Zilles, K. (2001). Movement preparation and motor intention. NeuroImage, 14(1 Pt 2), S110-117. https://doi.org/10.1006/nimg.2001.0841

      Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316(5821), 76-82. https://doi.org/10.1126/science.1135935

      van Kesteren, M. T., Fernandez, G., Norris, D. G., & Hermans, E. J. (2010). Persistent schemadependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proc Natl Acad Sci U S A, 107(16), 7550-7555. https://doi.org/10.1073/pnas.0914892107

      van Kesteren, M. T., Ruiter, D. J., Fernandez, G., & Henson, R. N. (2012). How schema and novelty augment memory formation. Trends Neurosci, 35(4), 211-219. https://doi.org/10.1016/j.tins.2012.02.001

      Vidaurre, D., Hunt, L. T., Quinn, A. J., Hunt, B. A. E., Brookes, M. J., Nobre, A. C., & Woolrich, M. W. (2018). Spontaneous cortical activity transiently organises into frequency specific phase-coupling networks. Nat Commun, 9(1), 2987. https://doi.org/10.1038/s41467-01805316-z

      Wagner, A. D., Schacter, D. L., Rotte, M., Koutstaal, W., Maril, A., Dale, A. M., Rosen, B. R., & Buckner, R. L. (1998). Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. [Comment]. Science (New York, N.Y.), 281(5380), 1188-1191. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=9712582 &retmode=ref&cmd=prlinks

      Wolpert, D. M., Goodbody, S. J., & Husain, M. (1998). Maintaining internal representations: the role of the human superior parietal lobe. Nat Neurosci, 1(6), 529-533. https://doi.org/10.1038/2245

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of microtubule dynamics and its effects on neuronal aging. Using C. elegans as a model, the authors investigate the role of evolutionarily conserved Hippo pathway in microtubule dynamics of touch receptor neurons (TRNs) in an age-dependent manner. Using genetic, molecular, behavioral, and pharmacological approaches, the authors show that age-dependent loss of microtubule dynamics might underlie structural and functional aging of TRNs. Further, the authors show that the Hippo pathway specifically functions in these neurons to regulate microtubule dynamics. Specifically, authors show that hyperactivation of YAP-1, a downstream component of the Hippo pathway that is usually inhibited by the kinase activity of the upstream components of the pathway, results in microtubule stabilization and that might underlie the structural and functional decline of TRNs with age. However, how the Hippo pathway regulates microtubule dynamics and neuronal aging was not investigated by the authors.

      Strengths:

      This is a well-conducted and well-controlled study, and the authors have used multiple approaches to address different questions.

      Weaknesses:

      There are no major weaknesses identified, except that the effect of the Hippo pathway seems to be specific to only a subset of neurons. I would like the authors to address the specificity of the effect of the Hippo pathway in TRNs, in their resubmission.

      Although our genetic experiments, including TRNs-specific rescue/overexpression of YAP-1 and knockdown of WTS-1, strongly suggest that a cell-autonomous function of WTS-1-YAP-1 axis in TRNs, the Hpo pathway could have broader roles in neuroprotection. While this pathway may regulate microtubules stability in multiple neurons, other characteristics of TRNs, such as their anatomical localization near the cuticle or their long projections along body axis, could contribute to their susceptibilities to age-related deformation. Otherwise, the Hpo pathway may be truly TRNs-specific. TRNs have unique microtubules in both terms of composition and structure. Among nine α-, six β-tubulin genes in C. elegans, one α-tubulin (mec-12) and one β-tubulin (mec-7) showed highly enriched expression in TRNs [1, 2] and TRNs contain special 15-protofilament microtubule structure, while all other neurons in C. elegans have 11-protofilament microtubules [3]. Transcriptional regulation through YAP-1 may affect the specific microtubule structure of TRNs, leading to premature neuronal deformation. We have included this in the discussion section of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study examines a novel role of the Hpo signaling pathway, specifically of wts-1/LATS and the downstream regulator of gene expression, yap, in age-related neurodegeneration in C. elegans touch-responsive mechanosensory neurons, ALM and PLM. The study shows that knockdown or deletion of wts-1/LATS causes age-associated morphological abnormalities of these neurons, accompanied by functional loss of touch responsiveness. This is further associated with enhanced, abnormal, microtubule stabilization in these neurons.

      Strengths:

      This study examines a novel role of the Hpo signaling pathway, specifically of wts-1/LATS and the downstream regulator of gene expression, yap, in age-related neurodegeneration in C. elegans touch-responsive mechanosensory neurons, ALM and PLM. The study shows that knockdown or deletion of wts-1/LATS causes age-associated morphological abnormalities of these neurons, accompanied by functional loss of touch responsiveness. This is further associated with enhanced, abnormal, microtubule stabilization in these neurons. Strong pharmacological and especially genetic manipulations of MT-stabilizing or severing proteins show a strong genetic link between yap and regulation of MTs stability. The study is strong and uses robust approaches, especially strong genetics. The demonstrations on the aging-related roles of the Hpo signaling pathway, and the link to MTs, are novel and compelling. Nevertheless, the study also has mechanistic weaknesses (see below).

      Weaknesses:

      Specific comments:

      (1) The study demonstrates age-specific roles of the Hpo pathway, specifically of wts-1/LATS and yap, specifically in TRN mechanosensory neurons, without observing developmental defects in these neurons, or effects in other neurons. This is a strong demonstration. Nevertheless, the study does not address whether there is a correlation of Hpo signaling pathway activity decline specifically in these neurons, and not other neurons, and at the observed L4 stage and onwards (including the first day of adulthood, 1DA stage). Such demonstrations of spatio-temporal regulation of the Hpo signaling pathway and its activation seem important for linking the Hpo pathway with the observed age-related neurodegeneration. Can this age-related response be correlated to indeed a decline in Hpo signaling during adulthood? Especially at L4 and onwards? It will be informative to measure this by examining the decline in wts1 as well as yap levels and yap nuclear localization.

      As described above, we have included possible explanations for the specificity of the Hpo pathway in TRNs. Since components of the Hpo pathway are expressed in various tissues, including the intestine and hypodermis, this pathway could have broader neuroprotective roles across multiple neurons. Alternatively, it could function in TRNs. Given that the TRNs possess unique microtubules in both structure and composition, and that Hpo pathway has crucial roles in microtubule stability regulation, the roles of the Hpo pathway may indeed be TRNs-specific. As we described in the manuscript, our observations, along with those of others, indicate that neuronal deformation of TRNs begins around the 4th day of adulthood. Additionally, the degree of morphological deformation in wts-1 mutants at the L4 stage is comparable to that of aged wild-type worms on the 15th day of adulthood. Therefore, to assess the functional decline of WTS-1 or nuclear localization of YAP-1, observations should begin in 4-day-old animals. Using fluorescence-tagged YAP-1 under the mec-4 promoter, we couldn’t detect a significant increase in nuclear YAP-1 in TRNs of 4-day-old adult. Additionally, we were unable to assess YAP-1 intercellular localization in older animals, such as 10-day-old animals, possibly due to the small cell size of neurons or morphological alteration along with aging of TRNs. Although we did not detect functional decline of WTS-1 or increased nuclear YAP-1 in TRNs, nuclear localization of YAP-1 increases with age in other tissues, such as the intestine and hypodermis (Author response image. 1). This may result from inactivation of the Hippo (Hpo) pathway, an indirect consequence of structural and functional decline—such as tissue stiffness associated with aging—or a combination of both. Additionally, given that morphological deformation of TRNs appears to begin around fourth day of adulthood, nuclear localization of YAP-1 in the intestine and hypodermis seems to have a later onset and be more moderate. It is possible that YAP-1 nuclear localization in TRNs occurs earlier or that other factors contribute early-stage touch neuronal deformation.

      Author response image 1.

      Quantification of the proportion of worms exhibiting nuclear localization of YAP-1. We used GFP-tagged YAP-1 driven by its own 4 kb promoter. A total of 90 animals were observed each day.

      (2) The Hpo pathway eventually activates gene expression via yap. Although the study uses robust genetic manipulations of yap and wts-1/LATS, it is not clear whether the observed effects are attributed to yap-mediated regulation of gene expression (see 3).

      Given that the neuronal deformation in the wts-1 mutant was completely restored by the loss of yap-1 or egl-44, it strongly suggests that YAP-TEAD-mediated transcriptional regulation is responsible for the premature neuronal degeneration of the wts-1 mutant. However, in this study, we were unable to identify specific transcriptional target genes associated with these phenomena, which represents a limitation of our research (please see below).

      (3) The observations on the abnormal MT stabilization, and the subsequent genetic examinations of MT-stability/severing genes, are a significant strength of the study. Nevertheless, despite the strong genetic links to yap and wts-1/LATS, it is not clear whether MT-regulatory genes are regulated by transcription downstream of the Hpo pathway, thus not enabling a strong causal link between MT regulation and Hpo-mediated gene expression, making this strong part of the study mechanistically circumstantial. Specifically, it will be good to examine whether the genes addressed herein, for example, Spastin, are transcriptionally regulated downstream of the Hpo pathway. This comment is augmented by the finding that in the wts-1/ yap-1 double mutants, MT abnormality, and subsequent neuronal morphology and touch responses are restored, clearly indicating that there is an associated transcriptional regulation

      If the target genes of YAP-1 are not identified, it will be difficult to fully understand how YAP-1 regulates microtubule stability. Microtubule-stabilizing genes, whose knockdown alleviates wts-1 mutant neuronal deformation, could be potential transcriptional targets of YAP-1. Among these genes, PTRN-1 and DLK-1 contain MCAT sequences (CATTCCA/T), a well-conserved DNA motif recognized by the TEAD transcription factor, in their promoters near the transcription start site (TSS). We hypothesized that the expression of fluorescence-tagged reporters of promoter regions containing these MCAT sequences would be enhanced in the absence of wts-1 activity. Although both reporters were expressed in TRNs, they did not show significant changes in the wts-1 mutant background. We also focused on spv-1, a worm homolog of ARHGAP29, which negatively regulates RhoA. YAP is known to modulate actin cytoskeleton rigidity through transcriptional regulation of ARHGAP29 [4]. The promoter of spv-1 contains 2 MCAT sequences and loss of spv-1 mitigated neuronal deformation of the wts-1 mutant. However, reporters of promoter regions containing MCAT sequences only weakly expressed in the process of TRNs. More importantly, ectopic expression of dominant-negative form of rho-1/rhoA did not lead to significant deformation of TRNs. While YAP typically functions as a transcriptional co-activator, it has also been reported to repress target gene expression, such as DDIT4 and Trail, in collaborated with TEAD transcriptional factor [5].  As a reviewer pointed out, spas-1 might be transcriptionally repressed by yap-1, given that its loss leads to premature deformation of TRNs. However, since the phenotype of the spas-1 mutant has a later onset than the wts-1 mutant and is relatively restricted to ALM, we excluded it from our candidate gene search. Despite extensive genetic approaches, we were unable to establish a strong causal link between YAP-1 and the regulation of microtubule stability. Unbiased screenings, such as tissue-specific transcriptome analysis, may help address the remaining questions. We have outlined the limitations of this study in the discussion section of the revised manuscript.

      Other comments:

      (1) The TRN-specific knockdown of wts-1 and yap-1 is a clear strength. Nevertheless, these do not necessarily show cell-autonomous effects, as the yap transcription factor may regulate the expression of external cues, secreted or otherwise, thus generating non-cell autonomous effects. For example, it is known that yap regulates TGF-beat expression and signaling.

      In the absence of LATS1/2 activity, activated YAP has been reported to drive biliary epithelial cell lineage specification by directly regulating TGF-β transcription during and after liver development [6]. Even when functioning in an autocrine manner, TGF-β can exhibit non-cell autonomous effects. While it primarily acts on the same cell that secretes it, some molecules may also affect neighboring cells, leading to paracrine effects. Additionally, TGF-β can modify the extracellular matrix (ECM), indirectly affecting surrounding cells. Similarly, if YAP regulates transcription of secretory protein in TRNs, the resulting extracellular factors or surrounding cells may influence touch neuronal microtubules in a non-cell-autonomous manner. Although our genetic data strongly suggest a cell-autonomous function of WTS-1-YAP-1 in TRNs, we could not exclude the possibility that YAP-1 functions non-cell-autonomously, as we were unable to identify its transcriptional targets. We have included this in the discussion section of the revised manuscript.

      (2) Continuing from comment (3) above, it seems that many of the MT-regulators chosen here for genetic examinations were chosen based on demonstrated roles in neurodegeneration in other studies. It would be good to show whether these MT-associated genes are directly regulated by transcription by the Hpo pathway.

      As we described above, several MT-associated genes­­, such as ptrn-1, dlk-1 and spv-1, contain MCAT sequences in their promoter and their knockdown alleviated wts-1-induced neuronal deformation. These genes were tested to determine whether they were directly regulated by WTS-1-YAP-1. Based on our findings, we concluded that they were unlikely to be regulated by the Hpo pathway in TRNs.

      (3) The impairment of the touch response may not be robust: it is only a 30-40% reduction at L4, and even less reduction at 1DA. It would be good to offer possible explanations for this finding.

      As pointed out by the reviewer, the impairment of touch responses of wts-1 mutants showed an approximately 33% reduction at both L4 and 1DA compared to age-matched wild-type animals. At the L4 stage, control worms responded to nearly every gentle touch (94%), whereas wts-1 mutants responded to only 60% of stimuli. By 1DA, control worms exhibited slightly decline in touch responses compared to L4 (82.5%), whereas wts-1 mutants displayed more pronounced impairment (55.7%) (Fig 1E). Regarding the severity and frequency of structural degeneration of wts-1 mutant at both stages, it appears to be relatively moderate. As we noted in the manuscript, our observations, along with those of others, indicate that structural abnormalities in ALM and PLM neurons begin to appear around the fourth day of adulthood and progressively worsen as the worms age [7]. In a previous study, Tank et al. categorized day 10-aged worms into two groups based on their movement ability and then assessed structural deformation in each animal to determine whether structural and functional degeneration of TRNs were correlated. In this same group of animals, they examined the gentle touch response and found that animals responded to gentle touch 46 ± 5.1 %, 84 ± 12.2 %, respectively [8]. It could be said that, on average, day 10 animals had 65% touch response on average, which is consistent with our observation in day 10 animals (Fig. 5E, 56.3%). Given these observations, the function of TRNs of wts-1 mutant or aged animals appears to be preserved despite severe structure failures. The gentle touch response evokes an escape behavior in which animals quickly move away from the stimulus; thus proper touch responses are essential for avoiding predators and ensuring survival. It has been reported to be necessary for evading fungal predation, such as escaping from a constricting hyphal ring [9]. Given that the gentle touch response is crucial for survival, its function is likely well preserved despite structural abnormalities, such as age-related deformation.

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) Why is the effect of the Hippo pathway on microtubule dynamics specific to TRNs? Is it the structure of TRNs that makes them prone to the effects of age-dependent decline in microtubule dynamics? The authors are advised to discuss it in their resubmission.

      As described above, we have included possible explanations for the tissue specificity of the Hpo pathway in TRNs and the vulnerability of TRNs to age-associated decline in the discussion section of the revised manuscript.

      (2) The authors are advised to explain the shorter life span of wts-1; yap-1 double mutants (with restored TRNs) compared to wts-1 single mutants in Figure 2F. The life span of yap-1 single mutants should be included in Figure 2F. Further, based on the data, the shorter lifespan of wts-1 mutants cannot be attributed to abnormal TRNs as the lifespan of wts-1; yap-1 double mutants is even shorter. The authors are advised to explain the shorter life span of wts-1 mutants compared to wild-type controls.

      wts-1 is known to be involved in various developmental processes, including the maintenance of apicobasal polarity in the intestine, growth rate control, and dauer formation [10-12]. Since WTS-1 activity is restored in the intestine of the mutant used for lifespan measurement, the shorter lifespan of the wts-1 mutant may result from the loss of WTS-1 in tissues other than the intestine. Although we were unable to include lifespan data for the yap-1 mutant, recent studies indicate that the yap-1(tm1416) mutant or yap-1 RNAi treated worms exhibit a shortened lifespan [13, 14]. Thus, our data showing a slightly shorter lifespan of the wts-1; yap-1 mutant compared with the wts-1 mutant may result from the synergistic action of yap-1 and yap-1-independent downstream factors of wts-1. While this study does not provide an explanation for the shortened lifespan of wts-1 or wts-1; yap-1 mutants, the fact that the wts-1; yap-1 double mutant with restored TRNs still have a shorter lifespan compared with the wts-1 mutant strongly suggests that premature deformation of the wts-1 neurons appear to be a touch neuron-specific event, rather than being associated with whole body, as described in the manuscript..

      Minor comments:

      (1) In the abstract, please provide definitions for LATS and YAP. Authors can mention that LATS is a kinase and YAP a transcriptional co-activator in the Hippo pathway.

      (2) In the last paragraph on page 9, change "these function" to "this function", and change "knock-downed" to "knocked down".

      (3) On page 10, paragraph 2, change "regarding the action mechanism" to "regarding the mechanism of action".

      (4) On page 11, paragraph 1, change "endogenous WTS-1 could inhibits" to "endogenous WTS-1 could inhibit".

      (5) On page 16, paragraph 1, change "consistent to the hypothesis" to "consistent with this hypothesis".

      (6) Overall, the paper is well written. However, there is still room to improve the language and diction used by the authors.

      We have revised all minor comments suggested by the reviewer in the revised manuscript.

      References

      (1) Hamelin M, Scott IM, Way JC, Culotti JG. The mec-7 beta-tubulin gene of Caenorhabditis elegans is expressed primarily in the touch receptor neurons. EMBO J. 1992;11(8):2885-93. Epub 1992/08/01. doi: 10.1002/j.1460-2075.1992.tb05357.x. PubMed PMID: 1639062; PubMed Central PMCID: PMCPMC556769.

      (2) Fukushige T, Siddiqui ZK, Chou M, Culotti JG, Gogonea CB, Siddiqui SS, et al. MEC-12, an alpha-tubulin required for touch sensitivity in C. elegans. J Cell Sci. 1999;112 ( Pt 3):395-403. Epub 1999/01/14. doi: 10.1242/jcs.112.3.395. PubMed PMID: 9885292.

      (3) Chalfie M, Thomson JN. Structural and functional diversity in the neuronal microtubules of Caenorhabditis elegans. J Cell Biol. 1982;93(1):15-23. Epub 1982/04/01. doi: 10.1083/jcb.93.1.15. PubMed PMID: 7068753; PubMed Central PMCID: PMCPMC2112106.

      (4) Qiao Y, Chen J, Lim YB, Finch-Edmondson ML, Seshachalam VP, Qin L, et al. YAP Regulates Actin Dynamics through ARHGAP29 and Promotes Metastasis. Cell Rep. 2017;19(8):1495-502. Epub 2017/05/26. doi: 10.1016/j.celrep.2017.04.075. PubMed PMID: 28538170.

      (5) Kim M, Kim T, Johnson RL, Lim DS. Transcriptional co-repressor function of the hippo pathway transducers YAP and TAZ. Cell Rep. 2015;11(2):270-82. Epub 2015/04/07. doi: 10.1016/j.celrep.2015.03.015. PubMed PMID: 25843714.

      (6) Lee DH, Park JO, Kim TS, Kim SK, Kim TH, Kim MC, et al. LATS-YAP/TAZ controls lineage specification by regulating TGFbeta signaling and Hnf4alpha expression during liver development. Nat Commun. 2016;7:11961. Epub 2016/07/01. doi: 10.1038/ncomms11961. PubMed PMID: 27358050; PubMed Central PMCID: PMCPMC4931324.

      (7) Toth ML, Melentijevic I, Shah L, Bhatia A, Lu K, Talwar A, et al. Neurite sprouting and synapse deterioration in the aging Caenorhabditis elegans nervous system. J Neurosci. 2012;32(26):8778-90. Epub 2012/06/30. doi: 10.1523/JNEUROSCI.1494-11.2012. PubMed PMID: 22745480; PubMed Central PMCID: PMCPMC3427745.

      (8) Tank EM, Rodgers KE, Kenyon C. Spontaneous age-related neurite branching in Caenorhabditis elegans. J Neurosci. 2011;31(25):9279-88. Epub 2011/06/24. doi: 10.1523/JNEUROSCI.6606-10.2011. PubMed PMID: 21697377; PubMed Central PMCID: PMCPMC3148144.

      (9) Maguire SM, Clark CM, Nunnari J, Pirri JK, Alkema MJ. The C. elegans touch response facilitates escape from predacious fungi. Curr Biol. 2011;21(15):1326-30. Epub 2011/08/02. doi: 10.1016/j.cub.2011.06.063. PubMed PMID: 21802299; PubMed Central PMCID: PMCPMC3266163.

      (10) Cai Q, Wang W, Gao Y, Yang Y, Zhu Z, Fan Q. Ce-wts-1 plays important roles in Caenorhabditis elegans development. FEBS Lett. 2009;583(19):3158-64. Epub 2009/09/10. doi: 10.1016/j.febslet.2009.09.002. PubMed PMID: 19737560.

      (11) Kang J, Shin D, Yu JR, Lee J. Lats kinase is involved in the intestinal apical membrane integrity in the nematode Caenorhabditis elegans. Development. 2009;136(16):2705-15. Epub 20090715. doi: 10.1242/dev.035485. PubMed PMID: 19605499.

      (12) Lee H, Kang J, Ahn S, Lee J. The Hippo Pathway Is Essential for Maintenance of Apicobasal Polarity in the Growing Intestine of Caenorhabditis elegans. Genetics. 2019;213(2):501-15. Epub 20190729. doi: 10.1534/genetics.119.302477. PubMed PMID: 31358532; PubMed Central PMCID: PMCPMC6781910.

      (13) Teuscher AC, Statzer C, Goyala A, Domenig SA, Schoen I, Hess M, et al. Longevity interventions modulate mechanotransduction and extracellular matrix homeostasis in C. elegans. Nat Commun. 2024;15(1):276. Epub 2024/01/05. doi: 10.1038/s41467-023-44409-2. PubMed PMID: 38177158; PubMed Central PMCID: PMCPMC10766642.

      (14) Saul N, Dhondt I, Kuokkanen M, Perola M, Verschuuren C, Wouters B, et al. Identification of healthspan-promoting genes in Caenorhabditis elegans based on a human GWAS study. Biogerontology. 2022;23(4):431-52. Epub 2022/06/25. doi: 10.1007/s10522-022-09969-8. PubMed PMID: 35748965; PubMed Central PMCID: PMCPMC9388463.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Review:

      Reviewer #1 (Public review): 

      Summary: 

      Odor- and taste-sensing are mediated by two different systems, the olfactory and gustatory systems, and have different behavioral roles. In this study, Wei et al. challenge this dichotomy by showing that odors can activate gustatory receptor neurons (GRNs) in Drosophila to promote feeding responses, including the proboscis extension response (PER) that was previously thought to be driven only by taste. While previous studies suggested that odors can promote PER to appetitive tastants, Wei et al. go further to show that odors alone cause PER, this effect is mediated through sweet-sensing GRNs, and sugar receptors are required. The study also shows that odor detection by bitter-sensing GRNs suppresses PER. The authors' conclusions are supported by behavioral assays, calcium imaging, electrophysiological recordings, and genetic manipulations. The observation that both attractive and aversive odors promote PER leaves an open question as to why this effect is adaptive. Overall, the study sheds new light on chemosensation and multimodal integration by showing that odor and taste detection converge at the level of sensory neurons, a finding that is interesting and surprising while also being supported by another recent study (Dweck & Carlson, Sci Advances 2023).

      Strengths: 

      (1) The main finding that odors alone can promote PER by activating sweet-sensing GRNs is interesting and novel.

      (2) The study uses video tracking of the proboscis to quantify PER rather than manual scoring, which is typically used in the field. The tracking method is less subjective and provides a higherresolution readout of the behavior.

      (3) The study uses calcium imaging and electrophysiology to show that odors activate GRNs. These represent complementary techniques that measure activity at different parts of the GRN (axons versus dendrites, respectively) and strengthen the evidence for this conclusion. 

      (4) Genetic manipulations show that odor-evoked PER is primarily driven by sugar GRNs and sugar receptors rather than olfactory neurons. This is a major finding that distinguishes this work from previous studies of odor effects on PER and feeding (e.g., Reisenman & Scott, 2019; Shiraiwa, 2008) that assumed or demonstrated that odors were acting through olfactory neurons.

      We appreciate the reviewer’s positive assessment of the novelty and significance of our work.

      Weaknesses/Limitations: 

      (1) The authors may want to discuss why PER to odors alone has not been previously reported, especially as they argue that this is a broad effect evoked by many different odors. Previous studies testing the effect of odors on PER only observed odor enhancement of PER to sugar (Oh et al., 2021; Reisenman & Scott, 2019; Shiraiwa, 2008) and some of these studies explicitly show no effect of odor alone or odor with low sugar concentration; regardless, the authors likely would have noticed if PER to odor alone had occurred. Readers of this paper may also be aware of unpublished studies failing to observe an effect of PER on odor alone (including studies performed by this reviewer and unrelated work by other colleagues in the field), which of course the authors are not expected to directly address but may further motivate the authors to provide possible explanations.

      We appreciate the reviewer’s comment. We believe that the difference in genotype is likely the largest reason behind this point. This is because the strength varied widely across genotypes and was quite weak in some strains including commonly used w[1118] empty Gal4 and w[1118] empty spit Gal4 as shown in Figure1- figure supplement 3 (Figure S3 in original submission). However, given that we observed odor-evoked PER in various genotypes (many in main Figures and three in Figure1- figure supplement 3 including Drosophila simulans), the data illustrate that it is a general phenomenon in Drosophila. Indeed, although Oh et al. (2021) did not emphasize it in the text, their Fig. 1E showed that yeast odor evoked PER at a probability of 20%, which is much higher than the rate of spontaneous PER in many genotypes. Therefore, this literature may represent another support for the presence of odor-evoked PER. We have expanded our text in the Discussion to describe these issues.

      Another possibility is our use of DeepLabcut to quantitatively track the kinematics of proboscis movement, which may have facilitated the detection of PER.

      (2) Many of the odor effects on behavior or neuronal responses were only observed at very high concentrations. Most effects seemed to require concentrations of at least 10-2 (0.01 v/v), which is at the high end of the concentration range used in olfactory studies (e.g., Hallem et al., 2004), and most experiments in the paper used a far higher concentration of 0.5 v/v. It is unclear whether these are concentrations that would be naturally encountered by flies.

      We acknowledge that the concentrations used are on the higher side, suggesting that GRNs may need to be stimulated with relatively concentrated odors to induce PER. Although it is difficult to determine the naturalistic range of odor concentration, it is at least widely reported that olfactory neurons including olfactory receptor neurons and projection neurons do not saturate, and exhibit odor identity-dependent responses at the concentration of 10<sup>-2</sup> where odor-evoked PER can be observed. Furthermore, we have shown in Figure 6 that low concentration (10<sup>-4</sup>) of banana odor, ethyl butyrate, and 4-methycyclohexanol all significantly increased the rate of odor-taste multisensory PER even in olfactory organs-removed flies, suggesting that low concentration odors can influence feeding behavior via GRNs in a natural context where odors and tastants coexist at food sites. Finally, we note that odors were further diluted by a factor of 0.375 by mixing the odor stream with the main air stream before being applied to the flies as described in Methods.

      (3) The calcium imaging data showing that sugar GRNs respond to a broad set of odors contrasts with results from Dweck & Carlson (Sci Adv, 2023) who recorded sugar neurons with electrophysiology and observed responses to organic acids, but not other odors. This discrepancy is not discussed.  

      As the reviewer points out, Dweck and Carlson (Sci Adv, 2023) reported using single sensillum electrophysiology (base recording) that sugar GRNs only respond to organic acids whereas we found using calcium imaging from a group of axons and single sensillum electrophysiology (tip recording) that these GRNs respond to a wide variety of odors. Given that we observed odor responses using two methods, the discrepancy is likely due to the differences in genotype examined. We now have discussed this point in the text.

      (4) Related to point #1, it would be useful to see a quantification of the percent of flies or trials showing PER for the key experiments in the paper, as this is the standard metric used in most studies and would help readers compare PER in this study to other studies. This is especially important for cases where the authors are claiming that odor-evoked PER is modulated in the same way as previously shown for sugar (e.g., the effect of starvation in Figure S4).

      For starved flies, we would like to remind the reviewer that the percentage of trials showing PER is reported in Fig. 1E, which shows a similar trend as the integrated PER duration. For fed flies, we have analyzed the percentage of PER and added the result to Figure 2-figure supplement 1C (Figure S4 in original submission).

      (5) Given the novelty of the finding that odors activate sugar GRNs, it would be useful to show more examples of GCaMP traces (or overlaid traces for all flies/trials) in Figure 3. Only one example trace is shown, and the boxplots do not give us a sense of the reliability or time course of the response. A related issue is that the GRNs appear to be persistently activated long after the odor is removed, which does not occur with tastes. Why should that occur? Does the time course of GRN activation align with the time course of PER, and do different odors show differences in the latency of GRN activation that correspond with differences in the latency of PER (Figure S1A)?

      Following the reviewer’s suggestion, we now report GCaMP responses for all the trials in all the flies (both Gr5a>GCaMP and Gr66a>GCaMP flies), where the time course and trial-to-trial/animal-toanimal variability of calcium responses can be observed (Figure 3-figure supplement 2).

      Regarding the second point, we recorded responses to both sucrose and odors in some flies and found that calcium responses of GRNs are long-lasting not only to odors but also to sucrose, as shown in Author response image 1. This may be due in part to the properties of GCaMP6s and slower decay of intracellular calcium concentration as compared to spikes.

      Author response image 1.

      Example calcium responses to sucrose and odor (MCH) in the same fly (normalized by the respective peak responses to better illustrate the time course of responses). Sucrose (blue) and odor (orange) concentrations are 100 mM, and 10<sup>-1</sup> respectively. Odor stimulation begins at 5 s and lasts for 2 s. Sucrose was also applied at the same timing for the same duration although there was a limitation in controlling the precise timing and duration of tastant application. Because of this limitation, we did not quantify the off time constant of two responses.

      To address whether the time course of GRN activation aligns with the time course of PER, and whether different odors evoke different latencies of GRN activation that correspond to latencies of PER, we plotted the time course of GRN responses and PER, and further compared the response latencies across odors and across two types of responses in Gr5a>GCaMP6s flies. As shown in Author response image 2, no significant differences were found in response latency between the six odors for PER and odor responses. Furthermore, Pearson correlation between GRN response latencies and PER latencies was not significant (r = 0.09, p = 0.872).

      Author response image 2.

      (A) PER duration in each second in Gr5a-Gal4>UAS-GCaMP6s flies. The black lines indicate the mean and the shaded areas indicate standard error of the mean. n = 25 flies. (B) Time course of calcium responses (ΔF/F) to nine odors in Gr5a GRNs. n = 5 flies. (C) Latency to the first odor-evoked PER in Gr5a-Gal4>UAS-GCaMP6s flies. Green bar indicates the odor application period. p = 0.67, one-way ANOVA. Box plots indicate the median (orange line), mean (black dot), quartiles (box), and 5-95% range (bar). Dots are outliers. (D) Latency of calcium responses (10% of rise to peak time) in Gr5a GRNs. Green bar indicates the odor application period. p = 0.32, one-way ANOVA. Box plots indicate the median (orange line), mean (black dot), quartiles (box), and 5-95% range (bar). Dots are outliers.

      (6) Several controls are missing, and in some cases, experimental and control groups are not directly compared. In general, Gal4/UAS experiments should include comparisons to both the Gal4/+ and UAS/+ controls, at least in cases where control responses vary substantially, which appears to be the case for this study. These controls are often missing, e.g. the Gal4/+ controls are not shown in Figure 2C-G and the UAS/+ controls are not shown in Figure 2J-L (also, the legend for the latter panels should be revised to clarify what the "control" flies are). For the experiments in Figure S5, the data are not directly compared to any control group. For several other experiments, the control and experimental groups are plotted in separate graphs (e.g., Figure 2C-G), and they would be easier to visually compare if they were together. In addition, for each experiment, the authors should denote which comparisons are statistically significant rather than just reporting an overall p-value in the legend (e.g., Figure 2H-L).

      We thank the reviewer for the input. We have conducted additional experiments for four Gal4/+controls in Figure 2 and added detailed information about control flies in the figure legend (Figure 2C-F).

      For the RNAi flies shown in Figure 2 and Figure 2-figure supplement 3, we used the recommended controls suggested by the VDRC. These control flies were crossed with tubulin-Gal4 lines to include both Gal4 and UAS control backgrounds.

      Regarding Figure S5 in original submission (current Figure 2-figure supplement 2), we now present the results of statistical tests which revealed that PER to certain odors is statistically significantly stronger than that to the solvent control (mineral oil) for both wing-removed and wing-leg-removed flies.

      For Figure 2C-F, we now plot the results for experimental and control groups side by side in each figure.

      Regarding the results of statistical tests, we have provided more information in the legend and also prepared a summary table (supplemental table). 

      (7) Additional controls would be useful in supporting the conclusions. For the Kir experiments, how do we know that Kir is effective, especially in cases where odor-evoked PER was not impaired (e.g., Orco/Kir)? The authors could perform controls testing odor aversion, for example. For the Gr5a mutant, few details are provided on the nature of the control line used and whether it is in the same genetic background as the mutant. Regardless, it would be important to verify that the Gr5a mutant retains a normal sense of smell and shows normal levels of PER to stimuli other than sugar, ruling out more general deficits. Finally, as the method of using DeepLabCut tracking to quantify PER was newly developed, it is important to show the accuracy and specificity of detecting PER events compared to manual scoring.  

      A previous study (Sato, 2023, Front Mol Neurosci) showed that the avoidance to 100 μM 2methylthiazoline was abolished, and the avoidance to 1 mM 2MT was partially impaired in Orco>Kir2.1 flies. However, because Orco-Gal4 does not label all the ORNs and we have more concrete results on flies in which all the olfactory organs are removed as well as specific GRNs and Gr are manipulated, we decided to remove the data for Orco>kir2.1 flies and have updated the text and Figure 2 accordingly.

      For the Gr5a mutant and its control, we have added detailed information about the genotype in the figure legend and in the Methods. We have used the exact same lines as reported in Dahanukar et al. (2007) by obtaining the lines from Dr. Dahanukar. Dahanukar et al. has already carefully examined that Gr5a mutant loses responses only to certain types of sugars (e.g. it even retains normal responses to some other sugars), demonstrating that Gr5a mutants do not exhibit general deficits.

      As for the PER scoring method, we manually scored PER duration and compared the results with those obtained using DeepLabCut in wild type flies for the representative data. The two results were similar (no statistical difference). We have reported the result in Figure1-figure supplement 1C.

      (8) The authors' explanation of why both attractive and aversive odors promote PER (lines 249-259) did not seem convincing. The explanation discusses the different roles of smell and taste but does not address the core question of why it would be adaptive for an aversive odor, which flies naturally avoid, to promote feeding behavior.  

      We have extended our explanation in the Discussion by adding the following possibility: “Enhancing PER to aversive odors might also be adaptive as animals often need to carry out the final check by tasting a trace amount of potentially dangerous substances to confirm that those should not be further consumed.”

      Reviewer #2 (Public review): 

      Summary: 

      A gustatory receptor and neuron enhances an olfactory behavioral response, proboscis extension. This manuscript clearly establishes a novel mechanism by which a gustatory receptor and neuron evokes an olfactory-driven behavioral response. The study expands recent observations by Dweck and Carlson (2023) that suggest new and remarkable properties among GRNs in Drosophila. Here, the authors articulate a clear instance of a novel neural and behavioral mechanism for gustatory receptors in an olfactory response.

      Strengths: 

      The systematic and logical use of genetic manipulation, imaging and physiology, and behavioral analysis makes a clear case that gustatory neurons are bona fide olfactory neurons with respect to proboscis extension behavior.

      Weaknesses: 

      No weaknesses were identified by this reviewer.  

      We appreciate the reviewer’s recognition of the novelty and significance of our work.

      Reviewer #3 (Public review): 

      Summary: 

      Using flies, Kazama et al. combined behavioral analysis, electrophysiological recordings, and calcium imaging experiments to elucidate how odors activate gustatory receptor neurons (GRNs) and elicit a proboscis extension response, which is interpreted as a feeding response. 

      The authors used DeepLabCut v2.0 to estimate the extension of the proboscis, which represents an unbiased and more precise method for describing this behavior compared to manual scoring.

      They demonstrated that the probability of eliciting a proboscis extension increases with higher odor concentrations. The most robust response occurs at a 0.5 v/v concentration, which, despite being diluted in the air stream, remains a relatively high concentration. Although the probability of response is not particularly high it is higher than control stimuli. Notably, flies respond with a proboscis extension to both odors that are considered positive and those regarded as negative.

      The authors used various transgenic lines to show that the response is mediated by GRNs.

      Specifically, inhibiting Gr5a reduces the response, while inhibiting Gr66a increases it in fed flies. Additionally, they find that odors induce a strong positive response in both types of GRNs, which is abolished when the labella of the proboscis are covered. This response was also confirmed through electrophysiological tip recordings.

      Finally, the authors demonstrated that the response increases when two stimuli of different modalities, such as sucrose and odors, are presented together, suggesting clear multimodal integration.

      Strengths: 

      The integration of various techniques, that collectively support the robustness of the results.

      The assessment of electrophysiological recordings in intact animals, preserving natural physiological conditions.

      We appreciate the reviewer’s recognition of the novelty and significance of our work.

      Weaknesses: 

      The behavioral response is observed in only a small proportion of animals.  

      We acknowledge that the probability of odor-evoked PER is lower compared to sucrose-evoked PER, which is close to 100 % depending on the concentration. To further quantify which proportion of animals exhibit odor-evoked PER, we now report this number besides the probability of PER for each odor shown in Fig. 1E. We found that, in wild type Dickinson flies, 73% and 68 % of flies exhibited PER to at least one odor presented at the concentration of 0.5 and 0.1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Minor comments/suggestions: 

      - Define "MO" in Figure 1D.  

      We have defined it as mineral oil in the figure legend.

      - Clarify how peak response was calculated for GCaMP traces (is it just the single highest frame per trial?).

      We extended the description in the Methods as follows: “The peak stimulus response was quantified by averaging ΔF/F across five frames at the peak, followed by averaging across three trials for each stimulus. Odor stimulation began at frame 11, and the frames used for peak quantification were 12 to 16.” We made sure that information about the image acquisition frame rate was provided earlier in the text.

      - Clarify how the labellum was covered in Figure 3 and show that this does not affect the fly's ability to do PER (e.g., test PER to sugar stimulation on tarsus) - otherwise one might think that gluing the labella could affect PER.

      In Figure 3, only calcium responses were recorded, and PER was not recorded simultaneously from the same flies. To ensure stable recording from GRN axons in the SEZ, we kept the fly’s proboscis in an extended position as gently as possible using a strip of parafilm. In some of the imaging experiments, we covered the labellum with UV curable glue, whose purpose was not to fix the labellum in an extended position but to prevent the odors from interacting with GRNs on the labellum. We have added a text in the Methods to explain how we covered the labellum.

      - Clarify how the coefficients for the linear equation were chosen in Figure 3G.  

      We used linear regression (implemented in Python using scikit-learn) to model the relationship between neural activity and behavior, aiming to predict the PER duration based on the calcium responses of two GRN types, Gr5a and Gr66a. The coefficients were estimated using the LinearRegression function. We added this description to the Methods. 

      - Typo in "L-type", Figure 4A.  

      We appreciate the reviewer for pointing out this error and have corrected it.

      - Clarify over what time period ephys recordings were averaged to obtain average responses.

      We have modified the description in the Methods as follows: “The average firing rate was quantified by using the spikes generated between 200 and 700 ms after the stimulus contact following the convention to avoid the contamination of motion artifact (Dahanukar and Benton, 2023; Delventhal et al., 2014; Hiroi et al., 2002).

      - The data and statistics indicate that MCH does not enhance feeding in Figure 6G, so the text in lines 207-208 is not accurate.

      We have modified the text as follows: “A similar result was observed with ethyl butyrate, and a slight, although not significant, increase was also observed with 4-methylcyclohexanol (Figure 6G).”

      - P-value for Figure S9 correlation is not reported.  

      We appreciate the reviewer for pointing this out. The p-value is 0.00044, and we have added it to the figure legend (current Figure 5-figure supplement 1).

      Reviewer #2 (Recommendations for the authors): 

      Honestly, I have no recommendations for improvement. The manuscript is extremely well-written and logical. The experiments are persuasive. A lapidary piece of work.

      We appreciate the reviewer for the positive assessment of our work.

      Reviewer #3 (Recommendations for the authors): 

      - I suggest explaining the rationale for selecting a 4-second interval, beginning 1 second after the onset of stimulation.

      Integrated PER duration was defined as the sum of PER duration over 4 s starting 1 s after the odor onset. This definition was set based on the following data.

      (1) We used a photoionization detector (PID) to measure the actual time that the odor reaches the position of a tethered fly, which was approximately 1.1 seconds after the odor valve was opened. Therefore, we began analyzing PER responses 1 second after the odor onset (valve opening) to align with the actual timing of stimulation.

      (2) As shown in Fig.1D and 1F, the majority of PER occurred within 4 s after the odor arrival.

      We have now added the above rationale in the Methods.

      - I could not find the statistical analysis for Figures 1E and 1G. If these figures are descriptive, I suggest the authors revise the sentences: 'Unexpectedly, we found that the odors alone evoked repetitive PER without an application of a tastant (Figures 1D-1G, and Movie S1). Different odors evoked PER with different probability (Figure 1E), latency (Figure S1A), and duration (Figures 1F, 1G, and S2)'.

      We have added the results of statistical analysis to the figure legend.

      - In Figure 2, the authors performed a Scheirer-Ray-Hare test, which, to my knowledge, is a nonparametric test for comparing responses across more than two groups with two factors. If this is the case, please provide the p-values for both factors and their interaction

      We now show the p-values for both factors, odor and group as well as their interaction in the supplementary table. 

      - In line 83, I suggest the authors avoid claiming that 'these data show the olfactory system modulates but is not required for odor-evoked PER,' as they are inhibiting most, but not all olfactory receptor neurons. In this regard, is it possible to measure the olfactory response to odors in these flies?  

      We thank the reviewer for the comment. Because Orco-Gal4 does not label all the ORNs and because we have more concrete results on flies in which all the olfactory organs are removed as well as specific GRNs and Gr are manipulated, we decided to remove the data for Orco>kir2.1 flies and have updated the text and Figure 2 accordingly.

      - In Figure 2, I wonder if there are differences in the contribution of various receptors in detecting different odors. A more detailed statistical analysis might help address this question.

      Although it might be possible to infer the contribution of different gustatory receptors by constructing a quantitative model to predict PER, it is a bit tricky because the activity of individual GRNs and not Grs are manipulated in Figure 2 except for Gr5a. The idea could be tested in the future by more systematically manipulating many Grs that are encoded in the fly genome.

      - For Figures 2J-L, please clarify which group serves as the control.  

      We have added this information to the legend. 

      - In Figure 3, I recommend including an air control in panels D and F to better appreciate the magnitude of the response under these conditions.

      The responses to all three controls, air, mineral oil and water, were almost zero. As the other reviewer suggested to present trial-to-trial variability as well, we now show responses to all the controls in all the trials in all the animals tested in Figure 3-figure supplement 2.

      - I had difficulty understanding Figure 3G. Could the authors provide a more detailed explanation of the model?

      We used linear regression (implemented in Python using scikit-learn) to model the relationship between neural activity and behavior, aiming to predict the PER duration based on the calcium responses of two GRN types, Gr5a and Gr66a. The weights for GRNs were estimated using the LinearRegression function. The weight for Gr5a and Gr66a was positive and negative, respectively, indicating that Gr5a contributes to enhance whereas Gr66a contributes to reduce PER.

      To evaluate the model performance, we calculated the coefficient of determination (R<sup>2</sup>), which was 0.81, meaning the model explained 81% of the variance in the PER data.

      The scatter plot in Fig. 3G shows a tight relationship between the predicted PER duration (y-axis) plotted against the actual PER duration (x-axis), demonstrating a strong predictive power of the model.

      We added the details to the Methods.

      - In Figure S4a, the reported p-value is 0.88, which seems to be a typo, as the text indicates that PER is enhanced in a starved state.

      Thank you for pointing this out. We have modified the figure legend to describe that PER was enhanced in a starved state only for the experiments conducted with odors at 10<sup>-1</sup> concentration (current Figure 2-figure supplement 1).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the editors and reviewers for their tremendously helpful comments. We outline below changes we have made to the manuscript in response to each point. These include new analyses and a substantial rewrite to address the concerns about lack of clarity.

      We believe the revisions strengthen the evidence for our conclusion that grid fields can be either anchored to or independent from a task reference frame, and that anchoring is selectively associated with successful path integration-dependent behaviour. Our additional analyses of non-grid cells indicate that while some are coherent with the grid population, many are not, suggesting cell populations within the MEC may implement grid-dependent and grid-independent computations in parallel.

      We hope the reviewers will agree that our novel experimental strategy complements and avoids limitations of perturbation-based approaches, and by providing evidence to dissociate the two major hypotheses for whether and when grid cells contribute to behaviour our results are likely to have a substantial impact on the field.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Clark et. al. uncovered an association between the positional encoding of grid cell activity with good performance in spatial navigation tasks that requires path integration, highlighting the contribution of grid firing to behaviour… The conclusions of this paper are mostly well supported by data, the finding about the association between grid cell encoding and behaviour in spatial memory tasks is important. However, some aspects of the analysis need to be clarified or extended.

      Thankyou for the overview and constructive comments.

      (1) While the current dataset aims to demonstrate a "correlation" between grid cell encoding and task performance, the other variables that could confound this correlation should be carefully examined.

      (1.1) The exact breakdown of the fraction of beaconed/non-beaconed/probe trials is never shown. if the session makeup has a significant effect on the coding scheme or other results, this variable should be accounted for.

      The lack of information about the trial organisation was a substantial oversight in our preparation of the first version of the manuscript. Session make up can not account for effects on grid stability and its relationship to behavioural outcome but this was not made at all clear.

      In all sessions trial types were varied in a fixed repeating sequence. Therefore, continuous blocks of trials on which grid firing is anchored (or independent from) the track can not be explained by the mouse experiencing a particular trial type. We have revised the manuscript to make this clearer, e.g. p 5, ‘These switches could not be explained by variation between trials in the availability of cues or rewards, as these were interleaved in blocks that repeated throughout a session (see Methods), whereas periods in which grid cell activity was in a given mode extended across the repeating blocks (e.g. Figures 3D,E, 4A, 5E,F).’ and methods p 12, ‘Trials were delivered in repeating blocks throughout a recording session…’

      (1.2) The manuscript did not provide information about whether individual mice experienced sessions with different combinations of the three trial types, and whether they show different preferences in position or distance encoding even in comparable sessions. This leads to the question of whether different behaviour and activity encoding were dominated by experimental or natural differences between individual mice. Presenting the data per mouse will be helpful.

      As we note above, because trial types were interleaved in a fixed sequence, experience of a particular trial type can not account for switching between task-anchored and taskindependent firing modes. This was insufficiently clear in the first version of the manuscript.

      We varied the proportions of trials of a particular type between sessions with the aim of maximising the number of non-beaconed and probe trials. This was necessary because we find that if we introduce too high a proportion of these trials early in training then mice appear to ‘lose interest’ in the task and their performance drops off. We therefore used an approach in which we increased the proportions of non-beaconed and probe trials over training days as mice became familiar with the task. This is now described in the methods (p 12).

      Because the decision for when to vary the proportion of trial types was based on the previous day’s performance, the experimental design was not optimised for addressing the reviewer’s question about dissociating experimental from natural differences in mice. To provide some initial insight we have analysed the relationship between task anchored coding and proportion of beaconed trials in a session (Figure 3, Figure Supplement 7). While on average there is a higher proportion of trials in which grid fields are task-anchored in sessions with more beaconed trials, this effect is small and most of the variance is independent from the proportion of beaconed trials.

      (1.3) Related to the above point, in Figure 5, the mice appeared to behave worse in probe trials than non-beaconed trials. If the mouse did not know if a trial is a probe or a non-beacon trial, they should behave equivalently until the reward location and thus should stop an equal amount. If this difference is because multiple probe trials are placed consecutively, did the mouse learn that it will not get a reward and then stop trying to get rewards? Did this affect switching between position and distance coding?

      Thankyou for flagging this. This reflected an inconsistency arising from the way we detected stops that we have now corrected. Briefly, the temporal resolution of the processed location data against which the stop detection threshold was applied was insufficiently high. As a result, stops in the non-beaconed group were picked up, as they tended to be longer because mice remained still to consume rewards, whereas some stops in the probe group were missed because they were relatively short. We have corrected this by repeating the analyses on raw position data at the highest temporal resolution available. This analysis is now clearly described in the Methods (see p13 “A stop was registered in Blender3D if the speed of the mouse dropped below 4.7 cm/s. Speed was calculated on a rolling basis from the previous 100 ms at a rate of 60 Hz.”).

      (1.4) It is not shown how the behaviours (e.g., running speed away from the reward zone, licking for reward) in beaconed/non-beaconed/probe trials were different and whether the difference in behaviours led to the different encoding schemes.

      Because trial types were interleaved and repeated with a period less than the length of typical trial sequences during which grid cell activity remained either task-anchored or taskindependent, differences between trial types are unlikely to explain use of the different coding schemes. Hopefully, this is clarified by the comments above.

      To further describe the relationship between behavioural outcomes, trial types and grid anchoring, we now also show running speed as a function of location for each combination of trial types and trial outcomes (Figure 6, Figure Supplement 1). This illustrates and replicates our previous findings (Tennant et al. 2018) that running speed profiles are similar for a given trial outcome regardless of trial type (Figure 6, Figure Supplement 1A), and further further shows that the behavioural profile for a given trial outcome and trial-type does not differ when grid cells are in task-anchored and task-independent modes (Figure 6, Figure Supplement 1B). This further argues against the possibility that difference in behaviours leads to the different encoding schemes.

      (2) Regarding the behaviour and activity encoding on a trial-by-trial basis, did the behavioural change occur first, or did the encoding switch occur first, or did they happen within the same trial? This analysis will potentially determine whether the encoding is causal for the behaviour, or the other way around.

      This is a good question but our experimental design lacks sufficient statistical power to address the timing of mode switches within a trial. This is because mode switching is relatively infrequent (so the n for switching is low) and only a subset of trials are uncued (making the relevant n even lower), while at a trial level the behavioural outcome is variable (increasing the required n for adequate power).

      (3) The author determined that the grid cell coding schemes were limited to distance encoding and position encoding. However, there could be other schemes, such as switching between different position encodings (with clear spatial fields but at different locations), as indicated by Low et. al., 2021, and switching between different distant encodings (with different distance periods). If these other schemes indeed existed in the data, they might contribute to the variation of the behaviours.

      Switching between position encoding schemes appears to be rare within our dataset and unlikely to contribute to variation in behaviour. In most sessions we did not observe switching between grid phases / position encodings (e.g. Figures 2A-B, 3B-E, 4A, 5C-D, F). In one session we found switching between different phases when grid cells were taskanchored. Because the grid period was unchanged, the spatial periodograms remained similar. We report this example in the revised manuscript (Figure 5E).

      (4) The percentage of neurons categorised in each coding scheme was similar between nongrid and grid cells. This implies that non-grid cells might switch coding schemes in sync with grid cells, which would mean the whole MEC network was switching between distance and position coding. This raises the question of whether the grid cell coding scheme was important per se, or just the MEC network coding scheme.

      We very much appreciate this suggestion. We note first that while the proportion of taskanchored grid and non-grid cells is similar, task-independent periodic firing of non-grid cells is much rarer than for grid cells (Figure 2E), suggesting a dissociation between the populations. To further address the question we have included additional analyses of nongrid cells (Figure 3, Figure Supplement 5). This shows that while some non-grid cells have anchoring that switches coherently with simultaneously recorded grid cells, others do not. Figures 4 and 5 now show examples of non-grid cell activity recorded simultaneously with grid cells.

      Together, our data suggest that the MEC implements multiple coding schemes: one that is associated with the grid network and includes some non-grid cells; and one (or more) that can be independent from the grid network. This dissociation adds to the insights into MEC function that are provided by our study and is now highlighted in the abstract and discussion.

      (5) In Figure 2 there are several cell examples that are categorised as distance or position coding but have a high fraction of the other coding scheme on a per-trial basis. Given this variation, the full session data in F should be interpreted carefully, since this included all cells and not just "stable" coding cells. It will be cleaner to show the activity comparison only between the stable cells.

      We have now included examples in Figure 2A-C where the grid mode is stable throughout a session. As the view of activity at a session level is important, we have not updated Figure 2F, but have clarified the terminology to now clearly refer to classification at either season or trial levels. In addition, we have repeated the analyses shown in Figure 2F but after grouping cells according to whether their firing has a single mode on >85% of the trials (Figure 3 Figure Supplement 4). This analysis supports similar conclusions to those of Figure 2F.

      (6) The manuscript is not well written. Throughout the manuscript, there are many unexplained concepts (especially in the introduction) and methods, mis-referenced figures, and unclear labels.

      We very much appreciate the feedback and have substantially rewritten the manuscript. We have paid particular attention to explaining key concepts in the introduction and have carefully checked the figures. We welcome further feedback on whether this is now clearer.

      Reviewer #2 (Public Review):

      Clark and Nolan's study aims to test whether the stability of grid cell firing fields is associated with better spatial behaviour performance on a virtual task… This study is very timely as there is a pressing need to identify/delimitate the contribution of grid cells to spatial behaviours. More studies in which grid cell activity can be associated with navigational abilities are needed.

      Thank you for the supportive comments and highlighting the importance of the question.

      The link proposed by Clark and Nolan between "virtual position" coding by grid cells and navigational performance is a significant step toward better understanding how grid cell activity might support behaviour. It should be noted that the study by Clark and Nolan is correlative. Therefore, the effect of selective manipulations of grid cell activity on the virtual task will be needed to evaluate whether the activity of grid cells is causally linked to the behavioural performance on this task. In a previous study by the same research group, it was shown that inactivating the synaptic output of stellate cells of the medial entorhinal cortex affected mice's performance of the same virtual task (Tennant et al., 2018). Although this manipulation likely affects non-grid cells, it is still one of the most selective manipulations of grid cells that are currently available.

      Again, thank you for the supportive comments. We recognise the previous version of the manuscript did not sufficiently clarify the motivation for our approach, or the benefits of capitalising on behavioural variable variability as a complementary strategy to perturbation approaches. We now make this clearer in the revised introduction (p 2, paragraphs 2 and 3).

      When interpreting the "position" and "distance" firing mode of grid cells, it is important to appreciate that the "position" code likely involves estimating distance. The visual cues on the virtual track appear to provide mainly optic flow to the animal. Thus, the animal has to estimate its position on the virtual track by estimating the distance run from the beginning of the track (or any other point in the virtual world).

      We appreciate the ambiguity here was confusing. We have re-named the groups to ‘taskanchored’, corresponding to when grid cells encode position on the track (as well as distance as the reviewer correctly points out), and ‘task-independent’, corresponding to the group we previously referred to as distance encoding.

      It is also interesting to consider how grid cells could remain anchored to virtual cues. Recent work shows that grid cell activity spans the surface of a torus (Gardner et al., 2022). A run on the track can be mapped to a trajectory on the torus. Assuming that grid cell activity is updated primarily from self-motion cues on the track and that the grid cell period is unlikely to be an integer of the virtual track length, having stable firing fields on the virtual track likely requires a resetting mechanism taking place on each trial. The resetting means that a specific virtual track position is mapped to a constant position on the torus. Thus, the "virtual position" mode of grid cells may involve 1) a trial-by-trial resetting process anchoring the grid pattern to the virtual cues and 2) a path integration mechanism. Just like the "virtual position" mode of grid cell activity, successful behavioural performance on non-beaconed trials requires the animal to anchor its spatial behaviour to VR cues.

      Reviewer #3 (Public Review):

      This study addresses the major question of 'whether and when grid cells contribute to behaviour'. There is no doubt that this is a very important question. My major concern is that I'm not convinced that this study gives a significant contribution to this question, although this study is well-performed and potentially interesting. This is mainly due to the fact that the relation between grid cell properties and behaviour is exclusively correlative and entirely based on single cell activity, although the introduction mentions quite often the grid cell network properties and dynamics. In general, this study gives the impression that grid cells exclusively support the cognitive processes involved in this task. This problem is in part related to the text.

      Thank you for the comments. We recognise now that the previous text was insufficiently clear. We have modified the introduction to clarify the value of an approach that takes advantage of behavioural variability. Importantly, this approach is complementary to perturbation strategies we and others have used previously. In particular it addresses critical limitations of perturbation strategies which can be confounded by off-target effects and possible adaptation, both of which are extremely difficult to fully rule out. We hope that with this additional clarification it is now clear that as for any important question multiple and complementary testing strategies are required to make progres, and second, that our study makes a new and important contribution by introducing a novel experimental approach and by following this up with careful analyses that clearly distinguish competing hypotheses.

      However, it would be interesting to look at the population level (even beyond grid cells) to test whether at the network level, the link between behavioural performance and neural activity is more straightforward compared to the single-cell level. This approach could reconcile the present results with those obtained in their previous study following MEC inactivation.

      We’re unclear here about what the reviewer means by ‘more straightforward’ as clear relationships between activity of single grid cells and populations of grid cells are well established (Gardner et al., 2021; Waaga et al., 2021; Yoon et al., 2013).

      To give a clearer indication of the corresponding population level representations, as mentioned in response to Reviewer #1, we now include additional data showing many simultaneously recorded neurons, and analyses of non-grid as well as grid cells (Figures 4, 5, Figure 5 Figure Supplement 2).

      To reconcile results with our previous study of MEC inactivation we have paid additional attention to the roles of non-grid cells (following suggestions by Reviewer #1). We show that while some non-grid cells show transitions between task-anchored and task-independent firing that are coherent with the grid population, many others have more stable firing that is independent of grid representations. This is consistent with the idea that the MEC supports localised behaviour in the cued and uncued versions of the task (Tennant et al., 2018), and suggests that while grid cells preferentially contribute when cues are absent, non-grid cells could also support the cued version. We make this additional implication clear in the revised abstract and discussion.

      The authors used a statistical method based on the computation of the frequency spectrum of the spatial periodicity of the neural firing to classify grid cells as 'position-coding' (with fields anchored to the virtual track) and 'distance-coding' (with fields repeating at regular intervals across trials). This is an interesting approach that has nonetheless the default to be based exclusively on autocorrelograms. It would be interesting to compare with a different method based on the similarities between raw maps.

      While our main analyses use a periodogram-based method to identify when grid cells are / are not anchored to the task environment, we validate these analyses by examination of the rate maps in each condition (Figures 2-4). For example, when grid cells are task-anchored, according to the periodogram analysis, the rate maps clearly show spatially aligned peaks, whereas when grid cells are not anchored the peaks in their rate maps are not aligned (Figure 2A vs 2B; Figure 3B-E; Figure 4C). We provide further validation by showing that spatial information (in the track reference frame) is substantially higher when grid cell activity is task-anchored vs task-independent (Figures 2F, 3G, 4F and Figure 3 Figure Supplement 4).

      To further address this point we have carried out additional complementary analyses in which we identify task anchored vs task independent modes using a template matching method applied to the raw rate maps (Figure 6, Figure Supplement 2). These analyses support similar conclusions to our periodogram-based analyses.

      Beyond this minor point, cell categorization is performed using all trial types.

      Each trial type (i.e. beacon or non-beacon) is supposed to force mice to use different strategies and should induce different spatial representations within the entorhinal-hippocampal circuit (and not only in the grid cell system). In that context, since all trials are mixed, it is difficult to extrapolate general information.

      We recognise that the description of the task design was insufficiently clear but are unsure why ‘it is difficult to extrapolate general information’. Before addressing this point, we should first be clear that mice are not ‘forced’ to adopt any particular strategy. Rather, on uncued trials a path integration strategy is the most efficient way to solve the task. However, mice could instead use a less efficient strategy, for example by stopping at short intervals they still obtain rewards. Detailed behavioural analyses indicate that such random stopping strategies are used by naive mice, while with training mice learn to use spatial stopping strategies (Tennant et al. 2018).

      In terms of ‘extracting general information’ from the task, the following findings lead to general predictions: 1) Grid cells can exist in either task-anchored or task-independent periodic firing modes; 2) These modes can be stable across a session, but often modeswitching occurs within a session; 3) While some non-grid cells show task-independent periodic firing, this is much less common than for grid cells, which suggests a model in which many non-grid MEC neurons operate independently from the grid network; 4) When a marker cue is available mice locate a reward equally well when grid cells are in taskanchored versus task-independent modes, which argues against theories in which grid cells are a key part of a general system for localisation; 5) When markers cues are absent taskanchored grid firing is associated with successful reward localisation, which corroborates a key prediction of theories in which grid cells contribute to path integration.

      In revising the manuscript we have attempted to improve the writing to make these advances clearer, and have clarified methodological details that made interpretation more challenging than it should have been. For example, as noted in our response to Reviewer #1, we have included additional details to clarify the organisation of trials and relationships between trials, behavioural outcomes and neural codes observed.

      On page 5 the authors state that 'Since only position representations should reliably predict the reward location, ..., we reasoned that the presence of positional coding could be used to assess whether grid firing contributes to the ongoing behaviour'. I do not agree with this statement. First of all, position coding should be more informative only in a cue-guided trial. Second, distance coding could be as informative as position coding since at the network level may provide information relevant to the task (such as distance from the reward).

      Again, this point perhaps reflects a lack of clarity on our part in writing the manuscript. When grid cells are anchored to the track reference frame (now called ‘tasked anchored’, previously ‘position encoding’), then the location of the rate peaks in grid firing is reliable from trial to trial. This is the case whether or not the trial is cued. When grid cells are independent of the track reference frame (now called ‘task independent’, previously ‘distance encoding’), then the location of the firing rate peaks vary from trial to trial. In the latter case, position can not be read out directly from trial to trial.

      In principle, in the task-independent mode track position could be calculated by storing the grid network configuration at the start of the track, which would differ on each trial, and then implementing a mechanism to readout relative distance as mice move along the track. However, if mice do use this computation we would expect them to do so equally well on cued and uncued trials. By contrast, our results clearly show a dissociation between trial types in the relationship between grid firing and behavioural outcome. We highlight and discuss this possibility in the revised manuscript (p 10, ‘Alternatively, mice could in principle estimate track location with a system that utilises information about distance travelled obtained from task-independent grid representations’).

      Third, position-coding is interpreted as more relevant because it predominates in correct trials. However, this does not imply that this coding scheme is indeed used to perform correct trials.

      We have revised the manuscript to clarify our goal of distinguishing major hypotheses for the roles of grid cells in behaviour (Introduction, ‘On the one hand, theoretical arguments that grid cell populations can generate high capacity codes imply that they could in principle contribute to all spatial behaviours (Fiete et al., 2008; Mathis et al., 2012; Sreenivasan and Fiete, 2011). On the other hand, if the behavioural importance of grid cells follows from their hypothesised ability to generate position representations by integrating self-motion signals (McNaughton et al., 2006), then their behavioural roles may be restricted to tasks that involve path integration strategies.’

      By showing that performance on cued trials is similar regardless of whether grid cells are task-anchored or not, we provide strong evidence against the idea that grid firing is in general necessary for location-based behaviours. By showing that task anchoring is associated with successful localisation when cues are absent we corroborate a key prediction of hypothesised roles for grid cells in path integration-dependent behaviour. Therefore, we substantially reduce the space of behaviours to which grid cells might contribute. Importantly, this space is much larger for the MEC, which is required for cued and uncued versions of the task. We have revised the introduction and discussion to make these points clearer.

      While we believe our results add a key piece of evidence to the puzzle of when and where grid cells contribute to behaviour, we agree that further work will be required to develop and test more refined hypotheses. Alternative models also remain plausible, for example perhaps the behaviourally relevant computations are implemented elsewhere in the brain with grid anchoring to the track as an indirect consequence. Nevertheless, explanations of this kind are more difficult to reconcile with evidence that inactivation of stellate cells in the MEC impairs learning of the task, and other manipulations that modify grid firing impair performance on similar tasks. We now discuss these possibilities (discussion p 10, ‘mice could in principle estimate track location with a system that utilises information about distance travelled obtained from task-independent grid representations’).

      It could be more informative to push forward the correlative analysis by looking at whether behavioural performance can be predicted by the coding scheme on a trial-by-trial basis.

      The previous version of the manuscript showed these analyses (now in Figure 6). Thus, task anchored grid firing predicts more successful performance on uncued trials at the session level (Figure 6A-B) and at the trial level (Figure 6C-D).

      Reviewer #1 (Recommendations For The Authors):

      (1) The author particularly mentioned that the 1D tracks are different from the "cue-rich environments that are typically used to study grid cells". It is not clear what conclusions would hold for a cue-rich environment or a track, which may require relatively less path integration compared to the cue-sparse environment. This point should be discussed.

      This is an important point that we did not pay sufficient attention to in the previous version of the manuscript. Our finding of successful localisation in the cued environment when grid cells are not task anchored implies that grid anchoring is not required to solve cued tasks. The implication here is that cue rich environments may then not be the most suitable for investigation of grid roles in behaviour as non-grid mechanisms may suffice, although this does not rule out the possibility that anchored grid codes may play important roles in learning about cue rich environments. We now address this point in the discussion (p 10, ‘An implication of this result is that cue rich tracks often used to investigate grid activity patterns may not engage behaviours that require anchored grid firing.’).

      (2) It would be good to see the statistics for the number of different cells (stable position or distance encoding, and unstable cells) identified per mouse/session and the number of grid cells per session.

      These are now added to Supplemental Data 2 and will also be accessible through code and datasets that we will make available alongside the version of record.

      (3) Figure 2F: any explanation about why AG cells had high spatial information?

      Previously the calculation used bits per spike and as aperiodic cells have low firing rates the spatial information was high. We have replaced this with bits per second, which provides a more intuitive measure and no longer implies high spatial information. We have amended this in the methods (p 15, ‘Spatial information was calculated in bits per second…’).

      (4) The following methods sections should provide additional details:

      (4.1) Details of the training protocol are largely left to reference papers. The reference papers give a general outline of the training protocol, but the details are not completely comparable given the single experiment performed on these mice. More details should be given on training stages and experience at the time of the experiment.

      The task is more clearly described in the introduction (p 3), and additional details of the training protocol are now provided in the methods (p 12-13).

      (4.2) The methods reference mean speed across sessions, but it is not clear where this was used.

      This was very poor wording. We have now changed this to ‘For each session the mean speed was calculated for each trial outcome’.

      (4.3) The calculation of the spatial autocorrelogram on a per-trial basis should be more explicitly stated. Is it the average of each 10 cm increment with the centre trial?

      We have added additional information to the methods (p 16-18).

      (4.4) 1D field detection is not sufficiently explained in Figure 1/S2. This information should also appear in the methods section.

      This is now clarified on page 16 in section ‘Analysis of neural activity and behaviour during the location memory task’.

      (5) The data in Figure 4A and B only shows speed vs. location for one example mouse. The combined per mouse or per session data should also be shown.

      This is now shown in Figure 5A and Figure 5, Figure Supplemental 2

      (6) Figure 5 is somewhat confusing. Why are A/B by session and C/D by trial? The methods imply that A/B are originally averaged by cell, but that duplicate cells in the same session are excluded because behaviour versus session type is identical. This method should be valid if all grid cells within a session are all "stable". This is likely given the synchrony of code-switching between grid cells, but not all co-active grid cells behaved identically.

      It is understandable that C/D are performed by trial, but it should be made clear that it is not a comparable analysis to A/B. It is unclear what N refers to in C. The figure says by trial, but the legend says the error bar is by cell. If data is calculated by trial and then averaged by cell, this should be more clearly stated.

      In Figure 6A/B (previously Figure 5A/B) we focus our analysis on sessions in which the mode of grid firing, either task-anchored or task-independent, was relatively stable on a trialto-trial basis (see Figure 3F for definitions). This enables us to then compare behaviour averaged across each session, with sessions categorised as task-anchored and task independent. This analysis has the advantage that it focuses on large blocks of time (whole sessions) in which the mode of grid firing is unambiguous, but the disadvantage is that it excludes many sessions in which grid firing switches between task-anchored and taskindependent modes.

      Figure 6C/D (previously Figure 5C/D) addresses this limitation by carrying out similar analyses with behaviour sorted into task-anchored versus task-independent groups at the level of trials. A potential limitation for this analysis is that grid firing is somewhat variable on a trial-by-trial basis and so some trials may be mis-classified. We don’t expect this to lead to systematic bias, but it may make the data more noisy. Nevertheless, these analyses are important to include as they allow assessment of whether conclusions from 6A/B hold when all sessions are considered.

      We have added additional clarification of the rationale for these analyses to the main text (p7-8, ‘’We addressed this by using additional trial-level comparisons’). We have also added clarification in the methods section for categorisation of task-anchored versus taskindependent trials when multiple grid cells were recorded simultaneously (p 17, ‘When assigning a common classification across a group of cells recorded simultaneously...’) and an explanation for the N in the figure legend. We also clarify that the analyses use a nested random effects design to account for dependencies at the levels of sessions and mice (methods, p 20, ‘Random effects had a nested structure to account for animals and sessions…’) .

      (7) Panels E and F of Figure 5 are not explained in the main text.

      This is now corrected (see p8, ‘Additional analyses…’).

      (8) Figure 5: Since stable grid cells and all grid cells are shown, it will be better to show unstable cells, which can be compared with grid cells.

      Given that the rationale for differences between Figure 6A/B and C/D (previously Figure 5AD) were not previously clear, the reason for focussing on stable grid cells here was likely also not clear (see point 6 above). We don’t show unstable grid cells in Figure 6A-B as the behaviour averaged at the level of a session would be a mix of trials when they are taskanchored and when they are task-independent. Therefore, the analysis would not test predictions about the relationship between task-anchored vs task-independent modes and behaviour. We hope this is now clear in the manuscript given the revisions introduced to address point 6 above.

      (9) The methods describing the statistics for these experiments are also confusing. The methods section should be written more clearly, and it should be made clear in the text or figure legend whether this data is the "original" data or is processed in relation to the model, such as excluding duplicate grid cells within a session. The figure legend should also state that a GLMM was used to calculate the statistics.

      We have revised the methods section with the goal of improving clarity, adding detail and removing ambiguity. This includes updates of the methods for the GLMM analysis, which are referred to within the Figure 6 legend. A clear definition of a stable session is now also added to the Figure 6 legend.

      Reviewer #2 (Recommendations For The Authors):

      When grid fields are anchored to the virtual world (position mode), there is probably small trialto-trial variability in the firing location of the firing fields. Is this trial-to-trial variability related to the variability in the stop location? This would provide a more direct link between path integration in grid cell networks and behaviour that depends on path integration.

      When attempting to address this we find that the firing of individual grid cells is too variable to allow sufficiently precise decoding of their fields at a single trial level. This is expected given the Poisson statistics of spike generation and previous evaluations of grid coding (e.g. (Stemmler et al., 2015)).

      The conclusion of the abstract is: "Our results suggest that positional anchoring of grid firing enhances the performance of tasks that require path integration." This statement is slightly confusing. The task requires 1) anchoring the behaviour to the visual cues presented at the start of the trial and 2) path integration from thereon to identify the rewarded location. The performance is higher when grid cells anchor to the visual cues presented at the start of the trial. What the results show is that the anchoring of grid firing fields to visual landmarks enhances the performance of tasks that require path integration from visual landmarks (i.e. grid cells being anchored to the reference frame that is behaviorally relevant).

      To try to more clearly explain the logic and conclusion we have rewritten the abstract, including the final sentence.

      Similar comment for the title of Figure 5: "Positional grid coding is not required for cued spatial localisation but promotes path integration-dependent localisation." Positional coding means that grid cells are anchored to the behaviorally relevant reference frame.

      To address the lack of clarity we have modified the little of Figure 6 (previously Figure 5) to read ‘Anchoring of grid firing to the task reference frame promotes localisation by path integration but is not required for cued localisation’.

      In Figure 1, there is a wide range of beaconed (40-80%) and non-beaconed (10-60%) trials given. It is not 100% clear whether these refer to the percentage of trials of a given type within the recording sessions. Was the proportion of non-beaconed trials manipulated? If so, was the likelihood of position and distance coding changing according to the percentage of nonbeaconed trials?

      The ranges given refer to proportions across different behavioural sessions. Within any given behavioural session the proportion was constant. We now make this clear in the figure legend and in the results and methods sections.

      We did not manipulate proportions of trial types during a session. Manipulations betweens sessions were carried out with the goal of maximising the numbers of uncued trials that the mice would carry out (see response to public comments above). While the effect of trial-type at the session level is not relevant to the hypotheses we aim to test here, we have included an additional analysis of the relationship between task anchoring and the proportions of trial types in a session (Figure 3, Figure Supplement 7)(also discussed above). As disentangling the effects of learning and motivation will be complex and likely require new experimental designs we have not drawn strong conclusions or pursued the analysis further..

      I was not convinced that the labels "position" and "distance" were appropriate for the two grid cell firing modes. My understanding is that the "position" code also requires the grid cell network to estimate distance. It seems that the main difference between the "position" and "distance" modes is that when in the "position" mode, the activity on the torus is reset to a constant toroidal location when the animal reaches a clearly identifiable location on the virtual track. In the "distance" mode, this resetting does not take place.

      As previously mentioned, we agree these terms weren’t the best and have since relabelled these as “task-anchored” and “task-independent”.

      There are a few sections in the manuscript that implicitly suggest that a causal link between grid cell activity and behaviour was demonstrated. For instance: "It has been challenging to directly test whether and when grid cells contribute to behaviour.": The assumption here is that the manuscript overcomes this challenge, but the study is correlative.

      We have modified the wording to be clear that we are introducing new tests of predictions made by hypotheses about causal relationships between grid coding and behaviour (introduction, p 1-2). We also clarify that our results argue against the hypothesis that grid cells provide a general coded for behaviour, but corroborate predictions of hypotheses in which they are specifically important for path integration (discussion, p 10).

      We have modified the title abstract and main text to try to treat claims about causality with care. We now more thoroughly introduce and contrast the approach we report here with previous experiments that use perturbations (introduction, p2). While it is tempting to make stronger claims for causality with these approaches, there are also logical limitations with perturbation-based approaches, for example the challenges of fully excluding off target effects and adaptation. We now explain how these strategies are complementary. Our view is that both strategies will be required to develop strong arguments for whether and when grid cells contribute to behaviour. From this perspective, it is encouraging that our conclusions are in agreement with what are probably the most specific perturbations of grid cells reported to date (Gil et al. 2017), while perturbations that more generally affect MEC function appear to impair cued and path integration-dependent behaviours (Tennant et al. 2018). We now discuss these points more clearly (introduction, p 2).

      I am slightly confused by the references to the panels in Figure 4.

      "In some sessions, localization of the reward occurred almost exclusively when grid cells were anchored to position and not when they encoded distance (Figure 4C). Figure 4C only shows position coding.

      "In other sessions, animals localised the reward when grid firing was anchored to position or distance, but overall performance was improved on positional trials (Figure 4D-E)." The reference should probably point to Figure 4E-F or just to 4E.

      "In a few sessions, we observed spatial stopping behaviour comparable to cued trials, even when grid firing almost exclusively encoded distance rather than position (Figure 4F)." From Figure 4F, it seems that the performance on non-beaconed trials is better during "position" coding.

      We have now updated Figure 5 (Figure 4 in the original manuscript) and references to the Figure in the text. Now Figure 5 shows the activity of cells recorded in stable and unstable task-anchored and task-independent sessions (see Figure 5C-F).

      Minor issues:

      Is this correct: (Figure 4A and Figure 4, Figure Supplement 1).

      This has been corrected.

      Figure 4B: There could be an additional label for position and distance.

      Figure 4B from the original manuscript has now been removed.

      Figure 4C-F. The panels on the right side should be explained in the Figure Legend.

      Legends for Figure 5C-F (previously Figure 4C-F) have now been updated.

      Reviewer #3 (Recommendations For The Authors):

      Specific questions :

      (1) Position coding reflects a coding scheme in which fields are spaced by a fixed distance; previous studies have shown that a virtual track grid map is a slice of the 2D classic grid. In that case, the fields are still anchored to the track but would produce a completely different map. Did the authors check whether it is the case at least for some cells? If not, what could explain such a major difference?

      Το avoid confusion we now use the term ‘task-anchored’ rather than ‘position coding’ (see comments above). We should further clarify that our conclusions rest on whether or not the grid fields are anchored to the track. Task anchored firing does not require that grid fields maintain their spacing from 2D environments, only that fields are at the same track position on each trial. Thus, whether the spacing of the fields corresponds to a slice through a 2D grid makes no difference to the hypotheses we test here.

      We agree that the relationship between 1D and 2D field organisation could be an interesting future direction, for example anchoring could involve resetting the grid phase while maintaining a stable period, or it could be achieved through local distortions in the grid period. However, since these outcomes would not help distinguish the hypotheses we test here we have not included analyses to address them.

      (2) Previous studies have highlighted the role of grid cells in goal coding. Here there is an explicit reward in a particular area. Are there any grid modifications around this area? This question is not addressed in this study.

      Again, we note that the hypotheses we test here relate to the firing mode of grid cells - taskanchored or task-independent - and interpretation of our results is independent from the specific pattern of grid fields on the track. This question nevertheless leads to an interesting prediction that if grid fields cluster in the goal area then this clustering should be apparent in the task-anchored but not the task-independent firing mode.

      We test this by considering the average distribution of firing fields across all grid cells in each firing mode (Reviewer Figure 1). We find that when grid firing is task-anchored there is a clear peak around the reward zone, which is consistent with previous work by Butler et al. and Boccara et al. Consistent with our other prediction, this peak is reduced when grid cells are in the task-independent mode.

      Author response image 1.

      Plot shows the grid field distribution during stable grid cell session (> 85 % task-anchored or task-independent) (A) or during task-anchored and task-independent trials (B). Shaded regions in A and B represent standard error of the mean measured across sessions and epochs respectively.

      (3) The behavioural procedure during recording is not fully explained. Do trial types alternate within the same session by blocks? How many trials are within a block? Is there any relation between trial alternation and the switch in the coding scheme observed in a large subset of the grid cells?

      We agree this wasn’t sufficiently clear in the previous version of the manuscript. Trial types were interleaved in a fixed order within each session. We have updated the results and methods sections to provide details (see responses above).

      (4) From the examples in Figure 2 it seems that firing fields tend to shift toward the start position. Is it the case in all cells? Could this reflect some reorganisation at the network level with cells signalling the starting as time progresses?

      This is inconsistent between cells. To make this variability clear we have included additional examples of spiking profiles from different grid cells (Figure 2 - 5). Because quantification of the phenomena would not, so far as we can tell, help distinguish our core hypotheses we have not included further analyses here.

      (5) Are grid cells with different coding properties recorded in different parts of the MEC? Are there any differences between these cell categories in the 2D map?

      The recordings we made are from the dorsal region of the MEC (stated at the start of the results section). We don’t have data to speak to other parts of the MEC.

      Minor:

      There are very few grid cell examples that repeat in the different figures. I would suggest showing more examples both in the main text and supplementary material.

      We have now provided multiple additional examples in Figures 2, 4 and 5. Grid cell examples repeat in the main figures twice, in both cases only when showing additional examples are shown from the same recording session (Figure 2A example #1 with Figure 5C, Figure 3E with Figure 4A). Further similar repeats are found in the supplemental figures (Figure 3D with Figure 5, Figure Supplement 2A, Figure 3C with Figure 5, Figure Supplement 2F).

      Fig1 A-B shows the predictions in a 1D track based on distance or position coding. The A inset represents the modification of field distribution from a 2D arena to a 1D track, as performed in this study. The inset B is misleading since it represents the modifications expected from a circular track to a 1D track as in Jacob et al 2019, that is not what the authors studied. It would be better to present either the predictions based on the present study or the prediction based on previous studies. In that case, they should mention the possibility that the 1D map is a slice of the 2D map.

      The goal of Figure 1A-B is to illustrate predictions (right) based on conclusions from previous studies (left). Figure 1A shows predicted 1D track firing given anchoring to the environment typically observed in grid cell studies in 2D arenas. Figure 1B shows predicted 1D track firing given the firing shifting firing patterns observed by Jacob et al. in a circular 2D track. To improve clarity, we have modified the legend to make clear that the schematics to the right are predictions given the previous evidence summarised to the left. As we outline above, the critical prediction relates to whether the representations anchor to the track. Whether the 1D representation is a perfect slice isn’t relevant to the hypotheses tested and so isn’t included in the schematic (see comments above).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors of this study seek to visualize NS1 purified from dengue virus infected cells. They infect vero cells with DV2-WT and DV2 NS1-T164S (a mutant virus previously characterized by the authors). The authors utilize an anti-NS1 antibody to immunoprecipitate NS1 from cell supernatants and then elute the antibody/NS1 complex with acid. The authors evaluate the eluted NS1 by SDS-PAGE, Native Page, mass spec, negative-stain EM, and eventually Cryo-EM. SDS-PAGE, mas spec, and native page reveal a >250 Kd species containing both NS1 and the proteinaceous component of HDL (ApoA1). The authors produce evidence to suggest that this population is predominantly NS1 in complex with ApoA1. This contrasts with recombinantly produced NS1 (obtained from a collaborator) which did not appear to be in complex with or contain ApoA1 (Figure 1C). The authors then visualize their NS1 stock in complex with their monoclonal antibody by CryoEM. For NS1-WT, the major species visualized by the authors was a ternary complex of an HDL particle in complex with an NS1 dimer bound to their mAB. For their mutant NS1-T164S, they find similar structures, but in contrast to NS1-WT, they visualize free NS1 dimers in complex with 2 Fabs (similar to what's been reported previously) as one of the major species. This highlights that different NS1 species have markedly divergent structural dynamics. It's important to note that the electron density maps for their structures do appear to be a bit overfitted since there are many regions with electron density that do not have a predicted fit and their HDL structure does not appear to have any predicted secondary structure for ApoA1. The authors then map the interaction between NS1 and ApoA1 using cross-linking mass spectrometry revealing numerous NS1-ApoA1 contact sites in the beta-roll and wing domain. The authors find that NS1 isolated from DENV infected mice is also present as a >250 kD species containing ApoA1. They further determine that immunoprecipitation of ApoA1 out of the sera from a single dengue patient correlates with levels of NS1 (presumably COIPed by ApoA1) in a dose-dependent manner.

      In the end, the authors make some useful observations for the NS1 field (mostly confirmatory) providing additional insight into the propensity of NS1 to interact with HDL and ApoA1. The study does not provide any functional assays to demonstrate activity of their proteins or conduct mutagenesis (or any other assays) to support their interaction predications. The authors assertion that higher-order NS1 exists primarily as a NS1 dimer in complex with HDL is not well supported as their purification methodology of NS1 likely introduces bias as to what NS1 complexes are isolated. While their results clearly reveal NS1 in complex with ApoA1, the lack of other NS1 homo-oligomers may be explained by how they purify NS1 from virally infected supernatant. Because NS1 produced during viral infection is not tagged, the authors use an anti-NS1 monoclonal antibody to purify NS1. This introduces a source of bias since only NS1 oligomers with their mAb epitope exposed will be purified. Further, the use of acid to elute NS1 may denature or alter NS1 structure and the authors do not include controls to test functionality of their NS1 stocks (capacity to trigger endothelial dysfunction or immune cell activation). The acid elution may force NS1 homo-oligomers into dimers which then reassociate with ApoA1 in a manner that is not reflective of native conditions. Conducting CryoEM of NS1 stocks only in the presence of full-length mAbs or Fabs also severely biases what species of NS1 is visualized since any NS1 oligomers without the B-ladder domain exposed will not be visualized. If the residues obscured by their mAb are involved in formation of higher-order oligomers then this antibody would functionally inhibit these species from forming. The absence of critical controls, use of one mAb, and acid elution for protein purification severely limits the interpretation of these data and do not paint a clear picture of if NS1 produced during infection is structurally distinct from recombinant NS1. Certainly there is novelty in purifying NS1 from virally infected cells, but without using a few different NS1 antibodies to purify NS1 stocks (or better yet a polyclonal population of antibodies) it's unclear if the results of the authors are simply a consequence of the mAb they selected.

      Data produced from numerous labs studying structure and function of flavivirus NS1 proteins provide diverse lines of evidence that the oligomeric state of NS1 is dynamic and can shift depending on context and environment. This means that the methodology used for NS1 production and purification will strongly impact the results of a study. The data in this manuscript certainly capture one of these dynamic states and overall support the general model of a dynamic NS1 oligomer that can associate with both host proteins as well as itself but the assertions of this manuscript are overall too strong given their data, as there is little evidence in this manuscript, and none available in the large body of existing literature, to support that NS1 exists only as a dimer associated with ApoA1. More likely the results of this paper are a result of their NS1 purification methodology.

      Suggestions for the Authors:

      Major:

      (1) Because of the methodology used for NS1 purification, it is not clear from the data provided if NS1 from viral infection differs from recombinant NS1. Isolating NS1 from viral infection using a polyclonal antibody population would be better to answer their questions. On this point, Vero cells are also not the best candidate for their NS1 production given these cells do not come from a human. A more relevant cell line like U937-DC-SIGN would be preferable.

      We performed an optimization of sNS1 secretion from DENV infection in different cell lines (Author response image 1 below) to identify the best cell line candidate to obtain relatively high yield of sNS1 for the study. As shown in Author response image 1, the levels of sNS1 in the tested human cell lines Huh7 and HEK 293T were at least 3-5 fold lower than in Vero cells. Although using a monocytic cell line expressing DC-SIGN as suggested by the reviewer would be ideal, in our experience the low infectivity of DENV in monocytic cell lines will not yield sufficient amount of sNS1 needed for structural analysis. For these practical reasons we decided to use the closely related non-human primate cell line Vero for sNS1 production supported by our optimization data.

      Author response image 1.

      sNS1 secretion in different mammalian and mosquito cell lines after DENV2 infection. The NS1 secretion level is measured using PlateliaTM Dengue NS1 Ag ELISA kit (Bio-Rad) on day 3 (left) and day 5 (right) post infection respectively.

      (2) The authors need to support their interaction predictions and models via orthogonal assays like mutagenesis followed by HDL/ApoA1 complexing and even NS1 functional assays. The authors should be able to mutate NS1 at regions predicted to be critical for ApoA1/HDL interaction. This is critical to support the central conclusions of this manuscript.

      In our previous publication (Chan et al., 2019 Sci Transl Med), we used similarly purified sNS1 (immunoaffinity purification followed by acid elution) from infected culture supernatants from both DENV2 wild-type and T164S mutant (both also studied in the present work) to carry out stimulation assay on human PBMCs as described by other leading laboratories investigating NS1 (Modhiran et al., 2015 Sci Transl Med). For reader convenience we have extracted the data from our published paper and present it as Author response image 2 below.

      Author response image 2.

      (A) IL6 and (B) TNFa concentrations measured in the supernatants of human PBMCs incubated with either 1µg/ml or 10µg/ml of the BHK-21 immunoaffinity-purified WT and TS mutant sNS1 for 24 hours. Data is adapted from Chan et al., 2019.

      Incubation of immunoaffinity-purified sNS1 (WT and TS) with human PBMCs from 3 independent human donors triggered the production of proinflammatory cytokines IL6 and TNF in a concentration dependent manner (Author response image 2), consistent with the published data by Modhiran et al., 2015 Sci Transl Med. Interestingly the TS mutant derived sNS1 induced a higher proinflammatory cytokines production than WT virus derived sNS1 that appears to correlate with the more lethal and severe disease phenotype in mice as also reported in our previous work (Chan et al., 2019). Additionally, the functionality of our immune-affinity purified infection derived sNS1 (isNA1) is now further supported by our preliminary results on the NS1 induced endothelial cell permeability assay using the purified WT and mutant isNS1 (Author response image 3). As shown in Author response image 3, both the isNS1wt and isNS1ts mutant reduced the relative transendothelial resistance from 0 to 9 h post-treatment, with the peak resistance reduction observed at 6 h post-treatment, suggesting that the purified isNS1 induced endothelial dysfunction as reported in Puerta-Guardo et al., 2019, Cell Rep.) It is noteworthy that the isNS1 in our study behaves similarly as the commercial recombinant sNS1 (rsNS1 purchased from the same source used in study by Puerta-Guardo et al., 2019) in inducing endothelial hyperpermeability. Collectively our previous published and current data suggest that the purified isNS1 (as a complex with ApoA1) has a pathogenic role in disease pathogenesis that is also supported in a recent publication by Benfrid et al., EMBO 2022). The acid elution has not affected the functionality of NS1.

      Author response image 3.

      Functional assessment of isNS1wt and isNS1ts on vascular permeability in vitro. A trans-endothelial permeabilty assay via measurement of the transendothelial electrical resistance (TEER) on human umbilical vascular endothelial cells (hUVEC) was performed, as described previously (Puerta-Guardo et al., 2019, Cell Rep). Ovalbumin serves as the negative control, while TNF-α and rsNS1 serves as the positive controls.

      We agree with reviewer about the suggested mutagnesis study. We will perform site-directed mutagenesis at selected residues and further structural and functional analyses and report the results in a follow-up study.

      (3) The authors need to show that the NS1 stocks produced using acid elution are functional compared to standard recombinantly produced NS1. Do acidic conditions impact structure/function of NS1?

      We are providing the same response to comments 1 & 2 above. We would like to reiterate that we have previously used sNS1 from immunoaffinity purification followed by acid elution to test its function in stimulating PBMCs to produce pro-inflammatory cytokines (Chan et al., 2019; Author response image 2). Similar to Modhiran et al. (2015) and Benfrid et al. (2022), the sNS1 that we extracted using acid elution are capable of activating PBMCs to produce pro-inflammatory cytokines. We have now further demonstrated the ability of both WT and TS isNS1 in inducing endothelial permeability in vitro in hUVECs, using the TEER assay (Author response image 3). Based on the data presented in the rebuttal figures as well as our previous publication we do not think that the acid elution has a significant impact on function of isNS1.

      We performed affinity purification to enrich the complex for better imaging and analysis (Supp Fig. 1b) since the crude supernatant contains serum proteins and serum-free infections also do not provide sufficient isNS1. The major complex observed in negative stain is 1:1 (also under acidic conditions which implies that the complex are stable and intact). We agree that it is possible that other oligomers can form but we have observed only a small population (74 out of 3433 particles, 2.15%; 24 micrographs) of HDL:sNS1 complex at 1:2 ratio as shown in the Author response image 4 below and in the manuscript (p. 4 lines 114-117, Supp Fig. 1c). Other NS1 dimer:HDL ratios including 2:1 and 3:1 have been reported by Benfrid et al., 2022 by spiking healthy sera with recombinant sNS1 and subsequent re-affinity purification. However, this method used an approximately 8-fold higher sNS1 concentration (400 ug/mL) than the maximum clinically reported concentration (50 ug/mL) (Young et al., 2000; Alcon et al., 2002; Libraty et al., 2002). In our hands, the sNS1 concentration in the concentrated media from in vitro infection was quantified as 30 ug/mL which is more physiologically relevant.

      We conclude that the integrity of the HDL of the complex is not lost during sample preparation, as we are able to observe the complex under the negative staining EM as well as infer from XL-MS. Our rebuttal data and our previous studies with our acid-eluted isNS1 from immunoaffinity purification clearly show that our protein is functional and biologically relevant.

      Author response image 4.

      (A) Representative negative stain micrograph of sNS1wt (B) Representative 2D averages of negative stained isNS1wt. Red arrows indicating the characteristic wing-like protrusions of NS1 inserted in HDL. (C) Data adapted from Figure 2 in Benfrid et al. (2022).

      (4) Overall, the data obtained from the mutant NS1 (contrasted to WT NS1) reveals how dynamic the oligomeric state of NS1 proteins are but the authors do not provide any insight into how/why this is, some additional lines of evidence using either structural studies or mutagenesis to compare WT and their mutant and even NS1 from a different serotype of DENV would help the field to understand the dynamic nature of NS1.

      The T164S mutation in DENV2 NS1 was proposed as the residue associated with disease severity in 1997 Cuban dengue epidemic (Halsted SB. “Intraepidemic increases in dengue disease severity: applying lessons on surveillance and transmission”. Whitehorn, J., Farrar. J., Eds., Clinical Insights in Dengue: Transmission, Diagnosis & Surveillance. The Future Medicine (2014), pp. 83-101). Our previous manuscript examined this mutation by engineering it into a less virulent clade 2 DENV isolated in Singapore and showed that sNS1 production was higher without any change in viral RNA replication. Transcript profiling of mutant compared to WT virus showed that genes that are usually induced during vascular leakage were upregulated for the mutant. We also showed that infection of interferon deficient AG129 mice with the mutant virus resulted in disease severity, increased complement protein expression in the liver, tissue inflammation and greater mortality compared to WT virus infected mice. The lipid profiling in our study (Chan et al., 2019) suggested small differences with WT but was overall similar to HDL as described by Gutsche et al. (2011). We were intrigued by our functional results and wanted to explore more deeply the impact of the mutation on sNS1 structure which at that stage was widely believed to be a trimer of NS1 dimers with a central channel (~ X Å) stuffed with lipid as established in several seminal publications (Flamand et al., 1999; Gutsche et al., 2011; Muller et al., 2012). In fact “This Week in Virology” netcast (https://www.microbe.tv/twiv/twiv-725/) discussed two back-to-back publications in Science (Modhiran et al., 371(6625)190-194; Biering et al., Science 371(6625):194-200)) which showed that therapeutic antibodies can ameliorate the NS1 induced pathogenesis and expert discussants posed questions that also pointed to the need for more accurate definition of the molecular composition and architecture of the circulating NS1 complex during virus infection to get a clearer handle on its pathogenic mechanism. Our current studies and also the recent high resolution cryoEM structures (Shu et al., 2022) do not support the notion of a central channel “stuffed with lipid”. Even in the rare instances where trimer of dimers are shown, the narrow channel in the center could only accommodate one molecule of lipoid molecule no bigger than a typical triglyceride molecule. This hexamer model cannot explain the lipid proeotmics data in the literature.

      In our study we observed predominantly 1:1 NS1 dimer to HDL (~30 μg/mL) mirroring maximum clinically reported concentration of sNS1 in the sera of DENV patients (40-50 μg/mL) as we highlighted in our main text (P. 18, lines 461-471). What is often quoted (also see later) is the recent study of Flamand & co-workers which show 1-3 NS1 dimers per HDL (Benfrid et al, 2022) by spiking rsNS1 (400 μg/mL) with HDL. This should not be confused with the previous models which suggested a lipid filled central channel holding together the hexamer. The use of physiologically relevant concentrations is important for these studies as we have highlighted in our main text (P. 18, lines 461-471).

      Our interpretation for the mutant (isNS1ts) is that it is possible that the hydrophilic serine at residue 164 located in the greasy finger loop may weaken the isNS1ts binding to HDL hence the observation of free sNS1 dimers in our immunoaffinity purified (acid eluted sample). The disease severity and increased complement protein expression in AG129 mice liver can be ascribed to weakly bound mutant NS1 with fast on/off rate with HDL being transported to the liver where specific receptors bind to free sNS1 and interact with effector proteins such as complement to drive inflammation and associated pathology. Our indirect support for this is that the XL-MS analysis of purified isNS1ts identified only 7 isNS1ts:ApoA1 crosslinks while 25 isNS1wt:ApoA1 crosslinks were identified from purified isNS1wt (refer to Fig. 4 and Supp. Fig. 8).

      Taken together, the cryoEM and XL-MS analysis of purified isNS1ts suggest that isNS1ts has weaker affinity for HDL compared to isNS1wt. We welcome constructive discussion on our interpretation that we and others will hopefully obtain more data to support or deny our proposed explanation. Our focus has been to compare WT with mutant sNS1 from DENV2 and we agree that it will be useful to study other serotypes.

      Reviewer #2:

      CryoEM:

      Some of the neg-stain 2D class averages for sNS1 in Fig S1 clearly show 1 or 2 NS1 dimers on the surface of a spherical object, presumably HDL, and indicate the possibility of high-quality cryoEM results. However, the cryoEM results are disappointing. The cryo 2D class averages and refined EM map in Fig S4 are of poor quality, indicating sub-optimal grid preparation or some other sample problem. Some of the FSC curves (2 in Fig S7 and 1 in Fig S6) have extremely peculiar shapes, suggesting something amiss in the map refinement. The sharp drop in the "corrected" FSC curves in Figs S5c and S6c (upper) indicate severe problems. The stated resolutions (3.42 & 3.82 Å) for the sNS1ts-Fab56.2 are wildly incompatible with the images of the refined maps in Figs 3 & S7. At those resolutions, clear secondary structural elements should be visible throughout the map. From the 2D averages and 3D maps shown in the figures this does not seem to be the case. Local resolution maps should be shown for each structure.

      The same sample is used for negative staining and the cryoEM results presented. The cryoEM 2D class averages are similar to the negative stain ones, with many spherical-like densities with no discernible features, presumably HDL only or the NS1 features are averaged out. The key difference lies in the 2D class averages where the NS1 could be seen. The side views of NS1 (wing-like protrusion) are more obvious in the negative stain while the top views of NS1 (cross shaped-like protrusion) are more obvious under cryoEM. HDL particles are inherently heterogeneous and known to range from 70-120 Å, this has been highlighted in the main text (p. 8, lines 203 and 228). This helps to explain why the reviewer may find the cryoEM result disappointing. The sample is inherently challenging to resolve structurally as it is (not that the sample is of poor quality). In terms of grid preparation, Supp Fig 4b shows a representative motion-corrected micrograph of the isNS1ts sample whereby individual particles can be discerned and evenly distributed across the grid at high density.

      We acknowledge that most of the dips in the FSC curves (Fig S5-7) are irregular and affect the accuracy of the stated resolutions, particularly for the HDL-isNS1ts-Fab56.2 and isNS1ts-Fab56.2 maps for which the local resolution maps are shown (Fig S7d-e). Probable reasons affecting the FSC curves include (1) the heterogeneous nature of HDL, (2) preferred orientation issue (p 7, lines 198 -200), and (3) the data quality is intrinsically less ideal for high resolution single particle analysis. Optimizing of the dynamic masking such that the mask is not sharper than the resolution of the map for the near (default = 3 angstroms) and far (12 angstroms) parameters during data processing, ranging from 6 - 12 and 14 - 20 respectively, did not help to improve the FSC curves. To report a more accurate global resolution, we have revised the figures S5-7 with new FSC curve plots generated using the remote 3DFSC processing server.

      Regardless, the overall architecture and the relative arrangement of NS1 dimer, Fab, and HDL are clearly visible and identifiable in the map. These results agree well with our biochemical data and mass-spec data.

      The samples were clearly challenging for cryoEM, leading to poor quality maps that were difficult to interpret. None of the figures are convincing that NS1, Ab56.2 or Fab56.2 are correctly fit into EM maps. There is no indication of ApoA1 helices. Details of the fit of models to density for key regions of the higher-resolution EM maps should be shown and the models should be deposited in the PDB. An example of modeling difficulty is clear in the sNS1ts dimer with bound Fab56.2 (figs 3c & S7e). For this complex, the orientation of the Fab56.2 relative to the sNS1ts dimer in this submission (Fig 3c) is substantially different than in the bioRxiv preprint (Fig 3c). Regions of empty density in Fig 3c also illustrate the challenge of building a model into this map.

      We acknowledge the modelling challenge posed by low resolution maps in general, such as the handedness of the Fab molecule as pointed out by the reviewer (which is why others have developed the use of anti-fab nanobody to aid in structure determination among other methods). The change in orientation of the Fab56.2 relative to the sNS1ts dimer was informed by the HDX-MS results which was not done at the point of bioRxiv preprint mentioned. With regards to indication of ApoA1 helices, this is expected given the heterogeneous nature of HDL. To the best of our knowledge, engineered apoA1 helices were also not reported in many cryoEM structures of membrane proteins solved in membrane scaffold protein (MSP) nanodiscs. This is despite nanodiscs, comprised of engineered apoA1 helices, having well-defined size classifications.

      Regions of weak density in Fig 3c is expected due to the preferred orientation issue acknowledged in the results section of the main text (p. 9, line 245). The cryoEM density maps have been deposited in the Electron Microscopy Data Bank (EMDB) under accession codes EMD-36483 (isNS1ts:Fab56.2) and EMD-36480 (Fab56.2:isNS1ts:HDL). The protein model files for isNS1ts:Fab56.2 and Fab56.2:isNS1ts:HDL model are available upon request. Crosslinking MS raw files and the search results can be downloaded from https://repository.jpostdb.org/preview/14869768463bf85b347ac2 with the access code: 3827. The HDX-MS data is deposited to the ProteomeXchange consortium via PRIDE partner repository51 with the dataset identifier PXD042235.

      Mass spec:

      Crosslinking-mass spec was used to detect contacts between NS1 and ApoA1, providing strong validation of the sNS1-HDL association. As the crosslinks were detected in a bulk sample, they show that NS1 is near ApoA1 in many/most HDL particles, but they do not indicate a specific protein-protein complex. Thus, the data do not support the model of an NS1-ApoA1 complex in Fig 4d. Further, a specific NS1-ApoA1 interaction should have evidence in the EM maps (helical density for ApoA1), but none is shown or mentioned. If such exists, it could perhaps be visualized after focused refinement of the map for sNS1ts-HDL with Fab56.2 (Fig S7d). The finding that sNS1-ApoA1 crosslinks involved residues on the hydrophobic surface of the NS1 dimer confirms previous data that this NS1 surface engages with membranes and lipids.

      We thank the reviewer for the comment. The XL-MS is a method to identify the protein-protein interactions by proximity within the spacer arm length of the crosslinker. The crosslinking MS data do support the NS1-ApoA1 complex model obtained by cryo-EM because the identified crosslinks that are superimposed on the EM map are within the cut-off distance of 30 Å. We agree that the XL-MS data do not dictate the specific interactions between specific residues of NS1-ApoA1 in the EM model. We also do not claim that specific residue of NS1 in beta roll or wing domain is interacting with specific residue of ApoA1 in H4 and H5 domain. We claim that beta roll and wing domain regions of NS1 are interacting with ApoA1 in HDL indicating the proximity nature of NS1-ApoA1 interactions as warranted by the XL-MS data.

      As explained in the previous response on the lack of indication of ApoA1 helical density, this is expected given the heterogeneous nature of HDL. It is typical to see lipid membranes as unstructured and of lower density than the structured protein. In our study, local refinement was performed on either the global map (presented in Fig S7d) or focused on the NS1-Fab region only. Both yielded similar maps as illustrated in the real space slices shown in Author response image 5. The mask and map overlay is depicted in similar orientations to the real space slices, and at different contour thresholds at 0.05 (Author response image 5e) and 0.135 (Author response image 5f). While the overall map is of poor resolution and directional anisotropy evident, there is clear signal differences in the low density region (i.e. the HDL sphere) indicative of NS1 interaction with ApoA1 in HDL, extending from the NS1 wing to the base of the HDL sphere.

      Author response image 5.

      Real Space Slices of map and mask used during Local Refinement for overall structure (a-b) and focused mask on NS1 region (c-d). The corresponding map (grey) contoured at 0.05 (e) and 0.135 (f) in similar orientations as shown for the real space slices of map and masks. The focused mask of NS1 used is colored in semi-transparent yellow. Real Space Slices of map and mask are generated during data processing in Cryosparc 4.0 and the map figures were prepared using ChimeraX.

      Sample quality:

      The paper lacks any validation that the purified sNS1 retains established functions, for example the ability to enhance virus infectivity or to promote endothelial dysfunction.

      Please see detailed response for question 2 in Reviewer #1’s comments. In essence, we have showed that both isNS1wt and isNS1ts are capable of inducing endothelial permeability in an in vitro TEER assay (Rebuttal Fig 3) and also in our previous study that quantified inflammation in human PBMC’s (Rebuttal Fig 2).

      Peculiarities include the gel filtration profiles (Fig 2a), which indicate identical elution volumes (apparent MWs) for sNS1wt-HDL bound to Ab562 (~150 kDa) and to the ~3X smaller Fab56.2 (~50 kDa). There should also be some indication of sNS1wt-HDL pairs crosslinked by the full-length Ab, as can be seen in the raw cryoEM micrograph (Fig S5b).

      Obtaining high quality structures is often more demanding of sample integrity than are activity assays. Given the low quality of the cryoEM maps, it's possible that the acidification step in immunoaffinity purification damaged the HDL complex. No validation of HDL integrity, for example with acid-treated HDL, is reported.

      Please see detailed response for question 3 in Reviewer #1’s comments.

      Acid treatment is perhaps discounted by a statement (line 464) that another group also used immunoaffinity purification in a recent study (ref 20) reporting sNS1 bound to HDL. However the statement is incorrect; the cited study used affinity purification via a strep-tag on recombinant sNS1.

      We thank the Reviewer for pointing this out and have rewritten this paragraph instead (p 18, line 445-455). We also expanded our discussion to highlight our prior functional studies showing that acid-eluted isNS1 proteins do induce endothelial hyperpermeability (p 18-19, line 470-476).

      Discussion:

      The Discussion reflects a view that the NS1 secreted from virus-infected cells is a 1:1 sNS1dimer:HDL complex with the specific NS1-ApoA1 contacts detected by crosslinking mass spec. This is inconsistent with both the neg-stain 2D class average with 2 sNS1 dimers on an HDL (Fig S1c) and with the recent study of Flamand & co-workers showing 1-3 NS1 dimers per HDL (ref 20). It is also ignores the propensity of NS1 to associate with membranes and lipids. It is far more likely that NS1 association with HDL is driven by these hydrophobic interactions than by specific protein-protein contacts. A lengthy Discussion section (lines 461-522) includes several chemically dubious or inconsistent statements, all based on the assumption that specific ApoA1 contacts are essential to NS1 association with HDL and that sNS1 oligomers higher than the dimer necessarily involve ApoA1 interaction, conclusions that are not established by the data in this paper.

      We thank the Reviewer and have revised our discussion to cover available structural and functional data to draw conclusions that invariably also need further validation by others. One point that is repeatedly brought up by Reviewer 1 & 2 is the quality and functionality of our sample. Our conclusion now reiterates this point based on our own published data (Chan et al., 2019) and also the TEER assay data provided as Author response image 3.

      Reviewer #1 (Recommendations For The Authors):

      Minor:

      (1) Fig. S3B, should the label for lane 4 be isNS1? In figure 1C you do not see ApoA1 for rsNS1 but for S3B you do? Which is correct?

      This has been corrected in the Fig. S3B, the label for lane 4 has been corrected to isNS1 and lane 1 to rsNS1, where no ApoA1 band (25 kDa) is found.

      (2) Line 436, is this the correct reference? Reference 43?

      This has been corrected in the main text. (p 20, Line 507; Lee et al., 2020, J Exp Med).

      Reviewer #2 (Recommendations For The Authors):

      The cryoEM data analysis is incompletely described. The process (software, etc) leading to each refined EM map should be stated, including the use of reference structures in any step. These details are not in the Methods or in Figs S4-7, as claimed in the Methods. The use of DeepEMhancer (which refinements?) with the lack of defined secondary structural features in the maps and without any validation (or discussion of what was used as "ground truth") is concerning. At the least, the authors should show pre- and post-DeepEMhancer maps in the supplemental figures.

      The data processing steps in the Methods section have been described with improved clarity. DeepEMhancer is a deep learning solution for cryo-EM volume post-processing to reduce noise levels and obtain more detailed versions of the experimental maps (Sanchez-Garcia, et al., 2021). DeepEMhancer was only used to sharpen the maps and reduce the noise for classes 1 and 2 of isNS1wt in complex with Ab56.2 for visualization purpose only and not for any refinements. To avoid any confusion, the use of DeepEMhancer has been removed from the supp text and figures.

      Line 83 - "cryoEM structures...recently reported" isn't ref 17

      This reference has been corrected in to Shu et al. (2022) in p 3, line 83.

      Fig. S3 - mis-labeled gel lanes

      This has been corrected in the Fig. S3B, the label for lane 4 has been corrected to isNS1 and lane 1 to rsNS1.

      Fig S6c caption - "Representative 2D classes of each 3D classes, white bar 100 Å. Refined 3D map for classes 1 and 2 coloured by local resolution". The first sentence is unclear, and there is no white scale bar and no heat map.

      Fig S6c caption has been corrected to “Representative 3D classes contoured at 0.06 and its particle distribution as labelled and coloured in cyan. Scale bar of 100 Å as shown. Refined 3D maps and their respective FSC resolution charts and posterior precision directional distribution as generated in crysosparc4.0”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors performed experimental evolution of MreB mutants that have a slow-growing round phenotype and studied the subsequent evolutionary trajectory using analysis tools from molecular biology. It was remarkable and interesting that they found that the original phenotype was not restored (most common in these studies) but that the round phenotype was maintained. 

      Strengths: 

      The finding that the round phenotype was maintained during evolution rather than that the original phenotype, rod-shaped cells, was recovered is interesting. The paper extensively investigates what happens during adaptation with various different techniques. Also, the extensive discussion of the findings at the end of the paper is well thought through and insighXul. 

      Weaknesses: 

      I find there are three general weaknesses: 

      (1) Although the paper states in the abstract that it emphasizes "new knowledge to be gained" it remains unclear what this concretely is. On page 4 they state 3 three research questions, these could be more extensively discussed in the abstract. Also, these questions read more like genetics questions while the paper is a lot about cell biological findings. 

      Thank you for drawing attention to the unnecessary and gratuitous nature of the last sentence of the Abstract. We are in agreement. It has been modified, and we have taken  advantage of additional word space to draw attention to the importance of the two competing (testable) hypotheses laid out in the Discussion. 

      As to new knowledge, please see the Results and particularly the Discussion. But beyond this, and as recognised by others, there is real value for cell biology in seeing how (and whether) selection can compensate for effects that are deleterious to fitness. The results will very o_en depart from those delivered from, for example, suppressor analyses, or bottom up engineering. 

      In the work recounted in our paper, we chose to focus – by way of proof-of principle – on the most commonly observed mutations, namely, those within pbp1A.  But beyond this gene, we detected mutations  in other components of the cell shape / division machinery whose connections are not yet understood and which are the focus of on-going investigation.  

      As to the three questions posed at the end of the Introduction, the first concerns whether selection can compensate for deleterious effects of deleting mreB (a question that pertains to evolutionary aspects); the second seeks understanding of genetic factors; the third aims to shed light on the genotype-to-phenotype map (which is where the cell biology comes into play).  Given space restrictions, we cannot see how we could usefully expand, let alone discuss, the three questions raised at the end of the Introduction in restrictive space available in the Abstract.   

      (2) It is not clear to me from the text what we already know about the restoration of MreB loss from suppressors studies (in the literature). Are there suppressor screens in the literature and which part of the findings is consistent with suppressor screens and which parts are new knowledge?  

      As stated in the Introduction, a previous study with B. subtilis (which harbours three MreB isoforms and where the isoform named “MreB” is essential for growth under normal conditions), suppressors of MreB lethality were found to occur in ponA, a class A penicillin binding protein (Kawai et al., 2009). This led to recognition that MreB plays a role in recruiting Pbp1A to the lateral cell wall. On the other hand, Patel et al. (2020) have shown that deletion of classA PBPs leads to an up-regulation of rod complex activity. Although there is a connection between rod complex and class A PBPs, a further study has shown that the two systems work semi-autonomously (Cho et al., 2016). 

      Our work confirms a connection between MreB and Pbp1A, and has shed new light on how this interaction is established by means of natural selection, which targets the integrity of cell wall. Indeed, the Rod complex and class A PBPs have complementary activities in the building of the cell wall with each of the two systems able to compensate for the other in order to maintain cell wall integrity. Please see the major part of the Discussion. In terms of specifics, the connection between mreB and pbp1A (shown by Kawai et al (2009)) is indirect because it is based on extragenic transposon insertions. In our study, the genetic connection is mechanistically demonstrated.  In addition, we capture that the evolutionary dynamics is rapid and we finally enriched understanding of the genotype-to-phenotype map.

      (3) The clarity of the figures, captions, and data quantification need to be improved.  

      Modifications have been implemented. Please see responses to specific queries listed below.

      Reviewer #2 (Public Review): 

      Yulo et al. show that deletion of MreB causes reduced fitness in P. fluorescens SBW25 and that this reduction in fitness may be primarily caused by alterations in cell volume. To understand the effect of cell volume on proliferation, they performed an evolution experiment through which they predominantly obtained mutations in pbp1A that decreased cell volume and increased viability. Furthermore, they provide evidence to propose that the pbp1A mutants may have decreased PG cross-linking which might have helped in restoring the fitness by rectifying the disorganised PG synthesis caused by the absence of MreB. Overall this is an interesting study. 

      Queries: 

      Do the small cells of mreB null background indeed have have no DNA? It is not apparent from the DAPI images presented in Supplementary Figure 17. A more detailed analysis will help to support this claim. 

      It is entirely possible that small cells have no DNA, because if cell division is aberrant then division can occur prior to DNA segregation resulting in cells with no DNA. It is clear from microscopic observation that both small and large cells do not divide. It is, however, true, that we are unable to state – given our measures of DNA content – that small cells have no DNA. We have made this clear on page 13, paragraph 2.

      What happens to viability and cell morphology when pbp1A is removed in the mreB null background? If it is actually a decrease in pbp1A activity that leads to the rescue, then pbp1A- mreB- cells should have better viability, reduced cell volume and organised PG synthesis. Especially as the PG cross-linking is almost at the same level as the T362 or D484 mutant.  

      Please see fitness data in Supp. Fig. 13. Fitness of ∆mreBpbp1A is no different to that caused by a point mutation. Cells remain round.  

      What is the status of PG cross-linking in ΔmreB Δpflu4921-4925 (Line 7)? 

      This was not analysed as the focus of this experiment was PBPs. A priori, there is no obvious reason to suspect that ∆4921-25 (which lacks oprD) would be affected in PBP activity.

      What is the morphology of the cells in Line 2 and Line 5? It may be interesting to see if PG cross-linking and cell wall synthesis is also altered in the cells from these lines. 

      The focus of investigation was restricted to L1, L4 and L7. Indeed, it would be interesting to look at the mutants harbouring mutations in :sZ, but this is beyond scope of the present investigation (but is on-going). The morphology of L2 and L5 are shown in Supp. Fig. 9.

      The data presented in 4B should be quantified with appropriate input controls. 

      Band intensity has now been quantified (see new Supp. Fig .20). The controls are SBW25, SBW25∆pbp1A, SBW25 ∆mreB and SBW25 ∆mreBpbp1A as explained in the paper.

      What are the statistical analyses used in 4A and what is the significance value? 

      Our oversight. These were reported in Supp. Fig. 19, but should also have been presented in Fig. 4A. Data are means of three biological replicates. The statistical tests are comparisons between each mutant and SBW25, and assessed by paired t-tests.  

      A more rigorous statistical analysis indicating the number of replicates should be done throughout. 

      We have checked and made additions where necessary and where previously lacking. In particular, details are provided in Fig. 1E, Fig. 4A and Fig. 4B. For Fig. 4C we have produced quantitative measures of heterogeneity in new cell wall insertion. These are reported in Supp. Fig. 21 (and referred to in the text and figure caption) and show that patterns of cell wall insertion in ∆mreB are highly heterogeneous.

      Reviewer #3 (Public Review): 

      This paper addresses an understudied problem in microbiology: the evolution of bacterial cell shape. Bacterial cells can take a range of forms, among the most common being rods and spheres. The consensus view is that rods are the ancestral form and spheres the derived form. The molecular machinery governing these different shapes is fairly well understood but the evolutionary drivers responsible for the transition between rods and spheres are not. Enter Yulo et al.'s work. The authors start by noting that deletion of a highly conserved gene called MreB in the Gram-negative bacterium Pseudomonas fluorescens reduces fitness but does not kill the cell (as happens in other species like E. coli and B. subtilis) and causes cells to become spherical rather than their normal rod shape. They then ask whether evolution for 1000 generations restores the rod shape of these cells when propagated in a rich, benign medium. 

      The answer is no. The evolved lineages recovered fitness by the end of the experiment, growing just as well as the unevolved rod-shaped ancestor, but remained spherical. The authors provide an impressively detailed investigation of the genetic and molecular changes that evolved. Their leading results are: 

      (1) The loss of fitness associated with MreB deletion causes high variation in cell volume among sibling cells a_er cell division. 

      (2) Fitness recovery is largely driven by a single, loss-of-function point mutation that evolves within the first ~250 generations that reduces the variability in cell volume among siblings. 

      (3) The main route to restoring fitness and reducing variability involves loss of function mutations causing a reduction of TPase and peptidoglycan cross-linking, leading to a disorganized cell wall architecture characteristic of spherical cells. 

      The inferences made in this paper are on the whole well supported by the data. The authors provide a uniquely comprehensive account of how a key genetic change leads to gains in fitness and the spectrum of phenotypes that are impacted and provide insight into the molecular mechanisms underlying models of cell shape. 

      Suggested improvements and clarifications include: 

      (1) A schematic of the molecular interactions governing cell wall formation could be useful in the introduction to help orient readers less familiar with the current state of knowledge and key molecular players. 

      We understand that this would be desirable, but there are numerous recent reviews with detailed schematics that we think the interested reader would be better consulting. These are referenced in the text.

      (2) More detail on the bioinformatics approaches to assembling genomes and identifying the key compensatory mutations are needed, particularly in the methods section. This whole subject remains something of an art, with many different tools used. Specifying these tools, and the parameter sesngs used, will improve transparency and reproducibility, should it be needed. 

      We overlooked providing this detail, which has now been corrected by provision of more information in the Materials and Methods. In short we used Breseq, the clonal option, with default parameters. Additional analyses were conducted using Genieous. The BreSeq output files are provided https://doi.org/10.17617/3.CU5SX1 (which include all read data).

      (3) Corrections for multiple comparisons should be used and reported whenever more than one construct or strain is compared to the common ancestor, as in Supplementary Figure 19A (relative PG density of different constructs versus the SBW25 ancestor). 

      The data presented in Supp Fig 19A (and Fig 4A) do not involve multiple comparisons. In each instance the comparison is between SBW25 and each of the different mutants. A paired t-test is thus appropriate.

      (4) The authors refrain from making strong claims about the nature of selection on cell shape, perhaps because their main interest is the molecular mechanisms responsible. However, I think more can be said on the evolutionary side, along two lines. First, they have good evidence that cell volume is a trait under strong stabilizing selection, with cells of intermediate volume having the highest fitness. This is notable because there are rather few examples of stabilizing selection where the underlying mechanisms responsible are so well characterized. Second, this paper succeeds in providing an explanation for how spherical cells can readily evolve from a rod-shaped ancestor but leaves open how rods evolved in the first place. Can the authors speculate as to how the complex, coordinated system leading to rods first evolved? Or why not all cells have lost rod shape and become spherical, if it is so easy to achieve? These are important evolutionary questions that remain unaddressed. The manuscript could be improved by at least flagging these as unanswered questions deserving of further attention. 

      These are interesting points, but our capacity to comment is entirely speculative. Nonetheless, we have added an additional paragraph to the Discussion that expresses an opinion that has yet to receive attention:

      “Given the complexity of the cell wall synthesis machinery that defines rod-shape in bacteria, it is hard to imagine how rods could have evolved prior to cocci. However, the cylindrical shape offers a number of advantages. For a given biomass (or cell volume), shape determines surface area of the cell envelope, which is the smallest surface area associated with the spherical shape. As shape sets the surface/volume ratio, it also determines the ratio between supply (proportional to the surface) and demand (proportional to cell volume). From this point of view, it is more efficient to be cylindrical (Young 2006). This also holds for surface attachment and biofilm formation (Young 2006). But above all, for growing cells, the ratio between supply and demand is constant in rod shaped bacteria, whereas it decreases for cocci. This requires that spherical cells evolve complex regulatory networks capable of maintaining the correct concentration of cellular proteins despite changes in surface/volume ratio. From this point of view, rod-shaped bacteria offer opportunities to develop unsophisticated regulatory networks.”

      why not all cells have lost rod shape and become spherical.

      Please see Kevin Young’s 2006 review on the adaptive significance of cell shape

      The value of this paper stems both from the insight it provides on the underlying molecular model for cell shape and from what it reveals about some key features of the evolutionary process. The paper, as it currently stands, provides more on which to chew for the molecular side than the evolutionary side. It provides valuable insights into the molecular architecture of how cells grow and what governs their shape. The evolutionary phenomena emphasized by the authors - the importance of loss-of-function mutations in driving rapid compensatory fitness gains and that multiple genetic and molecular routes to high fitness are o_en available, even in the relatively short time frame of a few hundred generations - are wellunderstood phenomena and so arguably of less broad interest. The more compelling evolutionary questions concern the nature and cause of stabilizing selection (in this case cell volume) and the evolution of complexity. The paper misses an opportunity to highlight the former and, while claiming to shed light on the latter, provides rather little useful insight. 

      Thank you for these thoughts and comments. However, we disagree that the experimental results are an overlooked opportunity to discuss stabilising selection. Stabilising selection occurs when selection favours a particular phenotype causing a reduction in underpinning population-level genetic diversity. This is not happening when selection acts on SBW25 ∆mreB leading to a restoration of fitness. Driving the response are biophysical factors, primarily the critical need to balance elongation rate with rate of septation. This occurs without any change in underlying genetic diversity.  

      Recommendations for the authors:  

      Reviewer 1 (Recommendations for the Authors): 

      Hereby my suggestion for improvement of the quantification of the data, the figures, and the text. 

      -  p 14, what is the unit of elongation rate?  

      At first mention we have made clear that the unit is given in minutes^-1

      -  p 14, please give an error bar for both p=0.85 and f=0.77, to be able to conclude they are different 

      Error on the probability p is estimated at the 95% confidence interval by the formula:1.96 , where N is the total number of cells. This has been added in the paragraph p »probability » of the Image Analysis section in the Material and Methods. 

      We also added errors on p measurement in the main text.

      -  p 14, all the % differences need an errorbar 

      The error bars and means are given in Fig 3C and 3D.

      -  Figure 1B adds units to compactness, and what does it represent? Is the cell size the estimated volume (that is mentioned in the caption)? Shouldn't the datapoints have error bars? 

      Compactness is defined in the “Image Analysis” section of the Material and Methods. It is a dimensionless parameter. The distribution of individual cell shapes / sizes are depicted in Fig 1B. Error does arise from segmentation, but the degree of variance (few pixels) is much smaller than the representations of individual cells shown.

      -  Figure 1C caption, are the 50.000 cells? 

      Correct. Figure caption has been altered.

      -  Figure 1D, first the elongation rate is described as a volume per minute, but now, looking at the units it is a rate, how is it normalized? 

      Elongation rate is explained in the Materials and Methods (see the image analysis section) and is not volume per minute. It is dV/dt = r*V (the unit of r is min^-1). Page 9 includes specific mention of the unit of r.

      -  Figure 1E, how many cells (n) per replicate? 

      Our apologies. We have corrected the figure caption that now reads:

      “Proportion of live cells in ancestral SBW25 (black bar) and ΔmreB (grey bar) based on LIVE/DEAD BacLight Bacterial Viability Kit protocol. Cells were pelleted at 2,000 x g for 2 minutes to preserve ΔmreB cell integrity. Error bars are means and standard deviation of three biological replicates (n>100).”

      -  Figure 1G, how does this compare to the wildtype 

      The volume for wild type SBW25 is 3.27µm^3 (within the “white zone”). This is mentioned in the text.

      -  Figure 2B, is this really volume, not size? And can you add microscopy images? 

      The x-axis is volume (see Materials and Methods, subsection image analysis). Images are available in Supp. Fig. 9.

      -  Figure 3A what does L1, L4 and L7 refer too? Is it correct that these same lines are picked for WT and delta_mreB 

      Thank you for pointing this out. This was an earlier nomenclature. It was shorthand for the mutants that are specified everywhere else by genotype and has now been corrected. 

      -  Figure 3c: either way write out p, so which probability, or you need a simple cartoon that is plotted. 

      The value p is the probability to proceed to the next generation and is explained in Materials and Methods  subsection image analysis.  We feel this is intuitive and does not require a cartoon. We nonetheless added a sentence to the Materials and Methods to aid clarity.

      -  Figure 4B can you add a ladder to the gel? 

      No ladder was included, but the controls provide all the necessary information. The band corresponding to PBP1A is defined by presence in SBW25, but absence in SBW25 ∆pbp1A.

      -  Figure 4c, can you improve the quantification of these images? How were these selected and how well do they represent the community? 

      We apologise for the lack of quantitative description for data presented in Fig 4C. This has now been corrected. In brief, we measured the intensity of fluorescent signal from between 10 and 14 cells and computed the mean and standard deviation of pixel intensity for each cell. To rule out possible artifacts associated with variation of the mean intensity, we calculated the ratio of the standard deviation divided by the square root of the mean. These data reveal heterogeneity in cell wall synthesis and provide strong statistical support for the claim that cell wall synthesis in ∆mreB is significantly more heterogeneous than the control. The data are provided in new Supp. Fig. 21. 

      Minor comments: 

      -  It would be interesting if the findings of this experimental evolution study could be related to comparative studies (if these have ever been executed).  

      Little is possible, but Hendrickson and Yulo published a portion of the originally posted preprint separately. We include a citation to that paper. 

      -  p 13, halfway through the page, the second paragraph lacks a conclusion, why do we care about DNA content? 

      It is a minor observation that was included by way of providing a complete description of cell phenotype.  

      -  p 17, "suggesting that ... loss-of-function", I do no not understand what this is based upon. 

      We show that the fitness of a pbp1A deletion is indistinguishable from the fitness of one of the pbp1A point mutants. This fact establishes that the point mutation had the same effects as a gene deletion thus supporting the claim that the point mutations identified during the course of the selection experiment decrease (or destroy) PBP1A function.

      -  p 25, at the top of the page: do you have a reference for the statement that a disorganized cell wall architecture is suited to the topology of spherical cells? 

      The statement is a conclusion that comes from our reasoning. It stems from the fact that it is impossible to entirely map the surface of a sphere with parallel strands.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Basha and colleagues aim to test whether the thalamic nucleus reuniens can facilitate the hippocampus/prefrontal cortex coupling during sleep. Considering the importance of sleep in memory consolidation, this study is important to understand the functional interaction between these three majorly involved regions. This work suggests that the thalamic nucleus reuniens has a functional role in synchronizing the hippocampus and prefrontal cortex.

      Strengths:

      The authors performed recordings in naturally sleeping cats, and analysed the correlation between the main slow wave sleep oscillatory hallmarks: slow waves, spindles, and hippocampal ripples, and with reuniens' neurons firing. They also associated intracellular recordings to assess the reuniens-prefrontal connectivity, and computational models of large networks in which they determined that the coupling of oscillations is modulated by the strength of hippocampal-thalamic connections.

      Thank you for your positive evaluation.

      Weaknesses:

      The authors' main claim is made on slow waves and spindle coupling, which are recorded both in the prefrontal cortex and surprisingly in reuniens. Known to be generated in the cortex by cortico-thalamic mechanisms, the slow waves and spindles recorded in reuniens show no evidence of local generation in the reuniens, which is not anatomically equipped to generate such activities. Until shown differently, these oscillations recorded in reuniens are most likely volume-conducted from nearby cortices. Therefore, such a caveat is a major obstacle to analysing their correlation (in time or frequency domains) with oscillations in other regions.

      (1) We fully agree with the reviewer that reuniens likely does not generate neither slow waves nor spindles. We do not make such claim, which we clearly stated in the discussion (lines 319-324). We propose that Reuniens neurons mediate different forms of activity. In the model, we introduced MD nucleus only because without MD we were unable to generate spindles. While the slow waves and spindles are generated in other thalamocortical regions, the REU neurons show these rhythms due to long-range projections from these regions to REU as has been shown in the model.

      (2) Definitely, we cannot exclude some influence of volume conductance on obtained LFP recordings in REU nucleus. However, we show modulation of spiking activity within REU by spindles. Spike modulation cannot be explained by volume conductance but can be explained by either synaptic drive (likely the case here) or some intrinsic neuronal processes (like T-current).

      (3) In our REU recordings for spike identification we used tetrode recordings. If slow waves and spindles are volume conducted, then slow waves and spindles recorded with tetrodes should have identical shape. Following reviewer comment, we took these recordings and subtracted one channel from another. The difference in signal during slow waves is in the order 0.1 mV. Considering that the distance between electrodes is in the order of 20 um, such a difference in voltage is major and can only be explained by local extracellular currents, likely due to synaptic activities originating in afferent structures.

      Finally, the choice of the animal model (cats) is the best suited one, as too few data, particularly anatomical ones regarding reuniens connectivity, are available to support functional results.

      (1) Thalamus of majority of mammals (definitely primates and carnivores, including cats) contain local circuit interneurons (about 30 % of all neurons). A vast majority of studies in rodents (except LGN nucleus) report either absence or extremally low (i.e. Jager P, Moore G, Calpin P, et al. Dual midbrain and forebrain origins of thalamic inhibitory interneurons. eLife. 2021; 10: e59272.) number of thalamic interneurons. Therefore, studies on other species than rodents are necessary, and bring new information, which is impossible to obtain in rodents.

      (2) Cats’ brain is much larger than the brain of mice or rats, therefore, the effects of volume conductance from cortex to REU are much smaller, if not negligible. The distance between REU and closest cortical structure (ectosylvian gyrus) in cats is about 15 mm.

      (3) Indeed, there is much less anatomical data on cats as opposed to rodents. This is why, we performed experiments shown in the figure 1. This figure contains functional anatomy data. Antidromic responses show that recorded structure projects to stimulated structure. Orthodromic responses show that stimulated structure projects to recorded structure.

      Reviewer #2 (Public Review):

      Summary:

      The interplay between the medial prefrontal cortex and ventral hippocampal system is critical for many cognitive processes, including memory and its consolidation over time. A prominent idea in recent research is that this relationship is mediated at least in part by the midline nucleus reuniens with respect to consolidation in particular. Whereas the bulk of evidence has focused on neuroanatomy and the effects of temproary or permanent lesions of the nucleus reuniens, the current work examined the electrophysiology of these three structures and how they inter-relate, especially during sleep, which is anticipated to be critical for consolidation. They provide evidence from intercellular recordings of the bi-directional functional connectivity among these structures. There is an emphasis on the interactions between these regions during sleep, especially slow-wave sleep. They provide evidence, in cats, that cortical slow waves precede reuniens slow waves and hippocampal sharp-wave ripples, which may reflect prefrontal control of the timing of thalamic and hippocampal events, They also find evidence that hippocampal sharp wave ripples trigger thalamic firing and precede the onset of reuniens and medial prefrontal cortex spindles. The authors suggest that the effectiveness of bidirectional connections between the reuniens and the (ventral) CA1 is particularly strong during non-rapid eye movement sleep in the cat. This is a very interesting, complex study on a highly topical subject.

      Strengths:

      An excellent array of different electrophysiological techniques and analyses are conducted. The temporal relationships described are novel findings that suggest mechanisms behind the interactions between the key regions of interest. These may be of value for future experimental studies to test more directly their association with memory consolidation.

      We thank this reviewer for very positive evaluation of our study.

      Weaknesses:

      Given the complexity and number of findings provided, clearer explanation(s) and organisation that directed the specific value and importance of different findings would improve the paper. Most readers may then find it easier to follow the specific relevance of key approaches and findings and their emphasis. For example, the fact that bidirectional connections exist in the model system is not new per se. How and why the specific findings add to existing literature would have more impact if this information was addressed more directly in the written text and in the figure legends.

      Thank you for this comment. In the revised version, we will do our best to simplify presentation and more clearly explain our findings.

      Reviewing Editor (Recommendations for Authors):

      Please discuss the ability of reuniens to generate spindles?

      We briefly discussed this in previous version. We now extended the discussion (p. 18).

      For population data, how many cats were used in acute and chronic experiments, where does the population data originate in Fig. 2? How repeatable were the findings across animals? Was histology verified in each animal?

      As previously stated in the beginning of method section we totally used 20 cats: 16 anesthetized (or acute) and 4 non-anesthetized (or chronic). We added number of cats in appropriate places in the result section. Population data in figure 2 comes from 48, 49 or 52 recording sessions (depending on the type of analysis, and indicated in the figure legend) from 4 chronic cats; we clarified this information in the legend. Results were highly repeatable across animals. Histology was verified in all chronic and acute animals, we added a sentence in the method section.

      Explanation of figures is very poor, values in figures should be reported in results so they can be compared in the context of the description.

      In this revised version, we report most numbers present in figures and their legend to the main text (result section).

      The depth of the recording tungsten electrodes are meaningless without the AP and ML coordinates given how heterogenous mPFC is. What is the ventromedial wall of the mPFC in the cat?

      We added the ML and AP coordinates in the method section. We corrected ventromedial wall for ventroposterior part of the mPFC.

      What are the two vertical lines in 1F?

      This was an error while preparing the figure. The panel was corrected.

      Line 90 mean +-SD of what? There are no numbers.

      Thanks, we now indicate the values.

      Panel 2L does not show increased spindling in reuniens prior to PFC as indicated in the results, please explain. It does show SWR in the hippocampus prior to spindles, what is the meaning of such a time relationship?

      Panel 2L did show an increased spindling reuniens prior to mPFC, but indeed at the time scale shown, it was not very clear. In this revised manuscript, we added an inset zooming around time zero to make this point clearer.

      Panel 2L indeed show an increase in SWR prior to the increase in spindle in both Reuniens and mPFC.

      As stated in the discussion, ‘We found that hippocampal SWRs trigger thalamic firing and precede the onset of reuniens and mPFC spindles, which points to SWRs as one of candidate events for spindle initiation.’

      It is unclear what the slow waves of PFC mean, these represent filtered PFC lfp, but is this a particular oscillation? They continue to occur during the spindle, while the slow waves supposedly trigger the spindle. Please explain and clarify.

      We recently published a review article involving several scientists studying both human and animal sleep that has inserted Box. 1 (Timofeev I, Schoch S, LeBourgeois M, Huber R, Riedner B, Kurth S. Spatio-temporal properties of sleep slow waves and implications for development. Current Opinion in Physiology. 2020; 15: 172–182). In this box among other terms, we provide current definition of slow waves vs slow oscillation. Briefly, if slow waves are repeated with a given rhythm, they typically form slow oscillation. However, if they occur in isolation or are not rhythmic, they remain slow waves, but cannot be called slow oscillation.

      Regarding relation of spindles and slow oscillation. We are currently systematically analyzing data on spindles and slow waves obtained from head-restrained and freely behaving cats. One of the main findings is that a majority of ‘cortical’ spindles are local. Local to the extent that spindles can occur in alternation in two neighboring cortical cells. Largely, LFP sleep spindles occur more or less synchronously within suprasylvian gyrus of cats where indeed a large majority of them was triggered by slow waves. The synchrony between LFP spindles in suprasylvian vs other other cortical areas is much less clear. So, it is not surprizing that spindles in one bran region can occur when there is a slow wave present in some other brain region. Something of a kind was also shown in human (Mölle M, Bergmann TO, Marshall L, Born J. Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep. 2011; 34 (10): 1411-1421).

      In this regard, we are not ready to include modifications in the manuscript.

      Line 134, where is spindle amplitude shown? Plots report power within the spindle frequency band, which obviously captures more than just spindles.

      No, plots of figure 3 B, C show the phase-amplitude coupling (PAC) strength. These were calculated with detected spindles, therefore, while we cannot exclude some false spindle detections, we are confident that the false spindle detections are at a negligible level. We modified text and instead of spindle amplitude, we describe SW-spindle amplitude coupling. This reflects our analysis with exactitude.

      The discussion must include the medio dorsal nucleus which is the largest thalamic input to the prefrontal cortex and also receives input from the hippocampus. In particular, the case must be made for why reuniens would play a more important or different role than MD? (For example: Occurrence of Hippocampal Ripples is Associated with Activity Suppression in the Mediodorsal Thalamic Nucleus - PMC (nih.gov)).

      We cited the suggested study. We cannot say whether reuniens plays a more or less important role. What is clear is that hippocampal ripples at the onset of spindles trigger increased firing in both MD and reuniens. Our extracellular recordings (Fig. 4, K) suggest that the increased firing is associated with spike-bursts. We also have a parallel unpublished study done on anesthetized mice showing SWR triggered inhibitory potentials in both reuniens and MD that reverses around -65mV - -70 mV. Because the majority of SWR occurred at the onset of cortical up state, a relative role of cortico-thalamic vs hippocampo-thalamic drive is not easy to separate. We hope, we will convincingly do this in our forthcoming study, with the limitation that it was done on anesthetized mice.

      Reviewer #1 (Recommendations For The Authors):

      I strongly encourage the authors to perform current source density analyses on the LFP signals recorded in the nucleus reuniens to make sure that the observed oscillations are indeed locally generated. So far, the anatomical organisation in reuniens cannot support the local generation of oscillations, such as spindles and slow wave. At least in rodents (the cat reuniens does not seem too different, until shown differently), there were no oscillators found in reuniens, and at least not arranged like in cortical areas, allowing the summation in time, and particularly space, of rhythmic input currents. Bipolar recordings with pairs of twisted electrodes might also be useful to assess the local existence of spindles and slow waves.

      Current source density calculation is possible when one knows the exact distance between recording sites. As we used tetrodes made with 4 twisted platinum-iridium wires, we know more or less the range of distance between recording sites, but not the exact distance between any given pair of electrodes.

      Then, the physical distance between the reuniens and any cortical structure is about 8-9 mm. Therefore, with such distances, volume conductance is expected to be negligible. If slow waves and spindles are volume conducted, then slow waves and spindles recorded with tetrodes should have identical shape. Following reviewer comment, we took these recordings and subtracted one channel from another. The difference in signal during slow waves is in the order 0.1 mV. Considering that the distance between electrodes is in the order of 20 um, such a difference in voltage is major and can only be explained by local extracellular currents, likely due to synaptic activities originating in afferent structures.

      Below, we plotted the voltage of one channel of the tetrode versus another channel of the same tetrode. If the signal was simply volume conducted, one would expect to see the vast majority of points on the x=y line (red).

      Author response image 1.

      Below is a segment of mPFC LFP recording (upper black trace), mPFC LFP filtered for spindle frequency (7-15 Hz) and the spindle detected (black lines above the filtered trace. Then two LFP traces from a tetrode in the Reuniens (orange and light blue) are overlayed. The second trace (Blue) from bottom represents the substraction of Reuniens 1 minus Reuniens 2 channel, and just below (lower Blue trace) is this susbtraction trace filtered for spindle frequency (7-15 Hz) showing clear voltage difference in the spindle range between the two electrodes. Note also that around time 179-179.5 s, there is clear spindle oscillation in the mPFC recording which is not present in the Reuniens recordings.

      Author response image 2.

      Therefore, we are convinced that in our recordings, volume conductance did not play any significant role.

      Another concern regarding delays between events, like slow waves, measured between two regions (as exemplified by Figure 3). It appears that the delays were calculated from the filtered signal. Figure 3G shows a delay between the peak of the mPFC slow wave between the raw and the filtered signal, which might be artifactual of the processing. It is though not (or less) visible for the reuniens recording. Such mismatch might explain the observed differences in delays.

      Thanks for this comment. We recomputed the analysis using the original signal (smoothed) and obtained very similar results. Panels H and I of figure 3 were updated using the new analysis performed on original signal.

      The overall analyses of LFP-triggered reuniens MUA activity lack of statistics (at least z-scored firing to normalise the firings).

      Fig. 2 H and I are representative examples for histograms; statistical data are shown in circular plots as explained in the legend. Fig. 2 L, shows populational data and we provide now standard error. Fig. 4 C and D show individual example. Fig. 4 E shows histograms of activity of all identified putative single units. Units that show significant modulation are displayed above white line. Fig. 4 F shows populational data for significantly modified units.  

      A last point of detail in the model, which surprisingly shows reuniens to excitatory hippocampal cells' connectivity. Recent literature reports that reuniens only connect hippocampal interneurons, and not principal cells (at least in rodents, I could not find any report in cats). I wonder how changing this parameter would affect the results of the computational investigation, particularly the results shown in Figure 6.

      There are several studies in the literature showing a direct excitation from the Reuniens to pyramidal cells in the CA1, here are three of them:

      Goswamee, P., et al. (2021). "Nucleus Reuniens Afferents in Hippocampus Modulate CA1 Network Function via Monosynaptic Excitation and Polysynaptic Inhibition." Frontiers in Cellular Neuroscience 15.

      Dolleman-Van der Weel MJ, Lopes da Silva FH, Witter MP (1997) Nucleus Reuniens Thalami Modulates Activity in Hippocampal Field CA1 through Excitatory and Inhibitory Mechanisms. The Journal of Neuroscience 17:5640.

      Dolleman-van der Weel MJ, Lopes da Silva FH, Witter MP (2017) Interaction of nucleus reuniens and entorhinal cortex projections in hippocampal field CA1 of the rat. Brain Structure and Function 222:2421-2438.

      Because this is not a review paper, we opted to not cite all the papers describing connectivity between mPFC, hippocampus and thalamus.

      Reviewer #2 (Recommendations For The Authors):

      I respectively suggest that the earlier (public) comments listed above should be addressed. In addition, it would be useful to make it clearer when non-rapid eye movement sleep was being addressed and when rapid eye movement was being addressed. Is it of value to use a single term instead of adding "slow wave sleep" or else clarify when either term is used? The addition of more subheadings might help. Moreover, the relative contribution/value of evidence from these two sleep states was not addressed or was not very clear.

      We tried to make it clearer when NREM and when REM was analysed.

      We replaced slow-wave sleep with NREM sleep in the figure 5 title.

      We added several subheadings in the discussion.

      Relative contribution of NREM vs REM sleep was not addressed? Sorry but we do not clearly understand your question. Figs. 2 and 3 deal mainly with NREM sleep (Fig 2.B has an example of REM sleep). Fig. 4 essentially describes results obtained during REM sleep.

      I was not sure if the Abstract summarised the key take-home messages from the large amount of evidence provided. Some choices are needed, of course, but "evidence of bidirectional connectivity" struck me as less novel than other evidence provided. Given the huge amount of findings provided, which is commendable, it is still useful to present it perhaps in a more digestible fashion. For example, the headings or the first sentence(s) below headings could indicate the aim or the outcome of the specific method/analysis/findings.

      We rewrote abstract and we also added some conclusion to highlight major findings and their meaning.

      It is more common to use NRe or Re, rather than REU.

      We avoided using RE as, for decades, we used RE to abbreviate the thalamic reticular nucleus in several publications. In this revised version, we spell at full - Reuniens.

      Line 49 mentions "short-term" memory. Please specify this more clearly as it is otherwise ambiguous. Also, line 303.

      We rephrased the sentence: In particular, the hierarchical coupling of slow waves, spindles and SWRs is thought to play a key role in memory consolidation.

      Line 303 was likely about the ventromedial wall: we corrected that sentence.

      Line 62: the word, "required" (for memory function) is too strong because there is evidence that it is not always required.

      We modified the sentence for plays a major role.

      The focus within the medial prefrontal cortex could be specified more clearly / earlier.

      The mPFC is mentioned in the second sentence of the abstract and in the first sentence of the introduction.

      Line 134: The heading states "determine" and then mentions modulation. These terms may not be interchangeable or they need clarification.

      We changed it to slow wave-spindle amplitude coupling. This represents exactly our analysis.

      Line 204: Does "cortical network" mean prefrontal cortex network"?

      Yes, as described in lines 192-193, the two cortical networks (N1 and N2) of the model represent the mPFC layer 5 and 6 respectively.

      Lines 283 to 289: These were not very clear to me.

      These lines described the potential mechanisms for the responses to hippocampal and reuniens stimulation recorded intracellularly (results in figure 1). We modified this paragraph for clarity.

      Line 296: Specify the "claim".

      We modified the sentence for “[…] provides supporting evidence for this claim that nucleus Reuniens might synchronize the activity of ventral hippocampus and mPFC.”

      The discussion naturally focuses on the thalamic nucleus reuniens, but also occasionally mentions the thalamic mediodorsal nucleus. The distinction, assuming this is highly relevant, could be expressed more clearly (direct comparison with their previous papers).

      We never published a study on the mediodorsal nucleus. We do have some unpublished results from recordings in the MD nucleus and they reveal the presence of an inhibitory component at the beginning of cortical active states, therefore behaving in a similar way to first order nuclei. It is then possible that spindles recorded in the reuniens are actually generated in the MD nucleus and then transmitted to Reuniens through the thalamic reticular nucleus, as both MD and reuniens are connected to the rostral thalamic reticular nucleus. We added some discussion about this.

      Figure 1B: Do the authors have any additional evidence of the placements in the reuniens, because the photo provided suggests a large area beyond the reuniens boundary. Also, please confirm is the CEM between Rh and Re in the cat (I think the Rh and Re are adjacent in the rat).

      Figure 1B is from an electrolytic lesion, which is necessarily bigger than the tip of the electrode. Therefore the center of the electrolytic lesion indicates where the electrode tip was located which is well within the reuniens nucleus.

      Also, yes CE (Nucleus centralis thalami, pars medialis) is located between the reuniens and rhomboid in cats. This can be found in two cat atlas:  

      Reinoso-Suárez, F. (1961). Topographischer Hirnatlas der Katze für experimental-physiologische Untersuchungen (Merck).

      Berman AL, Jones EG (1982) The Thalamus and Basal Telencephalon of the Cat: A Cytoarchitectonic Atlas with Stereotaxic Coordinates: University of Wisconsin Press.

      The first mention of hippocampus in the figure legends should remind the reader by stating "ventral hippocampus".

      In this revised version, we added “ventral” in several instances both in the main text and in figure legend.

      Figure 2: It seems unusual to mention "unusually short NREM". Presumably, things are the same otherwise - if so, perhaps mention that, especially if some of the effects reflect an "unusual" episode.

      We display this particular segment because we want to show continuous recording in which still individual elements characterizing specific states are still visible.

      Some effects look like they are strong and others perhaps weaker. If so, how do these impact the final conclusions?

      Sorry, we did not understand clearly what is meant here by the reviewer. In general, if any effect has statistically significant difference (old fashion 0.05) we consider it as significant. Any other cases are described on individual basis.

      Perhaps "MAD" should be in full on the first occasion, if not already.

      It was spelled out at line 659, but we now spell it out also in the results section and in figure 2 legend.

      Methods: the key question is the use of rodent recordings to classify cat recordings. It would be good to have a reference indicating that this can be directly used for cats, which may have different sleep cycles and patterns compared to rats.

      We did not use rodent recordings to classify cat recordings, however we did used a state detection script that was developed with rodent recordings. As mentioned in the method section, we adapted the script to cat mPFC recordings and then manual corrections were made to correctly detect REM episodes. Respectfully, our lab investigates sleep-wake in non-anesthetized animals for a few decades; we developed state detection algorithm in mice, cats, marmosets when needed (to analyse months of recordings), and we have an extensive expertise in identifying states of vigilance from electrophysiological recordings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      INTRODUCTION & THEORY

      (1) Can the authors please clarify why the first trial of extinction in a standard protocol does NOT produce the retrieval-extinction effect? Particularly as the results section states: "Importantly, such a short-term effect is also retrieval dependent, suggesting the labile state of memory is necessary for the short-term memory update to take effect (Fig. 1e)." The importance of this point comes through at several places in the paper:

      1A. "In the current study, fear recovery was tested 30 minutes after extinction training, whereas the effect of memory reconsolidation was generally evident only several hours later and possibly with the help of sleep, leaving open the possibility of a different cognitive mechanism for the short-term fear dementia related to the retrieval-extinction procedure." ***What does this mean? The two groups in study 1 experienced a different interval between the first and second CS extinction trials; and the results varied with this interval: a longer interval (10 min) ultimately resulted in less reinstatement of fear than a shorter interval. Even if the different pattern of results in these two groups was shown/known to imply two different processes, there is absolutely no reason to reference any sort of cognitive mechanism or dementia - that is quite far removed from the details of the present study.

      Indeed, the only difference between the standard extinction paradigm and the retrieval-extinction paradigm is the difference between the first and second CS extinction trials. It has been shown before that a second CS+ presented 1 hour after the initial retrieval CS+ resulted in the dephosphorylation of GluR1 in rats, which was indicative of memory destabilization. The second CS+ presented only 3 minutes after the initial retrieval CS+, as in the standard extinction training, did not cause the GluR1 dephosphorylation effect (Monfils et al., 2009). Therefore, an isolated presentation of the CS+ seems to be important in preventing the return of fear expression. Behaviorally, when the CSs were presented in a more temporally spaced (vs. mass presentation) or a more gradual manner in the extinction training, the fear amnesia effects were more salient (Cain et al., 2003, Gershman et al., 2013). It has also been suggested that only when the old memory and new experience (through extinction) can be inferred to have been generated from the same underlying latent cause, the old memory can be successfully modified (Gershman et al., 2017). On the other hand, if the new experiences are believed to be generated by a different latent cause, then the old memory is less likely to be subject to modification. Therefore, the way the first and 2nd CS are temporally organized (retrieval-extinction or standard extinction) might affect how the latent cause is inferred and lead to different levels of fear expression from a theoretical perspective. These findings, together with studies in both fear and drug memories using the retrieval-extinction paradigm (Liu et al., 2014, Luo et al., 2015, Schiller et al., 2010, Xue et al., 2012), seem to suggest that the retrieval-extinction and the standard extinction procedures engage different cognitive and molecular mechanisms that lead to significant different behavioral outcomes. 

      In our study, we focus on the short-term and long-term amnesia effects of the retrieval-extinction procedure but also point out the critical role of retrieval in eliciting the short-term effect.

      1B. "Importantly, such a short-term effect is also retrieval dependent, suggesting the labile state of memory is necessary for the short-term memory update to take effect (Fig. 1e)." ***As above, what is "the short-term memory update"? At this point in the text, it would be appropriate for the authors to discuss why the retrieval-extinction procedure produces less recovery than a standard extinction procedure as the two protocols only differ in the interval between the first and second extinction trials. References to a "short-term memory update" process do not help the reader to understand what is happening in the protocol.

      Sorry for the lack of clarity here. By short-term memory update we meant the short-term amnesia in fear expression.

      (2) "Indeed, through a series of experiments, we identified a short-term fear amnesia effect following memory retrieval, in addition to the fear reconsolidation effect that appeared much later."

      ***The only reason for supposing two effects is because of the differences in responding to the CS2, which was subjected to STANDARD extinction, in the short- and long-term tests. More needs to be said about how and why the performance of CS2 is affected in the short-term test and recovers in the long-term test. That is, if the loss of performance to CS1 and CS2 is going to be attributed to some type of memory updating process across the retrieval-extinction procedure, one needs to explain the selective recovery of performance to CS2 when the extinction-to-testing interval extends to 24 hours. Instead of explaining this recovery, the authors note that performance to CS1 remains low when the extinction-to-testing interval is 24 hours and invoke something to do with memory reconsolidation as an explanation for their results: that is, they imply (I think) that reconsolidation of the CS1-US memory is disrupted across the 24-hour interval between extinction and testing even though CS1 evokes negligible responding just minutes after extinction.

      In our results, we did not only focus on the fear expression related to CS2. In fact, we also demonstrated that the CS1 related fear expression diminished in the short-term memory test but re-appeared in the long-term memory after the CS1 retrieval-extinction training.

      The “…recovery of performance to CS2 when the extinction-to-testing interval extends to 24 hours…” is a result that has been demonstrated in various previous studies (Kindt and Soeter, 2018, Kindt et al., 2009, Nader et al., 2000, Schiller et al., 2013, Schiller et al., 2010, Xue et al., 2012). That is, the reconsolidation framework stipulates that the pharmacological or behavioral intervention during the labile states of the reconsolidation window only modifies the fear memory linked to the reminded retrieval cue, but not for the non-reminded CS-US memory expression (but also see (Liu et al., 2014, Luo et al., 2015) for using the unconditioned stimulus as the reminder cue and the retrieval-extinction paradigm to prevent the return of fear memory associated with different CS).  In fact, we hypothesized the temporal dynamics of CS1 and CS2 related fear expressions were due to the interplay between the short-term and long-term (reconsolidation) effects of the retrieval-extinction paradigm in the last figure (Fig. 6). 

      (3) The discussion of memory suppression is potentially interesting but, in its present form, raises more questions than it answers. That is, memory suppression is invoked to explain a particular pattern of results but I, as the reader, have no sense of why a fear memory would be better suppressed shortly after the retrieval-extinction protocol compared to the standard extinction protocol; and why this suppression is NOT specific to the cue that had been subjected to the retrieval-extinction protocol.

      We discussed memory suppression as one of the potential mechanisms to account for the three characteristics of the short-term amnesia effects: cue-independence, temporal dynamics (short-term) and thought-control-ability relevance. According to the memory suppression theory, the memory suppression effect is NOT specific to the cue and this effect was demonstrated via the independent cue test in a variety of studies (Anderson and Floresco, 2022, Anderson and Green, 2001, Gagnepain et al., 2014, Zhu et al., 2022). Therefore, we suggest in the discussion that it might be possible the CS1 retrieval cue prompted an automatic suppression mechanism and yielded the short-term fear amnesia consistent with various predictions from the memory suppression theory:

      “In our experiments, subjects were not explicitly instructed to suppress their fear expression, yet the retrieval-extinction training significantly decreased short-term fear expression. These results are consistent with the short-term amnesia induced with the more explicit suppression intervention (Anderson et al., 1994; Kindt and Soeter, 2018; Speer et al., 2021; Wang et al., 2021; Wells and Davies, 1994). It is worth noting that although consciously repelling unwanted memory is a standard approach in memory suppression paradigm, it is possible that the engagement of the suppression mechanism can be unconscious. For example, in the retrieval-induced forgetting (RIF) paradigm, recall of a stored memory impairs the retention of related target memory and this forgetting effect emerges as early as 20 minutes after the retrieval procedure, suggesting memory suppression or inhibition can occur in a more spontaneous and automatic manner (Imai et al., 2014). Moreover, subjects with trauma histories exhibited more suppression-induced forgetting for both negative and neutral memories than those with little or no trauma (Hulbert and Anderson, 2018). Similarly, people with higher self-reported thought-control capabilities showed more severe cue-independent memory recall deficit, suggesting that suppression mechanism is associated with individual differences in spontaneous control abilities over intrusive thoughts (Küpper et al., 2014). It has also been suggested that similar automatic mechanisms might be involved in organic retrograde amnesia of traumatic childhood memories (Schacter et al., 2012; Schacter et al., 1996).”

      3A. Relatedly, how does the retrieval-induced forgetting (which is referred to at various points throughout the paper) relate to the retrieval-extinction effect? The appeal to retrieval-induced forgetting as an apparent justification for aspects of the present study reinforces points 2 and 3 above. It is not uninteresting but needs some clarification/elaboration.

      We introduced the retrieval-induced forgetting (RIF) to make the point that RIF was believed to be related to the memory suppression mechanism and the RIF effect can appear relatively early, consistent with what we observed in the short-term amnesia effect. We have re-written the manuscript to make this point clearer:

      “It is worth noting that although consciously repelling unwanted memory is a standard approach in memory suppression paradigm, it is possible that the engagement of the suppression mechanism can be unconscious. For example, in the retrieval-induced forgetting (RIF) paradigm, recall of a stored memory impairs the retention of related target memory and this forgetting effect emerges as early as 20 minutes after the retrieval procedure, suggesting memory suppression or inhibition can occur in a more spontaneous and automatic manner (Imai et al., 2014). Moreover, subjects with trauma histories exhibited more suppression-induced forgetting for both negative and neutral memories than those with little or no trauma (Hulbert and Anderson, 2018). Similarly, people with higher self-reported thought-control capabilities showed more severe cue-independent memory recall deficit, suggesting that suppression mechanism is associated with individual differences in spontaneous control abilities over intrusive thoughts (Küpper et al., 2014).”

      (4) Given the reports by Chalkia, van Oudenhove & Beckers (2020) and Chalkia et al (2020), some qualification needs to be inserted in relation to reference 6. That is, reference 6 is used to support the statement that "during the reconsolidation window, old fear memory can be updated via extinction training following fear memory retrieval". This needs a qualifying statement like "[but see Chalkia et al (2020a and 2020b) for failures to reproduce the results of 6]."

      https://pubmed.ncbi.nlm.nih.gov/32580869/

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7115860/

      We have incorporated the reviewer’s suggestion into the revised manuscript in both the introduction:

      “Pharmacological blockade of protein synthesis and behavioral interventions can both eliminate the original fear memory expression in the long-term (24 hours later) memory test ( Lee, 2008; Lee et al., 2017; Schiller et al., 2013; Schiller et al., 2010), resulting in the cue-specific fear memory deficit (Debiec et al., 2002; Lee, 2008; Nader, Schafe, & LeDoux, 2000). For example, during the reconsolidation window, retrieving a fear memory allows it to be updated through extinction training (i.e., the retrieval-extinction paradigm (Lee, 2008; Lee et al., 2017; Schiller et al., 2013; Schiller et al., 2010), but also see (Chalkia, Schroyens, et al., 2020; Chalkia, Van Oudenhove, et al., 2020; D. Schiller, LeDoux, & Phelps, 2020)”

      And in the discussion:

      “It should be noted that while our long-term amnesia results were consistent with the fear memory reconsolidation literatures, there were also studies that failed to observe fear prevention (Chalkia, Schroyens, et al., 2020; Chalkia, Van Oudenhove, et al., 2020; Schroyens et al., 2023). Although the memory reconsolidation framework provides a viable explanation for the long-term amnesia, more evidence is required to validate the presence of reconsolidation, especially at the neurobiological level (Elsey et al., 2018). While it is beyond the scope of the current study to discuss the discrepancies between these studies, one possibility to reconcile these results concerns the procedure for the retrieval-extinction training. It has been shown that the eligibility for old memory to be updated is contingent on whether the old memory and new observations can be inferred to have been generated by the same latent cause (Gershman et al., 2017; Gershman and Niv, 2012). For example, prevention of the return of fear memory can be achieved through gradual extinction paradigm, which is thought to reduce the size of prediction errors to inhibit the formation of new latent causes (Gershman, Jones, et al., 2013). Therefore, the effectiveness of the retrieval-extinction paradigm might depend on the reliability of such paradigm in inferring the same underlying latent cause. Furthermore, other studies highlighted the importance of memory storage per se and suggested that memory retention was encoded in the memory engram cell ensemble connectivity whereas the engram cell synaptic plasticity is crucial for memory retrieval (Ryan et al., 2015; Tonegawa, Liu, et al., 2015; Tonegawa, Pignatelli, et al., 2015). It remains to be tested how the cue-independent short-term and cue-dependent long-term amnesia effects we observed could correspond to the engram cell synaptic plasticity and functional connectivity among engram cell ensembles (Figure 6). This is particularly important, since the cue-independent characteristic of the short-term amnesia suggest that either different memory cues fail to evoke engram cell activities, or the retrieval-extinction training transiently inhibits connectivity among engram cell ensembles. Finally, SCR is only one aspect of the fear expression, how the retrieval-extinction paradigm might affect subjects’ other emotional (such as the startle response) and cognitive fear expressions such as reported fear expectancy needs to be tested in future studies since they do not always align with each other (Kindt et al., 2009; Sevenster et al., 2012, 2013).”

      5A. What does it mean to ask: "whether memory retrieval facilitates update mechanisms other than memory reconsolidation"? That is, in what sense could or would memory retrieval be thought to facilitate a memory update mechanism?

      It is widely documented in the literatures that memory retrieval renders the old memory into a labile state susceptible for the memory reconsolidation process. However, as we mentioned in the manuscript, studies have shown that memory reconsolidation requires the de novo protein synthesis and usually takes hours to complete. What remains unknown is whether old memories are subject to modifications other than the reconsolidation process. Our task specifically tested the short-term effect of the retrieval-extinction paradigm and found that fear expression diminished 30mins after the retrieval-extinction training. Such an effect cannot be accounted for by the memory reconsolidation effect.

      5B. "First, we demonstrate that memory reactivation prevents the return of fear shortly after extinction training in contrast to the memory reconsolidation effect which takes several hours to emerge and such a short-term amnesia effect is cue independent (Study 1, N = 57 adults)."

      ***The phrasing here could be improved for clarity: "First, we demonstrate that the retrieval-extinction protocol prevents the return of fear shortly after extinction training (i.e., when testing occurs just min after the end of extinction)." Also, cue-dependence of the retrieval-extinction effect was assessed in study 2.

      We thank the reviewer and have modified the phrasing of the sentence:

      “First, we demonstrate that memory retrieval-extinction protocol prevents the return of fear expression shortly after extinction training and this short-term effect is memory reactivation dependent (Study 1, N = 57 adults).”

      5C. "Furthermore, memory reactivation also triggers fear memory reconsolidation and produces cue-specific amnesia at a longer and separable timescale (Study 2, N = 79 adults)." ***In study 2, the retrieval-extinction protocol produced a cue-specific disruption in responding when testing occurred 24 hours after the end of extinction. This result is interesting but cannot be easily inferred from the statement that begins "Furthermore..." That is, the results should be described in terms of the combined effects of retrieval and extinction, not in terms of memory reactivation alone; and the statement about memory reconsolidation is unnecessary. One can simply state that the retrieval-extinction protocol produced a cue-specific disruption in responding when testing occurred 24 hours after the end of extinction.

      We have revised the text according to the reviewer’s comment.

      “Furthermore, across different timescales, the memory retrieval-extinction paradigm triggers distinct types of fear amnesia in terms of cue-specificity and cognitive control dependence, suggesting that the short-term fear amnesia might be caused by different mechanisms from the cue-specific amnesia at a longer and separable timescale (Study 2, N = 79 adults).”

      5D. "...we directly manipulated brain activities in the dorsolateral prefrontal cortex and found that both memory retrieval and intact prefrontal cortex functions were necessary for the short-term fear amnesia."

      ***This could be edited to better describe what was shown: E.g., "...we directly manipulated brain activities in the dorsolateral prefrontal cortex and found that intact prefrontal cortex functions were necessary for the short-term fear amnesia after the retrieval-extinction protocol."

      Edited:

      “Finally, using continuous theta-burst stimulation (Study 3, N = 75 adults), we directly manipulated brain activity in the dorsolateral prefrontal cortex, and found that both memory reactivation and intact prefrontal cortex function were necessary for the short-term fear amnesia after the retrieval-extinction protocol.”

      5E. "The temporal scale and cue-specificity results of the short-term fear amnesia are clearly dissociable from the amnesia related to memory reconsolidation, and suggest that memory retrieval and extinction training trigger distinct underlying memory update mechanisms."

      ***The pattern of results when testing occurred just minutes after the retrieval-extinction protocol was different from that obtained when testing occurred 24 hours after the protocol. Describing this in terms of temporal scale is unnecessary, and suggesting that memory retrieval and extinction trigger different memory update mechanisms is not obviously warranted. The results of interest are due to the combined effects of retrieval+extinction and there is no sense in which different memory update mechanisms should be identified with retrieval (mechanism 1) and extinction (mechanism 2).

      We did not argue for different memory update mechanisms for the “retrieval (mechanism 1) and extinction (mechanism 2)” in our manuscript. Instead, we proposed that the retrieval-extinction procedure, which was mainly documented in the previous literatures for its association with the reconsolidation-related fear memory retention (the long-term effect), also had a much faster effect (the short-term effect). These two effects differed in many aspects, suggesting that different memory update mechanisms might be involved.

      5F. "These findings raise the possibility of concerted memory modulation processes related to memory retrieval..."

      ***What does this mean?

      As we mentioned in our response to the previous comment, we believe that the retrieval-extinction procedure triggers different types of memory update mechanisms working on different temporal scales.

      (6) "...suggesting that the fear memory might be amenable to a more immediate effect, in addition to what the memory reconsolidation theory prescribes..."

      ***What does it mean to say that the fear memory might be amenable to a more immediate effect?

      We intended to state that the retrieval-extinction procedure can produce a short-term amnesia effect and have thus revised the text.

      (7) "Parallel to the behavioral manifestation of long- and short-term memory deficits, concurrent neural evidence supporting memory reconsolidation theory emphasizes the long-term effect of memory retrieval by hypothesizing that synapse degradation and de novo protein synthesis are required for reconsolidation."

      ***This sentence needs to be edited for clarity.

      We have rewritten this sentence:

      “Corresponding to the long-term behavioral manifestation, concurrent neural evidence supporting memory reconsolidation hypothesis emphasizes that synapse degradation and de novo protein synthesis are required for reconsolidation.”

      (8) "previous behavioral manipulations engendering the short-term declarative memory effect..."

      ***What is the declarative memory effect? It should be defined.

      We meant the amnesia on declarative memory research, such as the memory deficit caused by the think/no-think paradigms. Texts have been modified for clarity:

      “On the contrary, previous behavioral manipulations engendering the short-term amnesia on declarative memory, such as the think/no-think paradigm, hinges on the intact activities in brain areas such as dorsolateral prefrontal cortex (cognitive control) and its functional coupling with specific brain regions such as hippocampus (memory retrieval) (Anderson and Green, 2001; Wimber et al., 2015).”

      (9) "The declarative amnesia effect emerges much earlier due to the online functional activity modulation..."

      ***Even if the declarative memory amnesia effect had been defined, the reference to online functional activity modulation is not clear.

      We have rephrased the sentence:

      “The declarative amnesia effect arises much earlier due to the more instant modulation of functional connectivity, rather than the slower processes of new protein synthesis in these brain regions.”

      (10) "However, it remains unclear whether memory retrieval might also precipitate a short-term amnesia effect for the fear memory, in addition to the long-term prevention orchestrated by memory consolidation."

      ***I found this sentence difficult to understand on my first pass through the paper. I think it is because of the phrasing of memory retrieval. That is, memory retrieval does NOT precipitate any type of short-term amnesia for the fear memory: it is the retrieval-extinction protocol that produces something like short-term amnesia. Perhaps this sentence should also be edited for clarity.

      We have changed “memory retrieval” to “retrieval-extinction” where applicable.

      I will also note that the usage of "short-term" at this point in the paper is quite confusing: Does the retrieval-extinction protocol produce a short-term amnesia effect, which would be evidenced by some recovery of responding to the CS when tested after a sufficiently long delay? I don't believe that this is the intended meaning of "short-term" as used throughout the majority of the paper, right?

      By “short-term”, we meant the lack of fear expression in the test phase (measured by skin conductance responses) shortly after the retrieval-extinction procedure (30 mins in studies 1 & 2 and 1 hour in study 3). It does not indicate that the effect is by itself “short-lived”.

      (11) "To fully comprehend the temporal dynamics of the memory retrieval effect..."<br /> ***What memory retrieval effect? This needs some elaboration.

      We’ve changed the phrase “memory retrieval effect” to “retrieval-extinction effect” to refer to the effect of retrieval-extinction on fear amnesia.

      (12) "We hypothesize that the labile state triggered by the memory retrieval may facilitate different memory update mechanisms following extinction training, and these mechanisms can be further disentangled through the lens of temporal dynamics and cue-specificities."

      ***What does this mean? The first part of the sentence is confusing around the usage of the term "facilitate"; and the second part of the sentence that references a "lens of temporal dynamics and cue-specificities" is mysterious. Indeed, as all rats received the same retrieval-extinction exposures in Study 2, it is not clear how or why any differences between the groups are attributed to "different memory update mechanisms following extinction".

      As the reviewer mentioned, if only one time point data were collected, we cannot differentiate whether different memory update mechanisms are involved. In study 2, however, the 3 groups only differed on the time onsets the reinstatement test was conducted. Accordingly, our results showed that the fear amnesia effects for CS1 and CS2 cannot be simply explained by forgetting: different memory update mechanisms must be at work to explain the characteristics of the SCR related to both CS1 and CS2 at three different time scales (30min, 6h and 24h). It was based on these results, together with the results from the TMS study (study 3), that we proposed the involvement of a short-term memory update mechanism in addition to the reconsolidation related fear amnesia (which should become evident much later) induced by the retrieval-extinction protocol.

      (13) "In the first study, we aimed to test whether there is a short-term amnesia effect of fear memory retrieval following the fear retrieval-extinction paradigm."

      ***Again, the language is confusing. The phrase, "a short-term amnesia effect" implies that the amnesia itself is temporary; but I don't think that this implication is intended. The problem is specifically in the use of the phrase "a short-term amnesia effect of fear memory retrieval." To the extent that short-term amnesia is evident in the data, it is not due to retrieval per se but, rather, the retrieval-extinction protocol.

      We have changed the wordings and replaced “memory retrieval” with “retrieval-extinction” where applicable.

      (14) The authors repeatedly describe the case where there was a 24-hour interval between extinction and testing as consistent with previous research on fear memory reconsolidation. Which research exactly? That is, in studies where a CS re-exposure was combined with a drug injection, responding to the CS was disrupted in a final test of retrieval from long-term memory which typically occurred 24 hours after the treatment. Is that what the authors are referring to as consistent? If so, which aspect of the results are consistent with those previous findings? Perhaps the authors mean to say that, in the case where there was a 24-hour interval between extinction and testing, the results obtained here are consistent with previous research that has used the retrieval-extinction protocol. This would clarify the intended meaning greatly.

      Our 24 hour test results after the retrieval-extinction protocol was consistent with both pharmacological and behavioral intervention studies in fear memory reconsolidation studies (Kindt and Soeter, 2018, Kindt et al., 2009, Liu et al., 2014, Luo et al., 2015, Monfils et al., 2009, Nader et al., 2000, Schiller et al., 2013, Schiller et al., 2010, Xue et al., 2012) since the final test phase typically occurred 24 hours after the treatment. At the 24-hour interval, the memory reconsolidation effect would become evident either via drug administration or behavioral intervention (extinction training).

      DATA

      (15) Points about data:

      5A. The eight participants who were discontinued after Day 1 in study 1 were all from the no-reminder group. Can the authors please comment on how participants were allocated to the two groups in this experiment so that the reader can better understand why the distribution of non-responders was non-random (as it appears to be)?

      15B. Similarly, in study 2, of the 37 participants that were discontinued after Day 2, 19 were from Group 30 min, and 5 were from Group 6 hours. Can the authors comment on how likely these numbers are to have been by chance alone? I presume that they reflect something about the way that participants were allocated to groups, but I could be wrong.

      We went back and checked out data. As we mentioned in the supplementary materials, we categorized subjects as non-responders if their SCR response to any CS was less than 0.02  in Day 1 (fear acquisition). Most of the discontinued participants (non-responders) in the no-reminder group (study 1) and the 30min & 24 h groups (study 2) were when the heating seasons just ended or were yet to start, respectively. It has been documented that human body thermal conditions were related to the quality of the skin conductance response (SCR) measurements (Bauer et al., 2022, Vila, 2004). We suspect that the non-responders might be related to the body thermal conditions caused by the lack of central heating.

      15C. "Post hoc t-tests showed that fear memories were resilient after regular extinction training, as demonstrated by the significant difference between fear recovery indexes of the CS+ and CS- for the no-reminder group (t26 = 7.441, P < 0.001; Fig. 1e), while subjects in the reminder group showed no difference of fear recovery between CS+ and CS- (t29 = 0.797, P = 0.432, Fig. 1e)."

      ***Is the fear recovery index shown in Figure 1E based on the results of the first test trial only? How can there have been a "significant difference between fear recovery indexes of the CS+ and CS- for the no-reminder group" when the difference in responding to the CS+ and CS- is used to calculate the fear recovery index shown in 1E? What are the t-tests comparing exactly, and what correction is used to account for the fact that they are applied post-hoc?

      As we mentioned in the results section of the manuscript, the fear recovery index was defined as “the SCR difference between the first test trial and the last extinction trial of a specific CS”. We then calculated the “differential fear recovery index” (figure legends of Fig. 1e) between CS+ and CS- for both the reminder and no-reminder groups. The post-hoc t-tests were used to examine whether there were significant fear recoveries (compare to 0) in both the reminder (t<sub>29</sub> = 0.797, P = 0.432, Fig. 1e) and no-reminder (t<sub>26</sub> = 7.441, P  < 0.001; Fig. 1e) groups. We realize that the description of Bonferroni correction was not specified in the original manuscript and hence added in the revision where applicable.

      15D. "Finally, there is no statistical difference between the differential fear recovery indexes between CS+ in the reminder and no reminder groups (t55 = -2.022, P = 0.048; Fig. 1c, also see Supplemental Material for direct test for the test phase)."

      ***Is this statement correct - i.e., that there is no statistically significant difference in fear recovery to the CS+ in the reminder and no reminder groups? I'm sure that the authors would like to claim that there IS such a difference; but if such a difference is claimed, one would be concerned by the fact that it is coming through in an uncorrected t-test, which is the third one of its kind in this paragraph. What correction (for the Type 1 error rate) is used to account for the fact that the t-tests are applied post-hoc? And if no correction, why not?

      We are sorry about the typo.  The reviewer was correct that we meant to claim here that “… there is a significant difference between the differential fear recovery indexes between CS+ in the reminder and no-reminder groups (t<sub>55</sub> =- 2.022, P = 0.048; Fig. 1e)”.  Note that the t-test performed here was a confirmatory test following our two-way ANOVA with main effects of group (reminder vs. no-reminder) and time (last extinction trial vs. first test trial) on the differential CS SCR response (CS+ minus CS-) and we found a significant group x time interaction effect (F<sub>1.55</sub> = 4.087, P = 0.048, η<sup>2</sup> = 0.069). The significant difference between the differential fear recovery indexes was simply a re-plot of the interaction effect mentioned above and therefore no multiple correction is needed. We have reorganized the sequence of the sentences such that this t-test now directly follows the results of the ANOVA:

      “The interaction effect was confirmed by the significant difference between the differential fear recovery indexes between CS1+ and CS2+ in the reminder and no-reminder groups (t<sub>55</sub> \= -2.022, P \= 0.048; Figure 1E, also see Supplemental Material for the direct test of the test phase).”

      15E. In study 2, why is responding to the CS- so high on the first test trial in Group 30 min? Is the change in responding to the CS- from the last extinction trial to the first test trial different across the three groups in this study? Inspection of the figure suggests that it is higher in Group 30 min relative to Groups 6 hours and 24 hours. If this is confirmed by the analysis, it has implications for the fear recovery index which is partly based on responses to the CS-. If not for differences in the CS- responses, Groups 30 minutes and 6 hours are otherwise identical.

      Following the reviewer’s comments, we went back and calculated the mean SCR difference of CS- between the first test trial and the last extinction trial for all three studies (see Author response image 1 below). In study 1, there was no difference in the mean CS- SCR (between the first test trial and last extinction trial) between the reminder and no-reminder groups (Kruskal-Wallis test , panel a), though both groups showed significant fear recovery even in the CS- condition (Wilcoxon signed rank test, reminder: P = 0.0043, no-reminder: P = 0.0037). Next, we examined the mean SCR for CS- for the 30min, 6h and 24h groups in study 2 and found that there was indeed a group difference (one-way ANOVA,F<sub>2.76</sub> = 5.3462, P = 0.0067, panel b), suggesting that the CS- related SCR was influenced by the test time (30min, 6h or 24h). We also tested the CS- related SCR for the 4 groups in study 3 (where test was conducted 1 hour after the retrieval-extinction training) and found that across TMS stimulation types (PFC vs. VER) and reminder types (reminder vs. no-reminder) the ANOVA analysis did not yield main effect of TMS stimulation type (F<sub>1.71</sub> = 0.322, P = 0.572) nor main effect of reminder type (F<sub>1.71</sub> = 0.0499, P = 0.824, panel c). We added the R-VER group results in study 3 (see panel c) to panel b and plotted the CS- SCR difference across 4 different test time points and found that CS- SCR decreased as the test-extinction delay increased (Jonckheere-Terpstra test, P = 0.00028). These results suggest a natural “forgetting” tendency for CS- related SCR and highlight the importance of having the CS- as a control condition to which the CS+ related SCR was compared with.

      Author response image 1.

      15F. Was the 6-hour group tested at a different time of day compared to the 30-minute and 24-hour groups; and could this have influenced the SCRs in this group?

      For the 30min and 24h groups, the test phase can be arranged in the morning, in the afternoon or at night. However, for the 6h group, the test phase was inevitably in the afternoon or at night since we wanted to exclude the potential influence of night sleep on the expression of fear memory (see Author response table 1 below). If we restricted the test time in the afternoon or at night for all three groups, then the timing of their extinction training was not matched.

      Author response table 1.

      Nevertheless, we also went back and examined the data for the subjects only tested in the afternoon or at nights in the 30min and 24h groups to match with the 6h group where all the subjects were tested either in the afternoon or at night. According to Author response table 1 above, we have 17 subjects for the 30min group (9+8),18 subjects for the 24h group (9 + 9) and 26 subjects for the 6h group (12 + 14). As Author response image 2 shows, the SCR patterns in the fear acquisition, extinction and test phases were similar to the results presented in the original figure.

      Author response image 2.

      15G. Why is the range of scores in "thought control ability" different in the 30-minute group compared to the 6-hour and 24-hour groups? I am not just asking about the scale on the x-axis: I am asking why the actual distribution of the scores in thought control ability is wider for the 30-minute group?

      We went back and tested whether the TCAQ score variance was the same across three groups. We found that there was significant difference in the variance of the TCAQ score distribution across three groups (F<sub>2.155</sub> = 4.324, P = 0.015, Levene test). However, post-hoc analyses found that the variance of TCAQ is not significantly different between the 30min and 6h groups (F<sub>26.25</sub> = 0.4788, P = 0.0697), nor between the 30min and 24h groups (i>F<sub>26.25</sub> = 0.4692, P = 0.0625). To further validate our correlational results between the TCAQ score and the fear recovery index, we removed the TCAQ scores that were outside the TCAQ score range of the 6h & 24h groups from the 30min group (resulting in 4 “outliner” TCAQ scores in the 30min group, panel a in Author response image 3 below) and the Levene test confirmed that the variance of the TCAQ scores showed no difference across groups after removing the 4 “outliner” data points in the 30min group (i>F<sub>2.147</sub> = 0.74028, P = 0.4788). Even with the 4 “outliers” removed from the 30min group, the correlational analysis of the TCAQ scores and the fear recovery index still yielded significant result in the 30min group (beta = -0.0148, t = -3.731, P = 0.0006, see panel b below), indicating our results were not likely due to the inclusion of subjects with extreme TCAQ scores.

      Author response image 3.

      (16) During testing in each experiment, how were the various stimuli presented? That is, was the presentation order for the CS+ and CS- pseudorandom according to some constraint, as it had been in extinction? This information should be added to the method section.

      We mentioned the order of the stimuli in the testing phase in the methods section “… For studies 2 & 3, …a pseudo-random stimulus order was generated for fear acquisition and extinction phases of three groups with the rule that no same trial- type (CS1+, CS2+ and CS-) repeated more than twice. In the test phase, to exclude the possibility that the difference between CS1+ and CS2+ was simply caused by the presentation sequence of CS1+ and CS2+, half of the participants completed the test phase using a pseudo-random stimuli sequence and the identities of CS1+ and CS2+ reversed in the other half of the participants.”

      (17) "These results are consistent with previous research which suggested that people with better capability to resist intrusive thoughts also performed better in motivated dementia in both declarative and associative memories."

      ***Which parts of the present results are consistent with such prior results? It is not clear from the descriptions provided here why thought control ability should be related to the present findings or, indeed, past ones in other domains. This should be elaborated to make the connections clear.

      In the 30min group, we found that subjects’ TCAQ scores were negatively correlated with their fear recovery indices. That is, people with better capacity to resist intrusive thoughts were also less likely to experience the return of fear memory, which are consistent with previous results. Together with our brain stimulation results, the short-term amnesia is related to subject’s cognitive control ability and intact dlPFC functions. It is because of these similarities that we propose that the short-term amnesia might be related to the automatic memory suppression mechanism originated from the declarative memory research. Since we have not provided all the evidence at this point of the results section, we briefly listed the connections with previous declarative and associative memory research.

      Reviewer #2 (Public Review):

      The fear acquisition data is converted to a differential fear SCR and this is what is analysed (early vs late). However, the figure shows the raw SCR values for CS+ and CS- and therefore it is unclear whether the acquisition was successful (despite there being an "early" vs "late" effect - no descriptives are provided).

      As the reviewer mentioned, the fear acquisition data was converted to a differential fear SCR and we conducted a two-way mixed ANOVA (reminder vs. no-reminder) x time (early vs. late part of fear acquisition) on the differential SCRs. We found a significant main effect of time (early vs. late; F<sub>1.55</sub> = 6.545, P = 0.013, η<sup>2</sup> = 0.106), suggesting successful fear acquisition in both groups. Fig. 1c also showed the mean differential SCR for the latter half of the acquisition phase in both the reminder and no-reminder groups and there was no significant difference in acquired SCRs between groups (early acquisition: t<sub>55</sub> = -0.063, P = 0.950; late acquisition: t<sub>55</sub> = -0.318, P = 0.751; Fig. 1c).

      In Experiment 1 (Test results) it is unclear whether the main conclusion stems from a comparison of the test data relative to the last extinction trial ("we defined the fear recovery index as the SCR difference between the first test trial and the last extinction trial for a specific CS") or the difference relative to the CS- ("differential fear recovery index between CS+ and CS-"). It would help the reader assess the data if Figure 1e presents all the indexes (both CS+ and CS-). In addition, there is one sentence that I could not understand "there is no statistical difference between the differential fear recovery indexes between CS+ in the reminder and no reminder groups (P=0.048)". The p-value suggests that there is a difference, yet it is not clear what is being compared here. Critically, any index taken as a difference relative to the CS- can indicate recovery of fear to the CS+ or absence of discrimination relative to the CS-, so ideally the authors would want to directly compare responses to the CS+ in the reminder and no-reminder groups. The latter issue is particularly relevant in Experiment 2, in which the CS- seems to vary between groups during the test and this can obscure the interpretation of the result.

      In all the experiments, the fear recovery index (FRI) was defined as the SCR difference between the first test trial and the last extinction trial for any CS. Subsequently, the differential fear recovery index (FRI) was defined between the FRI of a specific CS+ and the FRI of the CS-. The differential FRI would effectively remove the non-specific time related effect (using the CS- FRI as the baseline). We have revised the text accordingly.

      As we responded to reviewer #1, the CS- fear recovery indices (FIR) for the reminder and no-reminder groups were not statistically different (Kruskal-Wallis test , panel a, Author response image 1), though both groups showed significant fear recovery even in the CS- condition (Wilcoxon signed rank test, reminder: P = 0.0043, no-reminder: P = 0.0037, panel a). Next, we examined the mean SCR for CS- for the 30min, 6h and 24h groups in study 2 and found that there was indeed a group difference (one-way ANOVA,  one-way ANOVA,F<sub>2.76</sub> = 5.3462, P = 0.0067, panel b), suggesting that the CS- SCR was influenced by the test time delay. We also tested the CS- SCR for the 4 groups in study 3 and found that across TMS stimulation types (PFC vs. VER) and reminder types (reminder vs. no-reminder) the ANOVA analysis did not yield main effect of TMS stimulation type (F<sub>1.71</sub> = 0.322, P = 0.572) nor main effect of reminder type (F<sub>1.71</sub> = 0.0499, P = 0.824, panel c). We added the R-VER group results in study 3 (see panel c) to panel b and plotted the CS- SCR difference across 4 different test time points and found that CS- SCR decreased as the test-extinction delay increased (Jonckheere-Terpstra test, P = 0.00028). These results suggest a natural “forgetting” tendency for the CS- fear recovery index and highlight the importance of having the CS- as a control condition to compare the CS+ recovery index with (resulting in the Differential recovery index). Parametric and non-parametric analyses were adopted based on whether the data met the assumptions for the parametric analyses.

      In Experiment 1, the findings suggest that there is a benefit of retrieval followed by extinction in a short-term reinstatement test. In Experiment 2, the same effect is observed on a cue that did not undergo retrieval before extinction (CS2+), a result that is interpreted as resulting from cue-independence, rather than a failure to replicate in a within-subjects design the observations of Experiment 1 (between-subjects). Although retrieval-induced forgetting is cue-independent (the effect on items that are suppressed [Rp-] can be observed with an independent probe), it is not clear that the current findings are similar. Here, both cues have been extinguished and therefore been equally exposed during the critical stage.

      We appreciate the reviewer’s insight on this issue. Although in the discussion we raised the possibility of memory suppression to account for the short-term amnesia effect, we did not intend to compare our paradigm side-by-side with retrieval-induced forgetting. In our previous work (Wang et al., 2021), we reported that active suppression effect of CS+ related fear memory during the standard extinction training generalized to other CS+, yielding a cue-independent effect. In the current experiments, we did not implement active suppression; instead, we used the CS+ retrieval-extinction paradigm. It is thus possible that the CS+ retrieval cue may function to facilitate automatic suppression. Indeed, in the no-reminder group (standard extinction) of study 1, we did observe the return of fear expression, suggesting the critical role of CS+ reminder before the extinction training. Based on the results mentioned above, we believe our short-term amnesia results were consistent with the hypothesis that the retrieval CS+ (reminder) might prompt subjects to adopt an automatic suppress mechanism in the following extinction training, yielding cue-independent amnesia effects.

      The findings in Experiment 2 suggest that the amnesia reported in Experiment 1 is transient, in that no effect is observed when the test is delayed by 6 hours. The phenomena whereby reactivated memories transition to extinguished memories as a function of the amount of exposure (or number of trials) is completely different from the phenomena observed here. In the former, the manipulation has to do with the number of trials (or the total amount of time) that the cues are exposed to. In the current study, the authors did not manipulate the number of trials but instead the retention interval between extinction and test. The finding reported here is closer to a "Kamin effect", that is the forgetting of learned information which is observed with intervals of intermediate length (Baum, 1968). Because the Kamin effect has been inferred to result from retrieval failure, it is unclear how this can be explained here. There needs to be much more clarity on the explanations to substantiate the conclusions.

      Indeed, in our studies, we did not manipulate the amount of exposure (or number of trials) but only the retention interval between extinction and test. Our results demonstrated that the retrieval-extinction protocol yielded the short-term amnesia on fear memory, qualitatively different from the reconsolidation related amnesia proposed in the previous literatures. After examining the temporal dynamics, cue-specificity and TCAQ association with the short-term amnesia, we speculated that the short-term effect might be related to an automatic suppression mechanism. Of course, further studies will be required to test such a hypothesis.

      Our results might not be easily compared with the “Kamin effect”, a term coined to describe the “retention of a partially learned avoidance response over varying time intervals” using a learning-re-learning paradigm (Baum, 1968, Kamin, 1957). However, the retrieval-extinction procedure used in our studies was different from the learning-re-learning paradigm in the original paper (Kamin, 1957) and the reversal-learning paradigm the reviewer mentioned (Baum, 1968).

      There are many results (Ryan et al., 2015) that challenge the framework that the authors base their predictions on (consolidation and reconsolidation theory), therefore these need to be acknowledged. Similarly, there are reports that failed to observe the retrieval-extinction phenomenon (Chalkia et al., 2020), and the work presented here is written as if the phenomenon under consideration is robust and replicable. This needs to be acknowledged.

      We thank the reviewer pointing out the related literature and have added a separate paragraph about other results in the discussion (as well as citing relevant references in the introduction) to provide a full picture of the reconsolidation theory to the audience:

      “It should be noted that while our long-term amnesia results were consistent with the fear memory reconsolidation literatures, there were also studies that failed to observe fear prevention (Chalkia, Schroyens, et al., 2020; Chalkia, Van Oudenhove, et al., 2020; Schroyens et al., 2023). Although the memory reconsolidation framework provides a viable explanation for the long-term amnesia, more evidence is required to validate the presence of reconsolidation, especially at the neurobiological level (Elsey et al., 2018). While it is beyond the scope of the current study to discuss the discrepancies between these studies, one possibility to reconcile these results concerns the procedure for the retrieval-extinction training. It has been shown that the eligibility for old memory to be updated is contingent on whether the old memory and new observations can be inferred to have been generated by the same latent cause (Gershman et al., 2017; Gershman and Niv, 2012). For example, prevention of the return of fear memory can be achieved through gradual extinction paradigm, which is thought to reduce the size of prediction errors to inhibit the formation of new latent causes (Gershman, Jones, et al., 2013). Therefore, the effectiveness of the retrieval-extinction paradigm might depend on the reliability of such paradigm in inferring the same underlying latent cause. Furthermore, other studies highlighted the importance of memory storage per se and suggested that memory retention was encoded in the memory engram cell ensemble connectivity whereas the engram cell synaptic plasticity is crucial for memory retrieval (Ryan et al., 2015; Tonegawa, Liu, et al., 2015; Tonegawa, Pignatelli, et al., 2015). It remains to be tested how the cue-independent short-term and cue-dependent long-term amnesia effects we observed could correspond to the engram cell synaptic plasticity and functional connectivity among engram cell ensembles (Figure 6). This is particularly important, since the cue-independent characteristic of the short-term amnesia suggest that either different memory cues fail to evoke engram cell activities, or the retrieval-extinction training transiently inhibits connectivity among engram cell ensembles. Finally, SCR is only one aspect of the fear expression, how the retrieval-extinction paradigm might affect subjects’ other emotional (such as the startle response) and cognitive fear expressions such as reported fear expectancy needs to be tested in future studies since they do not always align with each other (Kindt et al., 2009; Sevenster et al., 2012, 2013).”

      The parallels between the current findings and the memory suppression literature are speculated in the general discussion, and there is the conclusion that "the retrieval-extinction procedure might facilitate a spontaneous memory suppression process". Because one of the basic tenets of the memory suppression literature is that it reflects an "active suppression" process, there is no reason to believe that in the current paradigm, the same phenomenon is in place, but instead, it is "automatic". In other words, the conclusions make strong parallels with the memory suppression (and cognitive control) literature, yet the phenomena that they observed are thought to be passive (or spontaneous/automatic).

      Ultimately, it is unclear why 10 mins between the reminder and extinction learning will "automatically" suppress fear memories. Further down in the discussion, it is argued that "For example, in the well-known retrieval-induced forgetting (RIF) phenomenon, the recall of a stored memory can impair the retention of related long-term memory and this forgetting effect emerges as early as 20 minutes after the retrieval procedure, suggesting memory suppression or inhibition can occur in a more spontaneous and automatic manner". I did not follow with the time delay between manipulation and test (20 mins) would speak about whether the process is controlled or automatic.

      In our previous research, we showed that the memory suppression instruction together with the extinction procedure successfully prevented the return of fear expression in the reinstatement test trials 30mins after the extinction training (Wang et al., 2021). In the current experiments, we replaced the suppression instruction with the retrieval cue before the extinction training (retrieval-extinction protocol) and observed similar short-term amnesia effects. These results prompted us to hypothesize in the discussion that the retrieval cue might facilitate an automatic suppression process. We made the analogy to RIF phenomenon in the discussion to suggest that the suppression of (competing) memories could be unintentional and fast (20 mins), both of which were consistent with our results. We agree with the reviewer that this hypothesis is more of a speculation (hence in the discussion), and more studies are required to further test such a hypothesis. However, what we want to emphasize in this paper is the report of the short-term amnesia effects which were clearly not related to the memory reconsolidation effect in a variety of aspects.

      Among the many conclusions, one is that the current study uncovers the "mechanism" underlying the short-term effects of retrieval extinction. There is little in the current report that uncovers the mechanism, even in the most psychological sense of the mechanism, so this needs to be clarified. The same applies to the use of "adaptive".

      Whilst I could access the data on the OFS site, I could not make sense of the Matlab files as there is no signposting indicating what data is being shown in the files. Thus, as it stands, there is no way of independently replicating the analyses reported.

      We have re-organized data on the OFS site, and they should be accessible now.

      The supplemental material shows figures with all participants, but only some statistical analyses are provided, and sometimes these are different from those reported in the main manuscript. For example, the test data in Experiment 1 is analysed with a two-way ANOVA with the main effects of group (reminder vs no-reminder) and time (last trial of extinction vs first trial of the test) in the main report. The analyses with all participants in the sup mat used a mixed two-way ANOVA with a group (reminder vs no reminder) and CS (CS+ vs CS-). This makes it difficult to assess the robustness of the results when including all participants. In addition, in the supplementary materials, there are no figures and analyses for Experiment 3.

      We are sorry for the lack of clarity in the supplementary materials. We have supplementary figures Fig. S1 & S2 for the data re-analysis with all the responders (learners + non-learners). The statistical analyses performed on the responders in both figures yielded similar results as those in the main text. For other analyses reported in the supplementary materials, we specifically provided different analysis results to demonstrate the robustness of our results. For example, to rule out the effects we observed in two-way ANOVA in the main text may be driven by the different SCR responses on the last extinction trial, we only tested the two-way ANOVA for the first trial SCR of test phase and these analyses provided similar results. Please note we did not include non-learners in these analyses (the texts of the supplementary materials).

      Since we did not exclude any non-learners in study 3, all the results were already reported in the main text.

      One of the overarching conclusions is that the "mechanisms" underlying reconsolidation (long term) and memory suppression (short term) phenomena are distinct, but memory suppression phenomena can also be observed after a 7-day retention interval (Storm et al., 2012), which then questions the conclusions achieved by the current study.

      As we stated before, the focus of the manuscript was to demonstrate a novel short-term fear amnesia effect following the retrieval-extinction procedure. We discussed memory suppression as one of the potential mechanisms for such a short-term effect. In fact, the durability of the memory suppression effect is still under debate. Although Storm et al. (2012) suggested that the retrieval-induced forgetting can persist for as long as a week, other studies, however, failed to observe long-term forgetting (after 24 hrs; (Carroll et al., 2007, Chan, 2009). It is also worth noting that Storm et al. (2012) tested RIF one week later using half of the items the other half of which were tested 5 minutes after the retrieval practice. Therefore, it can be argued that there is a possibility that the long-term RIF effect is contaminated by the test/re-test process on the same set of (albeit different) items at different time onsets (5mins & 1 week).

      Reviewer #3 (Public Review):

      (1) The entire study hinges on the idea that there is memory 'suppression' if (1) the CS+ was reminded before extinction and (2) the reinstatement and memory test takes place 30 minutes later (in Studies 1 & 2). However, the evidence supporting this suppression idea is not very strong. In brief, in Study 1, the effect seems to only just reach significance, with a medium effect size at best, and, moreover, it is unclear if this is the correct analysis (which is a bit doubtful, when looking at Figure 1D and E). In Study 2, there was no optimal control condition without reminder and with the same 30-min interval (which is problematic, because we can assume generalization between CS1+ and CS2+, as pointed out by the authors, and because generalization effects are known to be time-dependent). Study 3 is more convincing, but entails additional changes in comparison with Studies 1 and 2, i.e., applications of cTBS and an interval of 1 hour instead of 30 minutes (the reason for this change was not explained). So, although the findings of the 3 studies do not contradict each other and are coherent, they do not all provide strong evidence for the effect of interest on their own.

      Related to the comment above, I encourage the authors to double-check if this statement is correct: "Also, our results remain robust even with the "non-learners" included in the analysis (Fig. S1 in the Supplemental Material)". The critical analysis for Study 1 is a between-group comparison of the CS+ and CS- during the last extinction trial versus the first test trial. This result only just reached significance with the selected sample (p = .048), and Figures 1D and E even seem to suggest otherwise. I doubt that the analysis would reach significance when including the "non-learners" - assuming that this is what is shown in Supplemental Figure 1 (which shows the data from "all responded participants").

      Our subjects were categorized based on the criteria specified in supplementary table S1. More specifically, we excluded the non-responders (Mean CS SCR < 0.02 uS  in the fear acquisition phase), and non-learners and focused our analyses on the learners. Non-responders were dismissed after day 1 (the day of fear acquisition), but both learners and non-learners finished the experiments. This fact gave us the opportunity to examine data for both the learners and the responders (learners + non-learners). What we showed in fig. 1D and E were differential SCRs (CS+ minus CS-) of the last extinction trials and the differential fear recovery indices (CS+ minus CS-), respectively. We have double checked the figures and both the learners (Fig. 1) and the responders (i.e. learners and non-learners, supplementary Fig. 1) results showed significant differences between the reminder and no-reminder groups on the differential fear recovery index.

      Also related to the comment above, I think that the statement "suggesting a cue-independent short-term amnesia effect" in Study 2 is not correct and should read: "suggesting extinction of fear to the CS1+ and CS2+", given that the response to the CS+'s is similar to the response to the CS-, as was the case at the end of extinction. Also the next statement "This result indicates that the short-term amnesia effect observed in Study 2 is not reminder-cue specific and can generalize to the non-reminded cues" is not fully supported by the data, given the lack of an appropriate control group in this study (a group without reinstatement). The comparison with the effect found in Study 1 is difficult because the effect found there was relatively small (and may have to be double-checked, see remarks above), and it was obtained with a different procedure using a single CS+. The comparison with the 6-h and 24-h groups of Study 2 is not helpful as a control condition for this specific question (i.e., is there reinstatement of fear for any of the CS+'s) because of the large procedural difference with regard to the intervals between extinction and reinstatement (test).

      In Fig. 2e, we showed the differential fear recovery indices (FRI) for the CS+ in all three groups. Since the fear recovery index (FRI) was calculated as the SCR difference between the first test trial and the last extinction trial for any CS, the differential fear recovery indices (difference between CS+ FRI and CS- FRI) not significantly different from 0 should be interpreted as the lack of fear expression in the test phase. Since spontaneous recovery, reinstatement and renewal are considered canonical phenomena in demonstrating that extinction training does not really “erase” conditioned fear response, adding the no-reinstatement group as a control condition would effectively work as the spontaneous recovery group and the comparison between the reinstatement and no-instatement groups turns into testing the difference in fear recovery using different methods (reinstatement vs. spontaneous recovery).

      (2) It is unclear which analysis is presented in Figure 3. According to the main text, it either shows the "differential fear recovery index between CS+ and CS-" or "the fear recovery index of both CS1+ and CS2+". The authors should clarify what they are analyzing and showing, and clarify to which analyses the ** and NS refer in the graphs. I would also prefer the X-axes and particularly the Y-axes of Fig. 3a-b-c to be the same. The image is a bit misleading now. The same remarks apply to Figure 5.

      We are sorry about the lack of clarity here. Figures 3 & 5 showed the correlational analyses between TCAQ and the differential fear recovery index (FRI) between CS+ and CS-. That is, the differential FRI of CS1+ (CS1+ FRI minus CS- FRI) and the differential FRI of CS2+ (CS2+ FRI minus CS- FRI).

      We have rescaled both X and Y axes for figures 3 & 5 (please see the revised figures). 

      (3) In general, I think the paper would benefit from being more careful and nuanced in how the literature and findings are represented. First of all, the authors may be more careful when using the term 'reconsolidation'. In the current version, it is put forward as an established and clearly delineated concept, but that is not the case. It would be useful if the authors could change the text in order to make it clear that the reconsolidation framework is a theory, rather than something that is set in stone (see e.g., Elsey et al., 2018 (https://doi.org/10.1037/bul0000152), Schroyens et al., 2022 (https://doi.org/10.3758/s13423-022-02173-2)).

      In addition, the authors may want to reconsider if they want to cite Schiller et al., 2010 (https://doi.org/10.1038/nature08637), given that the main findings of this paper, nor the analyses could be replicated (see, Chalkia et al., 2020 (https://doi.org/10.1016/j.cortex.2020.04.017; https://doi.org/10.1016/j.cortex.2020.03.031).

      We thank the reviewer’s comments and have incorporated the mentioned papers into our revised manuscript by pointing out the extant debate surrounding the reconsolidation theory in the introduction:

      “Pharmacological blockade of protein synthesis and behavioral interventions can both eliminate the original fear memory expression in the long-term (24 hours later) memory test ( Lee, 2008; Lee et al., 2017; Schiller et al., 2013; Schiller et al., 2010), resulting in the cue-specific fear memory deficit (Debiec et al., 2002; Lee, 2008; Nader, Schafe, & LeDoux, 2000). For example, during the reconsolidation window, retrieving a fear memory allows it to be updated through extinction training (i.e., the retrieval-extinction paradigm (Lee, 2008; Lee et al., 2017; Schiller et al., 2013; Schiller et al., 2010), but also see (Chalkia, Schroyens, et al., 2020; Chalkia, Van Oudenhove, et al., 2020; D. Schiller, LeDoux, & Phelps, 2020). ”

      As well as in the discussion:

      “It should be noted that while our long-term amnesia results were consistent with the fear memory reconsolidation literatures, there were also studies that failed to observe fear prevention (Chalkia, Schroyens, et al., 2020; Chalkia, Van Oudenhove, et al., 2020; Schroyens et al., 2023). Although the memory reconsolidation framework provides a viable explanation for the long-term amnesia, more evidence is required to validate the presence of reconsolidation, especially at the neurobiological level (Elsey et al., 2018). While it is beyond the scope of the current study to discuss the discrepancies between these studies, one possibility to reconcile these results concerns the procedure for the retrieval-extinction training. It has been shown that the eligibility for old memory to be updated is contingent on whether the old memory and new observations can be inferred to have been generated by the same latent cause (Gershman et al., 2017; Gershman and Niv, 2012). For example, prevention of the return of fear memory can be achieved through gradual extinction paradigm, which is thought to reduce the size of prediction errors to inhibit the formation of new latent causes (Gershman, Jones, et al., 2013). Therefore, the effectiveness of the retrieval-extinction paradigm might depend on the reliability of such paradigm in inferring the same underlying latent cause. Furthermore, other studies highlighted the importance of memory storage per se and suggested that memory retention was encoded in the memory engram cell ensemble connectivity whereas the engram cell synaptic plasticity is crucial for memory retrieval (Ryan et al., 2015; Tonegawa, Liu, et al., 2015; Tonegawa, Pignatelli, et al., 2015). It remains to be tested how the cue-independent short-term and cue-dependent long-term amnesia effects we observed could correspond to the engram cell synaptic plasticity and functional connectivity among engram cell ensembles (Figure 6). This is particularly important, since the cue-independent characteristic of the short-term amnesia suggest that either different memory cues fail to evoke engram cell activities, or the retrieval-extinction training transiently inhibits connectivity among engram cell ensembles. Finally, SCR is only one aspect of the fear expression, how the retrieval-extinction paradigm might affect subjects’ other emotional (such as the startle response) and cognitive fear expressions such as reported fear expectancy needs to be tested in future studies since they do not always align with each other (Kindt et al., 2009; Sevenster et al., 2012, 2013).”

      Relatedly, it should be clarified that Figure 6 is largely speculative, rather than a proven model as it is currently presented. This is true for all panels, but particularly for panel c, given that the current study does not provide any evidence regarding the proposed reconsolidation mechanism.

      We agree with the reviewer that Figure 6 is largely speculative. We realize that there are still debates regarding the retrieval-extinction procedure and the fear reconsolidation hypothesis. We have provided a more elaborated discussion and pointed out that figure 6 is only a working hypothesis and more work should be done to test such a hypothesis:

      “Although mixed results have been reported regarding the durability of suppression effects in the declarative memory studies (Meier et al., 2011; Storm et al., 2012), future research will be needed to investigate whether the short-term effect we observed is specifically related to associative memory or the spontaneous nature of suppression (Figure 6C).”

      Lastly, throughout the paper, the authors equate skin conductance responses (SCR) with fear memory. It should at least be acknowledged that SCR is just one aspect of a fear response, and that it is unclear whether any of this would translate to verbal or behavioral effects. Such effects would be particularly important for any clinical application, which the authors put forward as the ultimate goal of the research.

      Again, we agree with the reviewer on this issue, and we have acknowledged that SCR is only one aspect of the fear response and caution should be exerted in clinical application:

      “Finally, SCR is only one aspect of the fear expression, how the retrieval-extinction paradigm might affect subjects’ other emotional (such as the startle response) and cognitive fear expressions such as reported fear expectancy needs to be tested in future studies since they do not always align with each other (Kindt et al., 2009; Sevenster et al., 2012, 2013).”

      (4) The Discussion quite narrowly focuses on a specific 'mechanism' that the authors have in mind. Although it is good that the Discussion is to the point, it may be worthwhile to entertain other options or (partial) explanations for the findings. For example, have the authors considered that there may be an important role for attention? When testing very soon after the extinction procedure (and thus after the reminder), attentional processes may play an important role (more so than with longer intervals). The retrieval procedure could perhaps induce heightened attention to the reminded CS+ (which could be further enhanced by dlPFC stimulation)?

      We thank the reviewer for this suggestion and have added more discussion on the potential mechanisms involved. Unfortunately, since the literature on attention and fear recovery is rather scarce, it is even more of a speculation given our study design and results are mainly about subjects’ skin conductance responses (SCR).

      (5) There is room for improvement in terms of language, clarity of the writing, and (presentation of the) statistical analyses, for all of which I have provided detailed feedback in the 'Recommendations for the authors' section. Idem for the data availability; they are currently not publicly available, in contrast with what is stated in the paper. In addition, it would be helpful if the authors would provide additional explanation or justification for some of the methodological choices (e.g., the 18-s interval and why stimulate 8 minutes after the reminder cue, the choice of stimulation parameters), and comment on reasons for (and implications of) the large amount of excluded participants (>25%).

      We have addressed the data accessibility issue and added the justifications for the methodological choices as well as the excluded participants. As we mentioned in the manuscript and the supplementary materials, adding the non-learners into data analysis did not change the results. Since the non-responders discontinued after Day 1 due to their non-measurable spontaneous SCR signals towards different CS, it’s hard to speculate whether or how the results might have changed. However, participants’ exclusion rate in the SCR studies were relatively high (Hu et al., 2018, Liu et al., 2014, Raio et al., 2017, Schiller et al., 2010, Schiller et al., 2012, Wang et al., 2021). The non-responders were mostly associated with participants being tested in the winter in our tasks. Cold weather and dry skins in the winter are likely to have caused the SCR hard to measure (Bauer et al., 2022, Vila, 2004). Different intervals between the reinstating US (electric shock) and the test trials were used in the previous literature such as 10min (Schiller et al., 2010, Schiller et al., 2013) and 18 or 19s (Kindt and Soeter, 2018, Kindt et al., 2009, Wang et al., 2021). We stuck with the 18s reinstatement interval in the current experiment. For the cTBS stimulation, since the stimulation itself lasted less than 2mins, we started the cTBS 8min after the onset of reminder cue to ensure that any effect caused by the cTBS stimulation occurred during the hypothesized time window, where the old fear memory becomes labile after memory retrieval. All the stimulation parameters were determined based on previous literature, which showed that with the transcranial magnetic stimulation (TMS) on the human dorsolateral prefrontal cortex could disrupt fear memory reconsolidation (Borgomaneri et al., 2020, Su et al., 2022).

      Finally, I think several statements made in the paper are overly strong in light of the existing literature (or the evidence obtained here) or imply causal relationships that were not directly tested.

      We have revised the texts accordingly.

      Reviewer #2 (Recommendations For The Authors):

      On numerous occasions there are typos and the autocorrect has changed "amnesia" for "dementia".

      We are sorry about this mistake and have revised the text accordingly.

      Reviewer #3 (Recommendations For The Authors):

      *"Neither of the studies reported in this article was preregistered. The data for both studies are publicly accessible at https://osf.io/9agvk". This excerpt from the text suggests that there are 2 studies, but there are 3 in the paper. Also, the data are only accessible upon request, not publicly available. I haven't requested them, as this could de-anonymize me as a reviewer.

      We are sorry for the accessibility of the link. The data should be available to the public now.

      *Please refrain from causal interpretations when they are not supported by the data:

      - Figure 3 "thought-control ability only affected fear recovery"; a correlation does not provide causal evidence.

      - "establishing a causal link between the dlPFC activity and short-term fear amnesia." I feel this statement is too strong; to what extent do we know for sure what the applied stimulation of (or more correct: near) the dlPFC does exactly?

      We thank the reviewer for the suggestion and have changed the wording related to figure 3. On the other hand, we’d like to argue that the causal relationship between the dlPFC activity and short-term fear amnesia is supported by the results from study 3. Although the exact functional role of the TMS on dlPFC can be debated, the fact that the TMS stimulation on the dlPFC (compared to the vertex group) brought back the otherwise diminished fear memory expression can be viewed as the causal evidence between the dlPFC activity and short-term fear amnesia.

      *The text would benefit from language editing, as it contains spelling and grammar mistakes, as well as wording that is vague or inappropriate. I suggest the authors check the whole text, but below are already some excerpts that caught my eye:

      "preludes memory reconsolidation"; "old fear memory can be updated"; "would cause short-term memory deficit"; "the its functional coupling"; "Subjects (...) yielded more severe amnesia in the memory suppression tasks"; "memory retrieval might also precipitate a short-term amnesia effect"; "more SEVERE amnesia in the memory suppression tasks"; "the effect size of reinstatement effect"; "the previous literatures"; "towards different CS"; "failed to show SCR response to the any stimuli"; "significant effect of age of TMS"; "each subject' left hand"; "latter half trials"; "Differntial fear recovery"; "fear dementia"; "the fear reinstatement effects at different time scale is related to"; "fear reocery index"; "thought-control abiliites"; "performed better in motivated dementia"; "we tested that in addition to the memory retrieval cue (reminder), whether the"; "during reconsolidation window"; "consisitent with the short-term dementia"; "low level of shock (5v)"

      We thank the reviewer for thorough reading and sorry about typos in the manuscript. We have corrected typos and grammar mistakes as much as we can find.

      *In line with the remark above, there are several places where the text could still be improved.

      - The last sentence of the Abstract is rather vague and doesn't really add anything.

      - Please reword or clarify: "the exact functional role played by the memory retrieval remains unclear".

      - Please reword or clarify: "the unbinding of the old memory trace".

      - "suggesting that the fear memory might be amenable to a more immediate effect, in addition to what the memory reconsolidation theory prescribes" shouldn't this rather read "in contrast with"?

      We have modified the manuscript.

      - In the Introduction, the authors state: "Specifically, memory reconsolidation effect will only be evident in the long-term (24h) memory test due to its requirement of new protein synthesis and is cue-dependent". They then continue about the more immediate memory update mechanisms that they want to study, but it is unclear from how the rationale is presented whether (and why (not)) they also expect this mechanism to be cue-dependent.

      Most of the previous studies on the fear memory reconsolidation using CS as the memory retrieval cues have demonstrated that the reconsolidation effect is cue-dependent (Kindt and Soeter, 2018, Kindt et al., 2009, Monfils et al., 2009, Nader et al., 2000, Schiller et al., 2013, Schiller et al., 2010, Xue et al., 2012). However, other studies using unconditioned stimulus retrieval-extinction paradigm showed that such protocol was able to prevent the return of fear memory expression associated with different CSs (Liu et al., 2014, Luo et al., 2015). In our task, we used CS+ as the memory retrieval cues and our results were consistent with results from previous studies using similar paradigms.

      - "The effects of cTBS over the right dlPFC after the memory reactivation were assessed using the similar mixed-effect four-way ANOVA". Please clarify what was analyzed here.<br /> - "designing novel treatment of psychiatric disorders". Please make this more concrete or remove the statement.

      This sentence was right after a similar analysis performed in the previous paragraph. While the previous graph focused on how the SCRs in the acquisition phase were modulated by factors such as CS+ (CS1+ and CS2+), reminder (reminder vs. no-reminder), cTBS site (right dlPFC vs. vertex) and trial numbers, this analysis focused instead on the SCR responses in the extinction training phase. We have made the modifications as the reviewer suggested.

      *I have several concerns related to the (presentation) of the statistical analyses/results:<br /> - Some statistical analyses, as well as calculation of certain arbitrary indices (e.g., differential fear recovery index) are not mentioned nor explained in the Methods section, but only mentioned in the Results section.

      We have added the explanation of the differential fear recovery index into the methods section:

      “To measure the extent to which fear returns after the presentation of unconditioned stimuli (US, electric shock) in the test phase, we defined the fear recovery index as the SCR difference between the first test trial and the last extinction trial for a specific CS for each subject. Similarly, in studies 2 and 3, differential fear recovery index was defined as the difference between fear recovery indices of CS+ and CS- for both CS1+ and CS2+.”

      - Figure 1C-E: It is unclear what the triple *** mean. Do they have the same meaning in Figure 1C and Figure 1E? I am not sure that that makes sense. The meaning is not explained in the figure caption (I think it is different from the single asterisk*) and is not crystal clear from the main text either.

      We explained the triple *** in the figure legend (Fig. 1): ***P < 0.001. The asterisk placed within each bar in Figure 1C-E indicates the statistical results of the post-hoc test of whether each bar was significant. For example, the *** placed inside bars in Figure 1E indicates that the differential fear recovery index is statistically significant in the no-reminder group (P < 0.001).

      - Supplemental Figure 1: "with all responded participants" Please clarify how you define 'responded participants' and include the n's.

      We presented the criteria for both the responder/non-responder and the learner/non-learner in the table of the supplementary materials and reported the number of subjects in each category (please see supplement Table 1).

      - "the differential SCRs (difference between CS+ and CS-) for the CS+". Please clarify what this means and/or how it is calculated exactly.

      Sorry, it means the difference between the SCRs invoked by CS+ and CS- for both CS1+ (CS1+ minus CS-) and CS2+ (CS2+ minus CS-).

      *I suggest that the authors provide a bit more explanation about the thought-control ability questionnaire. For example, the type of items, etc, as this is not a very commonly used questionnaire in the fear conditioning field.

      We provided a brief introduction to the thought-control ability questionnaire in the methods section:

      “The control ability over intrusive thought was measured by the 25-item Thought-Control Ability Questionnaire (TCAQ) scle(30). Participants were asked to rate on a five-point Likert-type scale the extent to which they agreed with the statement from 1 (completely disagree) to 5 (completely agree). At the end of the experiments, all participants completed the TCAQ scale to assess their perceived control abilities over intrusive thoughts in daily life(17).”

      We have added further description of the item types to the TCAQ scale.

      *The authors excluded more than 25% of the participants. It would be interesting to hear reasons for this relatively large number and some reflection on whether they think this selection affects their results (e.g., could being a (non)responder in skin conductance influence the susceptibility to reactivation-extinction in some way?).

      Participants exclusion rate in the SCR studies were relatively high (Hu et al., 2018, Liu et al., 2014, Raio et al., 2017, Schiller et al., 2010, Schiller et al., 2012, Wang et al., 2021). The non-responders were mostly associated with participants being tested in the winter in our tasks. Cold weather and dry skins in the winter are likely to have caused the SCR hard to measure (Bauer et al., 2022, Vila, 2004).

      *Minor comments that the authors may want to consider:

      - Please explain abbreviations upon first use, e.g., TMS.

      - In Figure 6, it is a bit counterintuitive that the right Y-axis goes from high to low.

      We added the explanation of TMS:

      “Continuous theta burst stimulation (cTBS), a specific form of repetitive transcranial magnetic stimulation (rTMS)…”

      We are sorry and agree that the right Y-axis was rather counterintuitive. However, since the direction of the fear recovery index (which was what we measured in the experiment) and the short/long-term amnesia effect are of the opposite directions, plotting one index from low to high would inevitably cause the other index to go from high to low.

      Reference:

      Anderson, M. C. and Floresco, S. B. 2022. Prefrontal-hippocampal interactions supporting the extinction of emotional memories: The retrieval stopping model. Neuropsychopharmacology, 47, 180-195.

      Anderson, M. C. and Green, C. 2001. Suppressing unwanted memories by executive control. Nature, 410, 366-9.

      Bauer, E. A., Wilson, K. A. and Macnamara, A. 2022. 3.03 - cognitive and affective psychophysiology. In: ASMUNDSON, G. J. G. (ed.) Comprehensive clinical psychology (second edition). Oxford: Elsevier.

      Baum, M. 1968. Reversal learning of an avoidance response and the kamin effect. J Comp Physiol Psychol, 66, 495-7.

      Borgomaneri, S., Battaglia, S., Garofalo, S., Tortora, F., Avenanti, A. and Di Pellegrino, G. 2020. State-dependent tms over prefrontal cortex disrupts fear-memory reconsolidation and prevents the return of fear. Curr Biol, 30, 3672-3679.e4.

      Cain, C. K., Blouin, A. M. and Barad, M. 2003. Temporally massed cs presentations generate more fear extinction than spaced presentations. J Exp Psychol Anim Behav Process, 29, 323-33.

      Carroll, M., Campbell-Ratcliffe, J., Murnane, H. and Perfect, T. 2007. Retrieval-induced forgetting in educational contexts: Monitoring, expertise, text integration, and test format. European Journal of Cognitive Psychology, 19, 580-606.

      Chan, J. C. K. 2009. When does retrieval induce forgetting and when does it induce facilitation? Implications for retrieval inhibition, testing effect, and text processing. Journal of Memory and Language, 61, 153-170.

      Gagnepain, P., Henson, R. N. and Anderson, M. C. 2014. Suppressing unwanted memories reduces their unconscious influence via targeted cortical inhibition. Proc Natl Acad Sci U S A, 111, E1310-9.

      Gershman, S. J., Jones, C. E., Norman, K. A., Monfils, M. H. and Niv, Y. 2013. Gradual extinction prevents the return of fear: Implications for the discovery of state. Front Behav Neurosci, 7, 164.

      Gershman, S. J., Monfils, M. H., Norman, K. A. and Niv, Y. 2017. The computational nature of memory modification. Elife, 6.

      Hu, J., Wang, W., Homan, P., Wang, P., Zheng, X. and Schiller, D. 2018. Reminder duration determines threat memory modification in humans. Sci Rep, 8, 8848.

      Kamin, L. J. 1957. The retention of an incompletely learned avoidance response. J Comp Physiol Psychol, 50, 457-60.

      Kindt, M. and Soeter, M. 2018. Pharmacologically induced amnesia for learned fear is time and sleep dependent. Nat Commun, 9, 1316.

      Kindt, M., Soeter, M. and Vervliet, B. 2009. Beyond extinction: Erasing human fear responses and preventing the return of fear. Nat Neurosci, 12, 256-8.

      Liu, J., Zhao, L., Xue, Y., Shi, J., Suo, L., Luo, Y., Chai, B., Yang, C., Fang, Q., Zhang, Y., Bao, Y., Pickens, C. L. and Lu, L. 2014. An unconditioned stimulus retrieval extinction procedure to prevent the return of fear memory. Biol Psychiatry, 76, 895-901.

      Luo, Y.-X., Xue, Y.-X., Liu, J.-F., Shi, H.-S., Jian, M., Han, Y., Zhu, W.-L., Bao, Y.-P., Wu, P., Ding, Z.-B., Shen, H.-W., Shi, J., Shaham, Y. and Lu, L. 2015. A novel ucs memory retrieval-extinction procedure to inhibit relapse to drug seeking. Nature Communications, 6, 7675.

      Monfils, M. H., Cowansage, K. K., Klann, E. and Ledoux, J. E. 2009. Extinction-reconsolidation boundaries: Key to persistent attenuation of fear memories. Science, 324, 951-5.

      Nader, K., Schafe, G. E. and Le Doux, J. E. 2000. Fear memories require protein synthesis in the amygdala for reconsolidation after retrieval. Nature, 406, 722-6.

      Raio, C. M., Hartley, C. A., Orederu, T. A., Li, J. and Phelps, E. A. 2017. Stress attenuates the flexible updating of aversive value. Proc Natl Acad Sci U S A, 114, 11241-11246.

      Schiller, D., Kanen, J. W., Ledoux, J. E., Monfils, M. H. and Phelps, E. A. 2013. Extinction during reconsolidation of threat memory diminishes prefrontal cortex involvement. Proc Natl Acad Sci U S A, 110, 20040-5.

      Schiller, D., Monfils, M. H., Raio, C. M., Johnson, D. C., Ledoux, J. E. and Phelps, E. A. 2010. Preventing the return of fear in humans using reconsolidation update mechanisms. Nature, 463, 49-53.

      Schiller, D., Raio, C. M. and Phelps, E. A. 2012. Extinction training during the reconsolidation window prevents recovery of fear. J Vis Exp, e3893.

      Su, S., Deng, J., Yuan, K., Gong, Y., Zhang, Y., Li, H., Cao, K., Huang, X., Lin, X., Wu, P., Xue, Y., Bao, Y., Shi, J., Shi, L. and Lu, L. 2022. Continuous theta-burst stimulation over the right dorsolateral prefrontal cortex disrupts fear memory reconsolidation in humans. iScience, 25, 103614.

      Vila, J. 2004. Psychophysiological assessment. In: SPIELBERGER, C. D. (ed.) Encyclopedia of applied psychology. New York: Elsevier.

      Wang, Y., Zhu, Z., Hu, J., Schiller, D. and Li, J. 2021. Active suppression prevents the return of threat memory in humans. Commun Biol, 4, 609.

      Xue, Y. X., Luo, Y. X., Wu, P., Shi, H. S., Xue, L. F., Chen, C., Zhu, W. L., Ding, Z. B., Bao, Y. P., Shi, J., Epstein, D. H., Shaham, Y. and Lu, L. 2012. A memory retrieval-extinction procedure to prevent drug craving and relapse. Science, 336, 241-5.

      Zhu, Z., Anderson, M. C. and Wang, Y. 2022. Inducing forgetting of unwanted memories through subliminal reactivation. Nature communications, 13, 6496-6496.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Many drugs have off-target effects on the gut microbiota but the downstream consequences for drug efficacy and side effect profiles remain unclear. Herein, Wang et al. use a mouse model of liver injury coupled to antibiotic and microbiota transplantation experiments. Their results suggest that metformin-induced shifts in gut microbial community structure and metabolite levels may contribute to drug efficacy. This study provides valuable mechanistic insights that could be dissected further in future studies, including efforts to identify which specific bacterial species, genes, and metabolites play a causal role in drug response. Importantly, although some pilot data from human subjects is shown, the clinical relevance of these findings for liver disease remain to be determined.

      Thank you for reviewing our manuscript. We appreciate your valuable feedback. We agree that the downstream consequences of off-target effects on the gut microbiota by various drugs remain unclear. Our study aimed to shed light on this aspect by utilizing a mouse model of liver injury and conducting antibiotic and microbiota transplantation experiments. Our findings suggest that shifts in the structure and metabolite levels of the gut microbial community induced by metformin play a role in the drug’s efficacy. We believe that these mechanistic insights provide a strong foundation for further investigations. Specifically, future studies could focus on identifying the specific bacterial species, genes, and metabolites that have a causal role in drug response. While we have included some pilot data from human subjects, we acknowledge that the clinical relevance of our findings in the context of liver disease still requires further determination. In fact, we focused on the alteration of microbiota and metabolism caused by metformin in human bodies, which could capture the characteristics of changes in a more composite clinical direction, elucidating the potential role of metformin. We appreciate your attention to this aspect and thank you again for your thoughtful review and valuable suggestions.

      The major strength of this work is its scope, including detailed mouse phenotyping, inter-disciplinary methods, and numerous complementary experiments. The antibiotic depletion and FMT experiments provide support for a role of the gut microbiota in this mouse model.

      A major limitation is the lack of studies narrowing down which microbes are responsible. Sequencing data is shown, but no follow-up studies are done with bacterial isolates or defined communities.

      We acknowledge the limitation of our study in not narrowing down the specific microbes responsible for the observed effects. We hold the opinion that metformin exerts its effects through modulation of specific metabolic pathways unique to the microbial community. Previous study has shown that metformin can inhibit microbial folate metabolism, leading to longevity-promoting effects that are not attributed to a single colony or strain[1]. Similarly, the impact of metformin on amino acid metabolism in the microbial community appears to be widespread. While further investigations with bacterial isolates or defined communities are needed, our findings suggest that metformin's effects on microbial metabolism are complex and involve multiple members of the microbial community.

      The link to GABA is also somewhat tenuous. While it does match the phenotypic data, there are no targeted experiments in which GABA producing microbial communities/strains are compared to a control community/strain. As such, it seems difficult to know how much of the effects in this model are due to GABA vs. other metabolites.

      We agree with your point regarding the tenuous link to GABA in our study. While we did observe an increase in GABA as the only amino acid following metformin treatment, and this finding has not been reported previously, we acknowledge the need for targeted experiments comparing GABA-producing microbial communities/strains to control communities/strains. Previous literatures suggest that metformin's modulation of the microbiota can vary significantly depending on the disease context, with different microbial populations exhibiting differential responses[2-4]. Given this complexity, we opted to study the overall microbial community response to metformin rather than focusing on specific strains. Additionally, our detection of key enzymes involved in GABA synthesis at the community level further supports our findings.

      My major recommendation would be to revise the title, abstract, and discussion to provide more qualification and to consider alternative interpretations.

      We appreciate your feedback and understand your concern regarding the need for more qualification and consideration of alternative interpretations. We hope to have more specific and detailed suggestions you may have to enhance the clarity and qualification of our title and abstract. Furthermore, we have tried to revise discussion in order to enhance the scientific rigor and logical coherence of our study. If you have any specific recommendations or insights, we would be more than willing to make further revisions to address those concerns.

      Some key controls are also missing, which could be addressed by repeat experiments in the mouse model.

      We appreciate your suggestion to include additional key controls in the mouse model experiments. We have conducted repeat experiments to test the effect of antibiotics in the absence of metformin to differentiate between the effects of the model itself and the interaction of metformin with antibiotics. As results of liver injury indicators shown, there were no significance among Control, Control+Met, Control+FMT and Control+Abx groups, revealing that metformin and its treated feces, and antibiotics had no effect on liver function in normal mice (Figure 1).

      Author response image 1.

      Figure1 a: Liver MDA detection; b: Serum ALT level; c: Serum AST level.

      The antibiotic depletion experiment would be improved by testing the effect of antibiotics in the absence of metformin, to see if the effect is just driven by the model itself as opposed to an interaction between metformin and antibiotics.

      For the antibiotic depletion experiment, we had used antibiotics (Abx) for the mice of modeling, and the survival rate and liver function detection suggested that Abx had no extra effect on liver, which demonstrated that the effect is just driven by the model itself as opposed to an interaction between metformin and antibiotics (Figure 2).

      Author response image 2.

      Figure2 a: Survival rate between IR and IR + Abx group; b: Serum ALT level; c: Serum AST level.

      References

      [1] CABREIRO F, AU C, LEUNG K Y, et al. Metformin Retards Aging in C. elegans by Altering Microbial Folate and Methionine Metabolism [J]. Cell, 2013, 153(1): 228-39.

      [2] LIANG H, SONG H, ZHANG X, et al. Metformin attenuated sepsis-related liver injury by modulating gut microbiota [J]. Emerg Microbes Infect, 2022, 11(1): 815-28.

      [3] SUN L, XIE C, WANG G, et al. Gut microbiota and intestinal FXR mediate the clinical benefits of metformin [J]. Nat Med, 2018, 24(12): 1919-29.

      [4] ZHAO H Y, LYU Y J, ZHAI R Q, et al. Metformin Mitigates Sepsis-Related Neuroinflammation via Modulating Gut Microbiota and Metabolites [J]. Frontiers in Immunology, 2022, 13:797312.

      Reviewer #2 (Public Review):

      The authors examine the use of metformin in the treatment of hepatic ischemia/reperfusion injury (HIRI) and suggest the mechanism of action is mediated in part by the gut microbiota and changes in hepatic ferroptosis. While the concept is intriguing, the experimental approaches are inadequate to support these conclusions.

      The histological and imaging studies were considered a strength and reveal a significant impact of metformin post-HIRI.

      Thank you for reviewing our paper titled “Gut microbiota-derived gamma-aminobutyric acid from metformin treatment reduces hepatic ischemia/reperfusion injury through inhibiting ferroptosis”. We appreciate your insightful comments and suggestions, which have provided valuable insights into improving the quality and credibility of my research. We agree with your assessment that the experimental approaches used in this study may have limitations in supporting the conclusions drawn, and we appreciate your recognition of the strength of our histological and imaging studies, which clearly demonstrate the impact of metformin post-HIRI.

      Weaknesses largely stem from the experimental design. First, use of the iron chelator DFO would be strengthened using the ferroptosis inhibitor, liproxstatin.

      Your suggestion to employ the ferroptosis inhibitor, liproxstatin, in addition to the iron chelator DFO is well-taken. Incorporating liproxstatin into our experimental setup would provide a more comprehensive understanding of the involvement of hepatic ferroptosis in the mechanism of action of metformin. Therefore, we employed liproxstatin to inhibit HIRI and detected some core indicators of liver injury. As figure 3 shown, liproxstatin can reduce liver injury, restore liver GSH level and inhibit Fe accumulation, suggesting that ferroptosis plays an important role in HIRI. We hope this modification will enhance the credibility of our conclusions.

      Author response image 3.

      Figure3 a: Liver MDA detection; b: Serum ALT level; c: Serum AST level; d: Liver GSH level; e: Liver Fe level.

      Second, the impact of metformin on the microbiota is profound resulting in changes in bile acid, lipid, and glucose homeostasis. Throughout the manuscript no comparisons are made with metformin alone which would better capture the metformin-specific effects.

      Thank you for raising an important point regarding the impact of metformin on the microbiota and its potential effects on bile acid, lipid, and glucose homeostasis. It has well known that that the effects of metformin on normal blood glucose and lipid metabolism are minimal. Metformin primarily exerts its effects in cases of impaired glucose tolerance, which is why it is widely used for non-diabetic conditions. Regarding the changes in bile acid metabolism and chronic cholesterol and lipid elevation, these associations are typically observed in chronic liver disease models. Since our study focuses on an acute model of HIRI, we did not specifically investigate these changes.

      Lastly, the absence of proper controls including germ free mice, metformin treated mice, FMT treated mice, etc make it difficult to understand the outcomes and to properly reproduce the findings in other labs.

      Lastly, we acknowledge your concern regarding the absence of proper controls, including germ-free mice, metformin-treated mice, and FMT -treated mice. We understand that these controls are essential for robustly interpreting and reproducing our findings. Therefore, we have added a batch of experiments for verification. As results shown, there were no significance among Control, Control+Met, Control+FMT and Control+Abx groups, revealing that metformin and its treated feces, and antibiotics had no effect on liver function in normal mice (Figure 1). We hope the result of these controls could address your valid point and provide a more comprehensive framework for understanding the outcomes.

      Author response image 4.

      Figure1 a: Liver MDA detection; b: Serum ALT level; c: Serum AST level.

      Overall, while the concept is interesting and has the potential to better understand the pleiotropic functions of metformin, the limitations with the experimental design and lack of key controls make it challenging to support the conclusions.

      We genuinely appreciate your constructive criticism and the time you have taken to evaluate my work. Your feedback has shed light on the limitations of our experimental design and the need for key controls, which we have addressed in revised manuscript. If you have any further recommendations or concerns, we would be more than willing to incorporate them into my future work.

      Reviewer #3 (Public Review):

      The study presented in this paper explores the role of gut microbiota in the therapeutic effect of metformin on HIRI, as supported by fecal microbiota transplantation (FMT) experiments. Through high throughput sequencing and HPLC-MS/MS, the authors have successfully demonstrated that metformin administration leads to an increase in GABA-producing bacteria. Moreover, the study provides compelling evidence for the beneficial impact of GABA on HIRI.

      Thank you for your valuable feedback on our paper exploring the role of gut microbiota in the therapeutic effect of metformin on hepatic ischemia-reperfusion injury (HIRI). We appreciate your positive remarks and suggestions for improvement. In response to your comments, we have revised the manuscript accordingly. We have included additional details on the high throughput sequencing and HPLC-MS/MS methods used to analyze the gut microbiota and GABA levels. This should provide readers with a clearer understanding of our experimental approach and the evidence supporting our findings.

      Regarding your suggestion to further investigate the mechanisms underlying the beneficial impact of GABA on HIRI, we agree that this is an important direction for future research. We plan to conduct additional studies to explore the specific mechanisms by which GABA exerts its protective effects on HIRI in the future. We also supplemented discussion of potential therapeutic strategies targeting GABAergic pathways in the discussion section.

      Thank you once again for your insightful comments. We believe that these revisions have strengthened the manuscript and improved its scientific rigor. We hope that you find the revised version to be satisfactory and look forward to your further feedback.

      Reviewer #1 (Recommendations For The Authors):

      The writing could be improved. Multiple typos are found throughout and there is an overuse of adverbs like "expectedly". You should let the reader decide what is or is not expected. Try to avoid terms like "confirmed" or "validated", which only applies if you knew the result a priori. Remove underscores in species names. The Results section is also very difficult to interpret given the lack of explanation of experimental design. For example, the human study is only briefly mentioned within a larger paragraph on mouse data, without any explanation as to the study design. Similar issues are true for the transcriptomics and amplicon sequencing - it would help the reader to explain what samples were processed, the timepoints, etc.

      Thank you for your valuable feedback on our manuscript entitled “Gut microbiota-derived gamma-aminobutyric acid from metformin treatment reduces hepatic ischemia/reperfusion injury through inhibiting ferroptosis” We appreciate your constructive comments and insightful suggestions for improvement.

      We have carefully reviewed your comments and have made several revisions to enhance the clarity and readability of the manuscript. We have addressed the issue of multiple typos and have removed the overuse of adverbs, such as “expectedly,” to allow readers to draw their own conclusions from the results. Additionally, we have eliminated terms like “confirmed” or “validated” that may imply a priori knowledge of the results.

      We apologize for the lack of clarity regarding the experimental design in the Results section. We have now provided a more detailed explanation of the study design for the human study, transcriptomics, and amplicon sequencing experiments. This includes information on the samples processed, timepoints, and other relevant details, to aid readers in understanding the experimental procedures.

      In response to your comment about removing underscores in species names, we have revised the text accordingly to ensure consistency and accuracy in the species nomenclature used throughout the manuscript.

      Once again, we sincerely appreciate your valuable input, which has helped us improve the quality of our manuscript. We hope that the revised version now meets your expectations and look forward to any further feedback you may have.

      Thank you for your time and attention.

      Line 53 - prebiotics aren't "microbial agents"

      We apologize for this error, which we have corrected. (line 55: “Microbial agents, such as synbioticsprebiotics and probiotics…”)

      Line 88 - sequencing doesn't "verify the critical role of gut microbiota"

      We apologize for this error, which we have corrected. (line 90: “In order to verifyclarify the critical role of gut microbiota in the pleiotropic actions of metformin,22-24 fecal samples were collected from the mice to perform 16S rRNA sequencing.

      Line 92 - missing a citation for the "microbiota-gut-liver axis theory"

      We have corrected it in manuscript. (line 93: “Next, as the microbiota-gut-liver axis theory indicates,25 HIRI-induced dysfunction of the gut barrier may aggravate liver damage by disrupting the gut microbiota.”)

      Line 112 - it's very surprising to me that FMT led to lower alpha diversity, which seems impossible.

      We understand your surprise regarding the observed decrease in alpha diversity after FMT. Our findings indeed deviate from the commonly observed pattern of increased alpha diversity post-FMT. We have carefully re-examined our data and conducted additional analyses to ensure the accuracy of our results. After thorough investigation, we have identified a potential reason for this unexpected outcome, which we believe could shed light on this phenomenon. We hypothesize that the lower alpha diversity observed in our study might be attributed to the specific characteristics of the donor microbiota used for FMT. While the donor microbiota exhibited certain beneficial properties associated with the therapeutic effect on HIRI, it could have presented a limited diversity compared to the recipient’s original gut microbiota. This discrepancy in diversity could have contributed to the observed decrease in alpha diversity following FMT.

      To further support our hypothesis, we have included a discussion on this unexpected finding in the revised manuscript. We believe that this addition will provide a more comprehensive understanding of the results and help contextualize the observed decrease in alpha diversity following FMT.

      Line 117 - Antibiotics don't "identify the function of gut microbes." Need to specify which antibiotics were used and for how long.

      We have corrected it in manuscript. (line 119: “To further identify the function of gut microbes, experiments were designed, and combination treatment of antibiotics (1 mg/mL penicillin sulfate, 1 mg/mL neomycin sulfate, 1 mg/mL metronidazole and 0.16 mg/mL gentamicin) and metformin were employed for 1 week before IR treated.”)

      Line 120 - this experiment shows that the gut microbiota (or antibiotics more precisely) matters, not the "reshaped gut microbiota"

      We have corrected it in manuscript. (line 124: “The results confirmed that reshaped gut microbiota is critical for the effect of metformin against HIRI.”)

      Line 122 - need to reword this subheading and the concluding sentence. The main takeaway is that the FMT improved markers of ferroptosis, but no additional causal links are provided here.

      We have revised in manuscript. (line 125: “FMT alleviates HIRI-induced ferroptosis through reshaped fecal microbiota.”)

      Line 141 - need to explain what transcriptomics data was generated and how it was analyzed.

      We have revised in manuscript. (line 144: “To elucidate the molecular mechanisms through which pathway participates metformin-treated IR injury, we analysed gene expression profiles of each group mice. Transcriptome sequencing analysis revealed that 9697 genes were in common among four groups (Supplementary Figure 6). Therefore, we used these common genes for KEGG analysis, showing that The transcriptome analysis of liver tissues showed that similar mRNA changes between Met group and FMT group are mainly concentrated in the three top pathways: lipid metabolism, carbohydrate metabolism, and amino acid metabolism (Fig 4a).”)

      Line 150 - change to "16S rRNA gene sequencing". Typo: "mice microbes".

      We have revised in manuscript. (line 156: “Moreover, it was observed that the genus of Bacteroides had a significant increase based on the 16s rRNA gene sequencing of metformin-treated mice microbes.”)

      Line 152 - upregulated refers to gene expression, change to enriched.

      We have revised in manuscript. (line 171: “Detailedly, the species of Bacteroides containing Bacteroides thetaiotaomicron, Bacteroides unifomis, and Bacteroides salyersiae, were enriched in human gut after metformin administration (Fig. 4i).”)

      Line 159 - typo: "prokaryotes"

      We have revised in manuscript. (line 165: “In order to further identify the increased GABA originates from gut microbiota, two key enzymes of prokaryotes protokaryotic GABA synthesis, GAD and PAT, were detected on DNA level, finding that both of them are significantly increased in the feces from IR+Met and IR+FMT groups (Fig. 4h).”)

      Line 161 - the human study should be under a new sub-heading and provide more details.

      We have revised in manuscript. (line 168: In order to clarify the specific effects of metformin on microbiota, given the big safety margin, healthy volunteers were recruited for a 1 week of daily oral 500mg dose of metformin trial. Fecal samples were collected before and after oral administration of metformin for metagenomic analysis .”)

      Line 197 - It's unclear why the current study conflicts with prior literature. Is it due to the disease model, the starting microbiota, something else? Please add more discussion.

      Thank you for bringing this important point to our attention, and we appreciate your valuable input. We agree that it is important to discuss the potential reasons for the discrepancy between our findings and prior literature on metformin-reshaped microbiota. In our study, we used a disease model of HIRI, which may have unique characteristics compared to other disease models. It is possible that the specific disease model influenced the response of the gut microbiota. Additionally, the starting microbiota of the recipients and the characteristics of the donor microbiota used for FMT could also play a role in the disparity. We have expanded the discussion section of our revised manuscript to further address these potential factors and their implications. We hope that this additional information will provide a more comprehensive explanation for the discrepancy between our study and prior literature.

      Figure 1a - change to Kaplan Meier not ANOVA. Specify the contrast - which groups are being compared?

      We have revised in Figure 1a.

      Figure 1e, alpha diversity - relabel "sobs" with "observed OTUs". Change to 3 bars with error and add statistics.

      We have revised in Figure 1e.

      Figure 1e, PCA - this should be a separate panel (1f). Color of big red circle doesn't match the points. Add PERMANOVA p-value/R2. Change to OTUs not genera. Better yet, use amplicon sequence variants from DADA2.

      We have revised in Figure 1e..

      Figure 2a - Change to Kaplan Meier. Also, it's unclear if residual metformin could be in the donor samples.

      We have revised in Figure 2a.

      Figure 2f, alpha diversity - relabel "sobs" with "observed OTUs". Change to 3 bars with error and add statistics.

      We have revised in Figure 2f.

      Figure 2f, PCA - this should be a separate panel (2g). Color of big orange circle doesn't match the points. Add PERMANOVA p-value/R2. Change to OTUs not genera. Better yet, use amplicon sequence variants from DADA2.

      We have revised in Figure 2f.

      Figure 4b - check units, shouldn't this be ng/mg (i.e. weight not volume).

      We have revised in Figure 4b.

      Figure 4c,d - need more explanation in the legend and Results as to what is shown here.

      We have revised in Figure 4c,d.

      Figure 4d - unclear why only Bacteroides are shown here or if the p-values are adjusted for multiple comparisons.

      Thank you for your comment regarding Figure 4d in our manuscript. We apologize for the confusion caused. The reason why only Bacteroides is shown in Figure 4d is because we specifically wanted to investigate the changes in Bacteroides abundance following metformin treatment.

      In the mouse experiments, we observed a significant increase in Bacteroides after metformin treatment. To investigate if a similar change occurs in healthy volunteers, we examined the levels of Bacteroides in fecal samples before and after oral administration of metformin. We found that the abundance of Bacteroides also increased in the human gut after metformin administration, consistent with the results from the animal experiments. Regarding the p-values, we apologize for not mentioning whether they were adjusted for multiple comparisons in the figure legend. In our revised manuscript, we have provided a clarification stating that the p-values were adjusted using the appropriate method. We appreciate your feedback and hope that this explanation clarifies the rationale behind Figure 4d. Thank you for your valuable input.

      Reviewer #2 (Recommendations For The Authors):

      Below I've listed several suggestions to improve the paper.

      1. Controls - the authors should include metformin only treated mice, FMT only treated mice, etc. Additionally, germ free mice treated with metformin and HIRI would be helpful to better implicate the gut microbiome in these beneficial effects.

      Thank you for your suggestion regarding the inclusion of additional control groups in our study. We agree that including metformin only treated mice, FMT only treated mice, and germ-free mice treated with metformin and HIRI would provide valuable insights into the role of the gut microbiome in the observed beneficial effects.

      Therefore, we have included metformin only treated mice, FMT only treated mice and Abx only treated mice as supplement to better assess the specific contribution to the observed effects. As results shown, there were no significance among Control, Control+Met, Control+FMT and Control+Abx groups, revealing that metformin and its treated feces, and antibiotics had no effect on liver function in normal mice (figure1).

      We appreciate your input and believe that the inclusion of these additional control groups will strengthen our study and provide a more comprehensive understanding of the role of the gut microbiome in the therapeutic effects observed.

      Author response image 5.

      Figure1 a: Liver MDA detection; b: Serum ALT level; c: Serum AST level.

      1. More thorough characterization of metabolite pools. Metformin is known to influence many pathways including bile acids and lipids. These important molecules should be measures as they likely play a key role in the observed protective effect. In fact, many of the key changes displayed in Figure 3H are involved in lipid metabolism.

      Thank you for your valuable feedback regarding the characterization of metabolite pools in our study. We appreciate your suggestion to measure the influence of metformin on bile acids and lipid metabolism, as they are crucial pathways that may play a significant role in the observed protective effect.

      Regarding bile acids, we agree that they are important in the context of metformin’s influence on metabolic pathways. However, it is important to note that the impact of metformin on bile acids appears to be more prominent in chronic liver disease models. In our acute model, the changes in bile acids were not as significant. Instead, our results primarily indicate a close association between lipid changes and hepatic ferroptosis. Metformin significantly modulates lipid metabolism, thereby alleviating liver ferroptosis.

      Additionally, we have conducted metagenomic sequencing on the gut microbiota of healthy volunteers before and after oral administration of metformin. While analyzing the data, we did not observe significant changes in key genes involved in regulating bile acid variations. This might be attributed to the healthy volunteers used in our study, where significant changes in bile acids were not induced.

      We appreciate your insightful comments and suggestions, which have shed light on the importance of characterizing bile acids and lipid metabolism in our study. While the impact of bile acids may be more evident in chronic liver disease models, our findings highlight the significant influence of metformin on lipid metabolism, closely related to hepatic ferroptosis. We will take your suggestions into account for future studies to further explore the role of bile acids and their regulation by metformin.

      1. Imaging of lipid ROS is not quantitative. The authors should conduct more standard assays with BODIPY 581/591 C11 using cell lysates.

      We appreciate your suggestion to conduct more standard assays using BODIPY 581/591 C11 with cell lysates.

      We would like to clarify that we did indeed utilize assays with BODIPY 581/591 C11 to detect and measure lipid ROS in our study. The detailed description of these assays can be found in the Methods section of our paper. We followed established protocols and guidelines to ensure accurate and reliable measurements of lipid ROS levels.

      We acknowledge that imaging techniques may have limitations in providing quantitative data. However, we employed BODIPY 581/591 C11 assays as a widely accepted and commonly used method to assess lipid ROS levels. This allowed us to obtain qualitative and semi-quantitative information on the changes in lipid ROS levels in response to metformin treatment.

      1. Liproxstatin may be a better drug choice or at the very least should be used to compare with the DFO data

      Thank you for your suggestion. We have taken your advice into consideration and conducted an evaluation of Liproxstatin as a ferroptosis inhibitor. Our findings indicate that Liproxstatin significantly improves HIRI (Figure C). We believe that incorporating Liproxstatin in our research will provide valuable insights and allow for a comprehensive comparison with the DFO data.

      Author response image 6.

      Figure3 a: Liver MDA detection; b: Serum ALT level; c: Serum AST level; d: Liver GSH level; e: Liver Fe level.

      1. The rationale for how GABA was selected is not clear. I am surprised that there were not more significant metabolite changes. It might be better to show a volcano plot of heatmap of the significantly changed features.

      Thank you for raising an important question regarding the rationale for selecting GABA as the focus metabolite in our study. Initially, we also had concerns about the limited number of significant metabolite changes observed. However, through our comprehensive metabolomic profiling, we identified GABA as the most significantly altered metabolite following HIRI.

      It is worth noting that we specifically focused on the measurement of 22 essential amino acids in our analysis. While it is possible that changes in non-essential amino acids may have occurred, we did not examine them in this study. Nevertheless, we have since used additional methods to validate the upregulation of GABA levels, and the biological effects observed support the specific role of GABA in protecting against HIRI. Based on the fact that GABA was the only significant amino acid, the volcano plot was of little significance, so we did not supplement this plot.

      We appreciate your valuable input and thank you for bringing up this important issue.

      1. The manuscript needs to be proofread and edited. There are a variety of typos and grammar issues throughout.

      Thank you for your feedback. We acknowledge that the manuscript requires proofreading and editing, as we have identified several typos and grammar issues. We will try to ensure that the necessary revisions are made to improve the overall quality of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      However, I have some major concerns for the manuscript.

      1. Line 26 16S rRNA and metagenomic sequencing alone can't accurately confirm the improvement effect of GABA producing bacteria on HIRI. In fact, transcriptome analysis, HPLC-MS/MS and other methods were also used in this paper, so the language expression here is not appropriate

      Thank you for pointing out the language expression issue in line 26 of the manuscript. We apologize for any confusion caused. You are correct in stating that 16S rRNA and metagenomic sequencing alone may not accurately confirm the improvement effect of GABA-producing bacteria on HIRI. In our study, we employed a combination of multiple methods, including transcriptome analysis, HPLC-MS/MS, especially detection of bacteria GABA key synthetases, PAT and GAD, to comprehensively investigate the impact of GABA-producing bacteria on HIRI.

      We have revised the language in line 26 to reflect the broader range of methods used in our study to support the conclusions regarding the improvement effect of GABA-producing bacteria on HIRI.

      1. The Introduction section needs to add a description of the previous research on the association between HIRI and ferroptosis

      Thank you for your suggestion regarding the inclusion of a description of the association between HIRI and ferroptosis in the Introduction section. We agree that this is an important aspect to address. However, upon further consideration, we have decided to move the discussion of ferroptosis and its potential role in HIRI to the Discussion section, as it aligns better with the logical flow of the manuscript. This allows us to discuss the potential implications and future directions in a more organized and coherent manner.

      1. Authors should provide quantified figure or table next to the results of western blot that are more convenient to understand.

      We have revised in manuscript. (See sfigure 7)

      1. In this paper, FMT experiments are used to verify that metformin remodeled gut microbiota can play a role in improving HIRI. The operation steps of FMT should be described more specifically in the method part

      *What is the fecal donor information for FMT?

      *Line272 Did the IR + FMT group put the transplanted microbiota of FMT directly into the drinking water like the other treatment groups? Will such an operation affect the quality and quantification of the transplanted microbiota and lead to the loss of microbiota species? It is crucial for the authors to provide a clear and thorough clarification regarding these matters within the context of their FMT experiment.

      Thank you for your feedback regarding the need for a more detailed description of the fecal microbiota transplantation (FMT) procedure and clarification regarding the IR + FMT group in our manuscript. We appreciate your suggestions and we have taken them into consideration.

      In our study, the fecal donor for FMT was obtained from mice that had been orally administered metformin. The fecal microbiota was collected and processed to remove any residual metformin before transplantation. Specifically, the microbiota for the IR + FMT group was administered through gavage, as stated in line 272. This method does not affect the quality or quantity of the transplanted microbiota, nor does it lead to a loss of microbiota species. We understand the importance of providing clear and thorough clarification regarding these matters. Therefore, we have included additional specific details of the FMT procedure in the revised version of the manuscript. We hope that this clarification addresses your concerns and provides a more comprehensive understanding of our FMT experiment.

      1. The presentation of transcriptomic analysis results in the manuscript is insufficiently comprehensive and specific, as they are solely depicted through Fig 4a. Relying solely on Fig 4a is inadequate to establish the definitive roles of the met group and FMT group in ferroptosis compared to other groups. Therefore, the authors should provide additional transcriptomic analysis results to ascertain the specific effects of the met group and FMT group in ferroptosis, as well as their comparison with other groups.

      Thank you for your feedback regarding the comprehensiveness of our transcriptomic analysis results in the manuscript. We understand your concerns and appreciate your suggestion. In our study, we have provided additional data beyond Fig 4a to support the specific effects of the met group and FMT group in ferroptosis, as well as their comparison with other groups. Specifically, in Figure 3, we have included Western blot (WB) and quantitative real-time polymerase chain reaction (qRT-PCR) data to confirm the involvement of ferroptosis in HIRI and the role of metformin in attenuating ferroptosis. Moreover, we have presented transcriptomic analysis results in Figure 3h, which includes a heatmap of genes related to lipid metabolism. These findings can strengthen our conclusions regarding the importance of ferroptosis in HIRI and the protective effects of metformin against ferroptosis. We hope that these data address your concerns and provide a more comprehensive understanding of our research findings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Summary:

      The authors examine the eigenvalue spectrum of the covariance matrix of neural recordings in the whole-brain larval zebrafish during hunting and spontaneous behavior. They find that the spectrum is approximately power law, and, more importantly, exhibits scale-invariance under random subsampling of neurons. This property is not exhibited by conventional models of covariance spectra, motivating the introduction of the Euclidean random matrix model. The authors show that this tractable model captures the scale invariance they observe. They also examine the effects of subsampling based on anatomical location or functional relationships. Finally, they briefly discuss the benefit of neural codes which can be subsampled without significant loss of information.

      Strengths:

      With large-scale neural recordings becoming increasingly common, neuroscientists are faced with the question: how should we analyze them? To address that question, this paper proposes the Euclidean random matrix model, which embeds neurons randomly in an abstract feature space. This model is analytically tractable and matches two nontrivial features of the covariance matrix: approximate power law scaling, and invariance under subsampling. It thus introduces an important conceptual and technical advance for understanding large-scale simultaneously recorded neural activity.

      Weaknesses:

      The downside of using summary statistics is that they can be hard to interpret. Often the finding of scale invariance, and approximate power law behavior, points to something interesting. But here caution is in order: for instance, most critical phenomena in neural activity have been explained by relatively simple models that have very little to do with computation (Aitchison et al., PLoS CB 12:e1005110, 2016; Morrell et al., eLife 12, RP89337, 2024). Whether the same holds for the properties found here remains an open question.

      We are grateful for the thorough and constructive feedback provided on our manuscript. We have addressed each point raised by you.

      Regarding the main concern about power law behavior and scale invariance, we would like to clarify that our study does not aim to establish criticality. Instead, we focus on describing and understanding a specific scale-invariant property in terms of collapsed eigenspectra in neural activity. We tested Morrell et al.’s latent-variable model (eLife 12, RP89337, 2024, [1]), where a slowly varying latent factor drives population activity. Although it produces a seemingly power-law-like spectrum, random sampling does not replicate the strict spectral collapse observed in our data (second row in Fig. S23). This highlights that simply adding latent factors does not fully recapitulate the scale invariance we measure, suggesting richer or more intricate processes may be involved in real neural recordings.

      Specifically, we have incorporated five key revisions.

      • As mentioned, we evaluated the latent variable model proposed by Morrell et al., and found that they fail to reproduce the scale-invariant eigenspectra observed in our data; these results are now presented in the Discussion section and supported by a new Supplementary Figure (Fig. S23).

      • We included a comparison with the findings of Manley et al. (2024 [2]) regarding the issue of saturating dimension in the Discussion section, highlighting the methodological differences and their implications.

      • We added a new mathematical derivation in the Methods section, elucidating the bounded dimensionality using the spectral properties of our model. • We have added a sentence in the Discussion section to further emphasize the robustness of our findings by demonstrating their consistency across diverse datasets and experimental techniques.

      • We have incorporated a brief discussion on the implications for neural coding (lines 330-332). In particular, Fisher information can become unbounded when the slope of the power-law rank plot is less than one, as highlighted in the recent work by Moosavi et al. (bioRxiv 2024.08.23.608710, Aug, 2024 [3]).

      We believe these revisions address the concerns raised during the review process and collectively strengthen our manuscript to provides a more comprehensive and robust understanding of the geometry and dimensionality of brain-wide activity. We appreciate your consideration of our revised manuscript and look forward to your feedback.

      Recommendations for the authors:

      In particular, in our experience replies to the reviewers are getting longer than the paper, and we (and I’m sure you!) want to avoid that. Maybe just reply explicitly to the ones you disagree with? We’re pretty flexible on our end.

      (1) The main weakness, from our point of view, is whether the finding of scale invariance means something interesting, or should be expected from a null model. We can suggest such model; if it is inconsistent with the data, that would make the results far more interesting.

      Morrell et al. (eLife 12, RP89337,2024 [1]) suggest a very simple model in which the whole population is driven by a slowly time-varying quantity. It would be nice to determine whether it matched this data. If it couldn’t, that would add some evidence that there is something interesting going on.

      We appreciate your insightful suggestion to consider the model proposed by Morrell et al. (eLife 12, RP89337, 2024 [1]), where a slowly time-varying quantity drives the entire neural population. We conducted simulations using parameters from Morrell et al. [4, 1], as detailed below.

      Our simulations show that Morrell’s model can replicate a degree of scaleinvariance when using functional sampling or RG as referred to in Morrell et al, 2021, PRL [4] (FSap, Fig.S23A-D, Author response image 1). However, it fails to fully capture the scale-invariance of collapsing spectra we observed in data under random sampling (RSap, Fig.S23E-H). This discrepancy suggests that additional dynamics or structures in the neural activity are not captured by this simple model, indicating the presence of potentially novel and interesting features in the data that merit further investigation.

      Unlike random sampling, the collapse of eigenspectra under functional sampling does not require a stringent condition on the kernel function f(x) in our ERM theory (see Discussion line 269-275), potentially explaining the differing results between Fig.S23A-D and Fig.S23E-H.

      We have incorporated these findings into the Result section 2.1 (lines 100-101) and Discussion section (lines 277-282, quoted below):

      “Morrell et al. [4, 1] suggested a simple model in which a slow time-varying factor influences the entire neural population. To explore the effects of latent variables, we assessed if this model explains the scale invariance in our data. The model posits that neural activity is primarily driven by a few shared latent factors. Simulations showed that the resulting eigenspectra differed considerably from our findings (Fig. S23). Although the Morrell model demonstrated a degree of scale invariance under functional sampling, it did not align with the scale-invariant features under random sampling observed in our data, suggesting that this simple model might not capture all crucial features in our observations.”

      Author response image 1:

      Morrell’s latent model. A: We reproduce the results as presented in Morrell et al., PRL 126(11), 118302 (2021) [4]. Parameters are same as Fig. S23A. Sampled 16 to 256 neurons. Unlike in our study, the mean eigenvalues are not normalized to one. Dashed line: eigenvalues fitted to a power law. See also Morrell et al. [4] Fig.1C. Parameters are same as Author response image 1. µ is the power law exponent (black) of the fit, which is different from the µ parameter used to characterize the slow decay of the spatial correlation function, but corresponds to the parameter α in our study.

      (2) The quantification of the degree of scale invariance is done using a ”collapse index” (CI), which could be better explained/motivated. The fact that the measure is computed only for the non-leading eigenvalues makes sense but it is not clear when originally introduced. How does this measure compare to other measures of the distance between distributions?

      We thank you for raising this important point regarding the explanation and motivation for our Collapse Index (CI). We defined the Collapse Index (CI) instead of other measures of distance between distributions for two main reasons. First, the CI provides an intuitive quantification of the shift of the eigenspectrum motivated by our high-density theory for the ERM model (Eq. 3, Fig. 4A). This high-density theory is only valid for large eigenvalues excluding the leading ones, and hence we compute the CI measure with a similar restriction of the range of area integration. Second, when using distribution to assess the collapse (e.g., we can use kernel density method to estimate the distribution of eigenvalues and then calculate the KL divergence between the two distributions), it is necessary to first estimate the distributions. This estimation step introduces errors, such as inaccuracies in estimating the probability of large eigenvalues.

      We agree that a clearer explanation would enhance the manuscript and thus have made modifications accordingly. The CI is now introduced more clearly in the Results section (lines 145-148) and further detailed in the Methods section (lines 630-636). We have also revised the CI diagram in Fig. 4A to better illustrate the shift concept using a more intuitive cartoon representation.

      (3) The paper focuses on the case in which the dimensionality saturates to a finite value as the number of recorded neurons is increased. It would be useful to contrast with a case in which this does not occur. The paper would be strengthened by a comparison with Manley et al. 2024, which argued that, unlike this study, dimensionality of activity in spontaneously behaving head-fixed mice did not saturate.

      Thank you for highlighting this comparison. We have included a discussion (lines 303-309) comparing our approach with Manley et al. (2024) [2]. While Manley et al. [2] primarily used shared variance component analysis (SVCA) to estimate neural dimensionality, they observed that using PCA led to dimensionality saturation (see Figure S4D, Manley et al. [2]), consistent with our findings (Fig. 2D). We acknowledge the value of SVCA as an alternative approach and agree that it is an interesting avenue for future research. In our study, we chose to use PCA for several reasons. PCA is a well-established and widely trusted method in the neuroscience community, with a proven track record of revealing meaningful patterns in neural data. Its mathematical properties are well understood, making it particularly suitable for our theoretical analysis. While we appreciate the insights that newer methods like SVCA can provide, we believe PCA remains the most appropriate tool for addressing our specific research questions.

      (4) More importantly, we don’t understand why dimensionality saturates. For the rank plot given in Eq. 3,

      where k is rank. Using this, one can estimate sums over eigenvalues by integrals. Focusing on the N-dependence, we have

      This gives

      We don’t think you ever told us what mu/d was (see point 13 below), but in the discussion you implied that it was around 1/2 (line 249). In that case, D<sub>PR</sub> should be approximately linear in N. Could you explain why it isn’t?

      Thank you for your careful derivation. Along this line of calculations you suggested, we have now added derivations on using the ERM spectrum to estimate the upper bound of the dimension in the Methods (section 4.14.4). To deduce D<sub>PR</sub> from the spectrum, we focus on the high-density region, where an analytical expression for large eigenvalues λ is given by:

      Here, d is dimension of functional space, L is the linear size of functional space, ρ is the neuron density and γ is the coefficient in Eq. (3), which only depends on d, µ and E(σ<sup>2</sup>). The primary difference between your derivation and ours is that the eigenvalue λ<sub>r</sub> decays rapidly after the threshold r \= β(N), which significantly affects the summations and . Since we did not discuss the small eigenvalues in the article, we represent them here as an unknown function η(r,N,L).

      The sum is the trace of the covariance matrix C. As emphasized in the Methods section, without changing the properties the covariance spectrum, we always consider a normalized covariance matrix such that the mean neural activity variance E(σ<sup>2</sup>) = 1. Thus

      rather than

      The issue stems from overlooking that Eq. (3) is valid only for large eigenvalues (λ > 1).

      Using the Cauchy–Schwarz inequality, we have a upper bound of

      Conversely, provides a lower bound of :

      As a result, we must have

      In random sampling (RSap), L is fixed. We thus must have a bounded dimensionality that is independent of N for our ERM model. In functional sampling (FSap), L varies while the neuronal density ρ is fixed, leading to a different scaling relationship of the upper bound, see Methods (section 4.14.4) for further discussion.

      (5) The authors work directly with ROIs rather than attempting to separate the signals from each neuron in an ROI. It would be worth discussing whether this has a significant effect on the results.

      We appreciate your thoughtful question on the potential impact of using ROIs. The use of ROIs likely does not impact our key findings since they are validated across multiple datasets with various recording techniques and animal models, from zebrafish calcium imaging to mouse brain multi-electrode recordings (see Figure S2, S24). The consistency of the scale-invariant covariance spectrum in diverse datasets suggests that ROIs in zebrafish data do not significantly alter the conclusions, and they together enhance the generalizability of our results. We highlight this in the Discussion section (lines 319-323).

      (6) Does the Euclidean random matrix model allow the authors to infer the value of D or µ? Since the measured observables only depend on µ/D it seems that one cannot infer the latent dimension where distances between neurons are computed. Are there any experiments that one could, in principle, perform to measure D or mu? Currently the conclusion from the model and data is that D/µ is a large number so that the spectrum is independent of neuron density rho. What about the heterogeneity of the scales σ<sub>i</sub>, can this be constrained by data?

      Measuring d and µ in the ERM Model

      We agree with you that the individual values of d and µ cannot be determined separately from our analysis. In our analysis using the Euclidean Random Matrix (ERM) model, we fit the ratio µ/d, rather than the individual values of d (dimension of the functional space) or µ (exponent of the distance-dependent kernel function). This limitation is inherent because the model’s predictions for observable quantities, such as the distribution of pairwise correlation, are dependent solely on this ratio.

      Currently there are no directly targeted experiments to measure d. The dimensions of the functional space is largely a theoretical construct: it could serve to represent latent variables encoding cognitive factors that are distributed throughout the brain or specific sensory or motor feature maps within a particular brain region. It may also be viewed as the embedding space to describe functional connectivity between neurons. Thus, a direct experimental measurement of the dimensions of the functional space could be challenging. Although there are variations in the biological interpretation of the functional space, the consistent scale invariance observed across various brain regions indicates that the neuronal relationships within the functional space can be described by a uniform slowly decaying kernel function.

      Regarding the Heterogeneity of σ<sub>i</sub>

      The heterogeneity of neuronal activity variances ( σ<sub>i</sub>) is a critical factor in our analysis. Our findings indicate that this heterogeneity:

      (1) Enhances scale invariance: The covariance matrix spectrum, which incorporates the heterogeneity of , exhibits stronger scale invariance compared to the correlation matrix spectrum, which imposes for all neurons. This observation is supported by both experimental data and theoretical predictions from the ERM model, particularly in the intermediate density regime.

      (2) Can be constrained by data: We fit a log-normal distribution to the experimentally observed σ<sup>2</sup> values to capture the heterogeneity in our model which leads to excellent agreement with data (section 4.8.1). Figure S10 provides evidence for this by directly comparing the eigenspectra obtained from experimental data (Fig S10A-F) with those generated by the fitted ERM model (Fig S10M-R). These results suggest that the data provides valuable information about the distribution of neuronal activity variances.

      In conclusion, the ERM model and our analysis cannot separately determine d and µ. We also highlight that the neuronal activity variance heterogeneity, constrained by experimental data, plays a crucial role in improving the scale invariance.

      (7) Does the fitting procedure for the positions x in the latent space recover a ground truth in your statistical regime (for the number of recorded neurons)? Suppose you sampled some neurons from a Euclidean random matrix theory. Does the MDS technique the authors use recover the correct distances?

      While sampling neurons from a Euclidean random matrix model, we demonstrated numerically that the MDS technique can accurately recover the true distances, provided that the true parameter f(x) is known. To quantify the precision of recovery, we applied the CCA analysis (Section 4.9) and compared the true coordinates from the original Euclidean random matrix with the fitted coordinates obtained through our MDS procedure. The CCA correlation between the true and fitted coordinates in each spatial dimension is nearly 1 (the difference from 1 is less than 10<sup>−7</sup>). When fitting with experimental data, one source of error arises from parameter estimation. To evaluate this, we assess the estimation error of the fitted parameters. When we choose µ \= 0_.5 in our ERM model and then fit the distribution of the pairwise correlation (Eq. 21), the estimated parameter is = 0.503 ± 0._007 (standard deviation). Then, we use the MDS-recovered distances to fit the coordinates with the fitted kernel function , which is determined by the fitted parameter . The CCA correlation between the true and fitted coordinates in each direction remains nearly 1 (the difference from 1 is less than 10<sup>−5</sup>).

      (8) l. 49: ”... both the dimensionality and covariance spectrum remain invariant ...”. Just to be clear, if the spectrum is invariant, then the dimensionality automatically is too. Correct?

      Thanks for the question. In fact, there is no direct causal relationship between eigenvalue spectrum invariance and dimensionality invariance as we elaborate below and added discussions in lines 311-317. For eigenvalue spectrum invariance, we focus on the large eigenvalues, whereas dimensionality invariance considers the second order statistics of all eigenvalues. Consequently, the invariance results for these two concepts may differ. And dimensional and spectral invariance have different requirements:

      (1) The condition for dimensional saturation is finite mean square covariance

      The participation ratio D<sub>PR</sub> for random sampling (RSap) is given by Eq. 5:

      This expression becomes invariant as N → ∞ if the mean square covariance is finite. In contrast, neural dynamics models, such as the balanced excitatory-inhibitory (E-I) neural network [5], exhibit a different behavior, where , leading to unbounded dimensionality (see discussion lines 291-295, section 6.9 in SI).

      (2) The requirements for spectral invariance involving the kernel function

      In our Euclidean Random Matrix (ERM) model, the eigenvalue distribution follows:

      For spectral invariance to emerge: (1) The eigenvalue distribution must remain unchanged after sampling. (2) Since sampling reduces the neuronal density ρ. (3) The ratio µ/d must approach 0 to maintain invariance.

      We can also demonstrate that D<sub>PR</sub> is independent of density ρ in the large N limit (see the answer of question 4).

      In conclusion, there is no causal relationship between spectral invariance and dimensionality invariance. This is also the reason why we need to consider both properties separately in our analysis.

      (9) In Eq. 1, the exact expression, which includes i=j, isn’t a lot harder than the one with i=j excluded. So why i≠j?

      The choice is for illustration purposes. In Eq. 1, we wanted to demonstrate that the dimension saturates to a value independent of N. When dividing the numerator and denominator of this expression by N<sup>2</sup>, the term is independent of the neuron number N, but the term associated with the diagonal entries is of order O(1_/N_) and can be ignored for large N.

      (10) Fig. 2D: Could you explain where the theory line comes from?

      We first estimate ] from all neurons, and then compute D<sub>PR</sub> for different neuron numbers N using Eq.5 (). This is further clarified in lines 511-512.

      (11) l 94-5: ”It [scale invariance] is also absent when replacing the neural covariance matrix eigenvectors with random ones, keeping the eigenvalues identical (Fig. 2H).” If eigenvalues are identical, why does the spectrum change?

      The eigenspectra of the covariance matrices in full size are the same by construction, but the eigenspectra of the sampled covariance matrices are different because the eigenvectors affect the sampling results. Please also refer to the construction process described in section 4.3 where this is also discussed: “The composite covariance matrix with substituted eigenvectors in (Fig. 2H) was created as described in the following steps. First, we generated a random orthogonal matrix U<sub>r<.sup> (based on the Haar measure) for the new eigenvectors. This was achieved by QR decomposition A=U<sub>r</sub>R of a random matrix A with i.i.d. entries A<sub>ij</sub> ∼ N(0_,1/N_). The composite covariance matrix C<sub>r</sub> was then defined as, where Λ is a diagonal matrix that contains the eigenvalues of C. Note that since all the eigenvalues are real and U<sub>r</sub> is orthogonal, the resulting C<sub>r</sub> is a real and symmetric matrix. By construction, C<sub>r</sub> and C have the same eigenvalues, but their sampled eigenspectra can differ.”

      (12) Eq 3: There’s no dependence on the distribution of sigma. Is that correct?

      Indeed, this is true in the high-density regime when the neuron density ρ is large. The p(λ) depends only on E(σ<sup>2</sup>) rather than the distribution of σ (see Eq. 8). However, in the intermediate density regime, p(λ) depends on the distribution of σ (see Eq.9 and Eq.10). In our analysis, we consider E(σ<sup>4</sup>) as a measure of heterogeneity.

      (13) Please tell us the best fit values of µ/d.

      This information now is added in the figure caption of Fig S10: µ/d \= [0_.456,0.258,0.205,0.262,0.302,0._308] in fish 1-6.

      (14) l 133: ”The eigenspectrum is rho-independent whenever µ/d ≈ 0.”

      It looks to me like rho sets the scale but not the shape. Correct? If so, why do we care about the overall scale – isn’t it the shape that’s important?

      Yes, our study focuses on the overall scale not only the shape, because many models, such as the ERM with other kernel functions, random RNNs, Morrell’s latent model [4, 1], can exhibit a power-law spectrum. However, these models do not exhibit scale-invariance in terms of spectrum curve collapsing. Therefore, considering the overall scale reveal additional non-trivial phenomenon.

      (15) Figs. 3 and 4: Are the grey dots the same as in previous figures? Either way, please specify what they are in the figure caption.

      Yes, they are the same, and thank you for pointing it out. It has been specified in the figure caption now.

      (16) Fig. 4B: Top is correlation matrix, bottom is covariance matrix, correct? If so, that should be explicit. If not, it should be clear what the plots are.

      That is correct. Both matrices (correlation - top, covariance - bottom) are labeled in the figure caption and plot (text in the lower left corner).

      (17) l 158: ”First, the shape of the kernel function f(x) over a small distance ...”. What does ”over a small distance” mean?

      We thank you for seeking clarification on this point. We understand that the phrase ”over a small distance” could be made clearer. We made a revised explanation in lines 164-165 Here, “over a small distance” refers to modifications of the particular kernel function f(x) we use Eq. 11 near x \= 0 in the functional space, while preserving the overall power-law decay at larger distances. The t-distribution based f(x) (Eq. 11) has a natural parameter ϵ that describes the transition to near 0. So we modified f(x) in different ways, all within this interval of |x| ≤ ϵ, and considered different values of ϵ. Table S3 and Figure S7 provide a summary of these modifications. Figure S7 visually compares these modifications to the standard power-law kernel function, highlighting the differences in shape near x \= 0.

      Our findings indicate that these alterations to the kernel function at small distances do not significantly affect the distribution of large eigenvalues in the covariance spectrum. This supports our conclusion that the large eigenvalues are primarily determined by the slow decay of the kernel function at larger distances in the functional space, as this characteristic governs the overall correlations in neural activity.

      (18) l390 . This x<sub>i</sub> is, we believe, different from the x<sub>i</sub> which is position in feature space. Given the difficulty of this paper, it doesn’t help to use the same symbol to mean two different things. But maybe we’re wrong?

      Thank you for your careful reading and suggestion. Indeed here x<sub>i</sub> was representing activity rather than feature space position. We have thus revised the notation (Line 390 has been updated to line 439 as well.):

      In this revised notation: a<sub>i</sub>(t) represents the neural activity of neuron i at time t (typically the firing rate we infer from calcium imaging). is simply the mean activity of neuron i across time. Meanwhile, we’ll keep x<sub>i</sub> exclusively for denoting positions in the functional space.

      This change should make it much easier to distinguish between neural activity measurements and spatial coordinates in the functional space.

      (19) Eq. 19: is it correct that g(u) is not normalized to 1? If so, does that matter?

      It is correct that the approximation of g(u) is not normalized to 1, as Eq. 19 provides an approximation suitable only for small pairwise distances (i.e., large correlation). Therefore, we believe this does not pose an issue. We have newly added this note in lines 691-693.

      (20) I get a different answer in Eq. 20:

      Whereas in Eq. 20,

      µ

      Which is correct?

      Thank you for your careful derivation. We believe the difference arises in the calculation of g(u).In our calculations:

      ,

      (Your first equation seems to missed an 1_/µ_ in R’s exponent.)

      ,

      That is, Eq. 20 is correct. From these, we obtain

      rather than

      We hope this clarifies the question.

      (21) I’m not sure we fully understand the CCA analysis. First, our guess as to what you did: After sampling (either Asap or Fsap), you used ERM to embed the neurons in a 2-D space, and then applied canonical correlation analysis (CCA). Is that correct? If so, it would be nice if that were more clear.

      We first used ERM to embed all the neurons in a 2-D functional space, before any sampling. Once we have the embedding, we can quantify how similar the functional coordinates are with the anatomical coordinates using R<sub>CCA</sub> (section 2.4). We can then use the anatomical and functional coordinates to perform ASap and FSap, respectively. Our theory in section 2.4 predicts the effect on dimension under these samplings given the value of R<sub>CCA</sub> estimated earlier (Fig. 5D). The detailed description of the CCA analysis is in section 4.9, where we explain how CCA is used to find the axes in both anatomical and functional spaces that maximize the correlation between projections of neuron coordinates.

      As to how you sampled under Fsap, I could not figure that out – even after reading supplementary information. A clearer explanation would be very helpful.

      Thank you for your feedback. Functional sampling (FSap) entails the expansion of regions of interest (ROIs) within the functional space, as illustrated in Figure 5A, concurrently with the calculation of the covariance matrix for all neurons contained within the ROI. Technically, we implemented the sampling using the RG approach [6], which is further elaborated in Section 4.12 (lines 852-899), quoted below.

      Stage (i): Iterative Clustering We begin with N</sub>0</sub> neurons, where N</sub>0</sub> is assumed to be a power of 2. In the first iteration, we compute Pearson’s correlation coefficients for all neuron pairs. We then search greedily for the most correlated pairs and group the half pairs with the highest correlation into the first cluster; the remaining neurons form the second cluster. For each pair (a,b), we define a coarse-grained variable according to:

      ,

      Where normalizes the average to ensure unit nonzero activity. This process reduces the number of neurons to N<sub>1</sub> = N<sub>0</sub>/2. In subsequent iterations, we continue grouping the most correlated pairs of the coarse-grained neurons, iteratively reducing the number of neurons by half at each step. This process continues until the desired level of coarse-graining is achieved.

      When applying the RG approach to ERM, instead of combining neural activity, we merge correlation matrices to traverse different scales. During the _k_th iteration, we compute the coarse-grained covariance as:

      and the variance as:

      Following these calculations, we normalize the coarse-grained covariance matrix to ensure that all variances are equal to one. Note that these coarse-grained covariances are only used in stage (i) and not used to calculate the spectrum.

      Stage (ii): Eigenspectrum Calculation The calculation of eigenspectra at different scales proceeds through three sequential steps. First, for each cluster identified in Stage (i), we compute the covariance matrix using the original firing rates of neurons within that cluster (not the coarse-grained activities). Second, we calculate the eigenspectrum for each cluster. Finally, we average these eigenspectra across all clusters at a given iteration level to obtain the representative eigenspectrum for that scale.

      In stage (ii), we calculate the eigenspectra of the sub-covariance matrices across different cluster sizes as described in [6]. Let N<sub>0</sub> = 2<sup>n</sub> be the original number of neurons. To reduce it to size N \= N<sub>0</sub>/2<sup>k</sup> = 2<sup>n-k</sup>, where k is the kth reduction step, consider the coarse-grained neurons in step nk in stage (i). Each coarse-grained neuron is a cluster of 2<sup>n-k</sup> neurons. We then calculate spectrum of the block of the original covariance matrix corresponding to neurons of each cluster (there are 2<sup>k</sup> such blocks). Lastly, an average of these 2<sup>k</sup> spectra is computed.

      For example, when reducing from N<sub>0</sub> = 2<sup>3</sup> = 8 to N \= 2<sup>3−1</sup> = 4 neurons (k \= 1), we would have two clusters of 4 neurons each. We calculate the eigenspectrum for each 4x4 block of the original covariance matrix, then average these two spectra together. To better understand this process through a concrete example, consider a hypothetical scenario where a set of eight neurons, labeled 1,2,3,...,7,8, are subjected to a two-step clustering procedure. In the first step, neurons are grouped based on their maximum correlation pairs, for example, resulting in the formation of four pairs: {1,2},{3,4},{5,6}, and {7,8} (see Fig. S22). Subsequently, the neurons are further grouped into two clusters based on the results of the RG step mentioned above. Specifically, if the correlation between the coarse-grained variables of the pair {1,2} and the pair {3,4} is found to be the largest among all other pairs of coarse-grained variables, the first group consists of neurons {1,2,3,4}, while the second group contains neurons {5,6,7,8}. Next, take the size of the cluster N = 4 for example. The eigenspectra of the covariance matrices of the four neurons within each cluster are computed. This results in two eigenspectra, one for each cluster. The correlation matrices used to compute the eigenspectra of different sizes do not involve coarse-grained neurons. It is the real neurons 1,2,3,...,7,8, but with expanding cluster sizes. Finally, the average of the eigenspectra of the two clusters is calculated.

      (22) Line 37: ”even if two cell assemblies have the same D<sub>PR</sub>, they can have different shapes.” What is meant by shape here isn’t clear.

      Thank you for pointing out this potential ambiguity. The “shape” here refers to the geometric configuration of the neural activity space characterized as a highdimensional ellipsoid by the covariance. Specifically, if we denote the eigenvalues of the covariance matrix as λ<sub>1</sub>,λ<sub>2</sub>,...,λ<sub>N</sub>, then corresponds to the length of the i-th semi-axis of this ellipsoid (Figure 1B). As shown in Figure 1C, two neural populations with the same dimensionality (D<sub>PR</sub> = 25/11 ≈ 2.27) exhibit different eigenvalue spectra, leading to differently shaped ellipsoids. This clarification is now included in lines 39-40.

      (23) Please discuss if any information about the latent dimension or kernel function can be inferred from the measurements.

      Same as comment(6): we would like to clarify that in our analysis using the Euclidean Random Matrix (ERM) model, we fit the ratio µ/d, rather than the individual values of d (dimension of the functional space) or µ (exponent of the distancedependent kernel function). This limitation is inherent because the model’s predictions for observable quantities, such as the eigenvalue spectrum of the covariance matrix, are dependent solely on this ratio.

      For the kernel function, once the d is chosen, we can infer the general shape of the kernel function from data (Figs S12 and S13), up to a certain extent (see also lines 164-166). In particular, we can compare the eigenspectrum of the simulation results for different kernel functions with the eigenspectrum of our data. This allows us to qualitatively exclude certain kernel functions, such as the exponential and Gaussian kernels (Fig. S4), which show clear differences from our data.

      References

      (1) M. C. Morrell, I. Nemenman, A. Sederberg, Neural criticality from effective latent variables. eLife 12, RP89337 (2024).

      (2) J. Manley, S. Lu, K. Barber, J. Demas, H. Kim, D. Meyer, F. M. Traub, A. Vaziri, Simultaneous, cortex-wide dynamics of up to 1 million neurons reveal unbounded scaling of dimensionality with neuron number. Neuron (2024).

      (3) S. A. Moosavi, S. S. R. Hindupur, H. Shimazaki, Population coding under the scale-invariance of high-dimensional noise (2024).

      (4) M. C. Morrell, A. J. Sederberg, I. Nemenman, Latent dynamical variables produce signatures of spatiotemporal criticality in large biological systems. Physical Review Letters 126, 118302 (2021).

      (5) A. Renart, J. De La Rocha, P. Bartho, L. Hollender, N. Parga, A. Reyes, K. D. Harris, The asynchronous state in cortical circuits. science 327, 587–590 (2010).

      (6) L. Meshulam, J. L. Gauthier, C. D. Brody, D. W. Tank, W. Bialek, Coarse graining, fixed points, and scaling in a large population of neurons. Physical Review Letters 123, 178103 (2019).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study explores the sequence characteristics and features of high-occupancy target (HOT) loci across the human genome. The computational analyses presented in this paper provide information into the correlation of TF binding and regulatory networks at HOT loci that were regarded as lacking sequence specificity.

      By leveraging hundreds of ChIP-seq datasets from the ENCODE Project to delineate HOT loci in HepG2, K562, and H1-hESC cells, the investigators identified the regulatory significance and participation in 3D chromatin interactions of HOT loci. Subsequent exploration focused on the interaction of DNA-associated proteins (DAPs) with HOT loci using computational models. The models established that the potential formation of HOT loci is likely embedded in their DNA sequences and is significantly influenced by GC contents. Further inquiry exposed contrasting roles of HOT loci in housekeeping and tissue-specific functions spanning various cell types, with distinctions between embryonic and differentiated states, including instances of polymorphic variability. The authors conclude with a speculative model that HOT loci serve as anchors where phase-separated transcriptional condensates form. The findings presented here open avenues for future research, encouraging more exploration of the functional implications of HOT loci.

      Strengths:

      The concept of using computational models to define characteristics of HOT loci is refreshing and allows researchers to take a different approach to identifying potential targets. The major strengths of the study lies in the very large number of datasets analyzed, with hundreds of ChIP-seq data sets for both HepG2 and K562 cells as part of the ENCODE project. Such quantitative power allowed the authors to delve deeply into HOT loci, which were previously thought to be artifacts.

      Weaknesses:

      While this study contributes to our knowledge of HOT loci, there are critical weaknesses that need to be addressed. There are questions on the validity of the assumptions made for certain analyses. The speculative nature of the proposed model involving transcriptional condensates needs either further validation or be toned down. Furthermore, some apparent contradictions exist among the main conclusions, and these either need to be better explained or corrected. Lastly, several figure panels could be better explained or described in the figure legends.

      We thank the reviewer for their valuable comments.

      - We have extended the study and included a new chapter focusing on the condensate hypothesis, added more supporting evidence (including the ones suggested by the reviewer), and made explicit statements on the speculative nature of this model.

      - We have restructured the text to remove the sentences which might be construed as contradictory.

      Reviewer #2 (Public Review):

      Summary:

      The paper 'Sequence characteristic and an accurate model of abundant hyperactive loci in human genome' by Hydaiberdiev and Ovcharenko offers comprehensive analyses and insights about the 'high-occupancy target' (HOT) loci in the human genome. These are considered genomic regions that overlap with transcription factor binding sites. The authors provided very comprehensive analyses of the TF composition characteristics of these HOT loci. They showed that these HOT loci tend to overlap with annotated promoters and enhancers, GC-rich regions, open chromatin signals, and highly conserved regions, and that these loci are also enriched with potentially causal variants with different traits.

      Strengths:

      Overall, the HOT loci' definition is clear and the data of HOT regions across the genome can be a useful dataset for studies that use HepG2 or K562 as a model. I appreciate the authors' efforts in presenting many analyses and plots backing up each statement.

      Weaknesses:

      It is noteworthy that the HOT concept and their signature characteristics as being highly functional regions of the genome are not presented for the first time here. Additionally, I find the main manuscript, though very comprehensive, long-winded and can be put in a shorter, more digestible format without sacrificing scientific content.

      The introduction's mention of the blacklisted region can be rather misleading because when I read it, I was anticipating that we are uncovering new regulatory regions within the blacklisted region. However, the paper does not seem to address the question of whether the HOT regions overlap, if any, with the ENCODE blacklisted regions afterward. This plays into the central assessment that this manuscript is long-winded.

      The introduction also mentioned that HOT regions correspond to 'genomic regions that seemingly get bound by a large number of TFs with no apparent DNA sequence specificity' (this point of 'no sequence specificity' is reiterated in the discussion lines 485-486). However, later on in the paper, the authors also presented models such as convolutional neural networks that take in one-hot-encoded DNA sequence to predict HOT performed really well. It means that the sequence contexts with potential motifs can still play a role in forming the HOT loci. At the same time, lines 59-60 also cited studies that "detected putative drive motifs at the core segments of the HOT loci". The authors should edit the manuscript to clarify (or eradicate) contradictory statements.

      We thank the reviewer for their valuable comments. Below are our responses to each paragraph in the given order:

      We added a statement in the commenting and summarizing other publications that studied the functional aspects of HOT loci with the following sentence in the introduction part:

      “Other studies have concluded that these regions are highly functionally consequential regions enriched in epigenetic signals of active regulatory elements such as histone modification regions and high chromatin accessibility”.

      We significantly shortened the manuscript by a) moving the detailed analyses of the computational model to the supplemental materials, and b) shortening the discussions by around half, focusing on core analyses that would be most beneficial to the field.

      Given that the ENCODE blacklisted regions are the regions that are recommended by the ENCODE guidelines to be avoided in mapping the ChIP-seq (and other NGS), we excluded them from our analyzed regions before mapping to the genome. Instead, we relied on the conclusions of other publications on HOT loci that the initial assessments of a fraction of HOT loci were the result of factoring in these loci which later were included in blacklisted regions.

      We addressed the potential confusion by using the expression of “no sequence specificity” by a) changing the sentence in the introduction by adding a clarification as “... with no apparent DNA sequence specificity in terms of detectible binding motifs of corresponding motifs” and b) removing that part from the sentence in the discussions.

      Reviewer #3 (Public Review):

      Summary:

      Hudaiberdiev and Ovcharenko investigate regions within the genome where a high abundance of DNA-associated proteins are located and identify DNA sequence features enriched in these regions, their conservation in evolution, and variation in disease. Using ChIP-seq binding profiles of over 1,000 proteins in three human cell lines (HepG2, K562, and H1) as a data source they're able to identify nearly 44,000 high-occupancy target loci (HOT) that form at promoter and enhancer regions, thus suggesting these HOT loci regulate housekeeping and cell identity genes. Their primary investigative tool is HepG2 cells, but they employ K562 and H1 cells as tools to validate these assertions in other human cell types. Their analyses use RNA pol II signal, super-enhancer, regular-enhancer, and epigenetic marks to support the identification of these regions. The work is notable, in that it identifies a set of proteins that are invariantly associated with high-occupancy enhancers and promoters and argues for the integration of these molecules at different genomic loci. These observations are leveraged by the authors to argue HOT loci as potential sites of transcriptional condensates, a claim that they are well poised to provide information in support of. This work would benefit from refinement and some additional work to support the claims.

      Comments:

      (1) Condensates are thought to be scaffolded by one or more proteins or RNA molecules that are associated together to induce phase separation. The authors can readily provide from their analysis a check of whether HOT loci exist within different condensate compartments (or a marker for them). Generally, ChIPSeq signal from MED1 and Ronin (THAP11) would be anticipated to correspond with transcriptional condensates of different flavors, other coactivator proteins (e.g., BRD4), would be useful to include as well. Similarly, condensate scaffolding proteins of facultative and constitutive heterochromatin (HP1a and EZH2/1) would augment the authors' model by providing further evidence that HOT Loci occur at transcriptional condensates and not heterochromatin condensates. Sites of splicing might be informative as well, splicing condensates (or nuclear speckles) are scaffolded by SRRM/SON, which is probably not in their data set, but members of the serine arginine-rich splicing factor family of proteins can serve as a proxy-SRSF2 is the best studied of this set. This would provide a significant improvement to their proposed model and be expected since the authors note that these proteins occur at the enhancers and promoter regions of highly expressed genes.

      (2) It is curious that MAX is found to be highly enriched without its binding partner Myc, is Myc's signal simply lower in abundance, or is it absent from HOT loci? How could it be possible that a pair of proteins, which bind DNA as a heterodimer are found in HOT loci without invoking a condensate model to interpret the results?

      (3) Numerous studies have linked the physical properties of transcription factor proteins to their role in the genome. The authors here provide a limited analysis of the proteins found at different HOT-loci by employing go terms. Is there evidence for specific types of structural motifs, disordered motifs, or related properties of these proteins present in specific loci?

      (4) Condensates themselves possess different emergent properties, but it is a product of the proteins and RNAs that concentrate in them and not a result of any one specific function (condensates can have multiple functions!)

      (5) Transcriptional condensates serve as functional bodies. The notion the authors present in their discussion is not held by practitioners of condensate science, in that condensates exist to perform biochemical functions and are dissolved in response to satisfying that need, not that they serve simply as reservoirs of active molecules. For example, transcriptional condensates form at enhancers or promoters that concentrate factors involved in the activation and expression of that gene and are subsequently dissolved in response to a regulatory signal (in transcription this can be the nascently synthesized RNA itself or other factors). The association reactions driving the formation of active biochemical machinery within condensates are materially changed, as are the kinetics of assembly. It is unnecessary and inaccurate to qualify transcriptional condensates as depots for transcriptional machinery.

      6) This work has the potential to advance the field forward by providing a detailed perspective on what proteins are located in what regions of the genome. Publication of this information alongside the manuscript would advance the field materially.

      We thank the reviewer for constructive comments and suggestions. Below are our point-by-point responses:

      (1) We added a new short section “Transcriptional condensates as a model for explaining the HOT regions” with additional support for the condensate hypothesis, wherein some of the points raised here were addressed. Specifically, we used a curated LLPS proteins (CD-CODE) database and provided statistics of those annotation condensate-related DAPs.

      Regarding the DAPs mentioned in this question, we observed that the distributions corresponding ChIP-seq peaks confirm the patterns expected by the reviewer (Author response image 1). Namely:

      - MED1 and Ronin (THAP11) are abundant in the HOT loci, being present 67% and 64% of HOT loci respectively.

      - While the BRD4 is present in 28% of the HOT loci, we observed that the DAPs with annotated LLPS activity ranged from 3% to 73%, providing further support for the condensate hypothesis.

      - ENCODE database does not contain ChIP-seq dataset for HP1A. EZH2 peaks were absent in the HOT loci (0.4% overlap), suggesting the lack of heterochromatin condensate involvement.

      - Serine-rich splicing factor family proteins were present only in 7.7% of the HOT loci, suggesting the absence or limited overlap with splicing condensates or nuclear speckles.

      Author response image 1.

      (2) In this study we selected the TF ChIP-seq datasets with stringent quality metrics, excluding those which had attached audit warning and errors. As a result, the set of DAPs analyzed in HepG2 did not include MYC, since the corresponding ChIP-seq dataset had the audit warning tags of "borderline replicate concordance, insufficient read length, insufficient read depth, extremely low read depth". Analyses in K562 and H1 did include MYC (alongside MAX) ChIP-seq dataset.

      To address this question, we added the mentioned ChIP-seq dataset (ENCODE ID: ENCFF800JFG) and analyzed the colocalization patterns of MYC and MAX. We observed that the MYC ChIP-seq peaks in HepG2 display spurious results, overlapping with only 5% of HOT loci. Meanwhile in K562 and H1, MYC and MAX are jointly present in 54% and 44% of the HOT loci, respectively (Author response image 2).

      Author response image 2.

      These observations were also supported by Jaccard indices between the MYC and MAX ChIP-seq peaks. To do this analysis, we calculated the pairwise Jaccard indices between MYC and MAX and divided them by the average Jaccard indices of 2000 randomly selected DAP pairs. In K562 and H1, the Jaccard indices between MYC and MAX are 5.72x and 2.53x greater than the random background, respectively. For HepG2, the ratio was 0.21x, clearly indicating that HepG2 MYC ChIP-seq dataset is likely erroneous.

      Author response image 3.

      (3) Despite numerous publications focusing on different structural domains in transcription factors, we could not find an extensive database or a survey study focusing on annotations of structural motifs in human TFs. Therefore, surveying such a scale would be outside of this study’s scope. We added only the analysis of intrinsically disordered regions, as it pertains to the condensate hypothesis. To emphasize this shortcoming, we added the following sentence to the end of the discussions section.

      “Further, one of the hallmarks of LLPS proteins that have been associated with their abilities to phase-separate is the overrepresentation of certain structural motifs, which we did not pursue due to size limitations.”

      (4, 5) We agree with these statements and thank the reviewer for pointing out this faulty statement. We modified the sections in the discussions related to the condensates and removed the part where we implied that the condensate model could be because of mostly a single function of TF reservoir.

      (6) We added a table to the supplemental materials (Zenodo repository) with detailed annotation of HOT and non-HOT DAP-bound loci in the genome.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The clause with "inadequate" would be dropped if the authors sufficiently address reviewer concerns about clarity of writing, including:

      (1) Editing the title to better reflect the findings of the paper.

      (2) Making clear that the condensate model is speculative and not explicitly tested in this study (and may be better described as a hypothesis).

      (3) Resolving apparent contradictions regarding DNA sequence specificity and the interpretation of ChIP-seq signal intensity.

      (4) Better specifying and justifying model parameters, thresholds, and assumptions.

      (5) Shortening the manuscript to emphasize the main, well-supported claims and to enhance readability (especially the discussion section).

      We thank the Editor for their work. We followed their advice and implemented changes and additions to address all 5 points.

      Reviewer #1 (Recommendations For The Authors):

      (1) The title "Sequence characteristics and an accurate model of abundant hyperactive loci in the human genome" does not accurately reflect the findings of the paper. We are unclear as to what the 'accurate model' refers to. Is it the proposed model 'based on the existence of large transcriptional condensates' (abstract)? If so, there are concerns below regarding this statement (see comment 2). If the authors are referring to the computational modeling presented in Figure 5, it is unclear that any one of them performed that much better than the others and the best single model was not identified. Furthermore, the models being developed in the study constitute only a portion of the paper and lacked validation through additional datasets. Additionally, sequence characteristics were not a primary focus of the study. Only figure 5 talks about the model and sequence characteristics, the rest of the figures are left out of the equation.

      We agree with and thank the reviewer for this idea of clarifying the intended meaning.

      (1) We changed the title and clarified that the computational model is meant:

      “Functional characteristics and a computational model of abundant hyperactive loci in the human genome”.

      (2) Shortened the part of the manuscript discussing the computational models and pointed out the CNNs as “the best single model”.

      (2) The abstract and discussion (and perhaps the title) propose a model of transcriptional condensates in relation to HOT loci. However, there is no data provided in the manuscript that relates to condensates. Therefore, anything relating to condensates is primarily speculative. This distinction needs to be properly made, especially in the abstract (and cannot be included in the title). Otherwise, these statements are misleading. Although the field of transcriptional condensates is relatively new, there have been several factors studied. The authors could include in Figure 2d which factors have been shown to form transcriptional condensates. This might provide some support for the model, though it would still largely remain speculative unless further testing is done.

      We added a new short chapter “Transcriptional condensates as a model for explaining the HOT regions”,  with additional analyses testing the condensates hypothesis. We provided supportive evidence by analyzing the metrics used as hallmarks of condensates including the distributions of annotated condensate-related proteins, nascent transcription, and protein-RNA interaction levels in HOT loci. Still, we acknowledge that this is a speculative hypothesis and we clarified that with the following statement in the discussions:

      “It is important to note here that our proposed condensate model is a speculative hypothesis. Further experimental studies in the field are needed to confirm or reject it.”

      (3) Several apparent contradictions exist throughout the manuscript. For example, "HOT locus formation are likely encoded in their DNA sequences" (lines 329-330) vs the proposed model of formation through condensates (abstract). These two statements do not seem compatible, or at the very least, the authors can explain how they are consistent with each other. Another example: "ChIP-seq signal intensity as a proxy for... binding affinity" (line 229) vs. "ChIP-seq signal intensities do not seem to be a function of the DNA-binding properties of the DAPs" (lines 259-260). The first statement is the assumption for subsequent analyses, which has its own concerns (see comment 4). But the conclusion from that analysis seems to contradict the assumption, at least as it is stated.

      In this study, we argue that the two statements may not necessarily contradict each other. We aimed to a) demonstrate that the observed intensity of DAP-DNA interactions as measured by ChIP-seq experiments at HOT loci cannot be explained with direct DNA-binding events of the DAPs alone and b) propose a hypothesis that this observation can be at least partially explained if the HOT loci have the propensity to either facilitate or take part in the formation of transcriptional condensates.

      One of the conditions for condensates to form at enhancers was shown to be the presence of strong binding sites of key TFs (Shrinivas et al. 2019 “Enhancer features that drive the formation of transcriptional condensates”), where the study was conducted using only one TF (OCT4) and one coactivator (MED1). To the best of our knowledge, no such study has been conducted involving many TFs and cofactors simultaneously. We also know that the factors that lead to liquid-to-liquid phase separation include weak multivalent IDR-IDR, IDR-DNA, and IDR-RNA interactions. As a result, the observed total sum of ChIP-seq peaks in HOT loci is the direct DNA-binding events combined with the indirect DAP-DNA interactions, some of which may be facilitated by condensates. And, the fact that CNNs can recognize the HOT loci with high accuracy suggests that there must be an underlying motif grammar specific to HOT loci.

      We emphasized this conclusion in the discussions.

      The comment on using the ChIP-seq signal as a proxy for DNA-binding affinity is addressed under comment 4.

      (4) In lines 229-230, the authors used "the ChIP-seq signal intensity as a proxy for the DAP binding affinity." What is the basis for this assumption? If there is a study that can be referenced, it should be added. However, ChIP-seq signal intensity is generally regarded as a combination of abundance, frequency, or percentage of cells with binding. RNA Pol2 is a good example of this as it has no specific binding affinity but the peak heights indicate level of expression. Therefore, the analyses and conclusions in Figure 4, particularly panel A, are problematic. In addition, clarification from lines 258-260 is needed as it contradicts the earlier premise of the section (see comment 3).

      We thank the reviewer for pointing out this error. The main conclusion of the paragraph is that the average ChIP-seq signal values at HOT loci do not correlate well with the sequence-specificity of TFs. We reworded the paragraph stating that we are analyzing the patterns of ChIP-seq signals across the HOT loci, removing the part that we use them as a proxy for sequence-specific binding affinity.

      (5) In Figure 1A, the authors show that "the distribution of the number of loci is not multimodal, but rather follows a uniform spectrum, and thus, this definition of HOT loci is ad-hoc" (lines 92-95). The threshold to determine how a locus is considered to be HOT is unclear. How did the authors decide to use the current threshold given the uniform spectrum observed? How does this method of calling HOT loci compare to previous studies? How much overlap is there in the HOT loci in this study versus previous ones?

      We moved the corresponding explanation from the supplemental methods to the main methods section of the manuscript.

      Briefly, our reasoning was as follows: assuming that an average TFBS is 8bp long and given that we analyze the loci of length 400bp, we can set the theoretical maximum number of simultaneous binding events to be 50. Hence, if there are >50 TF ChIP-seq peaks in a given 400bp locus, it is highly unlikely that the majority of ChIP-seq peaks can be explained by direct TF-DNA interactions. The condition of >50 TFs corresponded to the last four bins of our binning scale, which was used as an operational definition for HOT loci.

      We have compared our definition of HOT loci to those reported in previous studies by Remaker et al. and Boyle et al. The results of our analyses are in lines 147-154.

      (6) In Figure 3B, the authors state that of "the loop anchor regions with >3 overlapping loops, 51% contained at least one HOT locus, suggesting an interplay between chromatin loops and HOT loci." However, it is unclear how "51%" is calculated from the figure. Similarly, in the following sentence, "94% of HOT loci are located in regions with at least one chromatin interaction". It is unclear as to how the number was obtained based on the referenced figure.

      Initially, the x-axis on the Figure 3B was missing, making it hard to understand what we meant. We added the x-axis numbers and changed the “51%” to “more than half”. We intend to say that, of the loci with 4 and 5 overlapping loops, exactly 50% contain at least one HOT locus. However, since for x=6 the percentage is 100% (since there’s only one such locus), the percentage is technically “more than half”.

      The percentage of HOT loci engaging in chromatin interaction regions (91%) was calculated by simply overlapping the HOT regions with Hi-C long-range contact anchors. The details of extracting these regions using FitHiChip are described in Supplemental Methods 1.3.

      (7) While we have a limited basis to evaluate computational models, we would like to see a clearer explanation of the model set-up in terms of the number of trained vs. test datasets. In addition, it would be interesting to see if the models can be applied to data from different cell lines.

      We added the table with the sizes of the datasets used for classification in Supplemental Methods 1.6.1.

      Evaluating the models trained on the HOT loci of HepG2 and K562 on other cell lines would pose challenges since the number of available ENCODE TF ChIP-seq datasets is significantly less compared to the mentioned cell lines. Therefore, we conducted the proposed analysis between the studied cell lines. Specifically, we used the CNN models trained on HOT and regular enhancers of HepG2 and K562. Then, we evaluated each model on the test sets of each classification experiment (Author response image 4). We observed that the classification results of the HOT loci demonstrated a higher level of tissue-specificity compared to the same classification results of the regular enhancers.

      Author response image 4.

      (8) Lines 349-351. The significance of highly expressed genes being more prone to having multiple HOT loci, and vice versa, appears conventional and remains unclear. Intuitively, it makes sense for higher expressed genes to have more of the transcriptional machinery bound, and would bias the analysis. One way to circumvent this is to only analyze sequence-specific TFs and remove ones that are directly related to transcription machinery.

      We thank the reviewer for this suggestion. Our attempt to re-annotate the HOT loci with only sequence-specific TFs led to a significantly different set of loci, which would not be strictly comparable to the HOT loci defined by this study. Analyzing these new sets of loci would create a noticeable departure from the flow of the manuscript and further extend the already long scope of the study.

      Moreover, numerous studies have shown that super-enhancers recruit large numbers of TFs via transcriptional condensates (Boija et al., 2018; Cho et al., 2018; Sabari et al., 2018). We hope that our results can serve as data-driven supportive evidence for those studies.

      (9) Lines 393-396. We would like to see a reference to the models shown in the figures, if these models have been published previously.

      We could not understand the question. The lines 393-396 contains the following sentence:

      “However, many of the features of the loci that we’ve analyzed so far demonstrated similar patterns (GC contents, target gene expressions, ChIP-seq signal values etc.) when compared to the DAP-bound loci in HepG2 and K562, suggesting that albeit limited, the distribution of the DAPs in H1 likely reflects the true distribution of HOT loci.”

      In case the question was about the models that we trained to classify the HOT loci, we included the models and codebase to Zenodo and GitHub repository.

      (10) Values in Figure 7D are not reflected in the text. Specifically, the text states "Average ... phastCons of the developmental HOT loci are 1.3x higher than K562 and HepG2 HOT loci (Figure 7D)" (lines 408-409). Figure 7D shows conservation scores between HOT enhancers vs promoters for each cell line, and does not seem to reflect the text.

      We modified the figure to reflect the statement appropriately.

      (11) Methodology should include a justification for the use of the Mann-Whitney U-test (non-parametric) over other statistical tests.

      We added the following description to the methods section:

      “For calculating the statistical significance, we used the non-parametric Mann-Whitney U-test when the compared data points are non-linearly correlated and multi-modal. When the data distributions are bell-curve shaped, the Student’s t-test was used.“

      Minor:

      (1) Figure 2b was never mentioned in the paper. This can be added alongside Figure S6C, line 148.

      Indeed, Figure 2B was supposed to be listed together with Figure S6C, which was omitted by mistake. It was corrected.

      (2) Supplementary Figure 8 has two Cs. Needs to be corrected to D.

      Fixed.

      (3) Figure 3B is missing labels on the x-axis.

      Fixed.

      (4) The horizontal bar graph on the bottom left of Figure 1E needs to be described in the figure legend.

      Description added to the figure caption.

      (5) Line 345, Fig 15A should be Fig S15A.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      I listed all my concerns about the paper in the public comments. I think the manuscript is very comprehensive and it is valuable, but it should be cut short and presented in a more digestible way.

      We thank the reviewer for their valuable comments and suggestions. We addressed all the concerns listed in the public comments. We shortened the manuscript by reducing the paragraph that focuses on computational classification models and reduced the discussions by about half in length.

      Line 55: What are chromatin-associated proteins, i.e. are they histone modifications?

      To clarify the definition used from the citation we changed the sentence to the following:

      “For instance, Partridge et al. studied the HOT loci in the context of 208 proteins including TFs, cofactors, and chromatin regulators which they called chromatin-associated proteins.”

      Though most of the paper can be cut short to avoid analysis paralysis for readers, there are details that still need filling in. For example, how did the authors perform PCA analysis, i.e. what are the features of each data point in the PCA analysis? Lines 214-215: How do we calculate the number of multi-way contacts in Hi-C data?

      We added clarifying descriptions and changed the mentioned sentences to the following:

      PCA:

      “To analyze the signatures of unique DAPs in HOT loci, we performed a PCA analysis where each HOT locus is represented by a binary (presence/absence) vector of length equal to the total number of DAPs analyzed.”

      Multi-way contacts on loop anchors:

      “To investigate further, we analyzed the loop anchor regions harboring HOT loci and observed that the number of multi-way contacts on loop anchors (i.e. loci which serve as anchors to multiple loops) correlates with the number of bound DAPs (rho=0.84 p-value<10E-4; Pearson correlation). “

      - Lines 251-252: How did the referenced study categorize DAPs? It is important for any manuscript to be self-contained.

      We added the explanation and changed the sentence to the following:

      “To test this hypothesis, we classified the DAPs into those two categories using the definitions provided in the study (Lambert et al. 2018) 28, where the TFs are classified by manual curation through extensive literature review and supported by annotations such as the presence of DNA-binding domains and validated binding motifs. Based on this classification, we categorized the ChIP-seq signal values into these two groups.“

      - Lines 181-185, sentences starting with 'To test' can be moved to the methods, leaving only brief mentions of the statistic tests if needed.

      We removed the mentioned sentence and moved to the supplemental methods (1.4).

      - Lines 217-220: I find this sentence extremely redundant unless it can offer more specific insights about a particular set of DAPs or if the DAPs are closer/or a proven distal enhancer to a confirmed causal gene.

      We removed the mentioned sentence from the text.

      - Lines 243-246: How did the authors determine the set DAPs that have stabilizing effects, and how exactly are the 'stabilizing effects' observed/measured?

      We added explanations to Supplemental Methods 3.1 and Fig S18, S19.

      While addressing this comment we realized that the reported value of the ratio is 1.91x, not 1.7x. We corrected that value in the main text and added the p-value.

      - When discussing the phastCons scores analyses, such as in lines 268-271, how did the authors calculate the relationship between phastCons scores and HOT loci, i.e. was the score averaged across the 400-bp locus to obtain a locus-specific conservation score?

      Yes, per-locus conservation scores were averaged over the bps of loci. We added this clarification to the methods.

      - Line 311: What is the role of the 'control sets' in the analyses of the sequence's relationship with HOT?

      In this specific case, the control sets are used as background or negative sets to set up the classification tasks. In other words, we are asking, whether the HOT loci can be distinguished when compared to random chromatin-accessible regions, promoters, or regular enhancers. We clarified this in the text.

      - I also find the discussion about different machine learning methods that classify HOT loci based on sequence contexts quite redundant UNLESS the authors decide to go further into the features' importance (such as motifs) in the models that predict/ are associated with HOT loci, which in itself can constitute another study.

      We agree with the reviewer, and shortened the part with the discussions of models by limiting it to only 3 main models and moved the rest to the supplemental materials.

      - Can the authors clarify where they obtain data on super-enhancers?

      We obtained the super-enhancer definitions from the original study (Hnisz et al. 2013, PMID: 24119843) where the super-enhancers were defined for multiple cell lines. We clarified this in the methods.

      - Figure 1B, the x and y axis should be clarified.

      We clarified it by using MAX as an example case in the figure caption as follows:

      “Prevalence of DAPs in HOT loci. Each dot represents a DAP. X-axis: percentage of HOT loci in which DAP is present (e.g. MAX is present in 80% of HOT loci). Y-axis: percentage of total peaks of DAPs that are located in HOT loci (e.g. 45% of all the ChIP-seq peaks of MAX is located in the HOT loci). Dot color and size are proportional to the total number of ChIP-seq peaks of DAP.”

      Reviewer #3 (Recommendations For The Authors):

      The list of proteins associated with different types of genomic loci at a meta level (enhancers, promoters, and gene body etc.), and an annotation of the genome at the specific loci level.

      The authors use a wide range of acronyms throughout the text and figure legends, they do a reasonably good job, but the main text section "HOT-loci are enriched in causal variants" and Figure 8 would be materially improved if they held it to the same standard.

      Size is a physical property and not a physicochemical property.

      We thank the reviewer for their comments and suggestions. We added a table to supplemental files with detailed annotations of analyzed loci.

      We reviewed the section “HOT loci are enriched in causal variants” and corrected a few mismatches in the acronyms.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      In this paper, Kalidindi and Crevecoeur ask why sequential movements are sometimes coarticulated. To answer this question, first, they modified a standard optimal controller to perform consecutive reaches to two targets (T1 and T2). They investigated the optimal solution with and without a constraint on the endpoint's velocity in the via target (T1). They observed that the controller coarticulates the movements only when there is no constraint on the speed at the via-point. They characterized coarticulation in two ways: First, T2 affected the curvature of the first reach in unperturbed reaches. Second, T2 affected corrective movements in response to a mechanical perturbation of the first reach. 

      Parallel to the modeling work, they ran the same experiment on human participants. The participants were instructed to either consider T1 as via point (go task) or to slow down in T1 and then continue to T2 (stop task). Mirroring the simulation results, they observed coarticulation only in the go task. Interestingly, in the go task, when the initial reach was occasionally perturbed, the long-latency feedback responses differed for different T2 targets, suggesting that the information about the final target was already present in the motor circuits that mediate the long-latency response. In summary, they conclude that coarticulation in sequential tasks depends on instruction, and when coarticulation happens, the corrections in earlier segments of movement reflect the entirety of the coarticulated sequence.

      Evaluation 

      Among many strengths of this paper, most notably, the results and the experiment design are grounded in, and guided by the optimal control simulation. The methods and procedures are appropriate and standard. The results and methods are explained sufficiently and the paper is written clearly. The results on modulation of long-latency response based on future goals are interesting and of broad interest for future experiments on motor control in sequential movement. However, I find the authors' framing of these results, mostly in the introduction section, somewhat complicated.

      The current version of the introduction motivates the study by suggesting that "coarticulation and separation of sub-movement [in sequential movements] have been formulated as distinct hypotheses" and this apparent distinction, which led to contradictory results, can be resolved by Optimal Feedback Control (OFC) framework in which task-optimized control gains control coarticulation. This framing seems complicated for two main reasons. First, the authors use chunking and coarticulation interchangeably. However, as originally proposed by (Miller 1956), the chunking of the sequence items may fully occur at an abstract level like working memory, with no motoric coarticulation of sequence elements at the level of motor execution. In this scenario, sequence production will be faster due to the proactive preparation of sequence elements. This simple dissociation between chunking and coarticulation may already explain the apparent contradiction between the previous works mentioned in the introduction section. Second, the authors propose the OFC as a novel approach for studying neural correlates of sequence production. While I agree that OFC simulations can be highly insightful as a normative model for understanding the importance of sequence elements, it is unclear to me how OFCs can generate new hypotheses regarding the neural implementation of sequential movements. For instance, if the control gains are summarizing the instruction of the task and the relevance of future targets, it is unclear in which brain areas, or how these control gains are implemented. I believe the manuscript will benefit from making points more clear in the introduction and the discussion sections. 

      We agree that chunking may occur at different levels that do not necessarily involve motor coarticulation. We clarified that our contribution is towards answering why sequence movements sometimes coarticulate, and how the way sequences are executed influences the representation of future goals in the sensorimotor system.

      To address this point, we made the following modifications in the introduction:

      Line 44:

      “It remains unclear how future goals are integrated in the sensorimotor system. For rapid execution of a sequence, one possible solution is to represent multiple goals within low-level control circuits (3, 16), enabling the execution of several elements as a single entity, called “motor chunk”. Note that chunking can also occur at a higher level such as in working memory-guided sequences, which in this case may or may not involve the production of a movement (17, 18).”

      Lines 50:

      “Recent neural recordings in the primary motor cortex (M1) have shown no specific influence of future goals on the population responses governing ongoing action (19, 20). Specifically, Zimnik and Churchland (20) observed in a two-reach sequence task that, there was no coarticulation in sub-movement kinematics although the execution got faster with practice. Notably, M1 displayed separate phases of execution related activity for each sub-movement. Using a neural network model, they interpreted that sequence goals could be separated and serially specified to the controller from regions upstream of M1 (Figure 1A). These findings contrast with earlier studies showing coarticulation of sub-movements and whole sequence representations in M1 (21–23). As a result, it has been suggested that coarticulation and separation in rapid sequences may involve distinct computations: coarticulation possibly involves replacing sub-movements with a motor chunk, while separation possibly indicates independent control of each sub-movement with chunking at a higher-level (4, 20).  Thus, there are unresolved questions regarding why sequential movements sometimes coarticulate, and how the representation of future goals in the sensorimotor system influences the way sequences are executed.”

      With respect to the second part of your concern about OFC, we agree that this framework does not make direct prediction about the neural implementation and our statements required clarifications. The first link between the model and prediction about neural data follows from the observation that long-latency circuits participate in task-dependent sequence production, thus indicating that transcortical pathways must express this task dependency. The second link between our work and neural activities is by providing a counter argument to previous interpretation: indeed, Zimnik and Churchland argued that independent or “holistic” sequence production should be associated with different representations in monkey’s brain. In contrast we suggest that the same controller can flexibly generate both kinds of sequences, without implying a different structure in the controller, only a different cost-function. We thus refine the expectation about neural correlates of sequence representations by showing that it potentially relates to the encoding of task constraints.

      To address this point, we added the following changes in the introduction and discussion:

      Line 69 in Introduction: 

      “The theory of optimal feedback control (OFC) has been particularly useful in predicting the influence of numerous task parameters on the controller (27–34), thus reproducing goal-directed motor commands during both unperturbed movements and feedback responses to disturbances (30). OFC has been used in numerous studies to interpret flexible feedback responses occurring in the long-latency response period (30, 35).” 

      Line 454 in Discussion:

      “Although OFC has been predominantly used as a behavioral level framework agnostic to neural activity patterns, it can shed light on the planning, state estimation and execution related computations in the transcortical feedback pathway (Takei et al.,). Using OFC, our study proposes a novel and precise definition of the difference to expect in neural activities in order to identify coarticulated versus independent sequence representations from a computational point of view. Because each condition (i.e., overlapping versus non-overlapping controllers as in Figure 2) was associated with different cost-functions and time-varying control gains, it is the process of deriving these control gains, using the internal representation of the task structure, that may differ across coarticulated and separated sequence conditions. To our knowledge, how and where this operation is performed is unknown. A corollary of this definition is that the preparatory activity (20, 50) may not discern independently planned or coarticulated sequences because these situations imply different control policies (and cost functions), as opposed to different initial states. Moreover, the nature of the sequence representation is potentially not dissociable from its execution for the same reason.”

      Reviewer #2 (Public Review):

      Summary: 

      In this manuscript, the authors examine the question of whether discrete action sequences and coarticulated continuous sequential actions can be produced from the same controller, without having to derive separate control policies for each sequential movement. Using modeling and behavioral experiments, the authors demonstrate that this is indeed possible if the constraints of the policy are appropriately specified. These results are of interest to those interested in motor sequences, but it is unclear whether these findings can be interpreted to apply to the control of sequences more broadly (see weaknesses below). 

      Strengths: 

      The authors provide an interesting and novel extension of the stochastic optimal control model to demonstrate how different temporal constraints can lead to either individual or coarticulated movements. The authors use this model to make predictions about patterns of behavior (e.g., in response to perturbations), which they then demonstrate in human participants both by measuring movement kinematics as well as EMG. Together this work supports the authors' primary claims regarding how changes in task instructions (i.e., task constraints) can result in coarticulated or separated movement sequences and the extent to which the subsequent movement goal affects the planning and control of the previous movement. 

      Weaknesses: 

      I reviewed a prior version of this manuscript, and appreciate the authors addressing many of my previous comments. However, there are some concerns, particularly with regard to how the authors interpret their findings. 

      We thank the reviewer for their continued assessment of our work and for helping us to improve the paper. We are convinced that this and the previous review helped us clarifying our work considerably.

      (1) It would be helpful for the authors to discuss whether they think there is a fundamental distinction between a coarticulated sequence and a single movement passing through a via point (or equivalently, avoiding an obstacle). The notion of a coarticulated sequence brings with it the notion of sequential (sub)movements and temporal structure, whereas the latter can be treated as more of a constraint on the production of a single continuous movement. If I am interpreting the authors' findings correctly it seems they are suggesting that these are not truly different kinds of movements at the level of a control policy, but it would be helpful for the authors to clarify this claim. 

      Indeed, this is our interpretation of the results/simulations. This suggestion can also be observed in Ramkumar et al., article on chunking. To clarify this, we added a statement in the discussion as follows: 

      Line 449: 

      “Notably, in the framework of optimal feedback control, an intermediate goal is equivalent to a via-point that constrains the execution of the sequence (similar to (13)). It is thus possible that coarticulation in motor systems be processed similarly as other kinds of movement constraints, such as via-points, avoiding obstacles, or changes in control policies.”

      (2) The authors' model clearly shows that each subsequent target only influences the movement of one target back, but not earlier ones (page 7 lines 199-204). This stands in contrast to the paper they cite from Kashefi 2023, in which those authors clearly show that people account for at least 2 targets in the future when planning/executing the current movement. It would be useful to know whether this distinction arises because of a difference in experimental methodology, or because the model is not capturing something about human behavior.  

      Thank you for raising this point. There are some differences between the study of Kashefi and colleagues (2023), and ours. Both studies looked into planning of more than one reach. In the study of Kashefi et al., the results of Figure 6 showed that in H2 condition, there was no significant curvature, and the curvature increases in H3 and H4 conditions (only in the 75ms dwell-time scenario). Note that H2 condition in their work meant the presentation of +2 target after the initiation of +1 reach. Hence, we think the GO task in our case should be compared to the H3 condition, resulting in similar curvature as in our study. These authors also showed that curvature increased even in the H4 condition (75 ms dwell). OFC also accommodates this observation, if we consider the relationship between the cost of intermediate goals and spatial location of the targets (see figure below, also added to Supplementary Figure 4). To see this, we performed additional 3 target simulations where the constraint on intermediate goal velocity (at T1 and T2) was varied to achieve similar dwell velocity at the intermediate targets (Supplementary Figure 4C). In this case, the hand curvature of the first reach differed while the dwell velocity was similar across T3 up and T3 down conditions, as may be instructed experimentally. Again, the task instructions and the spatial location of the future goals together determine how much the first reach components are influenced by the next ones, and this may impact several reaches ahead. 

      We added the following clarification in the result to describe this. 

      Line 199:

      “It is worth noting that the OFC model can be generalized to longer sequences (10) through the incorporation of additional cost terms (in Equation 10 of Methods) and targets, enabling simultaneous planning for more than two targets. Simulations of a sample three-reach sequence (Supplementary Figure S4) revealed that, varying the cost of dwell velocity at intermediate targets (w2 and w3 parameters in Methods) caused a variation in control gains. Different amount of change in control gains can be expected for intermediate versus late targets (Supplementary Figure 4A). Notably, even when we used the same dwell velocity cost (w2 = w3 = 0), the observed velocity profiles were different between the two sequences towards different final targets (T3 up and T3 down) (Supplementary Figure 4B). We tested a condition in which both sequence reaches were forced to have similar dwell velocity profiles by increasing the dwell velocity costs in the sequence towards one of the targets (T3 down), while leaving this parameter unchanged for the other target (T3 up). In this scenario, T3 up sequence had the parameters (w2, w3) = (0, 0), while T3 down sequence had the parameters (0.8, 0.8). In this case, the curvature of the first reach was different, and predominantly occurred due to differences in K2 between the two sequence reaches (Supplementary Figure S4C). These simulations highlight that, planning for a longer horizon sequence can indirectly influence the curvature of early reaches, due to the interaction between intermediate dwell constraints, spatial arrangement of targets, and sequence horizon in a task dependent manner.”

      (3) In my prior review I raised a concern that the authors seem to be claiming that because they can use a single control policy for both coarticulated and separated movement sequences, there need not be any higher-level or explicit specification of whether the movements are sequential. While much of that language has been removed, it still appears in a few places (e.g., p. 13, lines 403-404). As previously noted, the authors' control policy can generate both types of movements as long as the proper constraints are provided to the model. However, these constraints must be specified somewhere (potentially explicitly, as the authors do by providing them as task instructions). Moreover, in typical sequence tasks, although some movements become coarticulated, people also tend to form chunks with distinct chunk boundaries, which presumably means that there is at least some specification of the sequential ordering of these chunks that must exist (otherwise the authors' model might suggest that people can coarticulate forever without needing to exhibit any chunk boundaries). Hence the authors should limit themselves to the narrow claim that a single control policy can lead to separated or coarticulated movements given an appropriate set of constraints, but acknowledge that their work cannot speak to where or how those constraints are specified in humans (i.e., that there could still be an explicit sequence representation guiding coarticulation). 

      We thank the reviewer for raising this point. We do not dispute the statement that the controller needs to be set dependent on the constraints of the task that must be specified somewhere. In our view, this problem is similar to the question of how a cost-function (or a task representation) is transformed into a control policy in the brain, which is unknown in general. In the earlier version, our intention was to stress that separation can occur without necessarily implying that the goals be processed independently (as in Figure 1A and Zimnik 2021). To avoid confusion on this point, we modified this statement in the new version as follows:

      Line 405: 

      “A straightforward interpretation could be that the stopping at the first target invoked a completely different strategy in which the control of the two reaches was performed independently (Figure 1A), effectively separating the two movements, whereas executing them rapidly could produce the merging of the two sub-movements into a coarticulated sequence. While this is conceptually valid, it is not necessary and the model provides a more nuanced view: both apparent separation or coarticulation of the two motor patterns can be explained within the same framework of flexible feedback control. These different modes of sequence execution still require proper specification of the task constraints in the model, such as number of intermediate steps, dwell-time, or velocity limit. Such specifications must be considered as input to the controller.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Line 57: Distinct hypotheses. 

      Line 209, The term "planned holistically" is confusing here. Seems like the authors suggest that the sequence is "planned holistically" as long as all sequence elements are given during the optimization process. 

      We changed the sentence as follows.

      Line 218: 

      “Overall, the model predicted that even if a feedback control policy was computed by optimizing the whole sequence over a long time-horizon, the requirements associated with intermediate goals determine how early in the sequence the second (future) target can influence the feedback controller”

      Line 336, It was not clear to me why the authors explained "the weak significant" results of PEC shortening in R0 given the nonsignificant values in R1. 

      We wanted to be transparent about whether changing the statistical analysis will lead to different interpretations, such as the sequence encoding even before long latency epochs. But we realized that it could lead to confusion and we deleted this sentence in the updated manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      About Weakness #2, to clarify this point the authors should either model and discuss what it would take for their model to account for multiple targets ahead, or else run a study to show that in this task people indeed only ever plan 1 target ahead.  

      Please see our response above (in Weakness #2).

      I am still puzzled by why people would resist the perturbation more when they eventually have to move in the direction of the perturbation (e.g., p 10 lines 313-314). Perhaps this is simply due to the geometry of the task, but it could also depend on what participants were trying to accomplish in the experiment. To help clarify this, the authors should report exactly what instructions were given to participants in each task condition.  

      The simulations suggest that the observed perturbation movements are an optimal way to perform the task given the task constraints on accuracy, control effort and constraints at intermediate goals. The intuition is that modulating the acceleration at the intermediate goal is preferred rather than missing it. This however depends on the cost parameter. 

      Below, in Author response figure 1, we show the simulations by varying the accuracy requirements at intermediate goal and the total motor cost parameters. Clearly, as expected, increasing the cost on accuracy of the intermediate reach, or decreasing the cost on motor output modulated the hand deviation (simulations not included in the article).

      Author response image 1.

      Impact of movement costs (motor effort and intermediate goal reach errors) on the hand path following a mechanical perturbation   

      Our observation suggests that participants’ behaviour agreed with the interpretation that can result from the model. We clarified the exact instructions in the methods section. Note that the instructions were given at the beginning of the task and did not differ across the different conditions involving changes in the location of T2 or perturbation direction:

      Line 594:

      Participants were given the following instructions verbally: “Wait in the starting circle until you receive a GO signal, where the target circles turn red and you will simultaneously hear a beep sound. When the circles turn red, react quickly, move as soon, and as straight as possible to target 1 and then move to target 2. You will get two points at the end of the trial if you reach T1 in the prescribed time window and then move to T2, and in all other cases you will not receive any points. Importantly, once you reach T1 you should try to come out of it quickly. If you stay in T1 for more than 150 ms then T2 will disappear and you will receive only one point. Additionally, in some trials, a force will perturb your hand towards the right or left direction randomly while moving towards T1. The instructions remain the same in the presence of perturbations. Try to score as many points as you can.”

      Additionally, we added the following lines in the results description:

      Line 284:

      “The influence of second target on the lateral hand deviation was qualitatively similar to that observed in model simulations, and counterintuitive to what we might expect without the help of the model simulations. As observed in the model simulations (see also Supplementary Figure S2), lateral hand deviation was smaller when the perturbation was in the direction of the second target (T2) and vice-versa. This was consistent for both rightward and leftward perturbation conditions. Both the model and humans expressed this strategy that can be seen as an emergent feature of efficient feedback control during production of movement sequences. Additionally, even though behavior was reproduced in simulations, changing the cost on control effort and/or accuracy of intermediate reaches could modulate the sequencedependent changes in curvature.”

      I am not sure if "the data and code for simulations can be provided by the corresponding author" satisfies the eLife/PLoS software guidelines (i.e., that it be deposited in a public repository).

      Thank you for pointing this out. This sentence was added by mistake.

      We modified this statement in the updated manuscript. 

      “The data and code from simulations and experiments is available in the public repository ‘figshare’ in the following link (https://figshare.com/s/865a8b77c264ef17a181).”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Recommendation 1: The authors reasoned upon the presence of a differential basal hydraulic stress in waves' valleys vs hills at first from the observation of "domes" formation upon 48h cultivation. I suggest performing a quantification to support the statement as a good scientific practice. Furthermore, it would strengthen the concept when the formation of domes was compared between the waves' dimensions as a different grade of cell extrusion was quantified. i.e., 50, 100, and 200 µm.

      Response 1: Upon seeing the phenomenon (Author response image 1 A), we performed a count for domes on the 100 µm and saw a significant effect. We refrained from including the results as it is the subject of ongoing research in our lab. In response to the reviewer’s suggestion, we have included a graph (Author response image 1 B) showing the increasing number of domes over 48 hours from three 100 µm wave samples.

      We have updated Figure 2A and B in the manuscript to include the new graph.

      Author response image 1.

      (A) shows dome (white arrows) over a 100 µm wave substrate. (B) is the number of accumulated domes in valley and hill regions, for 3 independent samples, over 48 hours.

      Recommendation 2: Using RICM microscopy to quantify the cell basal separation with the substrate and hydraulic stress is very clever. Nevertheless, I am in doubt if the different intensity reported for the hills vs valley (Fig. 2G and H) is a result of the signal reduction at deeper Z levels. Since there is no difference in extrusion and forces between valleys and hills in the 200 µm waves but only in 50µm and 100µm, I would add this to the quantification. I would expect no intensity difference from RICM for the 200 µm sample if this is not an artefact of imaging.

      Response 2: We performed additional experiments on blank wave substrates (both 100 and 200 µm) to ascertain the extent of reflection intensity drop (Author response image 2A). And, as correctly pointed out by Reviewer #1, there was a drop in intensity even without cells. On the 100 µm waves, hill reflections are on average ~27 % dimmer than valley reflections. Whereas, on the 200 µm waves, hill reflections are on average ~39 % dimmer.

      Using this information, we performed a calibration on the RICM results obtained from both the 100 and 200 µm waves (Author response image 3B). The calibrated 100 µm data showed residual signatures of difference, whereas the calibrated 200 µm distributions appeared very similar. We noticed large cross- sample variations in the registered intensities, which will negatively impact effect size if not accounted for. To do this, we subsequently normalized both hill and valley intensities against planar region intensities for each sample. As shown by the final output (Author response image 3C), we were able to remove the skewness in the distributions. Moreover, 1-way ANOVA followed by a post hoc analysis with BH correction revealed a significant reduction in 100 µm hill/flat intensity ratio compared to 100 µm valley/flat intensity ratios (Δ~-23 %). Conversely, no significance was observed for the same comparison on the 200 µm waves.

      Author response image 2.

      (A). RICM from blank wave samples reveal a reduction in reflection intensity in hill regions compared to flat and valley regions.

      Author response image 3.

      (B) shows the RICM intensities after adjusting for the inherent reflection intensity drop shown in (A). (C) show the RICM intensities after normalization against planar region signals; this removes cross-sample variations and improve effect size of differences.

      We have updated the manuscript Figure 2I and text accordingly. The blank wave results are included in Figure 2-figure supplement 1 along with updated text and summary data table in Supplementary File 4.

      Recommendation 3: To measure 3D forces on top of the hills and valleys, the use of PAA gels is necessary. Since in Fig 3B, the authors show a difference in cell extrusion number between substrates and stiffnesses, I think it is necessary to confirm the presence of more extrusion in valleys vs hills on PAA gels. This would ensure the conclusion between normal forces and extrusion.

      Response 3: We do have time-lapse data with monolayers on the PAA waves. However, we felt results from the flat regions were sufficient in supporting the point being made in the text. Specifically, our original intention with PAA gels was to show that the extrusion reductions seen in osmotic perturbations were by virtue of removing basal stress and not some cryptic osmotic response. Hydrogels were chosen because they can effectively dilute basal solute concentration and thereby reduce the osmotically induced water transport. Moreover, as fluid could freely move within the gel, the fluid stress can quickly equilibrate across the basal surface. In contrast, poorly water/solute permeable substrates could lead to localized spikes in solute concentration and transient basal regions with high fluid stress.

      To get a sense of the potential difference in basal solute concentration between the two materials, we can do a quick hand-waving estimation. For monolayers on non-water/solute permeable PDMS of 20x20 mm and using the laser wavelength (640 nm) for RICM as an extreme estimate of basal separation, we should expect ~0.25 µl of total basal water content. On the other hand, we typically produce our PAM gel slabs using ~150 µl of precursor solutions. This means that, given similar amounts of solute, PAM gels will lead to monolayer basal osmolarity that is around 3 orders of magnitude lower than monolayers on PDMS, producing significantly lower osmotic potential. This implies from the outset that we should expect high survivability of cells on these substrates irrespective of curvature domains. Indeed, later immunoblotting experiments showed MDCKs exhibiting hyper activated FAK and Akt on PAM gels.

      In response to Reviewer #1’s suggestion then, we have added another supporting time-lapse (Video 19) showing typical response of MDCK monolayers on 100 µm PAA waves (Author response image 4). Evident from the time-lapses, like the planar regions, cell extrusions were very rare. This supports the idea that on PAM gels the effects of basal hydraulic stress and asymmetric forces are marginal against the strong survival signals. And the response is similar to hyper-osmotic perturbations; there, we did not see a significant difference between valley and hill extrusions.

      Author response image 4.

      Time-lapse snapshot showing negligible MDCK extrusions 24 hours after confluency over PAM gel wave substrates.

      Recommendation 4: Before proceeding with the FAK inhibitor experiment, the authors should better justify why the 4.1 wt % sucrose vs DMSO or NaCl is the most inert treatment. This can be done by citing relevant papers or showing time-lapses (as it is done for the higher FAKI14 dose).

      Response 4: Although some cells have recently been shown to be able to transport and utilize sucrose, mammalian cells generally cannot directly take up polysaccharides for metabolism and this is frequently mentioned in literature: see (Ref. R1) for example. Without special enzymes to break sucrose down into monosaccharides, such as sucrase found in the gut, the sugars should remain spectators in the culture medium, contributing only to osmotic effects.

      DMSO on the other hand, besides changing osmolarity, can also be integrated into cell membrane and pass through cells over time. It has been reported to chronically affect cell membrane properties and gene expressions (Ref. R2).

      Finally, it is well known that both sodium and chloride ions are readily taken up and transported by cells (Ref R3). They help to regulate the transmembrane potential, which in turn can affect membrane bound proteins and biochemical reactions within a cell.

      Hence, comparing the 3 hyper-osmotic perturbations, adding sucrose should have the least off- target effects on both the inhibitor study and the subsequent immunoblotting. And, in response to the reviewer’s recommendation, we have updated the text accordingly and included new references to support our statement.

      Ref R1. H. Meyer, O. Vitavska, H. Wieczorek; Identification of an animal sucrose transporter. Journal of Cell Science 124, 1984–1991 (2011). Doi: 10.1242/jcs.082024

      Ref R2. B. Gironi, Z. Kahveci, B. McGill, B.-D. Lechner, S. Pagliara, J. Metz, A. Morresi, F. Palombo, P. Sassi, P. G. Petrov; Effect of DMSO on the Mechanical and Structural Properties of Model and Biological Membranes. Biophysical Journal 119, 274-286 (2020). Doi: doi.org/10.1016/j.bpj.2020.05.037

      Ref R3. X. Zhang, H. Li; Interplay between the electrostatic membrane potential and conformational changes in membrane proteins. Protein Science 28, 502-512 (2019). Doi: 10.1002/pro.3563

      Recommendation 5: The data showing a FAK-dependent phosphorylation of AKT responsible for a higher cell survival rate in the hills is not yet completely convincing. Please show a reduced AKT phosphorylation level after FAK inhibition in high osmolarity levels. Furthermore, the levels of AKT activation seem to increase slightly upon substrate softening independently of FAK activation or osmotic pressure (i.e., Fig. 4E, Soft PDMS). The authors should comment on this in connection with the results shown for PAA gels.

      Response 5: For the additional immunoblotting experiments, work is currently underway. We could not, however, complete these experiments in time for this revision, as both Cheng-Kuang and Xianbin will shortly be taking on new jobs elsewhere. David will continue with the immunoblotting studies and should be able to include the results in an update in the coming months. As for the apparent elevated levels of AKT seen on soft silicones, we speculate that it is because we cannot immunoblot cells that have died and were inevitably washed out at the start of the procedure. Inferring from the higher extrusion rates on these soft substrates, we could be missing a significant portion of stats. Specifically, we are missing all the cells that would have lowered AKT activation but died, and had we been able to collect those statistics, perhaps both the FAK and AKT should have shown lower levels. We risk committing survival bias on the results if we read too much into the data as is.

      Alternatively, another explanation could be that, by virtue of survival of the fittest, we might have effectively selected a subpopulation of cells that were able to survive on lower FAK signals, or completely irrespectively of it.

      At any rate, to prove our foregoing hypothesis would require us to perform comprehensive immunoblotting and total transcriptome analysis over different duration conditions. Unfortunately, we do not have the time to do that for the current article, but it could be developed into a stand-alone molecular biology investigation in future. We have included similar discussion in the main text.

      Recommendation 6: In the discussion, the authors suggest the reported findings be especially relevant for epithelia that significantly separate compartments and regulate water and soluble transport. These are for example kidney epithelia (i.e., MDCK is the best experimental choice), retinal epithelium or intestinal epithelium. I would suggest that some proof-of-concept experiments could be done to support this concept. For example, I would expect keratinocytes (i.e., HaCaT) not to show a strong difference in extrusion rate between valleys and hills since the monolayer is not so sealed as kidney epithelium. In general, this kind of experiment would significantly strengthen the finding of this work.

      Response 6: As recommended, we tracked the behavior of retina pigment epithelial cells (hTERT RPE-1 from ATCC) which do not form tight monolayers like MDCKs (Ref. R4). We did not detect extrusion events occurring from monolayers of these cells (Author response image 5). This is true even for portions of monolayers over waved regions.

      Author response image 5.

      Time-lapse snapshot showing non-existent o cell extrusions from RPE monolayers confluent for over 21 hours.

      We have updated these findings in the main text discussions and included a new supporting time- lapse (Video 15) in our article.

      Ref R4 F. Liu, T. Xu, S. Peng, R. A. Adelman, L. I. Rizzolo; Claudins regulate gene and protein expression of the retinal pigment epithelium independent of their association with tight junctions. Experimental Eye Research 198, 108157 (2020). Doi: 10.1016/j.exer.2020.108157

      Recommendation 7 (minor point): Figure S1 needs to have clear notes indicating in each step what is what. i.e., where is glass, PDMS, NOA73, etc? A more detailed caption will help the figure's comprehension. Also "Cy52" should be changed to "soft silicone" to be consistent with the text (or Cy52 should be mentioned in the text).

      Response 7 (minor point): Changes were made to Figure 1-figure supplement 1 to improve comprehension accordingly. CY52 was added to the main-text, next to the first appearance of the word soft silicone, to be consistent with the figures.

      Recommendation 8 (minor point): The authors often mentioned that epithelial monolayers are denser on PAA gels. Please add a reference(s) to this statement.

      Response 8 (minor point): The statement is an inference from visually comparing monolayers on PAM gels and PDMS. The difference is quite evident (Author response image 6). The density difference is in spite of the fact that the substrates share similar starting cell numbers.

      To address the reviewer’s comment, we have combined time-lapses of monolayers on silicones and PAM gels side-by-side in Video 17 to facilitate convenient comparisons.

      Author response image 6.

      Time-lapse snapshot at 24 hours after confluence, showing conspicuously higher density of MDCK monolayers on PAM gel compared to those on silicon elastomer.

      Reviewer #2

      Recommendation 1: The sinusoidal wavy substrate that the authors use in their investigation is interesting and relevant, but it is important to realize that this is a single-curved surface (also known as a developable surface). This means that the Gaussian curvature is zero and that monolayers need to undergo (almost) no stretching to conform to the curvature. The authors should at least discuss other curved surfaces as an option for future research, and highlight how the observations might change. Convex and concave hemispherical surfaces, for example, might induce stronger differences than observed on the sinusoidal substrates, due to potentially higher vertical resultant forces that the monolayer would experience. The authors could discuss this geometry aspect more in their manuscript and potentially link it to some other papers exploring cell-curvature interactions in more complex environments (e.g. non-zero Gaussian curvature).

      Response 1: In response to reviewer #2’s recommendation we have highlighted in the discussion of our text that our waves constitute a developable surface and that cells will experience little stretching for the most part. Based on our knowledge of how curvature can modulate forces and thus osmotic effects, we included some rudimentary analysis of what one would expect on hemispherical surfaces of two types: one that is periodic and contiguous (Ref. R5), and another with delineating flat regions (Ref. R6).

      For epithelial monolayers in the first scenario, and on poorly solute/water permeable substrates, we should also expect to see a relatively higher likelihood of extrusions from concave regions compared to convex ones. Moreover, as the surfaces are now curved in both principal directions (producing larger out-of-plane forces), we should see the onset of differential extrusions seen in this study, but at larger length scales. For example, the effects seen on 100 µm hemicylindrical waves might now happen at larger feature size for hemispherical waves. Furthermore, as this kind of surface would invariably contain hyperbolic regions (saddle points), we might expect an intermediate response from these locations. If the forces in both principal directions offset each other, the extrusion response may parallel planar regions. On the other hand, if one dominates over the other, we may see extrusion responses tending to the dominating curvature (concave of convex).

      On the other hand, on curved landscapes with discrete convex or concave regions, we should expect, within the curved surface, extrusion behaviors paralleling findings in this study. What would be interesting would be to see what happens at the rims (or skirt regions) of the features. At these locations we effectively have hyperbolically curved surfaces, and like before, we should expect some sort of competing effect between the forces generated from the principal directions. So, for dome skirts, we should see fewer extrusions when the domes are small, and vice versa, when they are larger. Meanwhile, for pit rims, we should see a reversed behavior. It should also be noted that the transitioning curvature between convex/concave and planar regions would also modulate the effect.

      These effects might have interesting developmental implications. For instance, in developing pillar like tissues (e.g., villi) structures, the strong curvatures of nascent lumps would favor accumulation of cell numbers. However, once the size of the lumps reaches some critical value, epithelial cell extrusions might begin to appear at the roots of the developing structures, offsetting cell division, and eventually halting growth.

      Ref R5. L. Pieuchot, J. Marteau, A. Guignandon, T. Dos Santos, I. Brigaud, P. Chauvy, T. Cloatre, A. Ponche, T. Petithory, P. Rougerie, M. Vassaux, J. Milan, N. T. Wakhloo, A. Spangenberg, M. Bigerelle, K. Anselme, Curvotaxis directs cell migration through cell-scale curvature landscapes. Nature Communications 9, 3995 (2018). Doi: 10.1038/s41467-018-06494-6

      Ref R6. M. Werner, S. B.G. Blanquer, S. P. Haimi, G. Korus, J. W. C. Dunlop, G. N. Duda, D. W. Grijpma, A. Petersen, Surface curvature differentially regulates stem cell migration and differentiation via altered attachment morphology and nuclear deformation. Advanced Science 4, 1–11 (2017). Doi: 10.1002/advs.201600347

      Recommendation 2: The discussion of the experiments on PAM gels is rather limited. The authors describe that cells on the PAM gels experience fewer extrusions than on the PDMS substrates, but this is not discussed in sufficient detail (e.g. why is this the case). Additionally, the description of the 3D traction force microscopy and its validation is quite limited and should be extended to provide more convincing evidence that the measured force differences are not an artefact of the undulations of the surface.

      Response 2: We first saw a significant reduction in cell extrusions when we performed hyper-osmotic perturbations, and to eliminate possible off-target effects of the compounds used to increase osmolarity, we used three different compounds to be sure. In spite of this, we felt it would further support our argument, that basal accumulation of fluid stress was responsible for the extrusions, if we had some other independent means of removing fluid stress without directly tuning osmolarity through addition of extraneous solutes. We hence thought of culturing MDCK monolayers on hydrogels.

      Hydrogels were chosen because they can effectively dilute basal solute concentration (for reference ions (Na+) are continuously pumped out basally by the monolayer) and thereby reduce the associated osmotically induced water transport. Moreover, as fluid could freely move within the gel, the fluid stress can quickly equilibrate across the basal surface. In contrast, poorly water/solute permeable substrates will lead to localized spikes in solute concentration and transient basal regions with high fluid stress.

      To get a sense of the extent of difference in basal solute concentration between the two materials, we can do a quick hand-waving estimation. For monolayers on non-water-permeable PDMS of 20x20 mm, and using the laser wavelength (640 nm) for RICM as an extreme estimate of basal separation, we should expect ~0.25 µl of total basal water content. On the other hand, we typically produce our PAM gel slabs using ~150 µl of precursor solutions. This means that, given similar amounts of solute, PAM gels will lead to monolayer basal osmolarity that is around 3 orders of magnitude lower than monolayers on PDMS, producing significantly lower osmotic potential. This implies from the outset that we should expect high survivability of cells on these substrates. Indeed, later immunoblotting experiments showed MDCKs exhibiting hyper activated FAK and Akt on PAM gels.

      As for the 3D TFM used in this study, it is actually implemented from a well-established finite element method to solve inverse problems in engineering and has been repeatedly validated in larger scale engineering contexts (Ref. R7). The novelty and contribution of our article is in its adaptation to reconstruct cellular forces at microscopic scales.

      In brief, soft materials, such as hydrogels used in our case, are doped with fluorescent particles, coated with ECM, and then seeded with cells. The cells would exert forces that deform the soft substrate, thereby displacing the fluorescent particles from their equilibrium positions. This particle displacement can be extracted by producing an image pair with microscopy; first one with the cells, and subsequent one of relaxed gel after removal of cells with acutely cytotoxic reagents, such as SDS. There are several ways in which the displacement field can be extracted from the image pair. These include particle tracking velocimetry, particle image velocimetry, digital volume correlation, and optical flow.

      We employed 3D Farneback optical flow in our study for its superior computational performance. The method was validated using synthetically generated images from Sample 14 of the Society for Experimental Mechanics DIC challenge. The accuracy of the calculated displacements using the 3D Farneback optical flow was then compared to the provided ground truth displacements. For the highest frequency displacement image pairs, an x-component root-mean-square-error (RMSE) value of 0.0113 was observed. This was lower than the 0.0141 RMSE value for the Augmented Lagrangian Digital Volume Correlation method. This suggested that the 3D Farneback optical flow is capable of accurately calculating the displacement between two bead images.

      The displacement fields are then fed into a finite element suite (ANSYS in our case) along with the model and mesh of the underlying substrate structure to obtain node specific displacements. This is required because mech nodes do not typically align with voxel positions of displacements. With these node specific displacements, we subsequently solve the inverse problem for the forces using Tikhonov regularization (Ref. R8). The outcome is a vector of node specific forces.

      In light of the above, to physically validate the method in our context would require the generation of a known ground truth force on the scale of pico- to nano-newtons and subsequently image the particle displacements from this force using confocal microscopy. The force must then be released in situ in order for the relaxed gel to be imaged again. This is not a straightforward feat at this scale, and a method that immediately springs to mind is magnetic tweezers. Unfortunately, this is a tool that we cannot develop within reasonable timeframes, as the method will have to be seamlessly integrated with our spinning-disk confocal. However, as a compromise, we have included an in-silico validation with our revised manuscript.

      Specifically, given a finite element model with a predefined curvature, a known force was applied to the surface of the model (Author response image 7A). The resulting displacements were then calculated from the finite element solution. A 10% random noise is then added to the resulting displacement. The traction force recovery (Fig. R2-1 B) was then performed using the in-silico noisy displacements. To evaluate the accuracy of the recovery, the cosine similarity along with the mean norm of the force vectors were calculated. A value closer to 1 for both evaluation metrics indicates a more accurate reconstruction of the simulated traction force. The cosine similarity of the recovered traction forces to the original applied force was 0.977±0.056 while the norm of the recovered traction forces as a proportion of the original applied force was 1.016±0.165. As both values are close to 1 (i.e., identical), this suggested that the traction forces could be satisfactorily recovered using the finite-element based method.

      In response to the reviewer’s recommendations then, additional content has been included in the main text to explain the use of PAM gels and the workings of our 3D TFM pipeline.

      Ref R7. James F. Doyle, Modern Experimental Stress Analysis: Completing the Solution of Partially Specified Problems (John Wiley & Sons, Chichester, 2004).

      Ref R8. Per Christian Hansen, Discrete Inverse Problems: Insight and Algorithms (siam, Philadelphia, 2010).

      Author response image 7.

      (A) shows simulated force field to generate simulated displacements. (B) shows force field reconstructed from simulated displacements with noise.

      Recommendation 3: The authors show nuclear deformation on the hills and use this as evidence for a resultant downward-pointing force vector. This has, indeed, also been observed in other works referenced by the authors (e.g. Werner et al.), and could be interesting evidence to support the current observations, provided the authors also show a nuclear shape on the concave and flat regions. The authors could potentially also characterize this shape change better using higher-resolution data.

      Response 3: We characterized nucleus deformation using Hoechst-stained samples as per recommendation. The deformation is estimated by dividing segmented nuclei volumes by best-fit ellipsoid volumes of same objects. In this way, objects exhibiting minimal bending will lead to values close to 1.0. The obtained graph is shown in figure Author response image 8B (and manuscript Figure 3D).

      Author response image 8.

      (A) an example of deformed nuclei on 50 µm wave hill region. (B) a Violin plot of calculated nuclear deformations across dimensions and features using segmented volume normalized against best-fit ellipsoid volume.

      Our quantifications show a statistically significant difference in nuclei deformation measure medians between hill and valley cells on the 50 µm (0.973 vs 0.982) and 100 µm (0.971 vs 0.979) waves; this indicates that cells on the hills tend to have more deformed nuclei compared to cells in the valleys. Meanwhile, no significant difference was found for a similar comparison on 200 µm (0.978 vs 0.978) samples. For reference, the median found for cells pooled from planar regions was 0.975.

      In response to the reviewer’s suggestions Figure 3 of our manuscript has been updated to include the new results on nuclei deformation. The text has also been updated to account for the new information to support our claims. The statistics are included in a new summary data table in Supplementary File 6.

      Recommendation 4: The U-net for extrusion detection is a central tool used within this study, though the explanation and particularly validation of the tool are somewhat lacking. More clarity in the explanation and more examples of good (or bad) detections would help establish this tool as a more robust component of the data collection (on all geometries).

      Response 4: The architecture of the neural network used in this study is outlined in supplementary figure S5a. To validate the performance of the model, a test dataset consisting of 200 positive examples and 100 negative examples were fed into the network and the resulting prediction was obtained from model. The confusion matrix of the model is shown in supplementary figure S5c. The weighted precision and recall of the model are 0.958 and 0.953 respectively.

      Additionally, we have included examples of false positive and false negative detections in Figure 1-figure supplement 5 (Author response image 8). For false positive detections, these were typically observed to be extrusions that were labelled to have occurred the frame prior to the frame of interest (Author response image 9 bottom sequence). However, as the extrusion process is incomplete in the prior frame, there are still changes in the extruded cell body and the network falsely predicts this as a detection.

      Author response image 9.

      Examples of false negative and false positive extrusions registration.

      Recommendation 5: The authors study the involvement of FAK in the observed curvature-dependent and hydraulic stress-dependent spatial regulation of cell extrusion. In one of the experiments, the authors supplement the cell medium with FAK inhibitors, though only in a hyper-osmotic medium. They show that FAK inhibition counteracts the extrusion-suppressing effect of a hyper-osmotic medium. However, no data is shown on the effect of FAK inhibitors within the control medium. Would the extrusion rates be even higher then?

      Response 4: We proceeded, as suggested by the reviewer, to explore the effects of the FAK inhibitor on MDCK monolayers in our control medium. The results revealed that, at the 3 µM FAK concentration, where cells in sucrose media showed an elevated extrusion rate, monolayers in control medium quickly suffered massive cell death (Author response image 10) similar to what was seen when 6 µM FAK was introduced to sucrose medium.

      This finding suggests that osmolarity protects against FAK inhibitors in a dose dependent manner. Moreover, as cell extrusions require an intact monolayer, its rates cannot increase indefinitely: a point will be reached where an intact monolayer can no longer be maintained.

      We have updated the main text of our article to mention this observation, and also included a new time-lapse (Video 22) to demonstrate the effect.

      Author response image 10.

      Timelapse snapshot of MDCK monolayers over waves 4 hours after inclusion of focal adhesion kinase inhibitor.

      Recommendation 6: The supplementary videos show two fields of view next to each other, which is not immediately clear to the viewer. I strongly advise the authors to add a clear border between the two panels, so that it is clear that the cells from one panel are not migrating into the next panel.

      Response 6: A distinctive border has been added to the movies to separate panels showing different focal planes of the same stack.

      Recommendation 7: The general quality and layout of the figures could be improved. Some figures would benefit from higher-resolution or larger cell images (e.g. Figure 2A, C, D), and the organisation of subpanels could be improved (e.g. especially in Figure 2). The box plots and bar graphs are also not consistent throughout the manuscript in terms of colouring and style, which should be improved.

      Response 7: We have enlarged the figures in question accordingly, at the cost of reducing some information. However, the full scope of the sub-figures remains accessible in the supplementary movies. We have also tried to change the placement of the panels to improve readability. We have also adjusted the valley, hill, and flat coloring scheme for the extrusion boxplots in Figures 1 and 2 to make them consistent.

      Recommendation 8: The graphs in Figures 3E and F are confusing and difficult to interpret. The x-axis states "Position along curve in radians" but it is unclear how to relate this to the position on the wavy substrate. The graphs also have a second vertical axis on the right ("valley-interface-hill"), which adds to the confusion. I would recommend the authors provide more explanation and consider a different approach of plotting this.

      Response 8: We have removed the confusing plot of cross-sectional profile from the force graphs. To indicate positions on the waves, we have augmented radian values with Hill, Interface, and Valley accordingly.

      Recommendation 9: Specify which silicone was used for the low-stiffness silicone substrates in the methods and in the main text.

      Response 9: CY52 has been added to the main-text, next to the first appearance of the word soft silicone, to be consistent with the figures.

      Recommendation 10: The flow lines that are plotted over the RICM data make it difficult to see the underlying RICM images. I would advise to also show the RICM images without the flow lines.

      Response 10: The original movie S15 (now Video 16) showing the RICM overlapped with optical flow paths has now been replaced by a movie showing the same, but with the flow paths and RICM in separate panels.

      Recommendation 11: In the first paragraph of the discussion, the authors write: "And this difference was both dependent on the sense (positive or negative)...". This is superfluous since the authors already mentioned earlier in the paragraph that the convex and concave regions (i.e. different signs of curvature) show differences in extrusion rates.

      Response 11: The sentence has been changed to “And this difference was also dependent on the degree of curvature.”

      Recommendation 12: In the second paragraph of the discussion, the authors mention that "basal fluid spaces under monolayers in hill regions were found consistently smaller than those in valley regions". Is this data shown in the figures of the manuscript? If so, a reference should be made because it was unclear to me.

      Response 12: This statement is an inference from the comparison of the hill and valley RICM grey values. Specifically, RICM intensities are direct surrogates for basal separations (i.e., fluid space (as there cannot be a vacuum)) by virtue of the physics underlying the effect. To be more precise then, “inferred from RICM intensity differences (Figure 2I)” has been added to support the statement.

      Recommendation 13: On page 7 of the discussion, the authors talk about positively and negatively curved surfaces. This type of description should be avoided, as this depends on the definition of the surface normal (i.e. is positive convex or concave?). Rather use convex and concave in this context.

      Response 13: The wording has been changed accordingly.

      Recommendation 14: The label of Table 8 reads "Table 2".

      Response 14: The error has been corrected.

      Reviewer #3

      Recommendation 1: The central finding seems to be opposite to an earlier report (J Cell Sci (2019) 132, jcs222372), where MDCK cells in curved alginate tubes exhibit increased extrusion on a convex surface. I suggest that you comment on possible explanations for the different behaviors.

      Response 1: The article in question primarily reported the phenomenon of MDCK and J3B1A monolayers detaching from the concave alginate tube walls coated with Matrigel. The authors attributed this to the curvature induced out-of-plane forces towards the center of the tubes. Up to this point, the findings and interpretation are consistent with our current study where we also find a similar force trend in concave regions.

      To further lend support to the importance of curvature in inducing detachment, the authors cleverly bent the tubes to introduce asymmetry in curvature between outer and inner surfaces. Specifically, the outside bend is concave in both principal directions, whereas the inside bend is convex in one of its principal directions. As expected, the authors found that detachment rates from the outer surface were much larger compared to the inner one. Again, the observations and interpretations are consistent with our own findings; the convex direction will generate out-of-plane forces pointing into the surface, serving to stabilize the monolayer against the substrate. It should be noted however, since the inner-side tube is characterized by both convex and concave curvatures in its two principal directions, the resulting behavior of overlaying monolayers will depend on which of the two resulting forces become dominant. So, for gradual bends, one should expect the monolayers to still be able to detach from the inner tube surface. This is what was reported in their findings.

      For their extrusion observations, I am surprised. Because their whole material (hydrogels) is presumably both solute and water permeable, I would be more inclined to expect very few extrusions irrespective of curvature. This is indeed the case with our study of MDCKs on PAM hydrogels, where the hydrogel substrate effectively buffers against the quick build-up of solute concentration and basal hydraulic stress. Without the latter, concave monolayer forces alone are unlikely to be able to disrupt cell focal adhesions. Indeed, the detachments seen in their study are more likely by exfoliation of Matrigel rather than pulling cells off Matrigel matrix entirely.

      My guess is that the extrusions seen in their study are solely of the canonical crowding effect. If this was the case, then the detached monolayer on the outside bend could buffer against crowding pressure by buckling. Meanwhile, the monolayer on the inside bend, being attached to the surface, can only regulate crowding pressure by removing cells through extrusions. This phenomenon should be particular to soft matrices such as Matrigel. Using stiffer and covalently bonded ECM should be sufficient to prevent monolayers from detaching, leading to similar extrusion behaviors. In response to the reviewer’s recommendation then, we have included a short paragraph to state the points discussed in this response.

      Recommendation 2: Fig 3E, F: The quantities displayed on the panels are not forces, but have units of pressure (or stress).

      Response 2: we have changed “force” to “stress” according to the reviewer’s suggestion. The reason we kept the use of force in the original text was due to the fact that we were reconstructing forces. Due to discretization, the resulting forces will inevitably be assigned to element nodes. In between the nodes, in the faces, there will be no information. So, in order to have some form of continuity to plot, the face forces are obtained by averaging the 4 nodes around the element face. Unfortunately, element face areas are not typically of the same size, therefore the average forces obtained needs to be further normalized against the face area, leading to a quantity that has units of stress.

      Recommendation 3: Fig 2D: Asterisks are hard to see.

      Response 3: the color of the asterisks has been changed to green for better clarity against a B&W background.

      Recommendation 4: p 19, l 7: Word missing in "the of molding"

      Response 4: the typo has been amended to “the molding of”.

    1. Author Response

      We thank you for the time you took to review our work and for your feedback!

      The major changes to the manuscript are:

      1. We have extended the range of locomotion velocity over which we compare its dependence with cholinergic activity in Figures 2E and S2H.

      2. We have quantified the contributions of cholinergic stimulation on multiplicative and additive gains on visual responses (Figure S7).

      3. We have provided single cell examples for the change in latency to visual response (Figure S12).

      4. We have added an analysis to compare layer 2/3 and layer 5 locomotion onset responses as a function of visuomotor condition (Figure S8).

      A detailed point-by-point response to all reviewer concerns is provided below.  

      Reviewer #1 (Public Review):

      The paper submitted by Yogesh and Keller explores the role of cholinergic input from the basal forebrain (BF) in the mouse primary visual cortex (V1). The study aims to understand the signals conveyed by BF cholinergic axons in the visual cortex, their impact on neurons in different cortical layers, and their computational significance in cortical visual processing. The authors employed two-photon calcium imaging to directly monitor cholinergic input from BF axons expressing GCaMP6 in mice running through a virtual corridor, revealing a strong correlation between BF axonal activity and locomotion. This persistent activation during locomotion suggests that BF input provides a binary locomotion state signal. To elucidate the impact of cholinergic input on cortical activity, the authors conducted optogenetic and chemogenetic manipulations, with a specific focus on L2/3 and L5 neurons. They found that cholinergic input modulates the responses of L5 neurons to visual stimuli and visuomotor mismatch, while not significantly affecting L2/3 neurons. Moreover, the study demonstrates that BF cholinergic input leads to decorrelation in the activity patterns of L2/3 and L5 neurons.

      This topic has garnered significant attention in the field, drawing the interest of many researchers actively investigating the role of BF cholinergic input in cortical activity and sensory processing. The experiments and analyses were thoughtfully designed and conducted with rigorous standards, leading to convincing results which align well with findings in previous studies. In other words, some of the main findings, such as the correlation between cholinergic input and locomotor activity and the effects of cholinergic input on V1 cortical activity, have been previously demonstrated by other labs (Goard and Dan, 2009; Pinto et al., 2013; Reimer et al., 2016). However, the study by Yogesh and Keller stands out by combining cutting-edge calcium imaging and optogenetics to provide compelling evidence of layerspecific differences in the impact of cholinergic input on neuronal responses to bottom-up (visual stimuli) and top-down inputs (visuomotor mismatch).

      We thank the reviewer for their feedback.

      Reviewer #2 (Public Review):

      The manuscript investigates the function of basal forebrain cholinergic axons in mouse primary visual cortex (V1) during locomotion using two-photon calcium imaging in head-fixed mice. Cholinergic modulation has previously been proposed to mediate the effects of locomotion on V1 responses. The manuscript concludes that the activity of basal forebrain cholinergic axons in visual cortex provides a signal which is more correlated with binary locomotion state than locomotion velocity of the animal. Cholinergic axons did not seem to respond to grating stimuli or visuomotor prediction error. Optogenetic stimulation of these axons increased the amplitude of responses to visual stimuli and decreased the response latency of layer 5 excitatory neurons, but not layer 2/3 neurons. Moreover, optogenetic or chemogenetic stimulation of cholinergic inputs reduced pairwise correlation of neuronal responses. These results provide insight into the role of cholinergic modulation to visual cortex and demonstrate that it affects different layers of visual cortex in a distinct manner. The experiments are well executed and the data appear to be of high quality. However, further analyses are required to fully support several of the study's conclusions.

      We thank the reviewer for their feedback.

      1) In experiments analysing the activity of V1 neurons, GCaMP6f was expressed using a ubiquitous Ef1a promoter, which is active in all neuronal cell types as well as potentially non-neuronal cells. The manuscript specifically refers to responses of excitatory neurons but it is unclear how excitatory neuron somata were identified and distinguished from that of inhibitory neurons or other cell types.

      This might be a misunderstanding. The Ef1α promoter has been reported to drive highly specific expression in neurons (Tsuchiya et al., 2002) with 99.7% of labeled cells in layer 2/3 of rat cortex being NeuN+ (a neuronal marker), with only 0.3% of labeled cells being GFAP+ (a glial marker) (Yaguchi et al., 2013). This bias was even stronger in layer 5 with 100% of labeled cells being NeuN+ and none GFAP+ (Yaguchi et al., 2013). The Ef1α promoter in an AAV vector, as we use it here, also biases expression to excitatory neurons. In layer 2/3 of mouse visual cortex, we have found that 96.8% ± 0.7% of labeled neurons are excitatory three weeks after viral injection (Attinger et al., 2017). Similar results have also been found in rats (Yaguchi et al., 2013), where on expressing GFP under Ef1a promoter delivered using Lenti virus, 95.2% of labeled neurons in layer 2/3 were excitatory and 94.1% in layer 5 were excitatory. These numbers are comparable to the ones obtained with promoters commonly used to target expression to excitatory neurons. To do this, typically two variants of promoters based on the transcription start region of CaMKIIα gene have been used. The first, the CaMKIIα-0.4 promoter, results in 95% excitatory specificity (Scheyltjens et al., 2015). The second, the CaMKIIα-1.3 promoter, results in only 82% excitatory specificity (Scheyltjens et al., 2015), and is thus not far from chance. We have clarified this in the manuscript. Nevertheless, we have removed the qualifier “excitatory” when talking about neurons in most instances, throughout the manuscript.

      2) The manuscript concludes that cholinergic axons convey a binary locomotion signal and are not tuned to running speed. The average running velocity of mice in this study is very slow - slower than 15 cm/s in the example trace in Figure 1D and speeds <6 cm/s were quantified in Figure 2E. However, mice can run at much faster speeds both under head-fixed and freely moving conditions (see e.g. Jordan and Keller, 2020, where example running speeds are ~35 cm/s). Given that the data in the present manuscript cover such a narrow range of running speeds, it is not possible to determine whether cholinergic axons are tuned to running speed or convey a binary locomotion signal.

      Our previous analysis window of 0-6.25 cm/s covered approximately 80% of all data. We have increased the analysis window to 0-35 cm/s that now covers more than 99% of the data (see below). Also, note that very high running speeds are probably overrepresented in the Jordan and Keller 2020 paper as mice had to be trained to run reliably before all experiments given the relatively short holding times of the intracellular recordings. The running speeds in our current dataset are comparable to other datasets we have acquired in similar experiments.

      Figure 2E has now been updated to reflect the larger range of data. Please note, as the number of mice that contribute to the data now differs as a function of velocity (some mice run faster than others), we have now switched to a variant of the plot based on hierarchical bootstrap sampling (see Methods). This does not overtly change the appearance of the plot. See Author response image 1 for a comparison of the original plot, the extended range without bootstrap sampling, and the extended range with bootstrap sampling currently used in the paper.

      Author response image 1.

      Average activity of cholinergic axons as a function of locomotion velocity. (A) As in the previous version of the manuscript. (B) As in A, but with the extended velocity range. (C) As in B, but using hierarchical bootstrap sampling to estimate median (red dots) and 95% confidence interval (shading) for each velocity bin.

      3) The analyses in Figure 4 only consider the average response to all grating orientations and directions. Without further analysing responses to individual grating directions it is unclear how stimulation of cholinergic inputs affects visual responses. Previous work (e.g. Datarlat and Stryker, 2017) has shown that locomotion can have both additive and multiplicative effects and it would be valuable to determine the type of modulation provided by cholinergic stimulation.

      We thank the reviewer for this suggestion. To address this, we quantified how cholinergic stimulation influenced the orientation tuning of V1 neurons. The stimuli we used were full field sinusoidal drifting gratings of 4 different orientations (2 directions each). For each neuron, we identified the preferred orientation and plotted responses relative to this preferred orientation as a function of whether the mouse was running, or we were stimulating cholinergic axons. Consistent with previous work, we found a mixture of a multiplicative and an additive components during running. With cholinergic axon stimulation, the multiplicative effect was stronger than the additive effect. This is now quantified in Figure S7.

      4) The difference between the effects of locomotion and optogenetic stimulation of cholinergic axons in Figure 5 may be confounded by differences in the visual stimulus. These experiments are carried out under open-loop conditions, where mice may adapt their locomotion based on the speed of the visual stimulus. Consequently, locomotion onsets are likely to occur during periods of higher visual flow. Since optogenetic stimulation is presented randomly, it is likely to occur during periods of lower visual flow speed. Consequently, the difference between the effect of locomotion and optogenetic stimulation may be explained by differences in visual flow speed and it is important to exclude this possibility.

      We find that in general locomotion is unaffected by visual flow in open loop conditions in this type of experiment (in this particular dataset, there was a small negative correlation between locomotion and visual flow in the open loop condition, Author response image 2).

      Author response image 2.

      Correlation between visual flow and locomotion in open loop conditions. Average correlation of locomotion velocity and visual flow speed in open loop for all mice in Figure 5. Each dot is an imaging site. In the open loop, the correlation between locomotion and visual flow speed is close to zero, but significantly negative in this dataset.

      However, to directly address the concern that our results are influenced by visual flow, we can restrict our analysis only to locomotion onsets that occurred in absence of visual flow (Author response image 3A and R3B). These responses are not substantially different from those when including all data (Figures 5A and 5B). Thus, the difference between the effect of locomotion and optogenetic stimulation cannot be explained by differences in visual flow speed.

      Author response image 3.

      Open loop locomotion onset responses without visual flow. (A) Average calcium response of layer 2/3 neurons in visual cortex to locomotion onset in open loop in the absence of visual flow. Shading indicates SEM. (B) As in A, but for layer 5 neurons.

      5) It is unclear why chemogenetic manipulations of cholinergic inputs had no effect on pairwise correlations of L2/3 neuronal responses while optogenetic stimulation did.

      This is correct – we do not know why that is the case and can only speculate. There are at least two possible explanations for this difference:

      1) Local vs. systemic. The optogenetic manipulation is relatively local, while the chemogenetic manipulation is systemic. It is not clear how cholinergic release in other brain regions influences the correlation structure in visual cortex. It is conceivable that a cortex-wide change in cholinergic release results in a categorically different state with a specific correlation structure in layer 2/3 neurons different from the one induced by the more local optogenetic manipulation.

      2) Layer-specificity of activation. Cholinergic projections to visual cortex arrive both in superficial and deep layers. We activate the axons in visual cortex optogenetically by illuminating the cortical surface. Thus, in our optogenetic experiments, we are primarily activating the axons arriving superficially, while in the chemogenetic experiment, we are likely influencing superficial and deep axons similarly. Thus, we might expect a bias in the optogenetic activation to influencing superficial layers more strongly than the chemogenetic activation does.

      6) The effects of locomotion and optogenetic stimulation on the latency of L5 responses in Figure 7 are very large - ~100 ms. Indeed, typical latencies in mouse V1 measured using electrophysiology are themselves shorter than 100 ms (see e.g. Durand et al., 2016). Visual response latencies in stationary conditions or without optogenetic stimulation appear surprisingly long - much longer than reported in previous studies even under anaesthesia. Such large and surprising results require careful analysis to ensure they are not confounded by artefacts. However, as in Figure 4, this analysis is based only on average responses across all gratings and no individual examples are shown.

      This is correct and we speculate this is the consequence of a combination of different reasons.

      1) Calcium imaging is inherently slower than electrophysiological recordings. While measuring spiking responses using electrophysiology, response latencies of on the order of 100 ms have indeed been reported, as the reviewer points out. Using calcium imaging these latencies are typically 4 times longer (Kuznetsova et al., 2021). This is likely a combination of a) calcium signals that are slower than electrical changes, b) delays in the calcium sensor itself, and c) temporal sampling used for imaging that is about 3 orders of magnitude slower than what typically used for electrophysiology.

      2) Different neurons included in analysis. The calcium imaging likely has very different biases than electrophysiological recordings. Historically, the fraction of visually responsive neurons in visual cortex based on extracellular electrophysiological recordings has been systematically overestimated (Olshausen and Field, 2005). One key contributor to this is the fact that recordings are biased to visually responsive neurons. The criteria for inclusion of “responsive neurons” strongly influences the “average” response latency. In addition, calcium imaging has biases that relate to the vertical position of the somata in cortex. Both layer 2/3 and layer 5 recordings are likely biased to superficial layer 2/3 and superficial layer 5 neurons. Conversely, electrical recordings are likely biased to layer 4 and layer 5 neurons. Thus, comparisons at this level of resolution between data obtained with these two methods are difficult to make.

      We have added example neurons as Figure S12, as suggested.  

      Reviewer #1 (Recommendations For The Authors):

      While the study showcases valuable insights, I have a couple of concerns regarding the novelty of their research and the interpretation of results. By addressing these concerns, the authors can clarify the positioning of their research and strengthen the significance of their findings.

      (Major comments)

      1) Page 1, Line 21: The authors claim, "Our results suggest that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, enabling faster switching between internal representations during locomotion." However, it is not clear which specific data or results support the claim of "switching between internal representations." Overall, their study primarily presents responses averaged across all neurons imaged, lacking a detailed exploration of individual neuron response patterns. Population analysis, such as PCA and decoding, can be used to assess the encoding of each stimulus by V1 neurons - "internal representation."<br /> To strengthen their claim regarding "switching between internal representations," the authors could consider an experiment measuring the speed at which the population activity pattern A transitions to the population activity pattern B when the visual stimulus switches from A to B. Such experiments would significantly enhance the impact of their study, providing a clearer understanding of how BF cholinergic input influences the dynamic representation of stimuli during locomotion.

      We thank the reviewer for bringing this up. That acetylcholine enables a faster switching between internal representations in layer 5 is a speculation. We have attempted to make this clearer in the discussion. Our speculation is based on the finding that the population response in layer 5 to sensory input is faster under high levels of acetylcholine (Figures 4D and 7B). In line with the reviewer’s intuition, the neuronal response to a change in visual stimulus, in our experiment from a uniform grey visual stimulus to a sinusoidal grating stimulus, is indeed faster. Based on evidence in favor of layer 5 encoding internal representation (Heindorf and Keller, 2023; Keller and Mrsic-Flogel, 2018; Suzuki and Larkum, 2020), we interpret the decrease in latency of the population response as a faster change in internal representation. We are not sure a decoding analysis would add much to this, given that a trivial decoder simply based on mean population response would already find a faster transition. We have expanded on our explanation of these points in the manuscript.

      2) Page 4, Line 103: "..., a direct measurement of the activity of cholinergic projection from basal forebrain to the visual cortex during locomotion has not been made." This statement is incorrect. An earlier study by Reimer et al. indeed imaged cholinergic axons in the visual cortex of mice running on a wheel. They found that "After walking onset, ... ACh activation, and a large pupil diameter, were sustained throughout the walking period in both cortical areas V1 and A1." Their findings are very similar to the results presented by Yogesh and Keller - that is, BF cholinergic axons exhibited locomotion statedependent activity. The authors should clarify the positioning of this study relative to previous studies.

      Reimer, J., McGinley, M., Liu, Y. et al. Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortex. Nat Commun 7, 13289 (2016). https://doi.org/10.1038/ncomms13289

      We have clarified this as suggested. However, we disagree slightly with the reviewer here. The key question is whether the cholinergic axons imaged originate in basal forebrain. While Reimer et al. 2016 did set out to do this, we believe a number of methodological considerations prevent this conclusion:

      1) In their analysis, Reimer et al. 2016 combine data from mice with cholinergic axons labeled with either viral injection to basal forebrain or germline cross of ChAT-cre mice with reporter line. Unfortunately, it is unclear what the exact number of mice labeled with either strategy was. Based on the information in the paper, we can conclude that of the 6 mice used for experiments between 2 and 5 were germline cross. The problem with germline labeling of ChAT positive neurons is that when using a cross, VIP-ChAT+ neurons in cortex are also labeled. Based on the fact that Reimer et al. 2016 find an anticipatory increase in activity on locomotion onset, that is also seen by Larsen et al. 2018 (they use a germline cross strategy), an effect we do not see in our data, we speculate that a significant part of the signals reported in the Reimer et al. 2016 paper are from local VIP-ChAT+ neurons.

      2) In their analysis, Reimer et al. 2016 also combine all imaging data obtained from both primary auditory cortex and primary visual cortex. Given the heterogeneity in the basal forebrain cholinergic neuronal population and their projection selectivity, to better understand these signals, it’s important to acquire the signals from cholinergic axons selectively in specific cortical regions, which we do in visual cortex. Based on the information provided in their paper, we were unfortunately not able to discern the injection location for their viral labeling strategy. Given the topographic selectivity in projection from basal forebrain, this could give hints as to the relative contribution of cholinergic projections to A1 vs V1 in their data. The injection coordinates given in the methods of the Reimer paper, of 4 mm lateral and 0.5 mm posterior to bregma to target basal forebrain, are likely wrong (they fall outside the head of the mouse).

      Given the heterogeneity in the basal forebrain cholinergic neuronal population and their projection selectivity, to better understand these signals, it’s important to acquire the signals from cholinergic axons both selectively in a cortical region, as we do in visual cortex, and purely originating from basal forebrain. Collins et al. 2023 inject more laterally and thus characterize cholinergic input to S1 and A1, while Lohani et al. 2022 use GRAB sensors which complement our findings. Please note, we don’t think there is any substantial disagreement in the results of previous studies and ours, with very few exceptions, like the anticipatory increase in cholinergic activity that precedes locomotion onset in the Reimer et al. 2016 data, but not in ours. This is a rather critical point in the context of the literature of motor-related neuronal activity in mouse V1. Based on early work on the topic, it is frequently assumed that motor-related activity in V1 is driven by a cholinergic input. This is very likely incorrect given our results, hence we feel it is important to highlight this methodological caveat of earlier work.

      3) Fig. 4H: The authors found that L5 neurons exhibit positive responses at the onset of locomotion in a closed-loop configuration. Moreover, these responses are further enhanced by photostimulation of BF axons.

      In a previous study from the same authors' group (Heindorf and Keller, 2023), they reported 'negative' responses in L5a IT neurons during closed-loop locomotion. This raises a question about the potential influence of different L5 neuron types on the observed results between the two studies. Do the author think that the involvement of the other neuronal type in L5, the PT neurons, might explain the positive responses seen in the present study? Discussing this point in the paper would provide valuable insights into the underlying mechanisms.

      Yes, we do think the positive response observed on locomotion onset in closed loop is due to non-Tlx3+ neurons. Given that Tlx3-cre only labels a subset of inter-telencephalic (IT) neurons (Gerfen et al., 2013; Heindorf and Keller, 2023), it’s not clear whether the positive response is explained by the pyramidal tract (PT) neurons, or the non-Tlx3+ IT neurons. Dissecting the response profiles of different subsets of layer 5 neurons is an active area of research in the lab and we hope to be able to answer these points more comprehensively in future publications. We have expanded on this in the discussion as suggested.

      Furthermore, it would be valuable to investigate whether the effects of photostimulation of BF axons vary depending on neuronal responsiveness. This could help elucidate how neurons with positive responses, potentially putative PT neurons, differ from neurons with negative responses, putative IT neurons, in their response to BF axon photostimulation during locomotion.

      We have attempted an analysis of the form suggested. In short, we found no relationship between a neuron’s response to optogenetic stimulation of ChAT axons and its response to locomotion onset, or its mean activity. Based on their response to locomotion onset in closed loop, we split layer 5 neurons into three groups, 30% most strongly decreasing (putative Tlx3+), 30% most strongly increasing, and the rest. We did not see a response to optogenetic stimulation of basal forebrain cholinergic axons in any of the three groups (Author response image 4A). We also found no obvious relationship between the mean activity of neurons and their response to optogenetic stimulation (Author response image 4B).

      Author response image 4.

      Neither putative layer 5 cell types nor neuronal responsiveness correlates with the response to optogenetic stimulation of cholinergic axons. (A) Average calcium response of layer 5 neurons split into putative Tlx3 (closed loop locomotion onset suppressed) and non-Tlx3 like (closed loop locomotion onset activated) to optogenetic stimulation of cholinergic axons. (B) Average calcium response of layer 5 neurons to optogenetic stimulation of cholinergic axons as a function of their mean response throughout the experimental session. Left: Each dot is a neuron. Right: Average correlation in the response of layer 5 to optogenetic stimulation and mean activity over all neurons per imaging site. Each dot is an imaging site.

      (Minor comments)

      1) It is unclear which BF subregion(s) were targeted in this study.

      Thanks for pointing this out. We targeted the entire basal forebrain (medial septum, vertical and horizontal limbs of the diagonal band, and nucleus basalis) with our viral injections. All our axonal imaging data comes from visual cortex and given the sensory modality-selectivity of cholinergic projections to cortex, the labeled axons originate from medial septum and the diagonal bands (Kim et al., 2016). We have now added the labels for basal forebrain subregions targeted next to the injection coordinates in the manuscript.

      2) Page 43, Line 818: The journal name of the cited paper Collins et al. is missing.

      Fixed.

      3) In the optogenetic experiments, how long is the inter-trial interval? Simulation of BF is known to have long-lasting effects on cortical activity and plasticity. It is, therefore, important to have a sufficient interval between trials.

      The median inter-trial interval for different stimulation events are as follows:

      • Optogenetic stimulation only : 15 s

      • Optogenetic stimulation + grating : 12 s

      • Optogenetic stimulation + mismatch: 35 s

      • Optogenetic stimulation + locomotion onset: 45 s

      We have added this information to the methods in the manuscript.

      Assuming locomotion is the primary driver of acetylcholine release (as we argue in Figures 1 and 2), the frequency of stimulation roughly corresponds to the frequency of acetylcholine release experienced endogenously. It is of course possible that being awake and mobile puts the entire system in a longlasting acetylcholine driven state different from what would be observed during long-term quite wakefulness or during sleep. But the main focus of the optogenetic stimulation experiments we performed was to investigate the consequences of the rapid acetylcholine release driven by locomotion.

      4) Page 11, Line 313: "..., we cannot exclude the possibility of a systemic contribution to the effects we observe through shared projections between different cortical and subcortical target." This possibility can be tested by examining the effect of optogenetic stimulation of cholinergic axons on locomotor activity, as they did for the chemogenetic experiments (Fig. S7). If the optogenetic manipulation changes locomotor activity, it is likely that this manipulation has some impact on subcortical activity and systemic contribution to the changes in cortical responses observed.

      Based on the reviewer suggestion we tested this and found no change in the locomotor activity of the mice on optogenetic stimulation of cholinergic axons locally in visual cortex (we have added this as Figure S5 to the manuscript). Please note however, we can of course not exclude a systemic contribution based on this.

      5) Fig. 4 and 5: In a closed-loop configuration, L2/3 neurons exhibit a transient increase in response at the onset of locomotion, while in an open-loop configuration, their response is more prolonged. On the other hand, L5 neurons show a sustained response in both configurations. Do the authors have any speculation on this difference?

      This is correct. Locomotion onset responses in layer 2/3 are strongly modulated by whether the locomotion onset occurs in closed loop or open loop configurations (Widmer et al., 2022). This difference is absent in our layer 5 data here. We suspect this is a function of a differential within-layer cell type bias in the different recordings. In the layer 2/3 recordings we are likely biased strongly towards superficial L2/3 neurons that tend to be negative prediction error neurons (top-down excited and bottom-up inhibited), see e.g. (O’Toole et al., 2023). A reduction of locomotion onset responses in closed loop is what one would expect for negative prediction error neurons. While layer 5 neurons exhibit mismatch responses, they do not exhibit opposing top-down and bottom-up input that would result in such a suppression (Jordan and Keller, 2020).

      We can illustrate this by splitting all layer 2/3 neurons based on their response to gratings and to visuomotor mismatch into a positive prediction error (PE) type (top 30% positive grating response), a negative prediction error type (top 30% positive visuomotor mismatch response), and the rest (remaining neurons and neurons responsive to both grating and visuomotor mismatch). Plotting the response of these neurons to locomotion onset in closed loop and open loop, we find that negative PE neurons have a transient response to locomotion onset in closed loop while positive PE neurons have a sustained increase in response in closed loop. In open loop the response of the two populations is indistinguishable. Splitting the layer 5 neurons using the same criteria, we don’t find a striking difference between closed and open loop between the two groups of neurons. We have added this as Figure S8.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      1) As a ubiquitous promoter was used to drive GCaMP expression, please explain how excitatory neurons were identified.

      2) As the data cover a very small range of running speeds, it is important to confirm that the binary locomotion signal model still applies when mice run at higher speeds - either by selecting recordings where mice have a wider range of running speeds or conducting additional experiments. In addition, please show the running speed tuning of individual axons.

      3) Please provide a more detailed analysis of the effects of locomotion and cholinergic modulation on visual responses. How does cholinergic modulation affect orientation and direction tuning? Are the effects multiplicative or additive? How does this compare to the effects of locomotion on single neurons?

      4) To ensure that the analyses in Figure 5 are not confounded by differences in the visual stimulus, please include average visual flow speed traces for each condition.

      5) Please clarify why chemogenetic manipulations of cholinergic inputs had no effect on pairwise correlations in L2/3.

      6) The latency effect is quite an extraordinary claim and requires careful analysis. Please provide examples of single neurons illustrating the latency effect - including responses across individual grating orientations/directions. One possible confound is that grating presentation could itself trigger locomotion or other movements. In the stationary / noOpto conditions, the grating response might not be apparent in the average trace until the animal begins to move. Thus the large latency in the stationary / noOpto conditions may reflect movement-related rather than visual responses.

      Please see our responses to these points in the public review part above.

      There are some minor points where text and figures could be improved:

      1) When discussing the decorrelation of neuronal responses by cholinergic axon activation, it is important to make it clear that Figure 6D quantifies the responses of layer 5 apical dendrites rather than neurons.

      We have added this information to the results section.

      2) In Figure S7, please clarify why velocity is in arbitrary units.

      This was an oversight and has been fixed.

      3) Please clarify how locomotion and stational trials are selected in Figure 4.

      We thank the reviewers for pointing this out. Trials were classified as occurring during locomotion or while mice were stationary as follows. We used a time-window of -0.5 s to +1 s around stimulus onset. If mice exhibited uninterrupted locomotion above a threshold of 0.25 cm/s in this time-window, we considered the stimulus as occurring during locomotion, otherwise it was defined as occurring while the mice were stationary. Note, the same criteria to define locomotion state was used to isolate visuomotor mismatch events, and also during control optogenetic stimulation experiments. We have added this information to the methods.

      4) When testing whether cholinergic activation is sufficient to explain locomotion-induced decorrelation in Figure 6G-H, please show pre-CNO and post-CNO delta-correlation, not just their difference.

      We can do that, but the results are harder to parse this way. We have added this as Figure S11 to the manuscript. The problem with parsing the figure is that the pre-CNO levels are different in different groups. This is likely a function of mouse-to-mouse variability and makes it harder to identify what the CNO induced changes are. Using the pre-post difference removes the batch influence. Hence, we have left this as the main analysis in Figure 6G and 6H.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang et al. generate XAP5 and XAP5L knockout mice and find that they are male infertile due to meiotic arrest and reduced sperm motility, respectively. RNA-Seq was subsequently performed and the authors concluded that XAP5 and XAP5L are antagonistic transcription factors of cilliogenesis (in XAP5-KO P16 testis: 554 genes were unregulated and 1587 genes were downregulated; in XAP5L-KO sperm: 2093 genes were unregulated and 267 genes were downregulated).

      We are grateful for the comprehensive summary.

      Strengths:

      Knockout mouse models provided strong evidence to indicate that XAP5 and XAP5L are critical for spermatogenesis and male fertility.

      Thank you for your positive comment.

      Weaknesses:

      The key conclusions are not supported by evidence. First, the authors claim that XAP5 and XAP5L transcriptionally regulate sperm flagella development; however, detailed molecular experiments related to transcription regulation are lacking. How do XAP5 and XAP5L regulate their targets? Only RNA-Seq is not enough. Second, the authors declare that XAP5 and XAP5L are antagonistic transcription factors; however, how do XAP5 and XAP5L regulate sperm flagella development antagonistically? Only RNA-Seq is not enough. Third, I am concerned about whether XAP5 really regulates sperm flagella development. XAP5 is specifically expressed in spermatogonia and XAP5-cKO mice are in meiotic arrest, indicating that XAP5 regulates meiosis rather than sperm flagella development.

      Thank you for the critical comments. To strengthen our conclusions, we have included XAP5/XAP5L CUT&Tag data in our revised manuscript. This highly sensitive method has allowed us to identify direct target genes of XAP5 and XAP5L (Table S1, Figure S6). Notably, our results demonstrate that both FOXJ1 and RFX2 are occupied by XAP5 (Figure 4G). Additionally, real-time PCR validation confirmed that RFX2 is also associated with XAP5L, even though enriched peaks for the RFX2 gene were not detected in the initial CUT&Tag data (Figure 4G). These findings indicate that XAP5 and XAP5L regulate the expression of FOXJ1 and RFX2 by directly binding to these genes. De novo motif analyses revealed that XAP5 and XAP5L shared a conserved binding sequence (CCCCGCCC/GGGCGGGG) (Figure S6C), and the bound regions of FOXJ1 and RFX2 contain this sequence. Further analysis shows that many XAP5L target genes are also targets of XAP5 (Figure S6G), despite the limited number of identified XAP5L target genes. This differential binding and regulation of shared target genes underscore the antagonistic relationship between XAP5 and XAP5L. Collectively, these findings provide additional support for the idea that XAP5 and XAP5L function as antagonistic transcription factors, acting upstream of transcription factor families, including FOXJ1 and RFX factors, to coordinate ciliogenesis during spermatogenesis.

      While we agree that XAP5 primarily regulates meiosis during spermatogenesis, our data also indicate that many cilia-related genes, including key transcription regulators of spermiogenesis such as RFX2 and SOX30, are downregulated in XAP5-cKO mice and are bound by XAP5 (Figure 4, Figures S4 and S6). It is important to note that genes coding for flagella components are expressed sequentially and in a germ cell-specific manner during development. When we refer to "regulating sperm flagella development", we mean the spatiotemporal regulation. We have revised the manuscript to clarify this point.

      Reviewer #2 (Public Review):

      In this study, Wang et al., report the significance of XAP5L and XAP5 in spermatogenesis, involved in transcriptional regulation of the ciliary gene in testes. In previous studies, the authors demonstrate that XAP5 is a transcription factor required for flagellar assembly in Chlamydomonas. Continuing from their previous study, the authors examine the conserved role of the XAP5 and XAP5L, which are the orthologue pair in mammals.

      XAP5 and XAP5L express ubiquitously and testis specifically, respectively, and their absence in the testes causes male infertility with defective spermatogenesis. Interestingly, XAP5 deficiency arrests germ cell development at the pachytene stage, whereas XAP5L absence causes impaired flagellar formation. RNA-seq analyses demonstrated that XAP5 deficiency suppresses ciliary gene expression including Foxj1 and Rfx family genes in early testis. By contrast, XAP5L deficiency abnormally remains Foxj1 and Rfx genes in mature sperm. From the results, the authors conclude that XAP5 and XAP5L are the antagonistic transcription factors that function upstream of Foxj1 and Rfx family genes.

      This reviewer thinks the overall experiments are performed well and that the manuscript is clear. However, the current results do not directly support the authors' conclusion. For example, the transcriptional function of XAP5 and XAP5L requires more evidence. In addition, this reviewer wonders about the conserved XAP5 function of ciliary/flagellar gene transcription in mammals - the gene is ubiquitously expressed despite its functional importance in flagellar assembly in Chlamydomonas. Thus, this reviewer thinks authors are required to show more direct evidence to clearly support their conclusion with more descriptions of its role in ciliary/flagellar assembly.

      Thank you for your thoughtful review of our work. We appreciate your positive feedback on the overall quality of the experiments and the clarity of the manuscript. In response to your concerns, we have included new experimental data and made revisions to the manuscript (lines 193-217) to better support our conclusions, particularly regarding the transcriptional function of XAP5 and XAP5L. Additionally, we have expanded on the role of XAP5 in ciliary and flagellar assembly to provide more direct evidence for its functional importance. Thank you for your insights.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The title (Control of ciliary transcriptional programs during spermatogenesis by antagonistic transcription factors) is not specific and does tend to exaggerate.

      Thank you for the comment, and we appreciate the opportunity to clarify the appropriateness of the title. Our paper extensively investigates the transcriptional regulation of ciliary genes during spermatogenesis. It demonstrates that XAP5/XAP5L are key transcription factors involved in this process. The title reflects our primary focus on the transcriptional programs that govern ciliary gene expression. Moreover, our paper shows that XAP5 positively regulates the expression of ciliary genes, particularly during the early stages of spermatogenesis, while XAP5L negatively regulates these genes. This antagonistic relationship is a crucial aspect of the study and is effectively conveyed in the title. In addition, our revised paper provides detailed insights into how XAP5/XAP5L control ciliary gene expression during spermatogenesis.

      Figure 4C: FOXJ1 and RFX2 are absent in sperm from WT mice. Are you sure? They are highly expressed in WT testes.

      Thank you for your careful review. While FOXJ1 and RFX2 are indeed highly expressed in the testes of wild-type (WT) mice, our data show that they are not detectable in mature sperm. This observation is consistent with published single-cell RNA-seq data(Jung et al., 2019), which indicate that FOXJ1 and RFX2 are primarily expressed in spermatocytes but not in spermatids (Figure S7). This expression pattern aligns with that that of IFT-particle proteins, which are essential for the formation but not the maintenance of mammalian sperm flagella(San Agustin, Pazour, & Witman, 2015).

      XAP5 is specifically expressed in spermatogonia and XAP5-cKO mice are in meiotic arrest, indicating that XAP5 regulates meiosis rather than sperm flagella development.

      We appreciate your insightful comments. As mentioned above, we agree that XAP5 primarily regulates meiosis during spermatogenesis. When we mentioned "regulating sperm flagella development," we were referring to the spatiotemporal regulation of these processes. We have revised the manuscript to clarify this distinction. Thank you for your understanding.

      The title of Figure 2 (XAP5L is required for normal sperm formation) is not accurate because the progress of spermatogenesis and sperm count is normal in XAP5L-KO mice (only sperm motility is reduced).

      We apologize for any confusion caused by the previous figure. It did not accurately convey the changes in sperm count. In the revised Figure 2B, we clearly demonstrate that the sperm count in XAP5L-KO mice is indeed lower than that in WT mice. This revision aims to provide a more accurate representation of the effects of XAP5L deficiency on spermatogenesis. Thank you for bringing this to our attention.

      Reviewer #2 (Recommendations For The Authors):

      (1) Although XAP5 and XAP5L deficiency alters the transcription of Foxj1 and Rfx family genes, which are the essential transcription factors for the ciliogenesis, current data do not directly support that XAP5 and XAP5L are the upstream transcription factors. The authors need to show more direct evidence such as CHIP-Seq data.

      Thank you for your valuable feedback! In this revised manuscript, we have included data identifying candidate direct targets of XAP5 and XAP5L using the highly sensitive CUT&Tag method (Kaya-Okur et al., 2019). Our results show that XAP5 occupies both FOXJ1 and RFX2 (Figure 4G). Furthermore, real-time PCR validation of the CUT&Tag experiments confirmed that RFX2 is also occupied by XAP5L (Figure 4G), despite the initial CUT&Tag data not revealing enriched peaks for the RFX2 gene (Table S1). Unfortunately, the limited number of enriched peaks identified for XAP5L (Table S1) suggests that the XAP5L antibody used in the CUT&Tag experiment might have suboptimal performance, which prevented us from detecting occupancy on the FOXJ1 promoter. Nevertheless, these additional data provide strong evidence that XAP5 and XAP5L function as upstream transcription factors for FOXJ1 and RFX family genes, supporting their essential roles in ciliogenesis.

      (2) Shared transcripts that are altered by the absence of either XAP5 or XAP5L do not clearly support they are antagonistic transcription factors.

      Thank you for your insightful comment. In our revised manuscript, we performed CUT&Tag analysis to identify target genes of XAP5 and XAP5L. Motif enrichment analysis revealed conserved binding sequences for both factors (Figures S6C), indicating a subset of shared downstream genes between XAP5 and XAP5L. Among the downregulated genes in XAP5 cKO germ cells, 891 genes were bound by XAP5 (Figure S6D). Although the number of enriched peaks identified for XAP5L was limited, 75 of the upregulated genes in XAP5L KO sperm were bound by XAP5L (Figure S6E). Importantly, of these 75 XAP5L target genes, approximately 30% (22 genes) were also identified as targets of XAP5 (Figure S6G), further support the idea that XAP5 and XAP5L function as antagonistic transcription factors.

      (3) XAP5 seems to be an ancient transcription factor for cilia and flagellar assembly. However, XAP5 expresses ubiquitously in mice. How can this discrepancy be explained? Is it also required for primary cilia assembly? Are their expression also directly linked to ciliogenesis in other types of cells?

      Thank you for the thoughtful questions. The ubiquitous expression of XAP5 in mice can be understood in light of its role as an ancient transcription factor for cilia and flagellar assembly. Given that cilia are present on nearly every cell type in the mammalian body (O'Connor et al., 2013), this broad expression pattern makes sense. In fact, XAP5 serves not only as a master regulator of ciliogenesis but also as a critical regulator of various developmental processes (Kim et al., 2018; Lee et al., 2020; Xie et al., 2023).

      Our current unpublished work demonstrates that XAP5 is essential for primary cilia assembly in different cell lines. The loss of XAP5 protein results in abnormal ciliogenesis, further supporting its vital role in ciliary formation across different cell types.

      We believe that the widespread expression of XAP5 reflects its fundamental importance in multiple cellular processes, including ciliogenesis, development, and potentially other cellular functions yet to be discovered.

      (4) XAP5L causes impairs flagellar assembly. Have the authors observed any other physiological defects in the absence of XAP5L in mouse models? Such as hydrocephalus and/or tracheal defects?

      Thank you for the questions. We have carefully examined XAP5L KO mice for other physiological defects. To date, we have not observed any additional physiological abnormalities. Specifically, we assessed the condition of tracheal cilia in XAP5L KO mice and found no significant differences compared to wild-type (WT) mice, as illustrated in Author response image 1 below.

      Author response image 1.

      References

      Jung, M., Wells, D., Rusch, J., Ahmad, S., Marchini, J., Myers, S. R., & Conrad, D. F. (2019). Unified single-cell analysis of testis gene regulation and pathology in five mouse strains. Elife, 8. doi:10.7554/eLife.43966

      Kaya-Okur, H. S., Wu, S. J., Codomo, C. A., Pledger, E. S., Bryson, T. D., Henikoff, J. G., . . . Henikoff, S. (2019). CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun, 10(1), 1930. doi:10.1038/s41467-019-09982-5

      Kim, Y., Hur, S. W., Jeong, B. C., Oh, S. H., Hwang, Y. C., Kim, S. H., & Koh, J. T. (2018). The Fam50a positively regulates ameloblast differentiation via interacting with Runx2. J Cell Physiol, 233(2), 1512-1522. doi:10.1002/jcp.26038

      Lee, Y.-R., Khan, K., Armfield-Uhas, K., Srikanth, S., Thompson, N. A., Pardo, M., . . . Schwartz, C. E. (2020). Mutations in FAM50A suggest that Armfield XLID syndrome is a spliceosomopathy. Nature Communications, 11(1). doi:10.1038/s41467-020-17452-6

      O'Connor, A. K., Malarkey, E. B., Berbari, N. F., Croyle, M. J., Haycraft, C. J., Bell, P. D., . . . Yoder, B. K. (2013). An inducible CiliaGFP mouse model for in vivo visualization and analysis of cilia in live tissue. Cilia, 2(1), 8. doi:10.1186/2046-2530-2-8

      San Agustin, J. T., Pazour, G. J., & Witman, G. B. (2015). Intraflagellar transport is essential for mammalian spermiogenesis but is absent in mature sperm. Mol Biol Cell, 26(24), 4358-4372. doi:10.1091/mbc.E15-08-0578

      Xie, X., Li, L., Tao, S., Chen, M., Fei, L., Yang, Q., . . . Chen, L. (2023). Proto-Oncogene FAM50A Can Regulate the Immune Microenvironment and Development of Hepatocellular Carcinoma In Vitro and In Vivo. Int J Mol Sci, 24(4). doi:10.3390/ijms24043217

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Previously, this group showed that Tgfbr1 regulates the reorganization of the epiblast and primitive streak into the chordo-neural hinge and tailbud during the trunk-to-tail transition. Gdf11 signaling plays a crucial role in orchestrating the transition from trunk to tail tissues in vertebrate embryos, including the reallocation of axial progenitors into the tailbud and Tgfbr1 plays a key role in mediating its signaling activity. Progenitors that contribute to the extension of the neural tube and paraxial mesoderm into the tail are located in this region. In this work, the authors show that Tgfbr1 also regulates the reorganization of the posterior primitive streak/base of allantois and the endoderm as well. 

      By analyzing the morphological phenotypes and marker gene expression in Tgfbr1 mutant mouse embryos, they show that it regulates the merger of somatic and splanchnic layers of the lateral plate mesoderm, the posterior streak derivative. They also present evidence suggesting that Tgfbr1 acts upstream of Isl1 (key effector of Gdf11 signaling for controlling differentiation of lateral mesoderm progenitors) and regulates the remodelling of the major blood vessels, the lateral plate mesoderm and endoderm associated with the trunk-to-tail transition. Through a detailed phenotypic analysis, the authors observed that, similarly to Isl1 mutants, the lack of Tgfbr1 in mouse embryos hinders the activation of hindlimb and external genitalia maker genes and results in a failure of lateral plate mesoderm layers to converge during tail development. As a result, they interpret that ventral lateral mesoderm, which generates the peri cloacal mesenchyme and genital tuberculum, fails to specify. 

      They also show defects in the morphogenesis of the dorsal aorta at the trunk/tail juncture, resulting in an aberrant embryonic/extraembryonic vascular connection. Endoderm reorganization defects following abnormal morphogenesis of the gut tube in the Tgfbr1 mutants cause failure of tailgut formation and cloacal enlargement. Thus, Tgfbr1 activity regulates the morphogenesis of the trunk/tail junction and the morphogenetic switch in all germ layers required for continuing post-anal tail development. Taken together with the previous studies, this work places Gdf11/8 - Tgfbr1 signaling at the pivot of trunk-to-tail transition and the authors speculate that critical signaling through Tgfbr1 occurs in the posterior-most part of the caudal epiblast, close to the allantois. 

      Strengths: 

      The data shown is solid with excellent embryology/developmental biology. This work demonstrates meticulous execution and is presented in a comprehensive and coherent manner. Although not completely novel, the results/conclusions add to the known function of Gdf11 signaling during the trunk-to-tail transition. 

      Weaknesses: 

      The authors rely on the expression of a small number of key regulatory genes to interpret the developmental defects. The alternative possibilities remain to be ruled out thoroughly. The manuscript is also quite descriptive and would benefit from more focused highlighting of the novelty regarding the absence of Tgfbr1 in the mouse embryo. They should also strengthen some of their conclusions with more details in the results.

      Although we used a limited number of key regulatory genes to interpret the phenotype, these genes were carefully chosen to focus on specific processes involving the lateral mesoderm, its derivatives, and the endoderm. In addition to these markers, we included references to other relevant markers that were previously analyzed and initially led us to examine the lateral plate mesoderm and tail gut in Tgfbr1 mutants. To strengthen our analysis, we have now incorporated additional data to clarify specific phenotypes. For instance, in situ hybridization (ISH) for Shh further confirms abnormalities at the caudal end of the endoderm in mutant embryos, while no endodermal defects are observed in the trunk region. We also included an analysis of the intermediate mesoderm, which shows abnormalities at the same level as those found in the lateral plate mesoderm and endoderm of Tgfbr1 mutants.

      It’s important to note that using additional markers to assess the epiblast/primitive streak of Tgfbr1 mutants at E7.5–E8.5, as suggested by a reviewer, is unlikely to yield new insights. At these early stages, Tgfbr1 mutant embryos do not display observable phenotypes in the main body axis. Data in this manuscript already demonstrate the absence of abnormalities at this stage, as shown in Figure 3 and Supplementary Figure 6. Additionally, the expression of certain genes showing abnormalities when the embryo would enter tail development, in the trunk their expression remains unaffected, indicating that trunk extension is not significantly impacted by Tgfbr1 deficiency. While transcriptomic analysis of these Tgfbr1 mutants could provide interesting insights, it would be more appropriate to focus on later developmental stages, which would be beyond the scope of the current study.

      The second major critique was that the manuscript is primarily descriptive. We disagree with this assessment. Several hypotheses were rigorously tested using genetic approaches, including Isl1 knockout experiments, cell tracing from the primitive streak with a newly generated Cre driver to activate a reporter from the ROSA26 locus, and assessment of extraembryonic endoderm fate in Tgfbr1 mutants by introducing the Afp-GFP transgene into the Tgfbr1 mutant background. Additionally, we conducted tracing analyses of tail bud cell contributions to the tail gut via DiI injection and embryo incubation. To address potential concerns regarding this experiment, we have included data showing the DiI position immediately after injection to confirm that it does not contact the tail gut. We also considered and accounted for potential DiI leakage into neuromesodermal progenitors to clarify the endodermal results.

      Our genetic and DiI experiments were specifically designed to differentiate between alternative hypotheses and to confirm hypotheses generated from other analyses. Additionally, improvements in some of the imaging data have helped address remaining concerns.

      Reviewer #1 (Recommendations For The Authors): 

      I have listed my suggestions as queries. The authors may perform experiments or clarify by editing the text to address them. 

      The authors state on Page 11 and elsewhere that the ventral lateral mesoderm is absent in the Tgfbr1 mutant. What is the basis for this conclusion? Are there specific markers for PCM or GT primordium? 

      The specific marker of PCM and GT primordium is Isl1. The absence of this marker in the Tgfbr1 mutants is shown in (Dias et al, 2020). The reference is introduced in the manuscript.

      A schematic illustrating the VLM and the expression patterns of Tgfbr1, Gdf11, etc., would be helpful. 

      Characterization of Gdf11 expression has been previously reported (e.g. McPherron et al 1999, cited in our manuscript). It is expressed in the region containing of axial progenitors before the trunk to tail transition and not expressed in the VLM. As for Tgfbr1 expression is hard to detect, likely because it is ubiquitously expressed at low level. We include in this document some pictures of an ISH, including a control using the Tgfbr1 mutants to illustrate that the staining resembling background actually represents Tgfbr1 expression. If the reviewers find it important, we can also incorporate these data into the manuscript. Under these circumstances, we feel that a schematic might not be very informative.

      Author response image 1.

      Image showing an example of an ISH procedure with a probe against Tgfbr1, showing widespread and low expression. The lower picture shows a ventral view of a stained wild type E10.5 embryo.

      Foxf1+ cells in the 'extended LPM' of Tgfbr1 mutants suggest fate transformation, or does it indicate the misexpression of marker gene otherwise suppressed by Tgfbr1 activity? The authors suggest that Foxf1+ cells are VLM progenitors from posterior PS trapped in the extended LPM. Do they continue to express PS markers? 

      The observation that both in wild type and Tgfbr1 mutant embryos Foxf1 expression in the trunk is restricted to the splanchnic LPM indicates that the absence of this marker in the somatic LPM is not the result of a suppression of its expression by Tgfbr1. In wild type embryos Foxf1 is also expressed in the posterior PS, regulated independently of its expression in the LPM (i.e. Shh-independent) and later in the pericloacal mesoderm (our supplementary figure 2). As Foxf1 expression in the posterior PS was not suppressed in the Tgfbr1 mutants, together with the absence of pericloacal mesoderm, we interpret that the Foxf1-positive cells in the two layers around the extended celomic cavity in the posterior end of the mutant embryos derived from the posterior PS, resulting from the absence of its normal progression through the embryonic tissues.

      We did not find expression of PS markers giving rise to paraxial mesoderm, like Tbxt, further suggesting that those cells could derive from the restricted set of cells within the posterior PS that contribute to the pericloacal mesoderm

      For example, the misexpression of Apela is interpreted as mis-localized endoderm cells. They show scattered Keratin 8 misexpression to support the interpretation. It would be more convincing if the authors tested the expression of other endoderm markers. 

      As indicated in the manuscript, we suggest that these cells are endoderm progenitors (p. 13), like those present at the posterior end of the gut tube at E9.5 and E10.5, that are unable to incorporate into the gut tube. Apela is not a general endodermal marker: it is expressed in the foregut pocket and the nascent cells of the hindgut/tail gut, becoming down regulated as cells take typical endodermal signatures. The presence of ectopic Apela expression in the extended LPM of the mutant embryos might indeed indicate the presence of progenitors that failed to downregulate Apela resulting from the lack differentiation-associated downregulation. This would also implicate the absence of definitive endodermal markers.

      The Nodal signaling pathway in the anterior PS drives endoderm development. It acts through Alk7. Does Tgfbr1 (Alk5) mutation impact endoderm development, in general? It isn't easy to assess this from the Foxa2 in situ RNA hybridization shown in Figures 6A and B. It would be helpful for the readers if the authors clarified this point. 

      In the pictures shown in Figure 7D-D’ it is already shown that the endoderm is mostly preserved until the region of the trunk to tail transition. The presence of a rather normal endoderm in the embryonic trunk can also be seen with Shh, a figure added as Supplementary Fig.5.

      Reviewer #2 (Recommendations For The Authors): 

      The authors mention two interesting novel points which they should develop in the discussion, and probably also in the results. 

      (1) The authors speculate about the possible involvement of the posterior PS as a mediator of Gdf11/Tgfbr1 signaling activity. However, as mentioned in the manuscript, their experiments do not allow regional sublocalization within the PS... Here it would be important to assess/discuss in more detail which progenitors respond to this signaling activity and when they do it. At the very least, the authors should provide high-resolution spatiotemporal data of the expression of Tgfbr1 in the PS. 

      Tgfbr1 expression at this embryonic stage does not give clear differential patterns. The data reported for this expression in Andersson et al 2006 is very low quality and we have not been able to reproduce the reported pattern. On the contrary, all our efforts over the years provided a very general staining that could even be interpreted as background. When we now included Tgfbr1 mutants as controls, it became clear that the ubiquitous and low level signal observed in wild type embryos indeed represent Tgfbr1 expression pattern: low level and ubiquitous. We are attaching a figure to this document illustrating these observations. If required, this can also be included in the manuscript as a supplementary figure. 

      Also, the work of Wymeersch et al., 2019 regarding the lateral plate mesoderm progenitors (LPMPs) should be referred to and discussed here. 

      This was now added in the results (page 11) and in discussion (page 16). 

      For instance, are the LPMP transcriptomic differences detected between E7.5 and E8.5 caused by Tgfbr1 signaling activity? This question could be easily answered through a comparative bulk RNAseq analysis of the posterior-most region of the PS of mutant and WT embryos. The possible colocalization of Tgfb1 (Wymeersch et al., 2019) and Tgfbr1 in the LPMPs should also be addressed. 

      We agree with the suggestion that RNA-seq in the posterior PS of WT and mutant embryos might be informative. However, it is very likely that within the proposed timeframe (E7.5 to E8.5) that there are no significant differences between the wild type and the Tgfbr1 mutant embryos because there is no apparent axial phenotype in Tgfbr1 mutant embryos before the trunk to tail transition. Therefore, at this stage, we think that this experiment is out of the scope of the present manuscript. 

      (2) The activity of Tgfbr1 during the trunk-to-tail transition is critical for the development of tail endodermal tissues. Here the authors suggest again the involvement of the posterior PS/allantois region, but a similar phenotype can also be observed for instance in the absence of Snai1 in the caudal epiblast (Dias et al., 2020)... It would be important to assess/discuss the origin of those morphogenetic problems in the gut. Is it due to the reallocation of NMC cells into the CNH? The tailbud-EMT process? LPMPs specification?... Regional mutations or gain of functions of Snai1 or Tgfbr1 in the caudal epiblast would help answer the question.  

      The endodermal phenotype in the Snai1 mutants is different to that observed in the Tgfbr1 mutants. As can be observed in Figures 3, 4 and 5 of Dias et al. the absence of tailbud is replaced by a structure that extends the epiblast. As a consequence, the endoderm finishes at the base of that structure, even expanding to make a structure resembling the cloaca, which is different to what is seen in the Tgfbr1 mutants. In this case, the lack of tail gut is likely to result either from the lack of formation of the progenitors of the gut endoderm or from the dissociation of what would be the tail bud from the LPM. Actually, hindlimb/pericloacal mesoderm markers, like Tbx4, are preserved in the Snai1 mutant. As for the gain of function of Snai1 experiment, already reported also in Dias et al 2020, the destiny of these cells is not clear. The ISH for Foxa2 showed extra signals but as it is not an exclusive marker for endoderm it is not possible to know whether any of these signals correspond to endodermal tissues.

      Regarding the development of tail endodermal tissues, the authors suggest that it occurs from a structure derived from the PS that is located posteriorly, in the tailbud, after the tip of the growing gut. This is an important and novel point as it suggests that the primordia of the endoderm is not wholly specified during gastrulation. So the observation should be well supported. How can Anastasiia et al. distinguish such "structure" from the actual developing gut? Does it have a distinct molecular signature or any morphological landmark that enables its separation from the actual gut? The data suggests that the region highlighted in Supplementary Figure 4Ab contains part of the actual gut tube (the same is suggested in Figure 5B). If the authors think otherwise, they must characterize that region of the tailbud by doing a thorough morphological and gene/protein expression analysis and assess its potency, via transplantation experiments. Also, the authors' claim mostly relies on the DiI experiments and those have three problems: #1 Anastasiia et al. assess "tail" endodermal growth at E9.5 when the correct stage to do it is after E10.5 (after tailbud formation). 2# Incongruencies, low number (only three embryos), and diversity in the results shown in Figure 8 and Supplementary Figure 4. For instance, despite similar staining at 0h, the extension and amount of DiI present in the gut tube after 20h varies significantly amongst the differently labeled embryos. A possible explanation lies in the abnormal leakiness of the DiI labelings and that is confirmed by the observations shown in Supplementary Figure 4M-O; the same for Supplementary Figure 4G, which shows a substantial amount of DiI in the neural tube. 3# The authors must provide high-quality data showing which tissues/regions were labelled at time 0h, including transversal and sagittal sections as they did for the 20h time-point. Additionally, it is important to re-orient the sagittal optical sections to a position that also shows the neural tube (like a mid-sagittal section) and include information concerning the AP/DV axis, as well as the location of the transversal optical sections in the sagittal image. 

      As described in the reply to reviewer 1, Apela is expressed in the nascent tail gut endoderm but not in more anterior areas except for a foregut pocket, and becomes downregulated as the tube acquires endodermal signatures. Therefore, the structure to which the reviewer refers to might indeed represent a group of progenitors that extend the tail gut. And the observation that this property is observed only in the tail gut as it grows, already separates this region of the gut, which in the end do not contribute to mature organs, from more anterior areas of the endoderm (essentially anterior to the cloaca) that will become a relevant tissue of the intestinal organs. Our DiI labelling experiment was aimed to test whether this pool of cells contributes to the gut but does not allow to determine the nature of those cells, a question that will require further research (discussed on p. 17) and we think is beyond the scope of the present manuscript.

      Regarding the labelling at E10.5, we agree that the tail bud in terms of NMCs is not completely formed, for example, at E9.5 the neuropore is not yet closed. However, we are more interested in regression of the epiblast, which is complete by E9.5. Injecting at E9.5 also has technical advantages for us, first, because in our hands earlier embryos grow better in culture, and second, because it is easier to inject in the tailbud at E9.5 because it is a little bit bigger than at E10.5. Therefore, injecting at E9.5 is less prone to technical artifacts due to injection inaccuracy and compromised growth in culture.

      We agree that the injected DiI could also leak into NMPs, which might be located in the same area. However, while this could result in labeling of the neural tube, it would not affect the interpretation of the finding of labeled cells in the tail gut. Indeed, the presence of this label in the gut epithelium indicates the presence of progenitors in the injected region of the tail gut. We added some considerations of this the possible leakage into the results section of the manuscript (p. 15). We thank the reviewer for drawing our attention to this issue. 

      We also now provide high quality data showing labelled tissue at 0h in Supplementary figure 8A-c’, higher magnification images in Fig. 8, and reoriented optical sections in Fig.6 and in Supplementary Fig. 7, including axis and location of the sections as suggested by the reviewer.

      Minor concerns/comments: 

      (1) The abstract is quite long, though this might be fine for this journal. 

      (2) In relation to the comment on the abstract, the manuscript needs an initial Figure descrbing the events that are described in the introduction. Otherwise, the manuscript will only be accessible to mouse embryologists.

      We have a figure summarizing the results at the end of the manuscript, we think that including similar figure in the beginning might be redundant. What we could do, if required, is to include this type of schematic as a graphical abstract.

      (3) The authors need to clarify what they mean when they use the following expressions "PS fate" and "fate of the posterior PS".

      I do not think that we have used such expressions. Indeed, they did not come out when we run a “find” in the word document. However, they would mean the tissue that would come out from them at later developmental stages.

      (4) The assessment of Isl1 expression in Tgfbr1 mutant and transgenic mouse embryos would be better indicative of their molecular relationship than a comparative phenotypic analysis. 

      These data have been reported in Dias et al 2020 and Jurberg et al 2013, both cited in the manuscript.  

      (5) The authors should explain or discuss what the upregulation of Foxa2 in the posterior end of Tgfbr1 mutants means.

      While an upregulation is apparent in the figure, looking at other pictures we cannot be sure of this being a significantly quantifiable up-regulation. We therefore removed the statement from the text.

      (6) What happens to the intermediate mesoderm during the trunk-to-tail transition? Is Tgfbr1 involved in the regulation of its development?

      We have tested this using Pax2 and added the relevant data in Supplementary Fig. 1 and described in the results.

      (7) The term "potential" should not be used during the description of DiI labeling experiments as this technique only assesses cell fate.

      Corrected

      (8) Some figures lack AP/DV axis information (e.g. Figures 6, C, and D).

      Corrected

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to extend our gratitude to the reviewers for their meticulous analysis and constructive feedback on our manuscript. We have revised our paper based on the suggestions regarding supporting literature and the theory behind CAPs along with detailed insights regarding our methods. Their suggestions have been extremely useful in strengthening the clarity and rigor of our manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) There are no obvious problems with this paper and it is relatively straightforward. There are some challenges that I would like to suggest. These variants have multiple mutations, so it would be interesting if you could drill down to find out which mutation is the most important for the collective changes reported here. I would like to see a sequence alignment of these variants, perhaps in the supplemental material, just to get some indication of the extent of mutations involved.

      Finding the most important mutation within a set is a tricky question, as each mutation changes the way future mutations will affect function due to epistasis. Indeed, this is what we aim to explore in this work. To illustrate this point, we included a new supplementary figure S5A. Three critical mutations that emerged quickly, and were frequently observed in other dominant variants, were S477N, T478K, and N501Y. Thus, we computed the EpiScore values of these three mutations, with several critical residues contributing to hACE2 binding. The EpiScore distribution indicates that residues 477, 478, and 501 have strong epistatic (i.e., non-additive) interactions, as indicated by EpiScore values above 2.0.

      To further investigate these epistatic interactions, we first conducted MD simulations and computed the DFI profile of these three single mutants. We analyzed how different the DFI scores of the hACE2 binding interface residues of the RBD are, across three single mutants with Omicron, Delta, and Omicron XBB variants (Fig S5B). Fig S5B shows how mutations at these particular sites affect the binding interface DFI in various backgrounds, as the three mutations are also observed in the Omicron, XBB, and XBB 1.5 variants. If the difference in the DFI profile of the mutant and the given variant is close to 0, then we could safely state that this mutation affected the variant the most. However, what we observe is quite the opposite: the DFI profile of the mutation is significantly different in different variant backgrounds. While these mutations may change overall behavior, their individual contributions to overall function are more difficult to pin down because overall function is dependent on the non-additive interactions between many different residues.

      Author response image 1.

      (A) Three critical mutations that emerged quickly, and were frequently observed in other dominant variants, were S477N, T478K, and N501Y. EpiScores of sites 477, 478, and 501 with one another are shown with k = the binding interface of the open chain. These residues are highly epistatic, producing higher responses than expected when perturbed together. (B) The difference in the dynamic flexibility profiles between the single mutants and the most common variants for the hACE2 binding residues of the RBD. DFI profiles exhibit significant variation from zero, and also show different flexibility in each background variant, highlighting the critical non-additive interactions of the other mutation in the given background variant. Thus, these three critical mutations, impacting binding affinity, do not solely contribute to the binding. There are epistatic interactions with the other mutations in VOCs that shape the dynamics of the binding interface to modulate binding affinity with hACE2.

      As we discussed above, while the epistatic interactions are crucial and the collective impact of the mutations shape the mutational landscape of the spike protein, we would like note that mutation S486P is one of the critical mutations we identify, modulating both antibody and hACE2 binding and our analysis reveals the strong non-additive interactions with the other mutational sites. This mutational site appears in both XBB1.5 and earlier Omicron strains which highlights its importance in functional evolution of the spike protein. CAPs 346R, 486F, and 498Q also may be important, as they have a high EpiScore, indicating critical epistatic interaction with many mutation sites.

      Regarding to the suggestion about presenting the alignment of the different variants, we have attached a mutation table, highlighting the mutated residues for each strain compared to the reference sequence as supplemental Figure S1 along with the full alignment file.

      (2) Also, I am wondering if it would be possible to insert some of these flexibilities and their correlations directly into the elastic network models to enable a simpler interpretation of these results. I realize this is beyond the scope of the present work, but such an effort might help in understanding these relatively complex effects.

      This is great suggestion. A similar analysis has been performed for different proteins by Mcleash (See doi: 10.1016/j.bpj.2015.08.009) by modulating the spring constants of specific position to alter specific flexibility and evaluate change in elastic free energy to identify critical mutation (in particular, allosteric mutation) sites. We will be happy to pursue this as future work.

      Minor

      (3) 1 typo on line 443 - should be binding instead of biding.

      Fixed, thanks for spotting that.

      (4) The two shades of blue in Fig. 4B were not distinguishable in my version.

      To fix this, we have changed the overlapping residues between Delta and Omicron to a higher contrast shade of blue.

      (5) Compensatory is often used in an entirely different way - additional mutations that help to recover native function in the presence of a deleterious mutation.

      Although our previous study (Ose et al. 2022, Biophysical Journal) shows that compensatory mutations were generally additive, the two ideas are not one and the same. We thank the reviewer for pointing this out. Therefore, to clarify, we have now described our results in terms of dynamic additivity, rather than compensation.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors note that the identified CAPs overlap with those of others (Cagliani et al. 2020; Singh and Yi 2021; Starr, Zepeda, et al. 2022). In itself, this merits a deeper discussion and explicit indication of which positions are not identified. However, there is one point that I believe may represent a fundamental flaw in this study in that the calculation of EP from the alignment of S proteins ignores entirely the differences in the interacting interface with which S for different coronaviruses in the alignment interact in the different receptors in each host species. This may be the reason why so many "CAPs" are in the RBD. The authors should at the very least make a convincing case of why they are not simply detecting constraints imposed by the different interacting partners, at least in the case of positions within the RBD interface with ACE2. Another point that the authors should discuss is that ACE2 is not the only receptor that facilitates infection, TMPRSS2 and possibly others have been identified as well. The results should be discussed in light of this.

      To begin with, we have now explicitly noted (on line 135) that “sites 478, 486, 498, and 681 have already been implicated in SARS-CoV-2 evolution, leaving the remaining 11 CAPs as undiscovered candidate sites for adaptation.” Evolutionary analyses are done using orthologous protein sequences, so there is no way to integrate information on different receptors in each host species in the calculation of EPs. However, we appreciate that the preponderance of CAPs in the RBD is likely due to different binding environments. We have added the following text (on line 83) to clarify our point: “Adaptation in this case means a virus which can successfully infect human hosts. As CAPs are unexpected polymorphisms under neutral theory, their existence implies a non-neutral effect. This can come in the form of functional changes (Liu et al. 2016) or compensation for functional changes (Ose et al. 2022). Therefore, we suspect that these CAPs, being unexpected changes from coronaviruses across other host species with different binding substrates, may be partially responsible for the functional change of allowing human infection.” This hypothesis is supported by the overlap of CAPs we identified with the positions identified in other studies (e.g., 478, 486, 498, and 681). Binding to TMPRSS2 and other substrates are also covered by this analysis as it is a measure of overall evolutionary fitness, rather than binding to any specific substrate. Our paper does focus on discussing hACE2 binding and mentions furin cleavage, but indeed lacks discussion on the role of TMPRSS2. We have added the following text to line 157: “Another host cell protease, TMPRSS2, facilitates viral attachment to the surface of target cells upon binding either to sites Arg815/Ser816, or Arg685/Ser686 which overlaps with the furin cleavage site 676-689, further emphasizing the importance of this area (Hoffmann et al. 2020b; Fraser et al. 2022).”

      (2) Turning now to the computational methods utilized to study dynamics, I have serious reservations about the novelty of the results as well as the validity of the methodology. First of all, the authors mention the work of Teruel et al. (PLOS Comp Bio 2021) in an extremely superficial fashion and do not mention at all a second manuscript by Teruel et al. (Biorxiv 2021.12.14.472622 (2021)). However, the work by Teruel et al. identifies positions and specific mutations that affect the dynamics of S and the evolution of the SARS-CoV-2 virus in light of immune escape, ACE2 binding, and open and closed state dynamics. The specific differences in approach should be noted but the results specifically should be compared. This omission is evident throughout the manuscript. Several other groups have also published on the use of nomal-mode analysis methods to understand the Spike protein, among them Verkhivker et al., Zhou et al., Majumder et al., etc.

      Thank you for your suggestions. Upon further examination of the listed papers, we have added citations to other groups employing similar methods. However, it's worth noting that the results of Teruel et al.'s studies are generally not directly comparable to our own. Particularly, they examine specific individual mutations and overall dynamical signatures associated with them, whereas our results are always considered in the context of epistasis and joint effects with CAPs, and all mutations belong to the common variants. Although important mutations may be highlighted in both cases, it is for very different reasons. Nevertheless, we provide a more detailed mention of the results of both studies. See lines 178, 255, and 393.

      (3) The last concern that I have is with respect to the methodology. The dynamic couplings and the derived index (DCI) are entirely based on the use of the elastic network model presented which is strictly sequence-agnostic. Only C-alpha positions are taken into consideration and no information about the side-chain is considered in any manner. Of course, the specific sequence of a protein will affect the unique placement of C-alpha atoms (i.e., mutations affect structure), therefore even ANM or ENM can to some extent predict the effect of mutations in as much as these have an effect on the structure, either experimentally determined or correctly and even incorrectly modelled. However, such an approach needs to be discussed in far deeper detail when it comes to positions on the surface of a protein such that the reader can gauge if the observed effects are the result of modelling errors.

      We would like to clarify that most of our results do not involve simulations of different variants, but rather how characteristic mutation sites for those variants contribute to overall dynamics. For the full spike, we operate on only two simulations: open and closed. When we do analyze different variants, starting on line 438, the observed difference does not come from the structure, but from the covariance matrix obtained from molecular dynamics (MD) simulations, which are sensitive to single amino acid changes.

      Reviewer #3 (Recommendations For The Authors):

      (1) On line 99 there is a misspelling, 'withing'.

      It has been fixed. Thanks for spotting that.

      (2) Some graphical suggestions to make the figures easier to read:

      In Figure 1C, a labeled circle around the important sites, the receptor binding domain, and the Furin cleavage site, would help the reader orient themselves. Moreover, it would make clear which CAPs are NOT in the noteworthy sites described in the text.

      Good idea. We have added transparent spheres and labels to show hACE2 binding sites and Furin cleavage sites.

      In Figure 2C the colors are a bit low contrast; moreover, there are multiple text sizes on the same figure which should perhaps be avoided to ensure legibility.

      We have made yellow brighter and standardized font sizes.

      Figure 3 is a bit dry, perhaps indicating in which bins the 'interesting' sites could be informative.

      Thank you for the suggestion, but the overall goal of Figure 3 is to illustrate that the mutational landscape is governed by the equilibrium dynamics in which flexible sites undergo more mutations during the evolution of the CoV2 spike protein. Therefore, adding additional positional information may complicate our message.

      Figure 4, the previous suggestions about readability apply.

      We ensured same sized text and higher contrast colors.

      Figure 5B, the residue labels are too small.

      We increased the font size of the residue labels.

      In Figure 8 maybe adding Delta to let the reader orient themselves would be helpful to the discussion.

      Unfortunately, there is no single work that has experimentally quantified binding affinities towards hACE2 for all the variants. When we conducted the same analysis for the Delta variant in Figure 8, the experimental values were obtained from a different source (doi: 10.1016/j.cell.2022.01.001) and the values were significantly different from the experimental work we used for Omicron (Yue et al. 2023). When we could adjust based on the difference in experimentally measured binding affinity values of the original Wuhan strain in these two separate studies, we observed a similar correlation, as seen below. However, we think this might not be a proper representation. Therefore, we chose to keep the original figure.

      Author response image 2.

      The %DFI calculations for variants Delta, Omicron, XBB, and XBB 1.5. (A) %DFI profile of the variants are plotted in the same panel. The grey shaded areas and dashed lines indicate the ACE2 binding regions, whereas the red dashed lines show the antibody binding residues. (B) The sum of %DFI values of RBD-hACE2 interface residues. The trend of total %DFI with the log of Kd values overlaps with the one seen with the experiments. (C) The RBD antibody binding residues are used to calculate the sum of %DFI. The ranking captured with the total %DFI agrees with the susceptibility fold reduction values from the experiments.

      (3) Replicas of the MD simulations would make the conclusions stronger in my opinion.

      We ran a 1µs long simulation and performed convergence analysis for the MD simulations using the prior work (Sawle L, Ghosh K. 2016.) More importantly, we also evaluated the statistical significance of computed DFI values as explained in detail below (Please see the answer to question 3 of Reviewer #3 (Public Review):)

      Reviewer #3 (Public Review):

      (1) A longer discussion of how the 19 orthologous coronavirus sequences were chosen would be helpful, as the rest of the paper hinges on this initial choice.

      The following explanation has been added on line 114: EP scores of the amino acid variants of the S protein were obtained using a Maximum Likelihood phylogeny (Kumar et al. 2018) built from 19 orthologous coronavirus sequences. Sequences were selected by examining available non-human sequences with a sequence identity of 70% or above to the human SARS CoV-2’s S protein sequence. This cutoff allows for divergence over evolutionary history such that each amino acid position had ample time to experience purifying selection, whilst limiting ourselves to closely related coronaviruses. (Figure 1A).

      (2) The 'reasonable similarity' with previously published data is not well defined, nor there was any comment about some of the residues analyzed (namely 417-484). We have revised this part of the manuscript and add to the revised version.

      We removed the line about reasonable similarity as it was vague, added a line about residues 417-484, and revised the text accordingly, starting on line 354.

      (3) There seem to be no replicas of the MD simulations, nor a discussion of the convergence of these simulations. A more detailed description of the equilibration and production schemes used in MD would be helpful. Moreover, there is no discussion of how the equilibration procedure is evaluated, in particular for non-experts this would be helpful in judging the reliability of the procedure.

      We opted for a single, extended equilibrium simulation to comprehensively explore the longterm behavior of the system. Given the specific nature of our investigation and resource constraints, a well-converged, prolonged simulation was deemed a practical and scientifically valid approach, providing a thorough understanding of the system's dynamics. (doi: 10.33011/livecoms.1.1.5957, https://doi.org/10.1146/annurev-biophys-042910-155255 )

      We updated our methods section starting on line 605 with extended information about the MD simulations and the converge criteria for the equilibrium simulations. We also added a section that explains our analysis to check statistical significance of obtained DFI values.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We greatly appreciate the recommendations of the reviewers and have performed further analyses with existing data where requested. 

      Below are our responses to each of the individual points. 

      Reviewer #1 (Recommendations For The Authors):

      (1) P11 mouse retina is still quite young, would MG isolated from adult retina be more interesting and relevant to disease-oriented cell replacement therapy? How efficiently would the sci-Plex system work for in vitro screen of mature murine MG?

      Thank you for bringing this up. While a protocol for the conversion of MG to neurons with adult mice in vivo exists, it has proven to be more difficult to maintain adult MG in dissociated cell cultures, due to their more limited proliferation in vitro. This makes it difficult to use the sci-Plex assay, since cell number is limiting for treatment conditions. Therefore, we have chosen the strategy of screening on P11, where MG undergo proliferative cell divisions in dissociated cultures, allowing us to grow the millions of cells needed for this assay, and then to test the efficacy of the compounds we find from the screen with an adult in vivo assay.

      (2) The study identified and tested the compounds individually, how would a combination of the compounds work in vivo? It would be interesting to examine how different combinations may affect the reprogramming efficiency and neuronal compositions.

      We agree that this would be very interesting to investigate.  However, the number of treatment conditions then expands beyond the scale of the current sci-Plex technology with the number of MG that we are able to collect.  We instead adopted the strategy of casting a very wide net to identify additional molecular pathways that might be important in the reprogramming process.

      (3) In-depth mechanistic and/or functional studies of the reprogrammed MG are highly desirable to improve the quality and significance of the study and to better understand how the compounds may influence the signaling and the reprogramming process.

      While we agree that this would strengthen the study, this would increase the scope of the required revisions considerably. We are very interested in following up on some of the hits and look forward to providing additional details of mechanisms in future publications.  However, we feel that reporting this method and the results will stimulate those interested in reprogramming glia in other areas of the nervous system to test the compounds we identified in this assay.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors employed two protocols to initiate direct reprogramming of MG into retinal neurons in vitro. These protocols, referred to as "Timecourse" and "Pulse," involved short-term treatments lasting no more than 5 days. However, the findings obtained indicate that these brief treatments were insufficient to achieve a stable conversion. This conclusion is supported by the comparison between the "4 days (Timecourse)" and "4 days (Pulse)" conditions, as depicted in Figure 1 (D and E). In this set of experiments, labeling cells that express specific neuronal markers as neurons raises concerns, as these cells may have multiple fates, either died, reverted, arrested in certain intermediate stages, or converted to functional neurons. It is thus critical to determine whether the conversion to functional neurons is enhanced.

      We thank you for your concern about this. We aimed to be very careful in our naming. In our naming scheme for this figure, we only consider the small number of cells with specific Bipolar markers (Trpm1, Grm6, Capb5, Otx2) neurons based on previous publications ((Jorstad et al. 2017; Todd et al. 2021; Todd et al. 2022; Todd et al. 2020)). The other cells that have some neuronal markers are identified as neuronal precursors (NeuPre) and are, as you mentioned, not necessarily mature/functional. While these NeuPre cells may eventually have multiple fates/may die/may revert to more ProL cells at some rate we believe it’s fair to define them as Neuronal Precursors due to the genes they are expressing (Dcx, Snap25, Elavl3, Gap43) at the moment of collection.  

      Furthermore, your statement indicating that “the findings obtained indicate that these brief treatments were insufficient to achieve a stable conversion” is not what we intended to demonstrate. The text will be reworked to reflect what we hoped to convey. We acknowledge that 1) the majority cells are not stably converted, and 2) the levels of NeuPre cells are lower in the Pulse experiment overall, but this is true even at Day 5 when the conditions should be the same across experiments. The Pulse and Timecourse experiments were done on different days, and having previously found that there are differences in MG to BP conversion rate from experiment to experiment, these results were not unexpected. Of more note to us was that while ProL cells, Transition cells, and MG have very different patterns of abundance across time when comparing the experiments, the NeuPre cells accumulate at a similar time and pattern across the two experiments. This indicated to us that they uniquely have some amount of Ascl1 independent stability in their cell fate even when exposed to Ascl1 for as little as 3 days. See Author response image 1 below. This plot will be added to Fig. S1.

      Author response image 1.

      (2) The authors made a claim that a pseudo time value of 15 represents a crucial timepoint where the transition in cell fate becomes stable and ceases to rely on ectopic Ascl1 expression. However, it is essential to provide concrete evidence to substantiate this assertion. It is prudent to perform quantitative analyses rather than relying solely on the deduced trajectory to make this claim.

      This is a fair point, the value of 15 was estimated by eye. We have returned to the data and estimated a density function for the pseudotime scores of the cells from the 1, 2, 3, and 4 day conditions in both the Pulse and Timecourse experiments (Author response image 2A-B below). We then calculated 16 to be the local minima between the pseudotime values of 10-20 for the Pulse experiment (Blue line). When comparing the two experiments, it’s apparent that there is a massive accumulation of cells with a pseudotime value just lower than 16 in the Timecouse experiment (values 10-15), and very few cells across the same region for the Pulse experiment, indicating some dependence on continued Ascl1 expression for the cell fate that exists from pseudotime 10-16 (mostly ProL cells). To the contrary, cells with greater pseudotime values exist across both experiments at similar levels.

      We have also looked at the expression of Ascl1 along the pseudotime trajectory in the Timecourse experiment. Interestingly, and consistent with experiments in previous studies, both in vitro and in vivo (Todd et al. 2021; Todd et al. 2022; Todd et al. 2020), we see a decrease in Ascl1 expression as the cells move towards the end of the pseudotime trajectory (C below). It’s intriguing to us that the downregulation also happens right after a pseudotime value of 16. The temporal coalescence of the loss of Ascl1 expression in the Timecourse experiment with the persistence of cells with pseudotime values > 16 in the Pulse experiment provides strong evidence that we have identified the point at which cells stop expressing Ascl1 while maintaining more mature cell fates. The plots below will be added to the manuscript.

      Author response image 2.

      (3) It is intriguing to observe that the expression of Ascl1 was down-regulated in both neuronal precursors and bipolar cells in the mouse retina following tamoxifen and NMDA treatment (refer to Fig. 3C). However, the expression of ectopical Ascl1 should have been constitutively activated by tamoxifen. Therefore, if the GFP+ bipolar cells and neuronal precursors were indeed converted from Müller cells, we would expect to capture a high level of Ascl1 expression. How to account for this discrepancy? How is the expression exogenous Ascl1 expressed from a constitutive promoter attenuated?

      As discussed above, this has been observed previously. Ascl1 driven from the TTA transgenic mouse line is high in the MG, but declines as these cells are reprogrammed into neurons in vivo or in vitro.  One possibility is that the TTA is not as active in neurons as in MG, but in other lines of transgenic mice, eg. TRE-Atoh1 mice, the transgene continues to be expressed at a high level even in the differentiating neurons, so this downregulation appears to be unique to Ascl1.  We do not understand why Ascl1 levels decline in the differentiating neurons, but this has been a consistent finding across several studies of in vivo and in vitro reprogramming.

      (4) Exogenous Ascl1 was shut down after other neuronal specific genes were induced during MG reprogramming in vitro. Is this also the case during Ascl1-mediated reprogramming in vivo? If so, do converting cells show a distinct gene expression program if exogenous Ascl1 is constitutively overexpressed?

      Yes, as can be seen in Fig 3C Ascl1 expression is high in the MG and Transition cell populations, but decreases in the NeuPre and Bipolar cells. As stated above, continued high Ascl1 expression keeps cells in a more progenitor-like state. This is true in vivo and in vitro. It has been more clearly addressed upon revision.  

      (5) As previously documented in their Science Advances publication, the authors have established the requirement of NMDA injury for facilitating the successful induction of neuronal conversion through Ascl1 over-expression. Why is injury required for MG conversion in vivo, but not in vitro? This is related to question #1 above that certain signals may be required for the full conversion process, not just the initial induction of a few neuronal specific genes.

      While the in vitro and in vivo systems share similarities, there are key differences, which affect what must be done to the cells in order to produce converted neurons. In our initial publication demonstrating that Ascl1 can reprogram mouse MG to a neurogenic state, we carried out our experiments in dissociated cell cultures (Pollak et al 2013) like those described in this report.  At that time, we did not need to add either NMDA or TSA to the cultures to induce neurogenesis from Ascl1.  However, when we attempted the reprogramming in vivo, we found that after postnatal day 8, injury and TSA were required in vivo (Ueki et al; Jorstad et al). We surmise that the massive neuronal loss that occurs in establishing dissociated MG cultures replaces the NMDA injury we carry out in vivo.   

      To your second point about the requirement for more than “just the initial induction of a few neuronal specific genes”. This is definitely true. When we carry out reprogramming in vivo with Ascl1 or other transcription factors, the MG-derived neurons acquire neuronal morphology, develop neuron-like electrophysiological properties, integrate into the retinal circuit and respond to light stimulus; however, they are still not identical in gene expression or morphology to normal retinal neurons. This  is why we are continuously looking for more compounds or conditions that can help improve the process.

      (6) The discovery that Metformin acts as a stimulator for MG-to-neuron conversion is interesting.

      However, before drawing definitive conclusions, several questions need to be addressed:

      (a) As specific small molecules have been identified to change cell fates, the question is whether Metformin and other effective compounds can function alone or have to effect in conjunction with Ascl1? This can and should be tested in vitro by simply treating MG with Metformin but not doxycycline.

      To our knowledge there are no convincing in vivo trials in which neurons have been generated from MG using only combinations of small molecules. Because Metformin was identified in vitro due to the increase in recovered cells and not an increase in % neurons, we especially doubt it would have the desired increase in neurons without expression of a transcription factor.  

      (b) Metformin is known to target AMPK, but this is unlikely the only target of the drug. Does AMPK knockdown have the same enhancement effect?

      In the drug screen, we also tested the AMPK inhibitor Dorsomorphin dihydrochloride, but it didn’t have any effect. However, Metformin is an activator, so it would be interesting to see in future studies if Dorsomorphin dihydrochloride could inhibit the effect of Metformin or if the enhancement is acting independently.  

      (c) Is the effect of Metformin specific for Ascl1 or any TF(s) that stimulates MG-to-neuron conversion?

      We would like to follow up with this in future.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study advances the understanding of physiological mechanisms in deep-sea Planctomycetes bacteria, revealing unique characteristics such as the only known Phycisphaerae using a budding mode of division, extensive involvement in nitrate assimilation and release phage particles without cell death. The study uses convincing evidence, based on experiments using growth assays, phylogenetics, transcriptomics, and gene expression data. The work will be of interest to bacteriologists and microbiologists in general.

      Response: Thanks for the Editor’s and Reviewers’ positive comments, which help us improve the quality of our manuscript entitled “Physiological and metabolic insights into the first cultured anaerobic representative of deep-sea Planctomycetes bacteria” (paper#eLife-RP-RA-2023-89874). The comments are all valuable, and we have studied the comments carefully and have made corresponding revisions according to the suggestions. Revised portions are marked in blue in the modified manuscript.

      Please find the detailed responses as following.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors of the manuscript cultivated a Planctomycetes strain affiliated with Phycisphaerae. The strain was one of the few Planctomycetes from deep-sea environments and demonstrated several unique characteristics, such as being the only known Phycisphaerae using a budding mode of division, extensive involvement in nitrate assimilation, and being able to release phage particles without cell death. The manuscript is generally well-written. However, a few issues need to be more clearly addressed, especially regarding the identification and characterization of the phage.

      Response: Thanks for your positive comments. Please find the detailed responses as following.

      Reviewer #1 (Recommendations For The Authors):

      • Line 75-77, add a reference for this statement.

      Response: Thanks for your suggestion. We have added a reference (Fuerst and Sagulenko, 2011) for this statement in the revised manuscript (Line 77).

      References related to this response:

      Fuerst, J.A., and Sagulenko, E. Beyond the bacterium: planctomycetes challenge our concepts of microbial structure and function. Nat Rev Microbiol. 2011;9:403-413.

      • Line 124-134, add key statistics (such as ANI) of strain ZRK32 and KS4 to this section.

      Response: Thanks for your suggestion. We added the key statistics of strain ZRK32 and KS4, and described as “Based on the 16S rRNA sequence of strain ZRK32, a sequence similarity calculation using the NCBI server indicated that the closest relatives of strain ZRK32 were Poriferisphaera corsica KS4T (98.06%), Algisphaera agarilytica 06SJR6-2T (88.04%), Phycisphaera mikurensis NBRC 102666T (85.28%), and Tepidisphaera mucosa 2842T (82.94%). Recently, the taxonomic threshold for species based on 16S rRNA gene sequence identity value was 98.65% (Kim et al., 2014). Based on these criteria, we proposed that strain ZRK32 might be a novel representative of the genus Poriferisphaera. In addition, to clarify the phylogenetic position of strain ZRK32, the genome relatedness values were calculated by the average nucleotide identity (ANI), the tetranucleotide signatures (Tetra), and in silico DNA-DNA similarity (isDDH), against the genomes of strains ZRK32 and KS4. The ANIb, ANIm, Tetra, and isDDH values were 72.89%, 85.34%, 0.97385, and 20.90%, respectively (Table S1). These results together demonstrated the strain ZRK32 genome to be obviously below established ‘cut-off’ values (ANIb: 95%, ANIm: 95%, Tetra: 0.99, isDDH: 70%) for defining bacterial species, suggesting strain ZRK32 represents a novel strain within the genus Poriferisphaera.” in the revised manuscript (Lines 124-139).

      • Fig. 2A missing description for figure key.

      Response: Thanks for your comments. We modified the Figure 2A, shown as below:

      Author response image 1.

      Figure. 2. Growth assay and transcriptomic analysis of P. heterotrophicis ZRK32 strains cultivated in basal medium and rich medium.

      • Regarding the page released, could this be a membrane vesicle-engulfed phage? I would recommend checking "Spontaneous Prophage Induction Contributes to the Production of Membrane Vesicles by the Gram-Positive Bacterium Lacticaseibacillus casei BL23" and "Chronic Release of Tailless Phage Particles from Lactococcus lactis" for further references.

      Response: Thanks for your valuable comments. We carefully read these two papers and found that phage ZRK32 is most likely a membrane vesicle-engulfed phage. We added the corresponding description as “Moreover, it has recently been reported that the tailless Caudoviricetes phage particles are enclosed in lipid membrane and are released from the host cells by a nonlytic mechanism (Liu et al., 2022), and the prophage induction contributes to the production of membrane vesicles by Lacticaseibacillus casei BL23 during cell growth (da Silva Barreira et al., 2022). Considering that strain ZRK32 has a large number of membrane vesicles during cell growth (Figure S9), we speculated that Phage-ZRK32 might be a membrane vesicle-engulfed phage and its release should be related to membrane vesicles.” in the revised manuscript (Lines 381-388).

      References related to this response:

      Liu Y, Alexeeva S, Bachmann H, Guerra Martníez J.A, Yeremenko N, Abee T et al. Chronic release of tailless phage particles from Lactococcus lactis. Appl Environ Microbiol. 2022; 88: e0148321.

      Silva Barreira, D., Lapaquette, P., Novion Ducassou, J., Couté, Y., Guzzo, J., and Rieu, A. Spontaneous prophage induction contributes to the production of membrane vesicles by the gram-positive bacterium Lacticaseibacillus casei BL23. mBio. 2022;13:e0237522.

      • How were the reference sequences for Fig. S10-S13 retrieved, was it by blasting the phage gene against the entire NCBI database, or only the virus sequence within the NCBI? Please clarify this.

      Response: Thanks for your comments. The reference sequences for Fig. S10-S13 were retrieved by blasting the phage gene against the entire NCBI database. We clarified this as “The reference sequences of four AMGs encoding amidoligase, glutamine amidotransferase, gamma-glutamylcyclotransferase, and glutathione synthase were retrieved by blasting the phage gene against the entire NCBI database, respectively.” in the revised manuscript (Lines 444-447).

      Reviewer #2 (Public Review):

      Summary:

      Planctomycetes encompass a group of bacteria with unique biological traits, the compartmentalized cells make them appear to be organisms in between prokaryotes and eukaryotes. However, only a few of the Planctomycetes bacteria are cultured thus far, and this hampers insight into the biological traits of these evolutionarily important organisms. This work reports the methodology details of how to isolate the deep-sea bacteria that could be recalcitrant to laboratory cultivation, and further reveals the distinct characteristics of the new species of a deep-sea Planctomycetes bacterium, such as the chronic phage release without breaking the host and promote the host and related bacteria in nitrogen utilization. Therefore, the finding of this work is of importance in extending our knowledge of bacteria.

      Response: Thanks for your positive comments.

      Strengths:

      Through the combination of microscopic, physiological, genomics, and molecular biological approaches, this reports the isolation and comprehensive investigation of the first anaerobic representative of the deep-sea Planctomycetes bacterium, in particular in that of the budding division, and release phage without lysis of the cells. Most of the results and conclusions are supported by the experimental evidence.

      Response: Thanks for your positive comments.

      Weaknesses:

      1. While EMP glycolysis is predicted to be involved in energy conservation, no experimental evidence indicated any sugar utilization by the bacterium.

      Response: Thanks for your comments. We have previously tested the sugar utilization of strain ZRK32, and now added this description as “Consistent with the presence of EMP glycolysis pathway in strain ZRK32, we found that it could use a variety of sugars including glucose, maltose, fructose, isomaltose, galactose, D-mannose, and rhamnose (Table S2).” in the revised manuscript (Lines 281-284).

      1. "anaerobic representative" is indicated in the Title, the contrary, TCA in energy metabolism is predicted by the bacterium.

      Response: Thanks for your valuable comments. Currently, anaerobic microorganisms can use other alternative electron acceptors (such as sulfate reducers, nitrate reducers, iron reducers, etc) in place of oxygen for the TCA cycle. For example, Proteus mirabilis uses the whole oxidative TCA cycle without using oxygen as the final electron acceptor when it performs multicellular swarming (Alteri et al., 2012). In this study, all the genes involved in the TCA cycle were present in anaerobic strain ZRK32 and most of them are upregulated, thus we speculate that it might function through the complete TCA metabolic pathway to obtain energy. We added the related description as “Notably, when growing in the rich medium, the expressions of most genes involved in the TCA cycle and EMP glycolysis pathway in strain ZRK32 were upregulated (Figure 2B-D, Figure S5B and Figure S6), suggesting that strain ZRK32 might function through the complete TCA metabolic pathway and EMP glycolysis pathway to obtain energy for growth (Figure S8) (Zheng et al., 2021b). Consistent with the presence of EMP glycolysis pathway in strain ZRK32, we found that it could use a variety of sugars including glucose, maltose, fructose, isomaltose, galactose, D-mannose, and rhamnose (Table S2). As for the presence of TCA cycle in the anaerobic strain ZRK32, we propose that it might use other alternative electron acceptors (such as sulfate reducers, nitrate reducers, iron reducers, etc) in place of oxygen for the TCA cycle, as shown in other anaerobic bacteria (Alteri et al., 2012).” in the revised manuscript (Lines 277-287).

      References related to this response:

      Alteri CJ, Himpsl SD, Engstrom MD, Mobley HL. Anaerobic respiration using a complete oxidative TCA cycle drives multicellular swarming in Proteus mirabilis. mBio. 2012; 3(6): e00365-12.

      1. The possible mechanisms of the chronic phage release without breaking the host are not discussed.

      Response: Thanks for your valuable comments. The possible mechanism of the chronic phage release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic mechanism. We added the corresponding description as “Moreover, it has recently been reported that the tailless Caudoviricetes phage particles are enclosed in lipid membrane and are released from the host cells by a nonlytic mechanism (Liu et al., 2022), and the prophage induction contributes to the production of membrane vesicles by Lacticaseibacillus casei BL23 during cell growth (da Silva Barreira et al., 2022). Considering that strain ZRK32 has a large number of membrane vesicles during cell growth (Figure S9), we speculated that Phage-ZRK32 might be a membrane vesicle-engulfed phage and its release should be related to membrane vesicles.” in the revised manuscript (Lines 381-388).

      References related to this response:

      Liu Y, Alexeeva S, Bachmann H, Guerra Martníez J.A, Yeremenko N, Abee T et al. Chronic release of tailless phage particles from Lactococcus lactis. Appl Environ Microbiol. 2022; 88: e0148321. da Silva Barreira, D., Lapaquette, P., Novion Ducassou, J., Couté, Y., Guzzo, J., and Rieu, A. Spontaneous prophage induction contributes to the production of membrane vesicles by the gram-positive bacterium Lacticaseibacillus casei BL23. mBio. 2022;13:e0237522.

      Reviewer #2 (Recommendations For The Authors):

      • Have you tested whether strain ZRK32 uses any sugars? If not, why it uses EMP pathway to obtain energy?

      Response: Thanks for your comments. We have previously tested the sugar utilization of strain ZRK32, and now added this description as “Consistent with the presence of EMP glycolysis pathway in strain ZRK32, we found that it could use a variety of sugars including glucose, maltose, fructose, isomaltose, galactose, D-mannose, and rhamnose (Table S2).” in the revised manuscript (Lines 281-284).

      • Further discussion on possible mechanisms of the chronic phage release without breaking the host is expected.

      Response: Thanks for your valuable comments. The possible mechanism of the chronic phage release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic mechanism. We added the corresponding description as “Moreover, it has recently been reported that the tailless Caudoviricetes phage particles are enclosed in lipid membrane and are released from the host cells by a nonlytic mechanism (Liu et al., 2022), and the prophage induction contributes to the production of membrane vesicles by Lacticaseibacillus casei BL23 during cell growth (da Silva Barreira et al., 2022). Considering that strain ZRK32 has a large number of membrane vesicles during cell growth (Figure S9), we speculated that Phage-ZRK32 might be a membrane vesicle-engulfed phage and its release should be related to membrane vesicles.” in the revised manuscript (Lines 381-388).

      References related to this response:

      Liu Y, Alexeeva S, Bachmann H, Guerra Martníez J.A, Yeremenko N, Abee T et al. Chronic release of tailless phage particles from Lactococcus lactis. Appl Environ Microbiol. 2022; 88: e0148321.

      da Silva Barreira, D., Lapaquette, P., Novion Ducassou, J., Couté, Y., Guzzo, J., and Rieu, A. Spontaneous prophage induction contributes to the production of membrane vesicles by the gram-positive bacterium Lacticaseibacillus casei BL23. mBio. 2022;13:e0237522.

      • It is recommended that the writing is improved, including presentation style and grammar.

      Response: Thanks for your comments. We have invited an English native speaker (Dr. Diana Walsh from Life Science Editors, USA) to revise our manuscript, which we hope to meet your approval.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Millard and colleagues investigated if the analgesic effect of nicotine on pain sensitivity, assessed with two pain models, is mediated by Peak Alpha Frequency (PAF) recorded with resting state EEG. The authors found indeed that nicotine (4 mg, gum) reduced pain ratings during phasic heat pain but not cuff pressor algometry compared to placebo conditions. Nicotine also increased PAF (globally). However, mediation analysis revealed that the reduction in pain ratings elicited by the phasic heat pain after taking nicotine was not mediated by the changes in PAF. Also, the authors only partially replicated the correlation between PAF and pain sensitivity at baseline (before nicotine treatment). At the group-level no correlation was found, but an exploratory analysis showed that the negative correlation (lower PAF, higher pain sensitivity) was present in males but not in females. The authors discuss the lack of correlation.

      In general, the study is rigorous, methodology is sound and the paper is well-written. Results are compelling and sufficiently discussed.

      Strengths:

      Strengths of this study are the pre-registration, proper sample size calculation, and data analysis. But also the presence of the analgesic effect of nicotine and the change in PAF.

      Weaknesses:

      It would even be more convincing if they had manipulated PAF directly.

      We thank Reviewer #1 for their positive and constructive comments regarding our study. We appreciate the view that the study was rigorous and methodologically sound, that the paper was well-written, and that the strengths included our pre-registration, sample size calculation, and data analysis.

      In response to the reviewer's comment about more directly manipulating Peak Alpha Frequency (PAF), we agree that such an approach could provide a more direct investigation of the role of PAF in pain processing. We chose nicotine to modulate PAF as the literature suggested it was associated with a reliable increase in PAF speed. As mentioned in our Discussion, there are several alternative methods to manipulate PAF, such as non-invasive brain stimulation techniques (NIBS) like transcranial alternating current stimulation (tACS) or neurofeedback training. These approaches could help clarify whether a causal relationship exists between PAF and pain sensitivity. Although methods such as NIBS still require further investigation as there is little evidence for these approaches changing PAF (Millard et al., 2024).

      Reviewer #2 (Public Review):

      Summary:

      The study by Millard et al. investigates the effect of nicotine on alpha peak frequency and pain in a very elaborate experimental design. According to the statistical analysis, the authors found a factor-corrected significant effect for prolonged heat pain but not for alpha peak frequency in response to the nicotine treatment.

      Strengths:

      I very much like the study design and that the authors followed their research line by aiming to provide a complete picture of the pain-related cortical impact of alpha peak frequency. This is very important work, even in the absence of any statistical significance. I also appreciate the preregistration of the study and the well-written and balanced introduction. However, it is important to give access to the preregistration beforehand.

      Weaknesses:

      The weakness of the study revolves around three aspects:

      (1) I am not entirely convinced that the authors' analysis strategy provides a sufficient signal-tonoise ratio to estimate the peak alpha frequency in each participant reliably. A source separation (ICA or similar) would have been better suited than electrode ROIs to extract the alpha signal. By using a source separation approach, different sources of alpha (mu, occipital alpha, laterality) could be disentangled.

      (2) Also, there's a hint in the literature (reference 49 in the manuscript) that the nicotine treatment may not work as intended. Instead, the authors' decision to use nicotine to modulate the peak alpha frequency and pain relied on other, not suitable work on chronic pain and permanent smokers. In the present study, the authors use nicotine treatment and transient painful stimulation on nonsmokers.

      (3) In my view, the discussion could be more critical for some aspects and the authors speculate towards directions their findings can not provide any evidence. Speculations are indeed very important to generate new ideas but should be restricted to the context of the study (experimental pain, acute interventions). The unfortunate decision to use nicotine severely hampered the authors' aim of the study.

      Impact:

      The impact of the study could be to show what has not worked to answer the research questions of the authors. The authors claim that their approach could be used to define a biomarker of pain. This is highly desirable but requires refined methods and, in order to make the tool really applicable, more accurate approaches at subject level.

      We thank reviewer #2 for their recognition of the study’s design, the importance of this research area, and the pre-registration of our study. In response to the weaknesses highlighted:

      (1) We appreciate the reviewer’s suggestion to improve the signal-to-noise ratio by applying source separation techniques, such as ICA, which have now been performed and incorporated into the manuscript. Our original decision to use sensor-level ROIs followed the precedent set in previous studies, our rationale being to improve reproducibility and avoid  biases from picking individual electrodes or manually picking sources. We have  added analyses using an automated pipeline that selects components based on the presence of a peak in the alpha range and alignment with a predefined template topography representing sensorimotor sites. Here again we found no significant differences in the mediation results that used a sensor space sensorimotor ROI, further supporting the robustness of the chosen approach. ICA could still potentially disentangle different sources of alpha, such as occipital alpha and mu rhythm, and provide new insights into the PAF-pain relationship. We have now added a discussion in the manuscript about the potential advantages of source separation techniques and suggest that the possible contributions of separate alpha sources be investigated and compared to sensor space PAF as a direction for future research.

      (2) We recognise the reviewer's concern regarding our choice of nicotine as a modulator of pain and alpha peak frequency (PAF). The meta-analysis by Ditre et al. (2016) indeed points to small effect sizes for nicotine's impact on experimental pain and highlights the potential for publication bias. However, our decision to use nicotine in this study was not primarily based on its direct analgesic effects, but rather on its well-documented ability to modulate PAF, in smoking and non-smoker populations, as outlined in our study aims.

      In this regard, the intentional use of nicotine was to assess whether changes in PAF could mediate alterations in pain. This approach aligns with the broader concept that a direct effect of an intervention is not necessary to observe indirect effects (Fairchild & McDaniel, 2017). We have, however, revised our introduction to further clarify this rationale, highlighting that nicotine was used as a tool for PAF modulation, not solely for its potential analgesic properties.

      (3) We agree with the reviewer’s observation that certain aspects of the Discussion could be more cautious, particularly regarding speculations about nicotine’s effects and PAF as a biomarker of pain. We have revised the Discussion to ensure that our interpretations are better grounded in the data from this study, clearly stating the limitations and avoiding overgeneralization. This revision focuses on a more critical evaluation of the potential relationships between PAF, nicotine, and pain sensitivity based solely on our experimental context.

      Finally, We also apologize for not providing access to the preregistration earlier. This was an oversight on our end, and we will ensure that future preregistrations are made available upfront.

      Reviewer #3 (Public Review):

      In this manuscript, Millard et al. investigate the effects of nicotine on pain sensitivity and peak alpha frequency (PAF) in resting state EEG. To this end, they ran a pre-registered, randomized, double-blind, placebo-controlled experiment involving 62 healthy adults who received either 4 mg nicotine gum (n=29) or placebo (n=33). Prolonged heat and pressure were used as pain models. Resting state EEG and pain intensity (assessed with a visual analog scale) were measured before and after the intervention. Additionally, several covariates (sex at birth, depression and anxiety symptoms, stress, sleep quality, among others) were recorded. Data was analyzed using ANCOVAequivalent two-wave latent change score models, as well as repeated measures analysis of variance. Results do not show *experimentally relevant* changes of PAF or pain intensity scores for either of the prolonged pain models due to nicotine intake.

      The main strengths of the manuscript are its solid conceptual framework and the thorough experimental design. The researchers make a good case in the introduction and discussion for the need to further investigate the association of PAF and pain sensitivity. Furthermore, they proceed to carefully describe every aspect of the experiment in great detail, which is excellent for reproducibility purposes. Finally, they analyse the data from almost every possible angle and provide an extensive report of their results.

      The main weakness of the manuscript is the interpretation of these results. Even though some of the differences are statistically significant (e.g., global PAF, pain intensity ratings during heat pain), these differences are far from being experimentally or clinically relevant. The effect sizes observed are not sufficiently large to consider that pain sensitivity was modulated by the nicotine intake, which puts into question all the answers to the research questions posed in the study.

      We would like to express our gratitude to Reviewer #3 for their thoughtful and constructive review, including the positive feedback on the strengths of our study's conceptual framework, experimental design, and thorough methodological descriptions.

      We acknowledge the concern regarding the experimental and clinical relevance of some statistically significant results (e.g., global PAF and pain intensity during heat pain) and agree that small effect sizes may limit their practical implications. However, our primary goal was to assess whether nicotine-induced changes in PAF mediate pain changes, rather than to demonstrate large direct effects on pain sensitivity. Nicotine was chosen for its known ability to modulate PAF, and our focus was on the mechanistic role of PAF in pain perception. To clarify this, we have revised the discussion to better differentiate between statistical significance, experimental relevance, and clinical applicability. We emphasize that this study represents a preliminary step towards understanding PAF’s mechanistic role in pain, rather than a direct clinical application.

      We appreciate the suggestion to refine our interpretation. We have adjusted our language to ensure it aligns with the effect sizes observed and made recommendations for future research, such as testing different nicotine doses, to potentially uncover stronger or more clinically relevant effects.

      Although modest, we believe these findings offer valuable insights into the potential mechanisms by which nicotine affects alpha oscillations and pain. We have also discussed how these small effects could become more pronounced in different populations (e.g., chronic pain patients) and over time, offering guidance for future research on PAF modulation and pain sensitivity.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I have a number of points that the authors may want to consider for this or future work.

      (1) By reviewing the literature provided by the authors in the introduction I think that using nicotine as a means to modulate pain and alpha peak frequency was a mistake. The only work that may give a hint on whether nicotine can modulate experimental pain is the meta-analysis by Ditre and colleagues (2016). They suggest that their small effect may contain a publication bias. I think the other "large body of evidence" is testing something else than analgesia.

      Thank you for your consideration of our choice of nicotine in the study. The meta-analysis by Ditre and colleagues (2016) suggests small effect sizes for nicotine's impact on experimental pain, compared to the moderate effects claimed in some papers, especially when accounting for the potential publication bias you mentioned. However, our selection of nicotine was primarily driven by its documented ability to modulate PAF rather than its direct analgesic effects, as clearly stated in our aims. Therefore, we do not view our decision to use nicotine as a mistake; instead, it was aligned with our goal of assessing whether changes in PAF mediate alterations in pain and thus served as a valuable tool. This perspective aligns with the broader concept that a direct effect is not a prerequisite for observing indirect effects of an intervention on an outcome (Fairchild &

      McDaniel, 2017). To further enhance clarity, we've revised the introduction to emphasize the role of nicotine in manipulating PAF in relation to our study's aims.

      Previously we wrote: “A large body of evidence suggests that nicotine is an ideal choice for manipulating PAF, as both nicotine and smoking increase PAF speed [37,40–47] as well as pain thresholds and tolerance [48–52].” This has been changed to read: “Because evidence suggests that nicotine can modulate PAF, where both nicotine and smoking increase PAF speed [37,40–47], we chose nicotine to assess our aim of whether changes in PAF mediate changes in pain in a ‘mediation by design’ approach [48]. In addition, given evidence that nicotine may increase experimental pain thresholds and tolerance [49–53], nicotine could also influence pain ratings during tonic pain.”

      (2) As mentioned above, the OSF page is not accessible.

      We apologise for this. We had not realised that the pre-registration was under embargo, but we have now made it available.

      (3) I generally struggle with the authors' approach to investigating alpha. With the approach the authors used to detect peak alpha frequency it might be that the alpha signal may just show such a low amplitude that it is impossible to reliably detect it at electrode level. In my view, the approach is not accurate enough, which can be seen by the "jagged" shape of the individual alpha peak frequency. In my view, a source separation technique would have been more useful. I wonder which of the known cortical alphas contributes to the effects the authors have reported previously: occipital, mu rhythms projections or something else? A source separation approach disentangles the different alphas and will increase the SNR. My suggestion would be to work on ICA components or similar approaches. The advantage is that the components are almost completely free of any artefacts. ICAs could be run on the entire data or separately for each individual. In the latter case, it might be that some participants do not exhibit any alpha component.

      We appreciate your thoughtful consideration of our approach to investigating alpha. The calculation of PAF involves various methods and analysis steps across the literature (Corcoran et al., 2018; Gil Avila et al., 2023; McLain et al., 2022). Your query about which known cortical alphas contribute to reported effects is important. Initially focusing on a sensorimotor component from an ICA in Furman et al., 2018, subsequent work from our labs suggested a broader relationship between PAF and pain across the scalp (Furman et al., 2019; Furman et al., 2020; Millard et al., 2022), and a desire to conduct analyses at the sensor level in order to improve the reproducibility of the methods (Furman et al., 2020). However, based on your comment we have made several additions to the manuscript, including: explaining why we did not use manual ICA methods, suggest this for future research, and added an exploratory analysis using a recently developed automated pipeline that selects components based on the presence of a peak in the alpha range and alignment with a predefined template topography representing activity from occipital or motor sites.

      While we acknowledge that ICA components can offer a better signal-to-noise ratio (SNR) and possibly smoother spectral plots, we opted for our chosen method to avoid potential bias inherent in deciding on a component following source separation. The desire for a quick, automated, replicable, and unbiased pipeline, crucial for potential clinical applications of PAF as a biomarker, influenced this decision. At the time of analysis registration, automated methods for deciding which alpha components to extract following ICA were not apparent. We have now added this reasoning to Methods.

      “Contrary to some previous studies that used ICA to isolate sensory region alpha sources (Furman et al., 2018; De Martino et al., 2021; Valentini et al., 2022), we used pre-determined sensor level ROIs to improve reproducibility and reduce the potential for bias when individually selecting ICA components. Using sensor level ROIs may decrease the signal-to-noise ratio of the data; however, this approach has still been effective for observing the relationship between PAF and experimental pain (Furman et al., 2019; Furman et al., 2020).”

      We have also added use of ICA and development of methods as a suggestion for future research in the discussion:

      “Additionally, the use of global PAF may have introduced mediation measurement error into our mediation analysis. The spatial precision used in the current study was based on previous literature on PAF as a biomarker of pain sensitivity, which have used global and/or sensorimotor ROIs (Furman et al., 2018; Furman et al., 2020). Identification and use of the exploratory electrode clusters found in this study could build upon the current work (e.g., Furman et al., 2021). However, exploratory analysis of the clusters found in the present analysis demonstrated no influence on mediation analysis results (Supplementary Materials 3.8-3.10). Alternatively, independent component analysis (ICA) could be used to identify separate sources of alpha oscillations (Choi et al., 2005), as used in other experimental PAF-pain studies (Furman et al., 2018; Valentini et al., 2022), which could aid to disentangle the potential relevance of different alpha sources in the PAFpain relationship. Although this comes with the need to develop more reproducible and automated methods for identifying such components.”

      The specific location or source of PAF that relates to pain remains unclear. Because of this, we did employ an exploratory cluster-based permutation analysis to assess the potential for variations in the presence of PAF changes across the scalp at sensor level, and emphasise that location of PAF change could be explored in future. However, we have now conducted the mediation analysis (difference score 2W-LCS model) using averages from the data-driven parietal cluster, frontal cluster, and both clusters together. For these we see a stronger effect of gum on PAF change, which was expected given the data driven approach of picking electrodes. There was still a total and direct effect of nicotine on pain during the PHP model, but still no indirect effect via change in PAF. For the CPA models, there were still no significant total, direct, or indirect effects of nicotine on CPA ratings. Therefore, using these data-driven clusters did not alter results compared to the model using the global PAF variable.

      The reader has been directed to this supplementary material so:

      “The potential mediating effect of this change in PAF on change in PHP and CPA was explored (not pre-registered) by averaging within each cluster (central-parietal: CP1, CP2, Cpz, P1, P2, P3, P4, Pz, POz; right-frontal: F8, FT8, FT10) and across both clusters. This averaging across electrodes produced three new variables, each assessed in relation to mediating effects on PHP and CPA ratings. The resulting in six exploratory mediation analysis (difference score 2W-LCS) models demonstrated minimal differences from the main analysis of global PAF (8-12 Hz), except for the

      expected stronger effect of nicotine on change in PAF (bs = 0.11-0.14, ps < .003; Supplementary

      Materials 3.8-3.10).”

      Moreover, our team has been working on an automated method for selecting ICA components, so in response to your comment we assessed whether using this method altered the results of the current analysis. The in-depth methodology behind this new automatic pipeline will be published with a validation from some co-authors in the current collaboration in due course. At present, in summary, this automatic pipeline conducts independent component analysis (ICA) 10 times for each resting state, and selects the component with the highest topographical correlation to a template created of a sensorimotor alpha component from Furman et al., (2018). 

      The results of the PHP or CPA mediation models were not substantially different using the PAF calculated from independent components than that using the global PAF. For the PHP model, the total effect (b = -0.648, p \= .033) and direct effects (b = -0.666, p \= .035) were still significant, and there was still no significant indirect effect (b = 0.018, p \= .726). The general fit was reduced, as although the CFI was above 0.90, akin to the original model, the RMSEA and SRMR were not below 0.08, unlike the original models (Little, 2013). For the CPA model, there were still no significant total (b = -0.371, p \= .357), direct (b = -0.364, p \= .386), or indirect effects (b = -0.007, p \= .906), and the model fit also decreased, with CFI below 0.90 and RMSEA and SRMR above 0.08. See supplementary material (3.11). Note that still no correlations were seen between this IC sensorimotor PAF and pain (PHP: r = 0.11, p = .4; CPA: r \= -0.064, p = .63).

      Interestingly, in both models, there was now no longer a significant a-path (PHP: b = 0.08, p =

      0.292; CPA: b = 0.039, p = 0.575), unlike previously observed (PHP: b = 0.085, p = 0.018; CPA: b = 0.089, p = 0.011). We interpret this as supporting the previously highlighted difference between finding an effect on PAF globally but not in a sensorimotor ROI (and now a sensorimotor IC), justifying the exploratory CBPA and the suggestion in the discussion to explore methodology.

      We understand that this analysis does not fully uncover the reviewer’s question in which they wondered which of the known cortical alphas contributes to the effects reported in our previous work. However, we consider this exploration to be beyond the scope of the current paper, as it would be more appropriately addressed with larger datasets or combinations of datasets, potentially incorporating MEG to better disentangle oscillatory sources. The highlighted differences seen between global PAF, sensorimotor ROI PAF, sensorimotor IC PAF, as well as the CBPA of PAF changes provide ample directions for future research to build upon: 1) which alpha (sensor or source space) are related to pain, 2) how are these alpha signals represented robustly in a replicable way, and 3) which alpha (sensor or source space) are manipulable through interventions. These are all excellent questions for future studies to investigate.

      The below text has been added to the Discussion:

      In-house code was developed to compare a sensorimotor component to the results presented in this manuscript (Supplementary Material 3.11), showing similar results to the sensorimotor ROI mediation analysis presented here. However, examination of which alpha - be it sensor or source space - are related to pain, how they can be robustly represented, and how they can be manipulated are ripe avenues for future study.

      (4) I have my doubts that you can get a reliable close to bell-shaped amplitude distribution for every participant. The argument that the peak detection procedure is hampered by the high-amplitude lower frequency can be easily solved by subtracting the "slope" before determining the peak. My issue is that the entire analysis is resting on the assumption that each participant has a reliable alpha effect at electrode level. This is not the case. Non-alpha participants can severely distort the statistics. ICA-based analyses would be more sensitive but not every participant will show alpha. You may want to argue with robust group effects but In my view, every single participant counts, particularly for this type of data analysis, where in the case of a low SNR the "peak" can easily shift to the extremes. In case there is an alpha effect for a specific subject, we should see a smooth bump in the frequency spectrum between 8 and 12 12Hz. Anything beyond that is hard to believe. The long stimulation period allows a broad FFT analysis window with a good frequency resolution in order to detect the alpha frequency bump.

      The reviewer is correct that non-alpha participants can distort the statistics. We did visually assess the EEG of each individual’s spectra at baseline to establish the presence of global peaks, as we believe this is good practice to aid understanding of the data. Please see Author response image 1 for individual spectra seen at baseline. Although not all participants had a ‘smooth bump in the frequency spectrum between 8 and 12 Hz’, we prefer to not apply/necessitate this assumption to our data. Chiang et al., (2011) suggest that ~3% of individuals do not have a discernible alpha peak, and in our data we observed only one participant without a very obvious spectral peak (px-39). But, this participant does have enough activity within the alpha range to identify PAF by the CoG method (i.e. not just flat spectra and activity on top of 1/f characteristics). Without a pre-registered and standardised decision process to remove such a participant in place, we opted to not remove any participants to avoid curation of our data.

      Author response image 1.

      (5) I find reports on frequent channel rejections reflect badly on the data quality. Bad channels can be avoided with proper EEG preparation. EEG should be continuously monitored during recording in order to obtain best data quality. Have any of the ROI channels been rejected?

      We appreciate your attention to the channel rejection. We believe that the average channels removed (0.94, 0.98, 0.74, and 0.87 [range: 0-4] for each of the four resting states out of 64 channels) does not suggest overly frequent rejection, as it was less than one electrode on average and the numbers are below the accepted number of bad channels to remove/interpolate (i.e. 10%) in EEG pipelines (Debnath et al., 2020; Kayhan et al., 2022). To maintain data quality, consistently poor channels were identified and replaced over time. We hope you will accept our transparency on this issue and note that by stating how channel removal decisions were made (i.e. 8 or more deviations) and reporting the number of channels removed, we adhere to the COBIDAS guidelines (Pernet et al., 2018; 2020).

      During analysis, cases of sensorimotor ROI channels being rejected were noted and are now specified in our manuscript. “Out of 248 resting states recorded, 14 resting states had 4 ROI channels instead of 5. Importantly, no resting state had fewer than 4 channels for the sensorimotor ROI.”

      Note, we also realised that we had not specified that we did interpolate channels for the cluster based permutation analysis. This has been corrected with the following sentence:

      “Removed channels were not interpolated for the pre-registered global and sensorimotor ROI averaged analyses, but were interpolated for an exploratory cluster based permutation analysis using the nearest neighbour average method in `Fieldtrip`.”

      (6) I have some issues buying the authors' claims that there is an effect of nicotine on prolonged pain. By looking at the mean results for the nicotine and placebo condition, this can not be right. What was the point in including the variables in the equation? In my view, in this within-subject design the effect of nicotine should be universal, no matter what gender, age, or depression. The unconditional effect of nicotine is close to zero. I can not get my head around how any of the variables can turn the effects into significance. There must be higher or lower variable scores that might be related to a higher or lower effect on nicotine. The question is not to consider these variables as a nuisance but to show how they modulate the pain-related effect of nicotine treatment. Still, the overall nicotine effect of the entire group is basically zero.

      Another point is that for within-subject analyses even tiny effects can become statistically significant if they are systematically in one direction. This might be the case here. There might be a significant effect of nicotine on pain but the actual effect size (5.73 vs. 5.78) is actually not interpretable. I think it would be interesting for the reader how (in terms of pain rating difference) each of the variables can change the effect of nicotine.

      Thank you for your comments. We recognize the concern about interpreting the effect of nicotine on prolonged pain solely based on mean results, and in fact wish to discourage this approach. It's crucial to note that both PAF and pain are highly individual measures (i.e. high inter-individual variance), necessitating the use of random intercepts for participants in our analyses to acknowledge the inherent variability at baseline across participants. Including random intercepts rather than only considering the means helps address the heterogeneity in baseline levels among participants. We also recognise that displaying the mean PHP ratings for all participants in Table 2 could be misleading, firstly because these means do not have weight in an analysis that takes into account a random-effects intercept for participants, and secondly because two participants (one from each group) did not have post-gum PHP assessments and were not included in the mediation analysis due to list-wise deletion of missing data. Therefore, to reduce the potential for misinterpretation, we have added extra detail to display both the full sample and CPA mediation analysis (i.e. N=62) and the data used for PHP mediation analysis (i.e. n=60) in Table 2. We hope that the extra details added to this table will help the readers interpretation of results.

      In light of this, we have also altered the PAF Table 3 to reflect both the pre-post values used for the CPA mediation and baseline correlations with CPA and PHP pain (i.e. N=62), and the pre-post values used for the PHP mediation (i.e. n=60).

      It is inherently difficult to visualise the findings of a mediation analysis with confounding variables that also used latent change scores (LCS) and random-effect intercepts for participants. LCS was specifically used because of issues of regression to the mean that occur if you calculate a straightforward ‘difference-score’, therefore calculating the difference in order to demonstrate the results of the statistical model in a figure, for example, does not provide a full description of the data assessed (Valente & McKinnon, 2017). Nevertheless, if we look at the data descriptively with this in mind, then calculating the change in PHP ratings does indicate that, for the nicotine group, the mean change in PHP ratings was -0.047 (SD = 1.05, range: -4.13, 1.45). Meanwhile, for the placebo group the mean change in PHP ratings was 0.33 (SD = 0.75, range: -1.37, 1.66). Therefore suggesting a slight decrease in pain ratings on average for the nicotine group compared to a slight increase on average for the placebo group. With control for pre-determined confounders, we found that the latent change score was -0.63 lower for the nicotine group compared to the control group (i.e. the direct effect of nicotine on change in pain).

      If the reviewer is only discussing the effect of nicotine on pain, we do not believe that this effect ‘should be universal’. There is clear evidence that effects of nicotine on other measures can vary greatly across individuals (Ettinger et al., 2009; Falco & Bevins, 2015; Pomerleau et al., 1995). Our intention would not be to propose a universal effect but to understand how these variables may influence nicotine's impact on pain for individuals. Here we focus on the effects of nicotine on PAF and pain sensitivity, but attempted to control for the potential influence of these other confounding factors. Therefore, our statistical approach goes beyond mean values, incorporating variables like sex at birth, age, and depression to control for and explore potential modulating factors. Control for confounding factors is an important aspect of mediation analysis (Lederer et al., 2019; VanderWeele, 2019).

      Regarding the seemingly small effect size, we understand your concern. Indeed ‘tiny effects can become statistically significant if they are systematically in one direction’, which may be what we see in this analysis. We do not agree that the effect is ‘not interpretable’, rather that it should be interpreted in light of its small effect size (effect size being the beta coefficient in our analysis, rather than the mean group difference). We agree on the importance of considering practical significance alongside statistical significance and hope to conduct additional experiments and analyses in future to elucidate the contribution of each variable to the subtle and therefore not entirely conclusive overall effect you mention.

      Your feedback on this is valuable, and we have ensured a more detailed discussion in the revised manuscript on how these factors should be interpreted alongside some additional post-hoc analyses of confounding factors that were significant in our mediation, with the note that investigation of these interactions is exploratory. We had already discussed the potential contribution of sex on the effect of nicotine on PAF, with exploratory post-hoc analysis on this included in supplementary materials. In addition, we have now added an exploratory post-hoc analysis on the potential contribution of stress on the effect of nicotine on pain. This then shows the stratified effects by the covariates that our model suggest are influencing change in PAF and pain.

      Results edits:

      “There was also a significant effect of perceived stress at baseline on change in PHP ratings when controlling for group allocation and other confounding variables (b = -0.096, p = .048, bootstrapped 95% CI: [-0.19, -0.000047]), where higher perceived stress resulted in larger decreases in PHP ratings (see Supplementary Material 3.3 for post-hoc analysis of stress).”

      Supplementary material addition:

      “3.3 Exploratory analysis of the influence of perceived stress on the effects of nicotine on change in PHP ratings “

      “Due to the significant estimated effects of perceived stress on change in PHP ratings in the 2WLCS mediation model, we also explored post-hoc effects of stress on change in PHP ratings. We found that there is strong evidence for a negative correlation between stress and change in PHP rating within the nicotine group (n = 28, r = −0.39, BF10 = 13.65; Figure 3) that is not present in the placebo group, with equivocal evidence (n = 32, r = −0.14, BF10 = 0.46). This suggests that those with higher baseline stress who had nicotine gum experienced greater decreases in PHP ratings. Note that there was less, but still sufficient evidence for this relationship within the nicotine group when the participant who was a potential outlier for change in PHP rating was removed (n = 27, r = −0.32, BF10 = 1.45). “

      Author response image 2.

      Spearman correlations od baseline perceived stress with the change in phasic heat pain (PHP) ratings, suggest strong evidence for a negative relationship for the nicotine gum groupin orange (n=28; BF<sub>10</sub>=13.65) but not for the placebo group in grey (n=32; BF<sub>10</sub>=0.46). Regression lines and 95% confidence intervals.

      Discussion edits:

      “For example, in addition to the effect of nicotine on prolonged heat pain ratings, our results suggest an effect of stress on changes in heat pain ratings, with those self-reporting higher stress at baseline having greater reductions in pain. Our post-hoc analysis suggested that this relationship between higher stress and larger decrease in PHP ratings was only present for the nicotine group (Supplementary Material 3.3). As stress is linked to nicotine use [69,70] and pain [71–73], these interactions should be explored in future.”

      (7) Is the differential effect of nicotine vs. placebo based on the pre vs. post treatment effect of the placebo condition or on the pre vs. post effect of the nicotine treatment? Can the mediation model be adapted and run for each condition separately? The placebo condition seems to have a stronger effect and may have driven the result.

      Thank you for your comments. In our mediation analysis, the differential effect of nicotine vs. placebo is assessed as a comparison between the pre-post difference within each condition. A latent change score (i.e. pre-post) is calculated for each condition (nicotine and placebo), and then the effect of being in the nicotine group (dummy coded as 1) is compared to being in the placebo group (dummy coded as 0). The comparison between conditions is needed for this model (Valente & MacKinnon, 2017), as we are assessing the change in PAF and pain in the nicotine group compared to the change in the placebo group.

      However, to address your response, it is possible to simplify and assess the relationship between the change in peak alpha frequency (PAF) and change in pain within each gum group (nicotine and placebo) independently, without including the intervention as a factor. To do this, the mediation model can be simplified to regression analysis with latent change scores that focus purely on these relationships. The results of this can help to understand whether change in PAF influences change in pain within each group separately. As with the main analysis, we see no significant influence of change in PAF on change in pain while controlling for the same confounding variables within the nicotine group (Beta = -0.146 +/- 1.105, p = 0.895, 95% CI: -2.243, 2.429) or the placebo group (Beta = 0.730 +/- 2.061, p = 0.723, 95% CI: -4.177, 3.625).

      When suggesting that the “the placebo condition seems to have a stronger effect and may have driven the result”, we believe you are referring to the increase in mean PHP ratings within the placebo group from pre (5.51 +/- 2.53) to post-placebo gum (5.84 +/- 2.67). Indeed there was a significant increase in pain ratings pre to post chewing placebo gum (t(31) = -2.53, p = 0.0165, 95% CI: -0.603, -0.0653), that was not seen after chewing nicotine gum (t(27) = 0.237, p = 0.81, 95% CI: -0.358, 0.452). In lieu of a control where no gum was chewed (i.e. simply a second pain assessment ~30 minutes after the first), we assume the gum without nicotine is a good reference that controls for the effect of time plus expectation of chewing nicotine gum. With this in mind, as we describe in our results, the change in PHP ratings is reduced in the nicotine group compared to the placebo group. Note that this phrasing keeps the effect of placebo on pain as our reference from which to view the effect of nicotine on pain. However, you are correct that we need to ensure we emphasise that the change in pain in the PHP group is reduced in comparison to the change seen after placebo.

      We have not included these extra statistics in our revised manuscript, but hope that they aid the your understanding and interpretation of the included analyses and have highlighted these nuances in the discussion.

      “However, we note that the observed effect of nicotine on pain was small in magnitude, and most prominent in comparison to the effect of placebo, where pain ratings increased after chewing, which brings into question whether this reduction in pain is meaningful in practice.”

      (8) I would not dare to state that nicotine can function as an acute analgesic. Acute analgesics need to work for everyone. The average effect here is close to zero.

      In light of your feedback, we have refined our language to avoid a sweeping assertion of universal analgesic effects and emphasize individual variability. Nicotine's role as a coping strategy for pain is acknowledged in the literature (Robinson et al., 2022), with the meta-analysis by Ditre et al. (2016) discussing its potential as an acute analgesic in humans, along with some evidence from animal research (Zhang et al., 2020). Our revised discussion underscores the need for further exploration into factors influencing nicotine's potential impact on pain. We have also specified the short-term nature of nicotine use in this context to distinguish acute effects from potential opposing effects after long-term use (Zhang et al., 2020).

      “Short-term nicotine use is thought to have acute analgesic properties in experimental settings, with a review reporting that nicotine increased pain thresholds and pain tolerance [49]. In addition, research in a rat model suggests analgesic effects on mechanical thresholds after short-term nicotine use (Zhang et al., 2020). However, previous research has not assessed the acute effects of nicotine on prolonged experimental pain models. The present study found that 4 mg of nicotine reduced heat pain ratings during prolonged heat pain compared to placebo for our human participants, but that prolonged pressure pain decreased irrespective of which gum was chewed. Our findings are thus partly consistent with the idea that nicotine may have acute analgesic properties [49], although further research is required to explore factors that may influence nicotine’s potential impact on a variety of prolonged pain models. We further advance the literature by reporting this effect in a

      model of prolonged heat pain, which better approximates the experience of clinical pain than short lasting models used to assess thresholds and tolerance [50]. However, we note that the observed effect of nicotine on pain was small in magnitude, and most prominent in comparison to the effect of placebo, where pain ratings increased after chewing, which brings into question whether this reduction in pain is meaningful in practice. Future research should examine whether effects on pain increase in magnitude with different nicotine administration regimens (i.e. dose and frequency).”

      (9) Figures 2E and 2F are not particularly intuitive. Usually, the colour green in "jet" colour coding is being used for "zero" values. I would suggest to cut off the blue and use only the range between red green and red.

      We have chosen to retain the current colour scale for several reasons. In our analysis, green represents the middle of the frequency range (approx 10 Hz in this case), and if we were to use green as zero, it would effectively remove both blue and green from the plot, resulting in only red shades. Additionally, we have provided a clear colour scale for reference next to the plot, which allows readers to interpret the data accurately. Our intention is to maintain clarity and precision in representing the data, rather than conforming strictly to conventional practices in color coding.

      We believe that the current representation effectively conveys the results of our study while allowing readers to interpret the data within the context provided. Thank you again for your suggestion, and we hope you understand our reasoning in this matter.

      (10) Did the authors do their analysis on the parietal ROI or on the pre-registerred ROI?

      The analysis was conducted on the pre-registered sensorimotor ROI and on the global values. We have now also conducted the analysis with the regions suggested with the cluster based permutation analysis as requested by reviewer 2, comment 3.

      (11) Point 3.2 in the discussion. I would be very cautious to discuss smoking and chronic pain in the context of the manuscript. The authors can not provide any additional knowledge with their design targeting non-smokers, acute nicotine and experimental pain. The information might be interesting in the introduction in order to provide the reader with some context but is probably misleading in the discussion.

      We appreciate your perspective and agree with your caution regarding the discussion of smoking and chronic pain. While our study specifically targets non-smokers and focuses on acute nicotine effects in experimental pain, we understand the importance of contextual clarity. We have removed these points from the discussion to not mislead the reader.

      Previously we wrote, and have removed: “For those with chronic pain, smoking and nicotine use is reported as a coping strategy for pain [52]; abstinence can increase pain sensitivity [48,50], and pain is thus seen as a barrier to smoking cessation due to fear of worsening pain [51,52]. Therefore, continued understanding of the acute effects of nicotine on models of prolonged pain could improve understanding of the role of nicotine and smoking use in chronic pain [49,51,52].”

      (12) I very much appreciate section 3.3 of the discussion. I would not give up on PAF as a target to modulate pain. A modulation might not be possible in such a short period of experimental intervention. PAF might need longer and different interventions to gradually shift in order to attenuate the intensity of pain. As discussed by the authors themselves, I would also consider other targets for alpha analysis (as mentioned above not other electrodes or ROIs but separated sources.)

      Thank you for your comments on section 3.3. We appreciate your recognition of the potential significance of PAF as a target for pain modulation. Your insights align with our considerations that the experimental intervention duration or type might be a limiting factor in observing substantial shifts in PAF to attenuate pain intensity. We had mentioned the use of the exploratory electrode clusters in future work, but have now also mentioned that the use of ICA to identify separate ICA sources may provide an alternative approach. See responses to your previous ICA comment regarding separate sources.

      REFERENCES for responses to reviewer 2

      Chiang, A. K. I., Rennie, C. J., Robinson, P. A., Van Albada, S. J., & Kerr, C. C. (2011). Age trends and sex differences of alpha rhythms including split alpha peaks. Clinical Neurophysiology, 122(8), 1505-1517.

      Debnath, R., Buzzell, G. A., Morales, S., Bowers, M. E., Leach, S. C., & Fox, N. A. (2020). The Maryland analysis of developmental EEG (MADE) pipeline. Psychophysiology, 57(6), e13580.

      Ettinger, U., Williams, S. C., Patel, D., Michel, T. M., Nwaigwe, A., Caceres, A., ... & Kumari, V. (2009). Effects of acute nicotine on brain function in healthy smokers and non-smokers: estimation of inter-individual response heterogeneity. Neuroimage, 45(2), 549-561.

      Falco, A. M., & Bevins, R. A. (2015). Individual differences in the behavioral effects of nicotine: a review of the preclinical animal literature. Pharmacology Biochemistry and Behavior, 138, 80-90.

      Kayhan, E., Matthes, D., Haresign, I. M., Bánki, A., Michel, C., Langeloh, M., ... & Hoehl, S. (2022). DEEP: A dual EEG pipeline for developmental hyperscanning studies. Developmental cognitive neuroscience, 54, 101104.

      Lederer, D. J., Bell, S. C., Branson, R. D., Chalmers, J. D., Marshall, R., Maslove, D. M., ... & Vincent, J. L. (2019). Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals. Annals of the American Thoracic Society, 16(1), 22-28.

      Little TD. Longitudinal structural equation modeling. Guilford press; 2013.

      Pernet, C., Garrido, M., Gramfort, A., Maurits, N., Michel, C. M., Pang, E., ... & Puce, A. (2018). Best practices in data analysis and sharing in neuroimaging using MEEG.

      Pernet, C., Garrido, M. I., Gramfort, A., Maurits, N., Michel, C. M., Pang, E., ... & Puce, A. (2020). Issues and recommendations from the OHBM COBIDAS MEEG committee for reproducible EEG and MEG research. Nature neuroscience, 23(12), 1473-1483.

      Pomerleau, O. F. (1995). Individual differences in sensitivity to nicotine: implications for genetic research on nicotine dependence. Behavior genetics, 25(2), 161-177.

      Robinson, C. L., Kim, R. S., Li, M., Ruan, Q. Z., Surapaneni, S., Jones, M., ... & Southerland, W. (2022). The Impact of Smoking on the Development and Severity of Chronic Pain. Current Pain and Headache Reports, 26(8), 575-581.

      Xia, J., Mazaheri, A., Segaert, K., Salmon, D. P., Harvey, D., Shapiro, K., ... & Olichney, J. M. (2020). Event-related potential and EEG oscillatory predictors of verbal memory in mild cognitive impairment. Brain communications, 2(2), fcaa213.

      VanderWeele, T. J. (2019). Principles of confounder selection. European journal of epidemiology, 34, 211-219.

      Valente, M. J., & MacKinnon, D. P. (2017). Comparing models of change to estimate the mediated effect in the pretest–posttest control group design. Structural Equation Modeling: A Multidisciplinary Journal, 24(3), 428-450.

      Vimolratana, O., Aneksan, B., Siripornpanich, V., Hiengkaew, V., Prathum, T., Jeungprasopsuk, W., ... & Klomjai, W. (2024). Effects of anodal tDCS on resting state eeg power and motor function in acute stroke: a randomized controlled trial. Journal of NeuroEngineering and Rehabilitation, 21(1), 1-15.

      Zhang, Y., Yang, J., Sevilla, A., Weller, R., Wu, J., Su, C., ... & Candiotti, K. A. (2020). The mechanism of chronic nicotine exposure and nicotine withdrawal on pain perception in an animal model. Neuroscience letters, 715, 134627.

      Reviewer #3 (Recommendations For The Authors):

      Introduction

      (1) Rationale and link to chronic pain. I am not sure I agree with the statement "The ability to identify those at greater risk of developing chronic pain is limited". I believe there is an abundance of literature associating risk factors with the different instances of chronic pain (e.g., Mills et al., 2019). The fact that the authors cite studies involving potential neuroimaging biomarkers leads me to believe that they perhaps did not intend to make such a broad statement, or that they wanted to focus on individual prediction instead of population risk.

      We thank the reviewer for the thought put into this comment. We did indeed wish to refer to individual prediction, but also realise that the focus on predicting pain might not be the most appropriate opening for this manuscript. Therefore, we have adjusted the below sentence to refer to the need to identify modifiable factors rather than the need to predict pain.

      “Identifying modifiable factors that influence pain sensitivity could be a key step in reducing the presence and burden of chronic pain (van der Miesen et al., 2019; Davis et al., 2020; Tracey et al., 2021).”

      (2) The statement "Individual peak alpha frequency (PAF) is an electro-physiological brain measure that shows promise as a biomarker of pain sensitivity, and thus may prove useful for predicting chronic pain development" is a non sequitur. PAF may very well be a biomarker of pain sensitivity, but the best measures of pain sensitivity we have (selfreported pain intensity ratings) in general are not in themselves predictive of the development of chronic pain. Conversely, features that are not related to pain sensitivity could be useful for predicting chronic pain (e.g., Tanguay-Sabourin et al., 2023).

      We agree that it is essential to acknowledge that self-reported pain intensity ratings alone are not definitive predictors of chronic pain development. To align with this, we have revised the sentence, removing the second clause to avoid overstatement. The adjusted sentence now reads, "Individual peak alpha frequency (PAF) is an electrophysiological brain measure that shows promise as a biomarker of pain sensitivity."

      (3) Finally, some of the statements in the discussion comparing a tonic heat pain model with chronic neuropathic pain might be an overstatement. Whereas it is true that some of the descriptors are similar, the time courses and mechanisms are vastly different.

      We appreciate this comment, and agree that it is difficult to compare the heat pain model used to clinical neuropathic pain. This was an oversight and with further understanding we have removed this comment from the introduction and the discussion:

      “In parallel, we saw no indication of a relationship between PAF and pain ratings during CPA. The introduction of the CPA model, specifically calibrated to a moderate pain threshold, provides further support for the notion that the relationship between PAF and pain is specific to certain pain types [17,28]. Prolonged heat pain was pre-dominantly described as moderate/severe shooting, sharp, and hot pain, whereas prolonged pressure pain was predominantly described as mild/moderate throbbing, cramping, and aching in the present study. It is possible that the PAF–pain relationship is specific to particular pain models and protocols [12,17].”

      Methodology

      (4) or the benefit of good science. However, I am compelled to highlight that I could not access the preregistered files, even though I waited for almost two weeks after requesting permission to do so. This was a problem on two levels: the main one is that I could not check the hypothesized effect sizes of the sample size estimation, which are not only central to my review, and in general negate all the benefits that should go with preregistration (i.e., avoiding phacking, publication bias, data dredging, HARKing, etc.). The second one is that I had to provide an email address to request access. This allows the authors to potentially identify the reviewers. Whereas I have no issues with this and I support transparent peer review practices (https://elifesciences.org/inside-elife/e3e90410/increasingtransparency-in-elife-s-review-process), I also note that this might condition other reviewers.

      We apologise for this. We had not realised that the pre-registration was under embargo, but we have now made it available.

      Interpretation of results

      (5)To be perfectly clear, I trust the results of this study more than some of the cited studies regarding nicotine and pain because it was preregistered, the sample size is considerably larger, and it seems carefully controlled. I just do not agree with the interpretation of the results, stated in the first paragraph of the Discussion. Quoting J. Cohen, "The primary product of a research inquiry is one or more measures of effect size, not P values" (Cohen, 1990). As I am sure the authors are aware of, even tiny differences between conditions, treatments or groups will eventually be statistically significant given arbitrarily large sample sizes. What really matters then is the magnitude of these differences. In general, the authors hypothesize on why there were no differences on the pressure pain model, and why decreases in heat pain were not mediated by PAF, but do not seem to consider the possibility that the intervention just did not cause the intended effect on the nociceptive system, which would be a much more straightforward explanations for all observations.

      While acknowledging and agreeing with the concern that 'even tiny differences between conditions, treatments, or groups will eventually be statistically significant given arbitrarily large sample sizes,' it's crucial to clarify that our sample size of N=62 does not fall into the category of arbitrarily large. We carefully considered the observed outcomes in the pressure pain model and the lack of PAF mediation in heat pain, as dictated by our statistical approach and the obtained results.

      The suggestion of a straightforward explanation aligning with the intervention not causing the intended effect on the nociceptive system is a valid consideration. We did contemplate the possibility of a false positive, emphasising this in the limitations of our findings and the need for replication to draw stronger conclusions to follow up this initial study.

      (6) In this regard, I do not believe that an average *increase* of 0.05 / 10 (Nicotine post - pre) can be considered a "reduction of pain ratings", regardless of the contrast with placebo (average increase of 0.24 / 10). This tiny effect size is more relevant in the context of the considerable inter-individual variation, in which subjects scored the same heat pain model anywhere from 1 to 10, and the same pressure pain model anywhere from 1 to 8.5. In this regard, the minimum clinically or experimentally important differences (MID) in pain ratings varies from study to study and across painful conditions but is rarely below 1 / 10 in a VAS or NRS scale, see f. ex. (Olsen et al., 2017). It is not my intention to question whether nicotine can function as an acute analgesic in general (as stated in the Discussion), but instead, if it worked as such under these very specific experimental conditions. I also acknowledge that the authors note this issue in two lines in the Discussion, but I believe that this is not weighed properly.

      We appreciate your perspective on the interpretation of the effect size, and we understand the importance of considering it in the context of individual variation.

      As also discussed in response to comment 6 From reviewer 2, we recognize the concern about interpreting the effect of nicotine on prolonged pain solely based on mean results, and in fact wish to discourage this approach. It's crucial to note that both PAF and pain are highly individual measures (i.e. high inter-individual variance), necessitating the use of random intercepts for participants in our analyses to acknowledge the inherent variability at baseline across participants. Including random intercepts rather than only considering the means helps address the heterogeneity in baseline levels among participants. We also recognise that displaying the mean PHP ratings for all participants in Table 2 could be misleading, firstly because these means do not have weight in an analysis that takes into account a random-effects intercept for participants, and secondly because two participants (one from each group) did not have post-gum PHP assessments and were not included in the mediation analysis due to list-wise deletion of missing data. Therefore, to reduce the potential for misinterpretation, we have added extra detail to display both the full sample and CPA mediation analysis (i.e. N=62) and the data used for PHP mediation analysis (i.e. n=60) in Table 2. We hope that the extra details added to this table will help the readers interpretation of results.

      Moreover, we have made sure refer to the comparison with the placebo group when discussing the reduction or decrease in pain seen in the nicotine group, for example:

      “2) nicotine reduced prolonged heat pain intensity but not prolonged pressure pain intensity compared to placebo gum;”

      “The nicotine group had a decrease in heat pain ratings compared to the placebo group and increased PAF speed across the scalp from pre to post-gum, driven by changes at central-parietal and right-frontal regions.”

      We have kept our original comment of whether this effect on pain is meaningful in practice to refer to the minimum clinically or experimentally important differences in pain ratings as highlighted by Olsen et al., 2017.

      “While acknowledging the modest effect size, it’s essential to consider the broader context of our study’s focus. Assessing the clinical relevance of pain reduction is pertinent in applications involving the use of any intervention for pain management [69]. However, from a mechanistic standpoint, particularly in understanding the implications of and relation to PAF, the specific magnitude of the pain effect becomes less pivotal. Nevertheless, future research should examine whether effects on pain increase in magnitude with different nicotine administration regimens (i.e. dose and frequency).”

      (7) In line with the topic of effect sizes, average effect sizes for PAF in the study cited in the manuscript range from around 1 Hz (Boord et al., 2008; Wydenkeller et al., 2009; Lim et al., 2016), to 2 Hz (Foulds et al., 1994), compared with changes of 0.06 Hz (Nicotine post - pre) or -0.01 Hz (Placebo post - pre). MIDs are not so clearly established for peak frequencies in EEG bands, but they should be certainly larger than some fractions of a Hertz (which is considerably below the reliability of the measurement).

      We appreciate your care of these nuances. We acknowledge the differences in effect sizes between our study and those referenced in the manuscript. Given the current state of the literature, it's noteworthy that ‘MIDs’ for peak frequencies in EEG bands, particularly PAF changes, are not clearly established, other than a recent publication suggesting that even small changes in PAF are reliable and meaningful (Furman et al., 2021). In light of this, we have addressed the uncertainty around the existence and determination of MIDs in our revision, highlighting the need for further research in this area.

      In addition, our study employed a greater frequency resolution (0.2 Hz) compared to some of the referenced studies, with approximately 0.5 Hz resolution (Boord et al., 2008; Wydenkeller et al., 2009; Foulds et al., 1994). This improved resolution allows for a more precise measurement of changes in PAF. Considering this, it is plausible that studies with lower resolution might have conflated increases in PAF, and our higher resolution contributes to a more accurate representation of the observed changes.

      We have also incorporated this insight into the manuscript, emphasising the methodological advancements in our study and their potential impact on the interpretation of PAF changes. Thank you for your thoughtful feedback.

      “The ability to detect changes in PAF can be considerably impacted by the frequency resolution used during Fourier Transformations, an element that is overlooked in recent methodological studies on PAF calculation [16,95]. Changes in PAF within individuals might be obscured or conflated by lower frequency resolutions, which should be considered further in future research.”

      (8) The authors also ran alternative statistical models to analyze the data and did not find consistent results in terms of PHP ratings (PAF modulation was still statistically significantly different). The authors attribute this to the necessity of controlling for covariates. Now, considering the effects sizes, aren't these statistically significant differences just artifacts stemming from the inclusion of too many covariates (Simmons et al., 2011)? How much influence should be attributable to depression and anxiety symptoms, stress, sleep quality and past pain, considering that these are healthy volunteers? Should these contrasting differences call the authors to question the robustness of the findings (i.e., whether the same data subjected to different analysis provides the same results), particularly when the results do not align with the preregistered hypothesis (PAF modulation should occur on sensorimotor ROIs)?

      Thank you for your comments on our alternative statistical models. By including these covariates, we aim to provide a more nuanced understanding of the complexities within our data by considering their potential impact on the effects of interest. The decision to include covariates was preregistered (apologies again that this was not available) and made with consideration of balancing model complexity and avoiding potential confounding. Moreover, we hope that the insights gained from these analyses will offer valuable information about the behaviour of our data and aid future research in terms of power calculations, expected variance, and study design.

      (9) Beyond that, I believe in some cases that the authors overreach in an attempt to provide explanations for their results. While I agree that sex might be a relevant covariate, I cannot say whether the authors are confirming a pre-registered hypothesis regarding the gender-specific correlation of PAF and pain, or if this is just a post hoc subgroup analysis. Given the large number of analyses performed (considering the main document and the supplementary files), caution should be exercised on the selective interpretation of those that align with the researchers' hypotheses.

      We chose to explore the influence of sex on the correlation between PAF and pain, because this has also been investigated in previous publications of the relationship (Furman et al., 2020).  We state that the assessment by sex is exploratory in our results on p.17: “in an exploratory analysis of separate correlations in males and females (Figure 5, plot C)”. For clarity regarding whether this was a pre-registered exploration or not, we have adjusted this to be: “in an exploratory analysis (not pre-registered) of separate correlations in males and females (Figure 5, plot C), akin to those conducted in previous research on this topic (Furman et al., 2020),

      We have made sure to state this in the discussion also. Therefore, when we previously said on p.22:

      “Regarding the relationship between PAF and pain at baseline, the negative correlation between PAF and pain seen in previous work [7–11,15] was only observed here for male participants during the PHP model for global PAF.” We have now changed this to: “Regarding the relationship between PAF and pain at baseline, the negative correlation between PAF and pain seen in previous work [7– 11,15] was only observed here for male participants during the PHP model for global PAF in an exploratory analysis.”

      Please also note that we altered the colour and shape of points on the correlation plot (Figure 5 in initial submission), the male brown was changed to a dark brown as we realised that the light brown colour was difficult to read. The shape was then changed for male points so that the two groups can be distinguished in grey-scale.

      Overall, your thoughtful feedback is instrumental in refining the interpretation of our findings, and we look forward to presenting a more comprehensive and nuanced discussion. Thank you for your comments.

      REFERENCES for responses to reviewer 3

      Arendt-Nielsen, L., & Yarnitsky, D. (2009). Experimental and clinical applications of quantitative sensory testing applied to skin, muscles and viscera. The Journal of Pain, 10(6), 556-572.

      Chowdhury, N. S., Skippen, P., Si, E., Chiang, A. K., Millard, S. K., Furman, A. J., ... & Seminowicz, D. A. (2023). The reliability of two prospective cortical biomarkers for pain: EEG peak alpha frequency and TMS corticomotor excitability. Journal of Neuroscience Methods, 385, 109766.

      Fishbain, D. A., Lewis, J. E., & Gao, J. (2013). Is There Significant Correlation between SelfReported Low Back Pain Visual Analogue Scores and Low Back Pain Scores Determined by Pressure Pain Induction Matching?. Pain practice, 13(5), 358-363.

      Furman, A. J., Prokhorenko, M., Keaser, M. L., Zhang, J., Chen, S., Mazaheri, A., & Seminowicz, D. A. (2021). Prolonged pain reliably slows peak alpha frequency by reducing fast alpha power.

      bioRxiv, 2021-07.

      Heitmann, H., Ávila, C. G., Nickel, M. M., Dinh, S. T., May, E. S., Tiemann, L., ... & Ploner, M. (2022). Longitudinal resting-state electroencephalography in patients with chronic pain undergoing interdisciplinary multimodal pain therapy. Pain, 163(9), e997.

      McLain, N. J., Yani, M. S., & Kutch, J. J. (2022). Analytic consistency and neural correlates of peak alpha frequency in the study of pain. Journal of neuroscience methods, 368, 109460.

      Ngernyam, N., Jensen, M. P., Arayawichanon, P., Auvichayapat, N., Tiamkao, S., Janjarasjitt, S., ... & Auvichayapat, P. (2015). The effects of transcranial direct current stimulation in patients with neuropathic pain from spinal cord injury. Clinical Neurophysiology, 126(2), 382-390.

      Parker, T., Huang, Y., Raghu, A. L., FitzGerald, J., Aziz, T. Z., & Green, A. L. (2021). Supraspinal effects of dorsal root ganglion stimulation in chronic pain patients. Neuromodulation: Technology at the Neural Interface, 24(4), 646-654.

      Petersen-Felix, S., & Arendt-Nielsen, L. (2002). From pain research to pain treatment: the role of human experimental pain models. Best Practice & Research Clinical Anaesthesiology, 16(4), 667680.

      Sarnthein, J., Stern, J., Aufenberg, C., Rousson, V., & Jeanmonod, D. (2006). Increased EEG power and slowed dominant frequency in patients with neurogenic pain. Brain, 129(1), 55-64.

      Sato, G., Osumi, M., & Morioka, S. (2017). Effects of wheelchair propulsion on neuropathic pain and resting electroencephalography after spinal cord injury. Journal of Rehabilitation Medicine, 49(2), 136-143.

      Sufianov, A. A., Shapkin, A. G., Sufianova, G. Z., Elishev, V. G., Barashin, D. A., Berdichevskii, V. B., & Churkin, S. V. (2014). Functional and metabolic changes in the brain in neuropathic pain syndrome against the background of chronic epidural electrostimulation of the spinal cord. Bulletin of experimental biology and medicine, 157(4), 462-465.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study probes the role of the NF-κB inhibitor IκBa in the regulation of pluripotency in mouse embyronic stem cells (mESCs). It follows from previous work that identified a chromatin-specific role for IκBa in the regulation of tissue stem cell differentiation. The work presented here shows that a fraction of IκBa specifically associates with chromatin in pluripotent stem cells. Using three Nfkbia-knockout lines, the authors show that IκBa ablation impairs the exit from pluripotency, with embryonic bodies (an in vitro model of mESC multi-lineage differentiation) still expressing high levels of pluripotency markers after sustained exposure to differentiation signals. The maintenance of aberrant pluripotency gene expression under differentiation conditions is accompanied by pluripotency-associated epigenetic profiles of DNA methylation and histone marks. Using elegant separation of function mutants identified in a separate study, the authors generate versions of IκBa that are either impaired in histone/chromatin binding or NF-κB binding. They show that the provision of the WT IκBa, or the NF-κB-binding mutant can rescue the changes in gene expression driven by loss of IκBa, but the chromatin-binding mutant can not. Thus the study identifies a chromatin-specific, NF-κB-independent role of IκBa as a regulator of exit from pluripotency.

      Strengths:

      The strengths of the manuscript lie in: (a) the use of several orthogonal assays to support the conclusions on the effects of exit from pluripotency; (b) the use of three independent clonal Nfkbia-KO mESC lines (lacking IκBa), which increase confidence in the conclusions; and (c) the use of separation of function mutants to determine the relative contributions of the chromatin-associated and NF-κB-associated IκBa, which would otherwise be very difficult to unpick.

      Weaknesses:

      In this reviewer's view, the term "differentiation" is used inappropriately in this manuscript. The data showing aberrant expression of pluripotency markers during embryoid body formation are supported by several lines of evidence and are convincing. However, the authors call the phenotype of Nfkbia-KO cells a "differentiation impairment" while the data on differentiation markers are not shown (beyond the fact that H3K4me1, marking poised enhancers, is reduced in genes underlying GO processes associated with differentiation and organ development). Data on differentiation marker expression from the transcriptomic and embryoid body immunofluorescent experiments, for example, should be at hand without the need to conduct many more experiments and would help to support the conclusions of the study or make them more specific. The lack of probing the differentiation versus pluripotency genes may be a missed opportunity in gaining in-depth understanding of the phenotype associated with loss of the chromatin-associated function of IκBa.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the role of IκBα in regulating mouse embryonic stem cell (ESC) pluripotency and differentiation. The authors demonstrate that IκBα knockout impairs the exit from the naïve pluripotent state during embryoid body differentiation. Through mechanistic studies using various mutants, they show that IκBα regulates ESC differentiation through chromatin-related functions, independent of the canonical NFκB pathway.

      Strengths:

      The authors nicely investigate the role of IκBα in pluripotency exit, using embryoid body formation and complementing the phenotypic analysis with a number of genome-wide approaches, including transcriptomic, histone marks deposition, and DNA methylation analyses. Moreover, they generate a first-of-its-kind mutant set that allows them to uncouple IκBα's function in chromatin regulation versus its NF-κB-related functions. This work contributes to our understanding of cellular plasticity and development, potentially interesting a broad audience including developmental biologists, chromatin biology researchers, and cell signaling experts.

      Weaknesses:

      - The study's main limitation is the lack of crucial controls using bona fide naïve cells across key experiments, including DNA methylation analysis, gene expression profiling in embryoid bodies, and histone mark deposition. This omission makes it difficult to evaluate whether the observed changes in IκBα-KO cells truly reflect naïve pluripotency characteristics.

      - Several conclusions in the manuscript require a more measured interpretation. The authors should revise their statements regarding the strength of the pluripotency exit block, the extent of hypomethylation, and the global nature of chromatin changes. - From a methodological perspective, the manuscript would benefit from additional orthogonal approaches to strengthen the knockout findings, which may be influenced by clonal expansion of ES cells.

      Overall, this study makes an important contribution to the field. However, the concerns raised regarding controls, data interpretation, and methodology should be addressed to strengthen the manuscript and support the authors' conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have the following comments and suggestions for the authors to consider:

      (1) Fig, 1D: the number of replicates for this experiment is not mentioned. It would be good to see if the apparent accumulation of IκBa on chromatin of S/L cells is reproducible. If it is, does the accumulation of IκBa "prime" chromatin for differentiation?

      We apologize for missing this information in the figure legend. We have repeated the experiment two independent times, and confirmed the localization of IκBα in the chromatin fraction of mESCs cultured in Serum/LIF (S/L). We have included the information in the figure legend.

      Regarding the second question, we do believe that the presence of IκBα primes mESCs to exit from differentiation. Previous data from the lab (Mulero et al Cancer Cell 2012; Marruecos et al EMBO Reports 2020) demonstrated that IκBα regulates important developmental genes (Hox genes and differentiation-related genes), which become dysregulated upon IκBα depletion. Based on those previous results, together with our results that demonstrated that lack of IκBα hyperactivates the pluripotency network, we conclude that IκBα is a crucial element to attenuate pluripotency programs, allowing a successful exit from naïve pluripotency and differentiation.

      (2) Fig. 1E: From what is shown, Rela doesn't agree (i.e. no enrichment in EpiSCs in the Atlasi data). Are the culture conditions in Atlasi 2020 the same as in this paper (base medium etc.)? Also, why not label all genes/proteins that are shown in 1C?

      Differences observed between our data and the in-silico data might be due to differences in culture conditions used in Atlasi and colleagues. In particular, Atlasi et al. cultured the mESCs in 2i/LIF for 2 consecutive months, whereas we induced ground state of naïve pluripotency (2i/LIF) for only 96h. In the case of EpiSC differentiation, similar protocols are used in both our work and in Atlasi et al. Nevertheless, despite existing differences, in both studies IκBα is enriched in the ground state of naive pluripotency. 

      The reason why some proteins that are missing in Figure 1E but appearing in Figure 1C is because they are not detected in the mass spectrometry experiment.

      (3) Fig. 1F: The word "clustering" here is misleading. While Nfkbia shows similar dynamics as pluripotency genes, clustering should not be used unless clusters of genes are shown in the same heatmap (and the transcripts naturally cluster together). The figure would be even more informative if all the genes from the 4 different categories were presented on the same heatmap.

      As suggested by the reviewer, we have generated a heatmap where the  genes from the different four categories (Figure 1F) are displayed  and clustered together:

      Author response image 1.

      Heatmap including all the genes from Figure 1F of the manuscript and clustering is simultaneously conducted over the four categories.

      As shown in previous heatmap, we can confirm that most of the Nf-kB genes (except for Nfkbia and Nfkbid) clustered together with differentiation markers.   

      Nonetheless, to be more conservative with original Figure 1F and for clarity upon gene categories,  we have updated the figure  with a combined heatmap, sliced by gene categories.  In this updated version, we can observe how IkBα gene, though classified by the biological process where it classically belongs (NF-kB pathway), is higher at pluripotency, whereas it decreases upon differentiation induction, similarly as most of the pluripotency genes.

      We have also changed the text accordingly and have added the following sentences in the main text (lines 121-125): “The expression pattern of Nfkbia was similar to the pluripotency genes whereas most of the NF-κB genes were upregulated upon differentiation, showing an analogous expression dynamics as developmental genes, as previously described”.

      (4) This reviewer felt that the statement "Notably, several polycomb elements were highly expressed in mESCs, consistent with the possibility that chromatin-bound IκBα modulates PRC2 activity in the pluripotent state" (p.5, lines 125-127) is premature here. While similar expression dynamics may be consistent with a linked function, they in no way suggest this. This can be more accurately stated to point out that Nfkbia shows similar expression dynamics in pluripotency and differentiation as Polycomb component      genes.

      We agree that the statement is premature and we have changed it by: “Previous reports have demonstrated that chromatin-bound IκBα modulates PRC2 activity in different adult stem cell models [27]. Interestingly, we observed that most of the Polycomb target genes follow a similar expression pattern of Nfkbia and pluripotency, with higher expression in mESCs (Figure 1F).” (lines 125-128 in the manucript).

      (5) Top of p. 6: the results are mis-attributed to Fig. 1, it should be Fig. 2.

      We thank the reviewer for this observation. We have corrected it in the main text.

      (6) Fig. 1B and Fig. 5I: the images of the AP stains are very difficult to see, better resolution images should be used.

      We have increased both the resolution and the size of the AP colonies.

      (7) Line 142 (p.6): Fig. S1B should be S1C. In general the manuscript would benefit from review of the order and labeling of the figure panels as there are a number of inconsistencies.

      We have better organized the figures in the new version of the manuscript. In particular, we have reorganized the Figure S1 to have a more logical order. We have done the same for the Figure 2 and Figure 5 and they are updated in the new version of the reviewed manuscript.

      (8) The authors call the phenotype of Nfkbia-KO cells a "differentiation impairment". Do the EBs shown in Fig. 2 also express differentiation markers? Do they fail to up-regulate those markers or just fail to down-regulate pluripotency markers? At the transcriptomic level the Nfkbia-KO cells still change significantly upon provision of differentiation signals (Fig. 2C), what types of gene processes underlie the differences between WT and KO cells and which processes are common? Also, based on this figure, the phenotype looks to be more of a delay than a failure in differentiation, as the cells still follow the same trajectory but lag behind the WT cells. It is difficult to discern whether this is the case based on Fig. 2E-G as we don't see the later time point (up to Day 9).

      In general, with the data presented in Fig. 2C and Fig. S1, the authors show that many of the hallmarks of exit from pluripotency are impaired in Nfkbia-KO cells, as well as the general "transcriptional status" of the cells, but they don't show differentiation markers (which would be necessary to conclude an impairment in differentiation). The data should be readily available in the datasets that are in the manuscript already and it will be informative to extract and present them. The data are not currently publicly accessible (unavailable until July 2025) so it was not possible to mine them.

      We appreciate the observation, and we have included more data to confirm that the IκBα-KO cells show a differentiation impairment. In the first version of the manuscript, differentiation markers are displayed from Figures 2E-G, where genes from the three germ layers (ectoderm, mesoderm and endoderm) are not activated in IκBα-KO EBs at 48h and 96h. Moreover, the volcano plot displayed in Figure S1F of the first version clearly shows a downregulation of important differentiation genes such as a T, Eomes, Lhx1 and Foxa2. We agree that 96h EBs is an early time point to talk about differentiation impairment. For that reason, we have also included the same pluripotent and differentiation genes in 216h EBs (Figures S1F-G of the newer version of the manuscript). It is clearly observed that IκBα-KO 216h EBs maintain an upregulation of pluripotency programs which negatively correlate with a lower differentiation capability. Moreover, the impairment in the differentiation with a higher expression of pluripotency markers is confirmed by the presence of high SSEA-1 expression in IκBα-KO 216h EBs (Figure S1C of the manuscript) and alkaline phosphatase (AP) staining (Figure 2C of the manuscript). Lastly, the fact that IκBα-KO teratomas contain higher proportion of OCT3/4+ cells further confirming that IκBα-KO cells cannot differentiate because of the inability to exit from pluripotency.

      Finally, generated data (and deposited in GEO repository with SuperSeries id GSE239565) is already publicly available. 

      (9) Fig. 5A: even if there are no global changes in NF-κB target genes, could a small subset of NF-κB target genes still mediate the IκBa effects?

      We have analyzed the whole NF-κB signature, and we have identified a small cluster of genes that are differentially expressed at 96h EBs between IκBα-KO and IκBα-WT (Author response image 2). Interestingly, what we observed is the opposite as expected since we see un downregulation of that subset in the IκBα-KO 96h EBs (Author response image 3). For that reason, detected changes in the NF-κB target gene expression after deletion of Nfkbia do not support an NF-κB inhibitory role for IkBa in pluripotent ESC.

      Author response image 2.

      Heatmap of NF-κB genes expression at the different time points of differentiation (mESCs, 48h EBs, 96h EBs). Highlighted region marks the genes that are differentially expressed between both genotypes at 96h EBs.

       

      Author response image 3.

      Violin plot of genes from the NF-κB pathway which are differentially expressed at 96h EBs.

      (10) Lines 233-238, the part of the text is repeated.

      We appreciate the observation and have deleted the repeated part.

      (11) The data in Fig. 5D-E make it difficult to be sure whether the conclusions on the relative subcellular localisations of the different mutants are accurate, as the chromatin-binding mutant seems to be less abundant than the other mutants (judging from the Input in Fig. 5C and also from the tubulin loading controls in Fig. 5D-E). Showing the IκBa levels in total extracts would make the interpretation of these data more robust. The authors do mention that the chromatin-binding mutant IκBa protein is consistently expressed at lower levels but they do not comment on how this may affect the data interpretation - could the lack of rescue be due to lower levels of the chromatin-binding mutant IκBa relative to the wild-type IκBa? This should be addressed in the Discussion, if not tested formally by normalising the expression levels of the different forms of IκBa in the rescue experiments.

      Although protein stability is different among the SOF mutants, IκBα<sup>ΔChromatin</sup> is exclusively detected in the cytoplasm, with lack of detection in the chromatin compartment (Figures 5D-E of the reviewed manuscript). For this reason, we believe that the quantitative differences in protein levels of the different mutants cannot explain the subcellular localization differences and the phenotype observed.

      Nonetheless, we cannot discard that differences in the protein levels between SOF mutants can affect the rescue phenotype, and we have specified so in the discussion section of the manuscript. 

      (12) Lines 260-261: "Induction of i-IκBαWT and i-IκBαΔNF-κB reduced the expression levels of the naive pluripotent genes Zfp42, Klf2, Sox2 and Tbx3, which were increased by i-IκBαΔChromatin (Figure 5F)." This is not an accurate statement. The expression was not reduced by the ΔChrom mutant in the same way as it was by the WT and the ΔNF-κB mutant, but it was not increased.

      We have better specified the description of the results displayed in Figure 5F (lines 258-261 of the main manuscript):

      “Induction of i-IκBα<sup>WT</sup> and i-IκBα<sup>ΔNF-κB</sup> reduced the expression levels of the naïve pluripotent genes Zfp42, Klf2, Sox2 and Tbx3. On the other hand, the same genes either do not change their expression (Zfp42, Sox2, Klf2) or increase their levels (Tbx3) upon i-IκBα<sup>ΔChromatin</sup>  induction (Figure 5F).”

      (13) In Fig. 5J the images will ideally be shown before and after Doxycycline treatment, to better support the conclusions.

      We have included a new panel in Figure S4 (Figure S4E in the reviewed manuscript) where the No doxycycline control 216 EBs between the different conditions (i-IκBα<sup>WT</sup>, i-IκBα<sup>ΔChrom</sup> and i-IκBα<sup>ΔNF-κB</sup>) are included.

      Reviewer #2 (Recommendations for the authors):

      - The PCA analysis in Figure 2 appears to contradict the authors' conclusions about global transcriptome changes in KO cells. Furthermore, there is a discrepancy between immunofluorescence data showing near-complete methylation loss and the methylation array analysis results.

      Although there is a differentiation block in the IkBa KO EBs, this is not complete and they show some differentiation trend after 96h (Fig 2C), moreover, acquisition of differentiation genes from all three germ layers is strongly affected (Figure 2E of the reviewed manuscript) and these programs remain downregulated and pluripotency genes are still expressed in IκBα-KO EBs at later time points (216h) (Fig 2B). Altogether demonstrates that the lack of IκBα impairs differentiation and the silencing of the pluripotency network.

      Discrepancies between methylation array and immunofluorescence are expected since immunofluorescence is not quantitative and the methylation array is very precise.  

      - The authors should revise their statements regarding the strength of the pluripotency exit block, the extent of hypomethylation, and the global nature of chromatin changes. For example, the observed chromatin changes, including H3K27ac modifications, appear relatively modest and should be described as such. - The manuscript would benefit from additional orthogonal approaches to strengthen the knockout findings, which may be influenced by clonal expansion of ES cells. Additionally, the emphasis on overlapping H3K4me3 and H3K27me3 regions should be reduced, as these represent a minor fraction of the affected regions (only 41 regions).

      We have revised the text and have included it in the discussion section the following text (lines 327-331 in the reviewed manuscript):

      “Although IκBα KO  mESCs  exhibit a transcriptional phenotype and hypomethylation state  that resembles the ground state of naïve pluripotency, there are only modest changes on histone marks associated to enhancers (H3K27Ac) or gene regulation (H3K4me3 and H3K27me3). Altogether indicates that further experiments are required to fully elucidate the effect of chromatin IκBα.”

      We have also included Fig S3E-S3F to show that similar differences as WT and KO in H3K4me3 and H3K27me3 are observed in a serum/LIF and 2i conditions, further supporting the fact that KO cells in Serum/LIF resemble WT cells in 2i condition.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      In an important fMRI study with an elegant experimental design and rigorous cross-decoding analyses, this work shows a solid dissociation between two parietal regions in visually processing actions. Specifically, aIPL is found to be sensitive to the causal effects of observed actions, while SPL is sensitive to the patterns of body motion involved in those actions. Additional analysis and explanation would help to determine the strength of evidence and the mechanistic underpinnings would benefit from closer consideration. Nevertheless, the work will be of broad interest to cognitive neuroscientists, particularly vision and action researchers.

      We thank the editor and the reviewers for their assessment and their excellent comments and suggestions. We really believe they helped us to provide a stronger and more nuanced paper. In our revision, we addressed all points raised by the reviewers. Most importantly, we added a new section on a series of analyses to characterize in more detail the representations isolated by the action-animation and action-PLD cross-decoding. Together, these analyses strengthen the conclusion that aIPL and LOTC represent action effect structures at a categorical rather than specific level, that is, the type of change (e.g., of location or configuration) rather than the specific effect type (e.g. division, compression). SPL is sensitive to body-specific representations, specifically manuality (unimanual vs. bimanual) and movement kinematics. We also added several other analyses and addressed each point of the reviewers. Please find our responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors report a study aimed at understanding the brain's representations of viewed actions, with a particular aim to distinguish regions that encode observed body movements, from those that encode the effects of actions on objects. They adopt a cross-decoding multivariate fMRI approach, scanning adult observers who viewed full-cue actions, pantomimes of those actions, minimal skeletal depictions of those actions, and abstract animations that captured analogous effects to those actions. Decoding across different pairs of these actions allowed the authors to pull out the contributions of different action features in a given region's representation. The main hypothesis, which was largely confirmed, was that the superior parietal lobe (SPL) more strongly encodes movements of the body, whereas the anterior inferior parietal lobe (aIPL) codes for action effects of outcomes. Specifically, region of interest analyses showed dissociations in the successful cross-decoding of action category across full-cue and skeletal or abstract depictions. Their analyses also highlight the importance of the lateral occipito-temporal cortex (LOTC) in coding action effects. They also find some preliminary evidence about the organisation of action kinds in the regions examined.

      Strengths:

      The paper is well-written, and it addresses a topic of emerging interest where social vision and intuitive physics intersect. The use of cross-decoding to examine actions and their effects across four different stimulus formats is a strength of the study. Likewise, the a priori identification of regions of interest (supplemented by additional full-brain analyses) is a strength.

      Weaknesses:

      I found that the main limitation of the article was in the underpinning theoretical reasoning. The authors appeal to the idea of "action effect structures (AES)", as an abstract representation of the consequences of an action that does not specify (as I understand it) the exact means by which that effect is caused, nor the specific objects involved. This concept has some face validity, but it is not developed very fully in the paper, rather simply asserted. The authors make the claim that "The identification of action effect structure representations in aIPL has implications for theories of action understanding" but it would have been nice to hear more about what those theoretical implications are. More generally, I was not very clear on the direction of the claim here. Is there independent evidence for AES (if so, what is it?) and this study tests the following prediction, that AES should be associated with a specific brain region that does not also code other action properties such as body movements? Or, is the idea that this finding -- that there is a brain region that is sensitive to outcomes more than movements -- is the key new evidence for AES?

      Thank you for raising this important issue. We reasoned that AES should exist to support the recognition of perceptually variable actions, including those that we have never experienced before. To the best of our knowledge, there is only indirect evidence for the existence of AES, namely that humans effortlessly and automatically recognize actions (and underlying intentions and feelings) in movements of abstract shapes, as in the famous Heider and Simmel (1949) animations. As these animations do not contain any body posture or movement information at all, the only available cues are the spatiotemporal relations between entities and entity parts in the perceived scene. We think that the effortless and automatic attribution of actions to these stimuli points toward an evolutionary optimized mechanism to capture action effect structures from highly variable action instantiations (so general that it even works for abstract animations). Our study thus aimed to test for the existence of such a level of representation in the brain. We clarified this point in the introduction.

      In our revised manuscript, we also revised our discussion of the implications of the finding of AES representations in the brain:

      "The identification of action effect structure representations in aIPL and LOTC has implications for theories of action understanding: Current theories (see for review e.g. Zentgraf et al., 2011; Kemmerer, 2021; Lingnau and Downing, 2024) largely ignore the fact that the recognition of many goal-directed actions requires a physical analysis of the action-induced effect, that is, a state change of the action target. Moreover, premotor and inferior parietal cortex are usually associated with motor- or body-related processing during action observation. Our results, together with the finding that premotor and inferior parietal cortex are similarly sensitive to actions and inanimate object events (Karakose-Akbiyik et al., 2023), suggest that large parts of the 'action observation network' are less specific for body-related processing in action perception than usually thought. Rather, this network might provide a substrate for the physical analysis and predictive simulation of dynamic events in general (Schubotz, 2007; Fischer, 2024). In addition, our finding that the (body-independent) representation of action effects substantially draws on right LOTC contradicts strong formulations of a 'social perception' pathway in LOTC that is selectively tuned to the processing of moving faces and bodies (Pitcher and Ungerleider, 2021). The finding of action effect representation in right LOTC/pSTS might also offer a novel interpretation of a right pSTS subregion thought to specialized for social interaction recognition: Right pSTS shows increased activation for the observation of contingent action-reaction pairs (e.g. agent A points toward object; agent B picks up object) as compared to two independent actions (i.e., the action of agent A has no effect on the action of agent B) (Isik et al., 2017). Perhaps the activation reflects the representation of a social action effect - the change of an agent's state induced by someone else's action. Thus, the representation of action effects might not be limited to physical object changes but might also comprise social effects not induced by a physical interaction between entities. Finally, not all actions induce an observable change in the world. It remains to be tested whether the recognition of, e.g., communication (e.g. speaking, gesturing) and perception actions (e.g. observing, smelling) similarly relies on structural action representations in aIPL and LOTC"

      On a more specific but still important point, I was not always clear that the significant, but numerically rather small, decoding effects are sufficient to support strong claims about what is encoded or represented in a region. This concern of course applies to many multivariate decoding neuroimaging studies. In this instance, I wondered specifically whether the decoding effects necessarily reflected fully five-way distinction amongst the action kinds, or instead (for example) a significantly different pattern evoked by one action compared to all of the other four (which in turn might be similar). This concern is partly increased by the confusion matrices that are presented in the supplementary materials, which don't necessarily convey a strong classification amongst action kinds. The cluster analyses are interesting and appear to be somewhat regular over the different regions, which helps. However: it is hard to assess these findings statistically, and it may be that similar clusters would be found in early visual areas too.

      We agree that in our original manuscript, we did not statistically test what precisely drives the decoding, e.g., specific actions or rather broader categories. In our revised manuscript, we included a representational similarity analysis (RSA) that addressed this point. In short, we found that the action-animation decoding was driven by categorical distinctions between groups of actions (e.g. hit/place vs. the remaining actions) rather than a fully five-way distinction amongst all action kinds. The action-PLD decoding was mostly driven by , specifically manuality (unimanual vs. bimanual)) and movement kinematics; in left and right LOTC we found additional evidence for action-specific representations.

      Please find below the new paragraph on the RSA:

      "To explore in more detail what types of information were isolated by the action-animation and action-PLD cross-decoding, we performed a representational similarity analysis.

      We first focus on the representations identified by the action-animation decoding. To inspect and compare the representational organization in the ROIs, we extracted the confusion matrices of the action-animation decoding from the ROIs (Fig. 5A) and compared them with different similarity models (Fig. 5B) using multiple regression. Specifically, we aimed at testing at which level of granularity action effect structures are represented in aIPL and LOTC: Do these regions encode the broad type of action effects (change of shape, change of location, ingestion) or do they encode specific action effects (compression, division, etc.)? In addition, we aimed at testing whether the effects observed in EVC can be explained by a motion energy model that captures the similarities between actions and animations that we observed in the stimulus-based action-animation decoding using motion energy features. We therefore included V1 in the ROI analysis. We found clear evidence that the representational content in right aIPL and bilateral LOTC can be explained by the effect type model but not by the action-specific model (all p < 0.005; two-sided paired t-tests between models; Fig. 5C). In left V1, we found that the motion energy model could indeed explain some representational variance; however, in both left and right V1 we also found effects for the effect type model. We assume that there were additional visual similarities between the broad types of actions and animations that were not captured by the motion energy model (or other visual models; see Supplementary Information). A searchlight RSA revealed converging results, and additionally found effects for the effect type model in the ventral part of left aIPL and for the action-specific model in the left anterior temporal lobe, left dorsal central gyrus, and right EVC (Fig. 5D). The latter findings were unexpected and should be interpreted with caution, as these regions (except right EVC) were not found in the action-animation cross-decoding and therefore should not be considered reliable (Ritchie et al., 2017). The motion energy model did not reveal effects that survived the correction for multiple comparison, but a more lenient uncorrected threshold of p = 0.005 revealed clusters in left EVC and bilateral posterior SPL.

      To characterize the representations identified by the action-PLD cross-decoding, we used a manuality model that captures whether the actions were performed with both hands vs. one hand, an action-specific model as used in the action-animation RSA above, and a kinematics model that was based on the 3D kinematic marker positions of the PLDs (Fig. 6B). Since pSTS is a key region for biological motion perception, we included this region in the ROI analysis. The manuality model explained the representational variance in the parietal ROIs, pSTS, and LOTC, but not in V1 (all p < 0.002; two-sided paired t-tests between V1 and other ROIs; Fig. 6C). By contrast, the action-specific model revealed significant effects in V1 and LOTC, but not in pSTS and parietal ROIs (but note that effects in V1 and pSTS did not differ significantly from each other; all other two-sided paired t-tests between mentioned ROIs were significant at p < 0.0005). The kinematics model explained the representational variance in all ROIs. A searchlight RSA revealed converging results, and additionally found effects for the manuality model in bilateral dorsal/medial prefrontal cortex and in right ventral prefrontal cortex and insula (Fig. 6D).”

      We also included an ROI covering early visual cortex (V1) in our analysis. While there was significant decoding for action-animation in V1, the representational organization did not substantially match the organization found in aIPL and LOTC: A cluster analysis revealed much higher similarity between LOTC and aIPL than between these regions and V1:

      (please note that in this analysis we included the action-PLD RDMs as reference, and to test whether aIPL shows a similar representational organization in action-anim and action-PLD; see below)

      Given these results, we think that V1 captured different aspects in the action-animation cross-decoding than aIPL and LOTC. We address this point in more detail in our response to the "Recommendations for The Authors".

      Reviewer #2 (Public Review):

      Summary:

      This study uses an elegant design, using cross-decoding of multivariate fMRI patterns across different types of stimuli, to convincingly show a functional dissociation between two sub-regions of the parietal cortex, the anterior inferior parietal lobe (aIPL) and superior parietal lobe (SPL) in visually processing actions. Specifically, aIPL is found to be sensitive to the causal effects of observed actions (e.g. whether an action causes an object to compress or to break into two parts), and SPL to the motion patterns of the body in executing those actions.

      To show this, the authors assess how well linear classifiers trained to distinguish fMRI patterns of response to actions in one stimulus type can generalize to another stimulus type. They choose stimulus types that abstract away specific dimensions of interest. To reveal sensitivity to the causal effects of actions, regardless of low-level details or motion patterns, they use abstract animations that depict a particular kind of object manipulation: e.g. breaking, hitting, or squashing an object. To reveal sensitivity to motion patterns, independently of causal effects on objects, they use point-light displays (PLDs) of figures performing the same actions. Finally, full videos of actors performing actions are used as the stimuli providing the most complete, and naturalistic information. Pantomime videos, with actors mimicking the execution of an action without visible objects, are used as an intermediate condition providing more cues than PLDs but less than real action videos (e.g. the hands are visible, unlike in PLDs, but the object is absent and has to be inferred). By training classifiers on animations, and testing their generalization to full-action videos, the classifiers' sensitivity to the causal effect of actions, independently of visual appearance, can be assessed. By training them on PLDs and testing them on videos, their sensitivity to motion patterns, independent of the causal effect of actions, can be assessed, as PLDs contain no information about an action's effect on objects.

      These analyses reveal that aIPL can generalize between animations and videos, indicating that it is sensitive to action effects. Conversely, SPL is found to generalize between PLDs and videos, showing that it is more sensitive to motion patterns. A searchlight analysis confirms this pattern of results, particularly showing that action-animation decoding is specific to right aIPL, and revealing an additional cluster in LOTC, which is included in subsequent analyses. Action-PLD decoding is more widespread across the whole action observation network.

      This study provides a valuable contribution to the understanding of functional specialization in the action observation network. It uses an original and robust experimental design to provide convincing evidence that understanding the causal effects of actions is a meaningful component of visual action processing and that it is specifically localized in aIPL and LOTC.

      Strengths:

      The authors cleverly managed to isolate specific aspects of real-world actions (causal effects, motion patterns) in an elegant experimental design, and by testing generalization across different stimulus types rather than within-category decoding performance, they show results that are convincing and readily interpretable. Moreover, they clearly took great care to eliminate potential confounds in their experimental design (for example, by carefully ordering scanning sessions by increasing realism, such that the participants could not associate animation with the corresponding real-world action), and to increase stimulus diversity for different stimulus types. They also carefully examine their own analysis pipeline, and transparently expose it to the reader (for example, by showing asymmetries across decoding directions in Figure S3). Overall, this is an extremely careful and robust paper.

      Weaknesses:

      I list several ways in which the paper could be improved below. More than 'weaknesses', these are either ambiguities in the exact claims made, or points that could be strengthened by additional analyses. I don't believe any of the claims or analyses presented in the paper show any strong weaknesses, problematic confounds, or anything that requires revising the claims substantially.

      (1) Functional specialization claims: throughout the paper, it is not clear what the exact claims of functional specialization are. While, as can be seen in Figure 3A, the difference between action-animation cross-decoding is significantly higher in aIPL, decoding performance is also above chance in right SPL, although this is not a strong effect. More importantly, action-PLD cross-decoding is robustly above chance in both right and left aIPL, implying that this region is sensitive to motion patterns as well as causal effects. I am not questioning that the difference between the two ROIs exists - that is very convincingly shown. But sentences such as "distinct neural systems for the processing of observed body movements in SPL and the effect they induce in aIPL" (lines 111-112, Introduction) and "aIPL encodes abstract representations of action effect structures independently of motion and object identity" (lines 127-128, Introduction) do not seem fully justified when action-PLD cross-decoding is overall stronger than action-animation cross-decoding in aIPL. Is the claim, then, that in addition to being sensitive to motion patterns, aIPL contains a neural code for abstracted causal effects, e.g. involving a separate neural subpopulation or a different coding scheme. Moreover, if sensitivity to motion patterns is not specific to SPL, but can be found in a broad network of areas (including aIPL itself), can it really be claimed that this area plays a specific role, similar to the specific role of aIPL in encoding causal effects? There is indeed, as can be seen in Figure 3A, a difference between action-PLD decoding in SPL and aIPL, but based on the searchlight map shown in Figure 3B I would guess that a similar difference would be found by comparing aIPL to several other regions. The authors should clarify these ambiguities.

      We thank the reviewer for this careful assessment. The observation of action-PLD cross-decoding in aIPL is indeed not straightforward to interpret: It could mean that aIPL encodes both body movements and action effect structures by different neural subpopulations. Or it could mean that representations of action effect structures were also activated by the PLDs, which lead to successful decoding in the action-PLD cross-decoding. Our revision allows a more nuanced view on this issue:

      First, we included the results of a behavioral test show that PLDs at least weakly allow for recognition of the specific actions (see our response to the second comment), which in turn might activate action effect structure representations. Second, the finding that also the cross-decoding between animations and PLDs revealed effects in left and right aIPL (as pointed out by the reviewer in the second comment) supports the interpretation that PLDs have activated, to some extent, action effect structure representations.

      On the other hand, if aIPL encodes only action-effect-structures, that were also captured in the action-PLD cross-decoding, we would expect that the RDMs in aIPL are similar for the action-PLD and action-animation cross-decoding. However, the cluster analysis (see our response to Reviewer 1 above) does not show this; rather, all action-PLD RDMs are representationally more similar with each other than with action-animation RDMs, specifically with regard to aIPL. In addition, the RSA revealed sensitivity to manuality and kinematics also in aIPL. This suggests that the action-PLD decoding in aIPL was at least partially driven by representations related to body movements.

      Taken together, these findings suggest that aIPL encodes also body movements. In fact, we didn't want to make the strong claim that aIPL is selectively representing action effect structures. Rather, we think that our results show that aIPL and SPL are disproportionally sensitive to action effects and body movements, respectively. We added this in our revised discussion:

      "The action-PLD cross-decoding revealed widespread effects in LOTC and parietal cortex, including aIPL. What type of representation drove the decoding in aIPL? One possible interpretation is that aIPL encodes both body movements (isolated by the action-PLD cross-decoding) and action effect structures (isolated by the action-animation cross-decoding). Alternatively, aIPL selectively encodes action effect structures, which have been activated by the PLDs. A behavioral test showed that PLDs at least weakly allow for recognition of the specific actions (Tab. S2), which might have activated corresponding action effect structure representations. In addition, the finding that aIPL revealed effects for the cross-decoding between animations and PLDs further supports the interpretation that PLDs have activated, at least to some extent, action effect structure representations.  On the other hand, if aIPL encodes only action effect structures, we would expect that the representational similarity patterns in aIPL are similar for the action-PLD and action-animation cross-decoding. However, this was not the case; rather, the representational similarity pattern in aIPL was more similar to SPL for the action-PLD decoding, which argues against distinct representational content in aIPL vs. SPL isolated by the action-PLD decoding. In addition, the RSA revealed sensitivity to manuality and kinematics also in aIPL, which suggests that the action-PLD decoding in aIPL was at least partially driven by representations related to body movements. Taken together, these findings suggest that aIPL encodes not only action effect structures, but also representations related to body movements. Likewise, also SPL shows some sensitivity to action effect structures, as demonstrated by effects in SPL for the action-animation and pantomime-animation cross-decoding. Thus, our results suggest that aIPL and SPL are not selectively but disproportionally sensitive to action effects and body movements, respectively."

      A clarification to the sentence "aIPL encodes abstract representations of action effect structures independently of motion and object identity": Here we are referring to the action-animation cross decoding only; specifically, the fact that because the animations did not show body motion and concrete objects, the representations isolated in the action-animation cross decoding must be independent of body motion and concrete objects. This does not rule out that the same region encodes other kinds of representations in addition.

      And another side note to the RSA: It might be tempting to test the "effects" model (distinguishing change of shape, change of location and ingest) also in the action-PLD multiple regression RSA in order to test whether this model explains additional variance in aIPL, which would point towards action effect structure representations. However, the "effect type" model is relatively strongly correlated with the "manuality" model (VIF=4.2), indicating that multicollinearity might exist. We therefore decided to not include this model in the RSA. However, we nonetheless tested the inclusion of this model and did not find clear effects for the "effects" model in aIPL (but in LOTC). The other models revealed largely similar effects as the RSA without the "effects" model, but the effects appeared overall noisier. In general, we would like to emphasize that an RSA with just 5 actions is not ideal because of the small number of pairwise comparisons, which increases the chance for coincidental similarities between model and neural RDMs. We therefore marked this analysis as "exploratory" in the article.

      (2) Causal effect information in PLDs: the reasoning behind the use of PLD stimuli is to have a condition that isolates motion patterns from the causal effects of actions. However, it is not clear whether PLDs really contain as little information about action effects as claimed. Cross-decoding between animations and PLDs is significant in both aIPL and LOTC, as shown in Figure 4. This indicates that PLDs do contain some information about action effects. This could also be tested behaviorally by asking participants to assign PLDs to the correct action category. In general, disentangling the roles of motion patterns and implied causal effects in driving action-PLD cross-decoding (which is the main dependent variable in the paper) would strengthen the paper's message. For example, it is possible that the strong action-PLD cross-decoding observed in aIPL relies on a substantially different encoding from, say, SPL, an encoding that perhaps reflects causal effects more than motion patterns. One way to exploratively assess this would be to integrate the clustering analysis shown in Figure S1 with a more complete picture, including animation-PLD and action-PLD decoding in aIPL.

      With regard to the suggestion to behaviorally test how well participants can grasp the underlying action effect structures: We indeed did a behavioral experiment to assess the recognizability of actions in the PLD stick figures (as well as in the pantomimes). In short, this experiment revealed that participants could not well recognize the actions in the PLD stick figures and often confused them with kinematically similar but conceptually different actions (e.g. breaking --> shaking, hitting --> swiping, squashing --> knitting). However, the results also show that it was not possible to completely eliminate that PLDs contain some information about action effects.

      Because we considered this behavioral experiment as a standard assessment of the quality of the stimuli, we did not report them in the original manuscript. We now added an additional section to the methods that describes the behavioral experiments in detail:

      "To assess how much the animations, PLD stick figures, and pantomimes were associated with the specific action meanings of the naturalistic actions, we performed a behavioral experiment. 14 participants observed videos of the animations, PLDs (without stick figures), and pantomimes in three separate sessions (in that order) and were asked to describe what kind of actions the animations depict and give confidence ratings on a Likert scale from 1 (not confident at all) to 10 (very confident). Because the results for PLDs were unsatisfying (several participants did not recognize human motion in the PLDs), we added stick figures to the PLDs as described above and repeated the rating for PLD stick figures with 7 new participants, as reported below.

      A general observation was that almost no participant used verb-noun phrases (e.g. "breaking a stick") in their descriptions for all stimulus types. For the animations, the participants used more abstract verbs or nouns to describe the actions (e.g. dividing, splitting, division; Tab. S1). These abstract descriptions matched the intended action structures quite well, and participants were relatively confident about their responses (mean confidences between 6 and 7.8). These results suggest that the animations were not substantially associated with specific action meanings (e.g. "breaking a stick") but captured the coarse action structures. For the PLD stick figures (Tab. S2), responses were more variable and actions were often confused with kinematically similar but conceptually different actions (e.g. breaking --> shaking, hitting --> turning page, squashing --> knitting). Confidence ratings were relatively low (mean confidences between 3 and 5.1). These results suggest that PLD stick figures, too, were not substantially associated with specific action meanings and additionally did not clearly reveal the underlying action effect structures. Finally, pantomimes were recognized much better, which was also reflected in high confidence ratings (mean confidences between 8 and 9.2; Tab. S3). This suggests that, unlike PLD stick figures, pantomimes allowed much better to access the underlying action effect structures."

      We also agree with the second suggestion to investigate in more detail the representational profiles in aIPL and SPL. We think that the best way to do so is the RSA that we reported above. However, to provide a complete picture of the results, we also added the whole brain maps and RDMs for the animation-pantomime, animation-PLD, pantomime-PLD, and action-pantomime to the supplementary information.

      (3) Nature of the motion representations: it is not clear what the nature of the putatively motion-driven representation driving action-PLD cross-decoding is. While, as you note in the Introduction, other regions such as the superior temporal sulcus have been extensively studied, with the understanding that they are part of a feedforward network of areas analyzing increasingly complex motion patterns (e.g. Riese & Poggio, Nature Reviews Neuroscience 2003), it doesn't seem like the way in which SPL represents these stimuli are similarly well-understood. While the action-PLD cross-decoding shown here is a convincing additional piece of evidence for a motion-based representation in SPL, an interesting additional analysis would be to compare, for example, RDMs of different actions in this region with explicit computational models. These could be, for example, classic motion energy models inspired by the response characteristics of regions such as V5/MT, which have been shown to predict cortical responses and psychophysical performance both for natural videos (e.g. Nishimoto et al., Current Biology 2011) and PLDs (Casile & Giese Journal of Vision 2005). A similar cross-decoding analysis between videos and PLDs as that conducted on the fMRI patterns could be done on these models' features, obtaining RDMs that could directly be compared with those from SPL. This would be a very informative analysis that could enrich our knowledge of a relatively unexplored region in action recognition. Please note, however, that action recognition is not my field of expertise, so it is possible that there are practical difficulties in conducting such an analysis that I am not aware of. In this case, I kindly ask the authors to explain what these difficulties could be.

      Thank you for this very interesting suggestion. We conducted a cross-decoding analysis that was based on the features of motion energy models as described in Nishimoto et al. (2011). Control analyses within each stimulus type revealed high decoding accuracies (animations: 100%, PLDs: 100%, pantomimes: 65%, actions: 55%), which suggests that the motion energy data generally contains information that can be detected by a classifier. However, the cross-decoding between actions and PLDs was at chance (20%), and the classification matrix did not resemble the neural RDMs. We also tested optical flow vectors as input to the decoding, which revealed similarly high decoding for the within-stimulus-type decoding (animations: 75%, PLDs: 100%, pantomimes: 65%, actions: 40%), but again at-chance decoding for action-PLD (20%), notably with a very different classification pattern:

      Author response image 1.

      Given these mixed results, we decided not to use these models for a statistical comparison with the neural action-PLD RDMs.

      It is notable that the cross-decoding worked generally less well for decoding schemes that involve PLDs, which is likely due to highly different feature complexity of actions and PLDs: Naturalistic actions have much richer visual details, texture, and more complex motion cues. Therefore, motion energy features extracted from these videos likely capture a mixture of both fine-grained and broad motion information across different spatial frequencies. By contrast, motion energy features of PLDs are sparse and might not match the features of naturalistic actions. In a way, this was intended, as we were interested in higher-level body kinematics rather than lower-level motion features. We therefore decided to use a different approach to investigate the representational structure found in the action-PLD cross-decoding: As the PLDs were based on kinematic recordings of actions that were carried out in exactly the same manner as the naturalistic actions, we computed the dissimilarity of the 5 actions based on the kinematic marker positions. Specifically, we averaged the kinematic data across the 2 exemplars per PLD, vectorized the 3D marker positions of all time points of the PLDs (3 dimensions x 13 markers x 200 time points), computed the pairwise correlations between the 5 vectors, and converted the correlations into dissimilarity values by subtracting 1 - r. This RDM was then compared with the neural RDMs extracted from the action-PLD cross-decoding. This was done using a multiple regression RSA (see also our response to Reviewer 1's public comment 2), which allowed us to statistically test the kinematic model against other dissimilarity models: a categorical model of manuality (uni- vs. bimanual) and an action-specific model that discriminates each specific action from each other with equal distance.

      This analysis revealed interesting results: the kinematic model explained the representational variance in bilateral SPL and (particularly right) pSTS as well as in right fusiform cortex and early visual cortex. The action-specific model revealed effects restricted to bilateral LOTC. The manuality model revealed widespread effects throughout the action observation network but not in EVC.

      (4) Clustering analysis: I found the clustering analysis shown in Figure S1 very clever and informative. However, there are two things that I think the authors should clarify. First, it's not clear whether the three categories of object change were inferred post-hoc from the data or determined beforehand. It is completely fine if these were just inferred post-hoc, I just believe this ambiguity should be clarified explicitly. Second, while action-anim decoding in aIPL and LOTC looks like it is consistently clustered, the clustering of action-PLD decoding in SPL and LOTC looks less reliable. The authors interpret this clustering as corresponding to the manual vs. bimanual distinction, but for example "drink" (a unimanual action) is grouped with "break" and "squash" (bimanual actions) in left SPL and grouped entirely separately from the unimanual and bimanual clusters in left LOTC. Statistically testing the robustness of these clusters would help clarify whether it is the case that action-PLD in SPL and LOTC has no semantically interpretable organizing principle, as might be the case for a representation based entirely on motion pattern, or rather that it is a different organizing principle from action-anim, such as the manual vs. bimanual distinction proposed by the authors. I don't have much experience with statistical testing of clustering analyses, but I think a permutation-based approach, wherein a measure of cluster robustness, such as the Silhouette score, is computed for the clusters found in the data and compared to a null distribution of such measures obtained by permuting the data labels, should be feasible. In a quick literature search, I have found several papers describing similar approaches: e.g. Hennig (2007), "Cluster-wise assessment of cluster stability"; Tibshirani et al. (2001) "Estimating the Number of Clusters in a Data Set Via the Gap Statistic". These are just pointers to potentially useful approaches, the authors are much better qualified to pick the most appropriate and convenient method. However, I do think such a statistical test would strengthen the clustering analysis shown here. With this statistical test, and the more exhaustive exposition of results I suggested in point 2 above (e.g. including animation-PLD and action-PLD decoding in aIPL), I believe the clustering analysis could even be moved to the main text and occupy a more prominent position in the paper.

      With regard to the first point, we clarified in the methods that we inferred the 3 broad action effect categories after the stimulus selection: "This categorization was not planned before designing the study but resulted from the stimulus selection."

      Thank you for your suggestion to test more specifically the representational organization in the action-PLD and action-animation RDMs. However, after a careful assessment, we decided to replace the cluster analysis with an RSA. We did this for two reasons:

      First, we think that RSA is a better (and more conventional) approach to statistically investigate the representational structure in the ROIs (and in the whole brain). The RSA allowed us, for example, to specifically test the mentioned distinction between unimanual and bimanual actions, and to test it against other models, i.e., a kinematic model and an action-specific model. This indeed revealed interesting distinct representational profiles of SPL and LOTC.

      Second, we learned that the small number of items (5) is generally not ideal for cluster analyses (absolute minimum for meaningful interpretability is 4, but to form at least 2-3 clusters a minimum of 10-15 items is usually recommended). A similar rule of thumb applies to methods to statistically assess the reliability of cluster solutions (e.g., Silhouette Scores, Cophenetic Correlation Coefficient, Jaccard Coefficient). Finally, the small number of items is not ideal to run a permutation test because the number of unique permutations (for shuffling the data labels: 5! = 30) is insufficient to generate a meaningful null distribution. We therefore think it is best to discard the cluster analysis altogether. We hope you agree with this decision.

      (5) ROI selection: this is a minor point, related to the method used for assigning voxels to a specific ROI. In the description in the Methods (page 16, lines 514-24), the authors mention using the MNI coordinates of the center locations of Brodmann areas. Does this mean that then they extracted a sphere around this location, or did they use a mask based on the entire Brodmann area? The latter approach is what I'm most familiar with, so if the authors chose to use a sphere instead, could they clarify why? Or, if they did use the entire Brodmann area as a mask, and not just its center coordinates, this should be made clearer in the text.

      We indeed used a sphere around the center coordinate of the Brodmann areas. This was done to keep the ROI sizes / number of voxels constant across ROIs. Since we aimed at comparing the decoding accuracies between aIPL and SPL, we thereby minimized the possibility that differences in decoding accuracy between ROIs are due to ROI size differences. The approach of using spherical ROIs is a quite well established practice that we are using in our lab by default (e.g. Wurm & Caramazza, NatComm, 2019; Wurm & Caramazza, NeuroImage, 2019; Karakose, Caramazza, & Wurm, NatComm, 2023). We clarified that we used spherical ROIs to keep the ROI sizes constant in the revised manuscript.

      Reviewer #3 (Public Review):

      This study tests for dissociable neural representations of an observed action's kinematics vs. its physical effect in the world. Overall, it is a thoughtfully conducted study that convincingly shows that representations of action effects are more prominent in the anterior inferior parietal lobe (aIPL) than the superior parietal lobe (SPL), and vice versa for the representation of the observed body movement itself. The findings make a fundamental contribution to our understanding of the neural mechanisms of goal-directed action recognition, but there are a couple of caveats to the interpretation of the results that are worth noting:

      (1) Both a strength of this study and ultimately a challenge for its interpretation is the fact that the animations are so different in their visual content than the other three categories of stimuli. On one hand, as highlighted in the paper, it allows for a test of action effects that is independent of specific motion patterns and object identities. On the other hand, the consequence is also that Action-PLD cross-decoding is generally better than Action-Anim cross-decoding across the board (Figure 3A) - not surprising because the spatiotemporal structure is quite different between the actions and the animations. This pattern of results makes it difficult to interpret a direct comparison of the two conditions within a given ROI. For example, it would have strengthened the argument of the paper to show that Action-Anim decoding was better than Action-PLD decoding in aIPL; this result was not obtained, but that could simply be because the Action and PLD conditions are more visually similar to each other in a number of ways that influence decoding. Still, looking WITHIN each of the Action-Anim and Action-PLD conditions yields clear evidence for the main conclusion of the study.

      The reviewer is absolutely right: Because the PLDs are more similar to the actions than the animations, a comparison of the effects of the two decoding schemes is not informative. As we also clarified in our response to Reviewer 2, we cannot rule out that the action-PLD decoding picked up information related to action effect structures. Thus, the only firm conclusion that we can draw from our study is that aIPL and SPL are disproportionally sensitive to action effects and body movements, respectively. We clarified this point in our revised discussion.

      (2) The second set of analyses in the paper, shown in Figure 4, follows from the notion that inferring action effects from body movements alone (i.e., when the object is unseen) is easier via pantomimes than with PLD stick figures. That makes sense, but it doesn't necessarily imply that the richness of the inferred action effect is the only or main difference between these conditions. There is more visual information overall in the pantomime case. So, although it's likely true that observers can more vividly infer action effects from pantomimes vs stick figures, it's not a given that contrasting these two conditions is an effective way to isolate inferred action effects. The results in Figure 4 are therefore intriguing but do not unequivocally establish that aIPL is representing inferred rather than observed action effects.

      We agree that higher decoding accuracies for Action-Pant vs. Action-PLD and Pant-PLD could also be due to visual details (in particular of hands and body) that are more similar in actions and pantomimes relative to PLDs. However, please note that for this reason we included also the comparison of Anim-Pant vs. Anim-PLD. For this comparison, visual details should not influence the decoding. We clarified this point in our revision.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It struck me that there are structural distinctions amongst the 5 action kinds that were not highlighted and may have been unintentional. Specifically, three of the actions are "unary" in a sense: break(object), squash(object), hit(object). One is "binary": place(object, surface), and the fifth (drink) is perhaps ternary - transfer(liquid, cup, mouth)? Might these distinctions be important for the organization of action effects (or actions generally)?

      This is an interesting aspect that we did not think of yet. We agree that for the organization of actions (and perhaps action effects) this distinction might be relevant. One issue we noticed, however, is that for the animations the suggested organization might be less clear, in particular for "drink" as ternary, and perhaps also for "place" as binary. Thus, in the action-animation cross-decoding, this distinction - if it exists in the brain - might be harder to capture. We nonetheless tested this distinction. Specifically, we constructed a dissimilarity model (using the proposed organization, valency model hereafter) and tested it in a multiple regression RSA against an effect type model and two other models for specific actions (discriminating each action from each other with the same distance) and motion energy (as a visual control model). This analysis revealed no effects for the "valency" model in the ROI-based RSA. Also a searchlight analysis revealed no effects for this model. Since we think that the valency model is not ideally suited to test representations of action effects (using data from the action-animation cross-decoding) and to make the description of the RSA not unnecessarily complicated, we decided to not include this model in the final RSA reported in the manuscript.

      In general, I found it surprising that the authors treated their LOTC findings as surprising or unexpected. Given the long literature associating this region with several high-level visual functions related to body perception, action perception, and action execution, I thought there were plenty of a priori reasons to investigate the LOTC's behaviour in this study. Looking at the supplementary materials, indeed some of the strongest effects seem to be in that region.

      (Likewise, classically, the posterior superior temporal sulcus is strongly associated with the perception of others' body movements; why not also examine this region of interest?)

      One control analysis that would considerably add to the strength of the authors' conclusions would be to examine how actions could be cross-decoded (or not) in the early visual cortex. Especially in comparisons of, for example, pantomime to full-cue video, we might expect a high degree of decoding accuracy, which might influence the way we interpret similar decoding in other "higher level" regions.

      We agree that it makes sense to also look into LOTC and pSTS, and also EVC. We therefore added ROIs for these regions: For EVC and LOTC we used the same approach based on Brodmann areas as for aIPL and SPL, i.e., we used BA 17 for V1 and BA 19 for LOTC. For pSTS, we defined the ROI based on a meta analysis contrast for human vs. non-human body movements (Grobras et al., HBM 2012). Indeed we find that the strongest effects (for both action effect structures and body movements) can be found in LOTC. We also found effects in EVC that, at least for the action-animation cross-decoding, are more difficult to interpret. To test for a coincidental visual confound between actions and animations, we included a control model for motion energy in the multiple regression RSA, which could indeed explain some of the representational content in V1. However, also the effect type model revealed effects in V1, suggesting that there were additional visual features that caused the action-animation cross-decoding in V1. Notably, as pointed out in our response to the Public comments, the representational organization in V1 was relatively distinct from the representational organization in aIPL and LOTC, which argues against the interpretation that effects in aIPL and LOTC were driven by the same (visual) features as in V1.

      Regarding the analyses reported in Figure 4: wouldn't it be important to also report similar tests for SPL?

      In the analysis of implied action effect structures, we focused on the brain regions that revealed robust effects for action-animation decoding in the ROI and the searchlight analysis, that is, aIPL and SPL. However, we performed a whole brain conjunction analysis to search for other brain regions that show a profile for implied action effect representation. This analysis (that we forgot to mention in our original manuscript; now corrected) did not find evidence for implied action effect representations in SPL.

      However, for completeness, we also added a ROI analysis for SPL. This analysis revealed a surprisingly complex pattern of results: We observed stronger decoding for Anim-Pant vs. Anim-PLD, whereas there were no differences for the comparisons of Action-Pant with Action-PLD and Pant-PLD:

      This pattern of results is not straightforward to explain: First, the equally strong decoding for Action-Pant, Action-PLD, and Pant-PLD suggests that SPL is not substantially sensitive to body part details. Rather, the decoding relied on the coarse body part movements, independently of the specific stimulus type (action, pantomime, PLD). However, the stronger difference between Anim-Pant and Anim-PLD suggests that SPL is also sensitive to implied AES. This appears unlikely, because no effects (in left aIPL) or only weak effects (in right SPL) were found for the more canonical Action-Anim cross-decoding. The Anim-Pant cross-decoding was even stronger than the Action-Anim cross-decoding, which is counterintuitive because naturalistic actions contain more information than pantomimes, specifically with regard to action effect structures. How can this pattern of results be interpreted? Perhaps, for pantomimes and animations, not only aIPL and LOTC but also SPL is involved in inferring (implied) action effect structures. However, for this conclusion, also differences for the comparison of Action-Pant with Action-PLD and for Action-Pant with Pant-PLD should be found. Another non-mutually exclusive interpretation is that both animations and pantomimes are more ambiguous in terms of the specific action, as opposed to naturalistic actions. For example, the squashing animation and pantomime are both ambiguous in terms of what is squashed/compressed, which might require additional load to infer both the action and the induced effect. The increased activation of action-related information might in turn increase the chance for a match between neural activation patterns of animations and pantomimes.

      In any case, these additional results in SPL do not question the effects reported in the main text, that is, disproportionate sensitivity for action effect structures in right aIPL and LOTC and for body movements in SPL and other AON regions. The evidence for implied action effect structures representation in SPL is mixed and should be interpreted with caution.

      We added this analysis and discussion as supplementary information.

      Statistical arguments that rely on "but not" are not very strong, e.g. "We found higher cross-decoding for animation-pantomime vs. animation-PLD in right aIPL and bilateral LOTC (all t(23) > 3.09, all p < 0.0025; one-tailed), but not in left aIPL (t(23) = 0.73, p = 0.23, one-tailed)." Without a direct statistical test between regions, it's not really possible to support a claim that they have different response profiles.

      Absolutely correct. Notably, we did not make claims about different profiles of the tested ROIs with regard to implied action effect representations. But of course it make sense to test for differential profiles of left vs. right aIPL, so we have added a repeated measures ANOVA to test for an interaction between TEST (animation-pantomime, animation-PLD) and ROI (left aIPL, right aIPL), which, however, was not significant (F(1,23)=3.66, p = 0.068). We included this analysis in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      (1) I haven't found any information about data and code availability in the paper: is the plan to release them upon publication? This should be made clear.

      Stimuli, MRI data, and code are deposited at the Open Science Framework (https://osf.io/am346/). We included this information in the revised manuscript.

      (2) Samples of videos of the stimuli (or even the full set) would be very informative for the reader to know exactly what participants were looking at.

      We have uploaded the full set of stimuli on OSF (https://osf.io/am346/).

      (3) Throughout the paper, decoding accuracies are averaged across decoding directions (A->B and B->A). To my knowledge, this approach was proposed in van den Hurk & Op de Beeck (2019), "Generalization asymmetry in multivariate cross-classification: When representation A generalizes better to representation B than B to A". I believe it would be fair to cite this paper.

      Absolutely, thank you very much for the hint. We included this reference in our revised manuscript.

      (4) Page 3, line 70: this is a very nitpicky point, but "This suggests that body movements and the effects they induce are at least partially processed independently from each other." is a bit of an inferential leap from "these are distinct aspects of real-world actions" to "then they should be processed independently in the brain". The fact that a distinction exists in the world is a prerequisite for this distinction existing in the brain in terms of functional specialization, but it's not in itself a reason to believe that functional specialization exists. It is a reason to hypothesize that the specialization might exist and to test that hypothesis. So I think this sentence should be rephrased as "This suggests that body movements and the effects they induce might be at least partially processed independently from each other.", or something to that effect.

      Your reasoning is absolutely correct. We revised the sentence following your suggestion.

      (5) Page 7, line 182: the text says "stronger decoding for action-animation vs. action-PLD" (main effect of TEST), which is the opposite of what can be seen in the figure. I assume this is a typo?

      Thanks for spotting this, it was indeed a typo. We corrected it: “…stronger decoding for action-PLD vs. action-animation cross-decoding..”

      (6) Page 7, Figure 3B: since the searchlight analysis is used to corroborate the distinction between aIPL and SPL, it would be useful to overlay the contours of these ROIs (and perhaps LOTC as well) on the brain maps.

      We found that overlaying the contours of the ROIs onto the decoding searchlight maps would make the figure too busy, and the contours would partially hide effects. However, we added a brain map with all ROIs in the supplementary information.

      (7) Page 9, Figure 4A: since the distinction between the significant difference between anim-pant and anim-PLD is quite relevant in the text, I believe highlighting the lack of difference between the two decoding schemes in left aIPL (for example, by writing "ns") in the figure would help guide the reader to see the relevant information. It is generally quite hard to notice the absence of something.

      We added “n.s.” to the left aIPL in Fig. 4A.

      (8) Page 11, line 300: "Left aIPL appears to be more sensitive to the type of interaction between entities, e.g. how a body part or an object exerts a force onto a target object" since the distinction between this and the effect induced by that interaction" is quite nuanced, I believe a concrete example would clarify this for the reader: e.g. I guess the former would involve a representation of the contact between hand and object when an object is pushed, while the latter would represent only the object's displacement following the push?

      Thank you for the suggestion. We added a concrete example: “Left aIPL appears to be more sensitive to the type of interaction between entities, that is, how a body part or an object exerts a force onto a target object (e.g. how a hand makes contact with an object to push it), whereas right aIPL appears to be more sensitive to the effect induced by that interaction (the displacement of the object following the push).”

      (9) Page 12, line 376: "Informed consent, and consent to publish, was obtained from the participant in Figure 2." What does this refer to? Was the person shown in the figure both a participant in the study and an actor in the stimulus videos? Since this is in the section about participants in the experiment, it sounds like all participants also appeared in the videos, which I guess is not the case. This ambiguity should be clarified.

      Right, the statement sounds misleading in the “Participants” section. We rephrased it and moved it to the “Stimuli” section: “actions…were shown in 4 different formats: naturalistic actions, pantomimes, point light display (PLD) stick figures, and abstract animations (Fig. 2; informed consent, and consent to publish, was obtained from the actor shown in the figure).”

      (10) Page 15, line 492: Here, "within-session analyses" are mentioned. However, these analyses are not mentioned in the text (only shown in Figure S2) and their purpose is not clarified. I imagine they were a sanity check to ensure that the stimuli within each stimulus type could be reliably distinguished. This should be explained somewhere.

      We clarified the purpose of the within session decoding analyses in the methods section: "Within-session decoding analyses were performed as sanity checks to ensure that for all stimulus types, the 5 actions could be reliably decoded (Fig. S2)."

      (11) Page 20, Figure S1: I recommend using the same color ranges for the two decoding schemes (action-anim and action-PLD) in A and C, to make them more directly comparable.

      Ok, done.

      Reviewer #3 (Recommendations For The Authors):

      (1) When first looking at Figure 1B, I had a hard time discerning what action effect was being shown (I thought maybe it was "passing through") Figure 2 later clarified it for me, but it would be helpful to note in the caption that it depicts breaking.

      Thank you for the suggestion. Done.

      (2) It would be helpful to show an image of the aIPL and SPL ROIs on a brain to help orient readers - both to help them examine the whole brain cross-decoding accuracy and to aid in comparisons with other studies.

      We added a brain map with all ROIs in the supplementary information.

      (3) Line 181: I'm wondering if there's an error, or if I'm reading it incorrectly. The line states "Moreover, we found ANOVA main effects of TEST (F(1,24)=33.08, p=7.4E-06), indicating stronger decoding for action-animation vs. action-PLD cross-decoding..." But generally, in Figure 3A, it looks like accuracy is lower for Action-Anim than Action-PLD in both hemispheres.

      You are absolutely right, thank you very much for spotting this error. We corrected the sentence: “…stronger decoding for action-PLD vs. action-animation cross-decoding..”

      (4) It might be useful to devote some more space in the Introduction to clarifying the idea of action-effect structures. E.g., as I read the manuscript I found myself wondering whether there is a difference between action effect structures and physical outcomes in general... would the same result be obtained if the physical outcomes occurred without a human actor involved? This question is raised in the discussion, but it may be helpful to set the stage up front.

      We clarified this point in the introduction:

      In our study, we define action effects as induced by intentional agents. However, the notion of action effect structures might be generalizable to physical outcomes or object changes as such (e.g. an object's change of location or configuration, independently of whether the change is induced by an agent or not).

      (5) Regarding my public comment #2, it would perhaps strengthen the argument to run the same analysis in the SPL ROIs. At least for the comparison of Anim-Pant with Anim-PLD, the prediction would be no difference, correct?

      The prediction would indeed be that there is no difference for the comparison of Anim-Pant with Anim-PLD, but also for the comparison of Action-Pant with Action-PLD and for Action-Pant with Pant-PLD, there should be no difference. As explained in our response to the public comment #2, we ran a whole brain conjunction (Fig. 4B) to test for the combination of these effects and did not find SPL in this analysis. However, we did found differences for Anim-Pant vs. Anim-PLD, which is not straightforward to interpret (see our response to your public comment #2 for a discussion of this finding).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      The weaknesses of the study include the following. 

      (1) It remains unclear whether the function described for CDK2 is regulatory, that is, it affects TBK1 levels during physiological responses such as viral infection or cell cycle progression, or if it is homeostatic, governing the basal abundance of TBK1 but not responding to signaling.

      The regulation of TBK1 by CDK2 described in this article occurs during viral infection. Simultaneously, we also investigated the effects of CDK2 overexpression and knockdown on TBK1 levels under non-infected state and observed a slight reduction, as shown in Figure 4K and 4L. Thus, we speculate that the regulation of TBK1 by CDK2 serves, on one hand, to maintain cellular homeostasis and, on the other hand, to respond to signaling triggered by viral infection.

      (2) The authors have not explored whether the catalytic activity of CDK2 is required for TBK1 ubiquitinoylation and, if so, what its target specificity is.

      We found that the ubiquitination modification of TBK1 was not affected by treatment with a CDK2 kinase activity inhibitor (SNS-032), as demonstrated in the results below (Author response image 1).

      Author response image 1.

      (3) Given the multitude of CDK isoforms in fish, it remains unexplored whether the identified fish CDK2 homolog is a requisite cell cycle regulator or if its action in the cell cycle is redundant with other CDKs.

      A comparison of the protein sequences of fish CDK2 and human CDK2 revealed a 90% similarity (Author response image 2). It has also been reported that the kinase activity of goldfish CDK2 significantly increases during oocyte maturation (ref. 1). Furthermore, UHRF1 phosphorylation by cyclin A2/CDK2 is crucial for zebrafish embryogenesis (ref. 2). Additionally, Red grouper nervous necrosis virus (RGNNV) infection activated the p53 pathway, leading to the upregulation of p21 and downregulation of cyclin E and CDK2, which forces infected cells to remain in the G1/S replicative phase (ref. 3). All these evidences suggest that fish CDK2 plays a vital role in cell cycle regulation, and there have been no reports of other CDKs demonstrating CDK2-like functions.

      References:

      (1) Hirai T, et al. (1992) Isolation and Characterization of Goldfish Cdk2, a Cognate Variant of the Cell-Cycle Regulator Cdc2. Developmental biology 152(1):113-120.

      (2) Chu J, et al. (2012) UHRF1 phosphorylation by cyclin A2/cyclin-dependent kinase 2 is required for zebrafish embryogenesis. Molecular biology of the cell 23(1):59-70. 

      (3) Mai WJ, Liu HX, Chen HQ, Zhou YJ, & Chen Y (2018) RGNNV-induced cell cycle arrest at G1/S phase enhanced viral replication via p53-dependent pathway in GS cells. Virus Res 256:142-152.

      Author response image 2.

      Reviewer #2 (Public Review):

      Weaknesses:

      (1) While the study focuses on fish, the broader implications for other lower vertebrates and higher vertebrates are not extensively discussed.

      Thanks to your comment, we have added a paragraph to the Discussion section of the manuscript regarding the implications of the negative regulation of IFN expression by fish CDK2 for other vertebrates (lines 398-403). The details are as follows: first, we selected representative species from each of the six major vertebrate groups and compared their CDK2 protein sequences, finding that they are over 90% similar to one another (Author response image 3). This suggests that the function of CDK2 may be conserved to some extent across vertebrates. Additionally, CDK2 inhibition has been shown to enhance anti-tumor immunity by increasing the IFN response to endogenous retroviruses (ref. 1). Our studies provide evidence that fish CDK2 inhibits the IFN response by promoting the ubiquitination and degradation of TBK1, strongly supporting the role of CDK2 in the regulation of the immune response.

      Reference:

      (1) Chen Y, et al. (2022) CDK2 Inhibition Enhances Antitumor Immunity by Increasing IFN Response to Endogenous Retroviruses. Cancer Immunol Res 10(4):525-539.

      Author response image 3.

      (2) The study heavily relies on specific fish models, which may limit the generalizability of the findings across different species.

      Thank you for your comment. First, we compared the amino acid sequences of CDK2 proteins from fish and other vertebrates, which show over 90% similarity. Moreover, the small size, low cost, and external development of zebrafish make it an excellent model for vertebrate developmental biology. It has been reported that due to the high genomic and molecular similarities between zebrafish and other vertebrates, including humans, many significant discoveries in zebrafish development are relevant to humans (ref. 2). Our study concentrated on CDK2 in zebrafish, and the findings should be valuable for other vertebrates.

      Reference:

      (2) Veldman MB & Lin S (2008) Zebrafish as a Developmental Model Organism for Pediatric Research. Pediatr Res 64(5):470-476.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The following additional data/discussion could improve the manuscript.

      (1) Investigate whether the catalytic activity of CDK2 is required to regulate TBK1 abundance. It is common for E3 ligases to be directed towards phosphorylated substrates, so it would be of interest to know if CDK2 phosphorylates TBK1 to facilitate its recognition for ubiquitinylation.

      We examined the effect of CDK2 on the TBK1 protein after inhibiting its kinase activity with SNS-032 treatment and found that it could still affect TBK1 expression, as shown in the results below (Figure R4). Our previous experiments investigating the effect of CDK2 on TBK1 did not show that CDK2 caused the migration of TBK1 bands (typically, proteins that undergo phosphorylation exhibit band migration). Furthermore, in this study, CDK2 did not function as an E3 ligase; instead, it recruited the E3 ligase Dtx4 to ubiquitinate TBK1.

      Author response image 4.

      (2) Investigate how CDK2 abundance is regulated by viral infection and whether viral infection impacts cell cycle progression in a CDK2-dependent manner.

      In fact, as illustrated in Figure 1, we investigated the changes in CDK2 at both the mRNA and protein levels following viral infection. Our findings revealed that SVCV infection resulted in an increase in CDK2 mRNA and protein expression. Additionally, our earlier reports have indicated that SVCV infection can induce alterations in the cell cycle, resulting in a notable increase in the S phase (Figure 1 of ref. 1). However, whether SVCV infection impacts cell cycle progression in a CDK2dependent manner will be explored in our upcoming study.

      Reference:

      (1) Li S, et al. Spring viraemia of carp virus modulates p53 expression using two distinct mechanisms. PLoS Pathog 15, e1007695 (2019).

      (3) Provide data/discussion concerning the role of fish CDK2 in the regulation of cell cycle progression and whether this process is impacted by viral infection (part 1). Are TBK1 abundance and interferon production differentially regulated across the cell cycle due to the action of CDK2 (part 2).

      Thank you for your advice. This concern is addressed in two parts, as follows: 

      For part 1: To date, there has been limited research conducted on fish CDK2 in the regulation of cell cycle progression. The details are as follows: It has been reported that the kinase activity of goldfish CDK2 significantly increases during oocyte maturation (ref. 1). Furthermore, UHRF1 phosphorylation by cyclin A2/CDK2 is crucial for zebrafish embryogenesis (ref. 2). Additionally, a novel CDK2 homolog has been identified in Japanese lamprey, which plays a crucial role in apoptosis (ref. 3). Red grouper nervous necrosis virus (RGNNV) infection activates the p53 pathway, leading to the upregulation of p21 and downregulation of cyclin E and CDK2, which forces infected cells to remain in the G1/S replicative phase (ref. 4). All this evidence suggests that fish CDK2 plays a vital role in cell cycle regulation, and this process is also impacted by viral infection. Relevant content has been added to the Discussion section in the revised manuscript (lines 389-398).

      References:

      (1) Hirai T, et al. (1992) Isolation and Characterization of Goldfish Cdk2, a Cognate Variant of the Cell-Cycle Regulator Cdc2. Developmental biology 152(1):113-120.

      (2) Chu J, et al. (2012) UHRF1 phosphorylation by cyclin A2/cyclin-dependent kinase 2 is required for zebrafish embryogenesis. Molecular biology of the cell 23(1):5970.

      (3) Xu Y, Tian Y, Zhao H, Zheng N, Ren KX, Li QW. A novel CDK-2 homolog identified in lamprey, with roles in apoptosis. Fish Physiol Biochem 47, 189-189 (2021). 

      (4) Mai WJ, Liu HX, Chen HQ, Zhou YJ, & Chen Y (2018) RGNNV-induced cell cycle arrest at G1/S phase enhanced viral replication via p53-dependent pathway in GS cells. Virus Res 256:142-152.

      For part 2: TBK1 plays a crucial role in regulating IFN production. Variations in CDK2 activity during different phases of the cell cycle may lead to changes in the expression and function of TBK1. Our findings suggest that heightened CDK2 activity may suppress TBK1 expression, thereby hindering the cell's capacity to produce IFN. Conversely, during the late phase of the cell cycle or in an inhibited state, TBK1 expression may rise, enhancing IFN synthesis and release. In summary, CDK2 is involved in intracellular signaling by modulating TBK1 levels and IFN production, affecting the cellular immune response and cycle regulation—two processes that are notably distinct at various stages of the cell cycle. Relevant content has been added to the Discussion section in the revised manuscript (lines 377-384).

      Minor suggestions:

      (1) The authors introduce their study with the consideration that knowledge of fish signaling pathways can inform mammalian biology because mammals evolved from fish. This is not strictly true, since mammals and fish both evolved from an ancient common ancestor and the diversification of signaling in each species likely occurred in response to distinct evolutionary selective pressures.

      Thank you for your suggestion. We have revised the statement in the manuscript to eliminate the notion that mammals evolved from fish (lines 98-99). The immune systems of higher vertebrates (e.g., humans) and lower vertebrates (e.g., fish) generally exhibit some consistency, although there are notable differences.

      (2) On line 210 and line 276, the authors appear to have misstated the data. CDK2 knockout increases not decreases TBK1 and Dtx4 knockdown abrogated rather than restored CDK2 suppression of TBK1.

      Thanks for your reminder, I jumped to the wrong conclusions in these two places (line 204 and line 267) and have changed them as you suggested.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript has some shortcomings that, if addressed, could improve the overall quality of the article.

      (1) Line 63-72, line 77-79, line 88-90- please add additional references for these sentences.

      Thanks to your comment, we have added references for these sentences (Line 63-72, line 77-79, line 88-90).

      (2) It is of the utmost importance to quantify the data presented in Figures 4J and 5D, as this will facilitate the visualization of the immunoblot.

      Thank you for your comment. We have quantified the data presented in Figures 4J and 5D to enhance the clarity of the immunoblot.

      (3) The scale in Figure 4E is difficult to discern.

      Thanks for your comment. To improve the visual clarity of the image, we have enlarged the scale label in Figure 4E.

      (4) In Figure 3B, shCDK2 is shown in italics, preferably in line with other standards such as Figures 3C and 3F.

      Thank you for your comment. We have revised the shCDK2 in Figure 3B.

      (5) The functions of CDK family members in immunity are hoped to be discussed.

      Thanks for your suggestion. We have discussed the functions of CDK family members in immunity (lines 363-387). The details are as follows: Recent studies have demonstrated that CDK activity is crucial for virus-induced innate immune responses. Reports indicate that CDKs are involved in the Toll-like receptor (TLR) signaling pathway, the nuclear factor-κB (NF-κB) signaling pathway, and the JAK-STAT signaling pathway. For instance, CDK8 and/or CDK19 enhanced the transcription of inflammatory genes, such as IL-8 and IL-10, in cells following TLR9 stimulation. CDKs and NF-κB establish a remarkable paradigm where CDKs can act directly on substrate proteins rather than depending solely on transcriptional control. It has been reported that CDK1 serves as a positive regulator of the IFN-I signaling pathway, facilitating STAT1 phosphorylation, which subsequently boosts the expression of ISGs. Furthermore, inhibiting CDK activity has been shown to obstruct STAT phosphorylation, proinflammatory gene activation, and ISG mRNA induction in response to SeV infection. It is important to note that no evidence suggests the involvement of CDKs in RLR signaling pathways. This study has shown that fish CDK2 functions as a negative regulator of the key kinase TBK1, which is involved in the RLR signaling pathway. A better understanding of the relationship between CDK2 and RLR signaling pathways will enhance our grasp of the regulatory mechanisms of CDKs in antiviral innate immunity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Amaral et al. presents a study investigating the mesoscale modelling and dynamics of bolalipids.

      Strengths:

      The figures in this paper are exceptional. Both those to outline and introduce the lipid types, but also the quality and resolution of the plots. The data held within also appears to be outstanding and of significant (hopefully) general interest.

      We thank the reviewer for their kind words and the appreciation of our work.

      Weaknesses:

      In the introduction, I would like to have read more specifics on the biological role of bolalipids. Archaea are mentioned, but this kingdom is huge - there must be specific species that can be discussed where bolalipids are integral to archaeal life. The authors should go beyond ’extremophiles’. In short, they should unpack why the general audience should be interested in these lipids, within a subset of organisms that are often forgotten about.

      Following the reviewer’s advice we have revised the introduction of the manuscript, in which we now discuss specific species (Sulfolobus acidocaldarius and Thermococcus kodakarensis) and how in these species bolalipids are integral to archaeal life. We explain that the ratio between bilayer and bolalipids, and the number of cyclopentane rings contained within bolalipids can change to adapt to the environment. The revised parts of the introduction read (p.1 ):

      “Like for bacteria and eukaryotes, archaea must keep their lipid membranes in a fluid state (homeoviscous adaptation). This is important even under extreme environmental conditions, such as hot and cold temperatures, or high and low pH values [7]. Because of this, many archaea adapt to changes in their environment by tuning the lipid composition of their membranes: altering the ratio between bola- and bilayer lipids in their membranes [8, 9] and/or by changing the number of cyclopentane rings in their lipid tails, which are believed to make lipid molecules more rigid [5]. For example, Thermococcus kodakarensis increases its tetraether bolalipid ratio from around 50% to over 80% when the temperature of the environment increases from 60 to 85 C [10]. Along the same lines, the cell membrane of Sulfolobus acidocaldarius, can contain over 90 % of bolalipids with up to 8 cyclopentane rings at 70 C and pH 2.5 [5, 11]. It is worth mentioning that in exceptional cases bacteria also synthesise bolalipids in response to high temperatures [12], highlighting that the study of bolalipid membranes is relevant not only for archaeal biology but also from a general membrane biophysics perspective.”

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to understand the biophysical properties of archeal membranes made of bolalipids. Bacterial and eukaryotic membranes are made of lipids that self-assemble into bilayers. Archea, instead, use bolalipids, lipids that have two headgroups and can span the entire bilayer. The authors wanted to determine if the unique characteristics of archaea, which are often extremophiles, are in part due to the fact that their membranes contain bolalipids.

      The authors develop a minimal computational model to compare the biophysics of bilayers made of lipids, bolalipids, and mixtures of the two. Their model enables them to determine essential parameters such as bilayer phase diagrams, mechanical moduli, and the bilayer behaviour upon cargo inclusion and remodelling.

      The author demonstrates that bolalipid bilayers behave as binary mixtures, containing bolalipids organized either in a straight conformation, spanning the entire bilayer, or in a u-shaped one, confined to a single leaflet. This dynamic mixture allows bolalipid bilayers to be very sturdy but also provides remodelling. However, remodelling is energetically more expensive than with standard lipids. The authors speculate that this might be why lipids were more abundant in the evolutionary process. Strengths:

      This is a wonderful paper, a very fine piece of scholarship. It is interesting from the point of view of biology, biophysics, and material science. The authors mastered the modelling and analysis of these complex systems. The evidence for their findings is really strong and complete. The paper is written superbly, the language is precise and the reading experience is very pleasant. The plots are very well-thought-out.

      Weaknesses:

      I would not talk about weaknesses, because this is really a nice paper. If I really had to find one, I would have liked to see some clear predictions of the model expressed in such a way that experimentalists could design validation experiments.

      We thank the reviewer for their very kind assessment. We incorporated their recommendations regarding experimental validation in the discussion section, as follows (p.14):

      “Our model makes a number of predictions that could be tested by experiment either in cells or in vitro. First, it predicts that a small increase in the fraction of archaeal bilayer lipids should be sufficient to soften a bolalipid-rich membrane. While this could be tested in the future, so far only very few studies have yet reported experimental analysis of archaeal membrane mixtures [18, 50]. Second, we observed that membranes with moderate bolalipid molecular rigidity k<sub>bola</sub> exhibit curvature-dependent bending rigidity. To experimentally verify this, one could extrude membrane tethers from cells while controlling for membrane tension. Finally, to get to the core mechanism underlying our findings, it will be important to develop experimental methods that will allow the fraction of U-shaped bolalipid conformers per leaflet to be imaged and measured.”

      Reviewer #3 (Public review):

      Summary:

      The authors have studied the mechanics of bolalipid and archaeal mixed-lipid membranes via comprehensive molecular dynamics simulations. The Cooke-Deserno 3-bead-per-lipid model is extended to bolalipids with 6 beads. Phase diagrams, bending rigidity, mechanical stability of curved membranes, and cargo uptake are studied. Effects such as the formation of U-shaped bolalipids, pore formation in highly curved regions, and changes in membrane rigidity are studied and discussed. The main aim has been to show how the mixture of bolalipids and regular bilayer lipids in archaeal membrane models enhances the fluidity and stability of these membranes.

      Strengths:

      The authors have presented a wide range of simulation results for different membrane conditions and conformations. For the most part, the analyses and their results are presented clearly and concisely. Figures, supplementary information, and movies very well present what has been studied. The manuscript is well-written and is easy to follow.

      We thank the reviewer for the detailed assessment of our work and their constructive feedback.

      Major issues

      R3.Q1: The Cooke-Deserno model, while very powerful for biophysical analysis of membranes at the mesoscale, is very much void of chemical information. It is parametrized such that it is good in producing fluid membranes and predicting values for bending rigidity, compressibility, and even thermalexpansioncoefficientfallingintheacceptedrangeofvaluesforbilayermembranes. But it still represents a generic membrane. Now, the authors have suggested a similar model for the archaeal bolalipids, which have chemically different lipids (the presence of cyclopentane rings for one), and there is no good justification for using the same pairwise interactions between their representative beads in the coarse-grained model. This does not necessarily diminish the worth of all the authors’ analyses. What is at risk here is the confusion between ”what we observe this model of bolalipidor mixed-membranes do” and ”how real bolalipid-containing archaeal membranes behave at these mechanical and thermal conditions.”.

      As the reviewer correctly notes, Cooke and Deserno used a minimal model, devoid of chemical detail, to represent fluid lipid membranes composed of bilayer lipids. Indeed archaeal lipids are chemically different compared to non-archaeal lipids, but just like non-archaeal lipids, they can be very different from one another. Given the chemical diversity of bolalipids between each other, instead of representing their complexity in a complicated model with many experimentally unconstrained parameters, we here defined a minimal model for bolalipids. The power of this minimal model is to represent the key physical/geometrical characteristics of archaeal membranes, namely the fact that lipid heads on two sides of the membrane are often connected, that bolalipids can exhibit a conformational change, and that bolalipids mix with some percentage of bilayer molecules. We then ask a general question: how do these unique geometrical characteristics of archaeal membranes influence their mechanics and reshaping? The reviewer is however right in pointing out that a model, regardless of its level of details (atomistic, coarse-grained, minimal), is still a model.

      Our approach of extending an established coarse-grained model for bilayer lipids to bolalipids is further supported by experimental observations, which report that archaeal bilayer lipids can form membranes of comparable bending rigidity to those of non-archaeal bilayer membranes [53]. Hence, different lipid linkages (archaeal vs. non-archaeal) give rise to fluid, deformable membranes of not too dissimilar rigidities, suggesting that both archaeal and non-archaeal bilayer lipids can be represented by a similar minimal coarse-grained model for the purpose of mesoscopic biophysical investigations. Since archaeal bolalipids have the same core chemical structure as two archaeal bilayer lipids joined by their tail ends, similarly we model a bolalipid by joining two bilayer lipids. Such an approach also efficiently enables us to compare bolalipid with bilayer membranes, and connect to the large body of knowledge on the physics of bilayer membranes.

      To conclude, our coarse-grained model is indeed intended to capture the main physical properties of bolalipid membranes, and not their chemical diversity.

      R3.Q2: Another more specific, major issue has to do with using the Hamm-Kozlov model for fitting the power spectrum of thermal undulations. The 1/q<sup>2</sup> term can very well be attributed to membrane tension. While a barostat is indeed used, have the authors made absolutely sure that the deviation from 1/q<sup>4</sup> behaviour does not correspond to lateral tension?

      To the casual observer, any 1/q<sup>2</sup> trend might point at membrane tension. However, the precise functional form is relevant as it determines whether the 1/q<sup>2</sup> dominates the 1/q<sup>4</sup> trend for small or large values of the wave number q in the fitted power spectrum.

      The first model (including lipid tilt) exhibits the functional form 1/(kq<sup>4</sup>) + 1/(kq<sup>2</sup>). In contrast, the second model (including membrane tension) exhibits the functional form 1/(kq<sup>4</sup> + ∑q<sup>2</sup>). Importantly, the two models obey a different functional form. Here k and k<sub>θ</sub>, are the bending and tilt moduli, which are assumed positive, and ∑ is the membrane tension, which can be either positive or negative. For the first model (with tilt), while for small q the amplitude is proportional to q<sup>-4</sup>, for large q the amplitude is proportional to q<sup>-2</sup>. In contrast, for the second model (with positive tension) while for small q the amplitude is proportional to q<sup>-2</sup>, for large q the amplitude is proportional to q<sup>-4</sup>. If membrane tension were to be negative in the second model, the slope would cross from negative infinity for small q to -4 for large q. The functional dependencies are summarized in Author response image 1A.

      For rigid bolalipid membranes, it is clearly visible that the slope of the power spectrum plotted against the wave number q decreases with increasing q (Author response image 1B). While the slope initially assumes a value close to 4, it gradually approaches 2 for larger values of q. We conclude that only the model including lipid tilt can fit the power spectrum of membrane fluctuations appropriately (solid-dashed line), whereas the model with tension fails to fit the data (dashed line). We note that the combined model containing both lipid tilt and membrane tension does not give a better fit (dotted line).

      To demonstrate that the tension model cannot fit the data, we included the best fits for both models for rigid bolalipid membranes in the new SI section 16 (p. S22) and show that only the tilt model leads to acceptable fits. We also measured the projected membrane tension - , where P<sub>x</sub>,P<sub>y</sub> are respectively the pressure in x and y direction and  L<sub>z</sub> is the dimension of the simulation box in z axis. We found the projected membrane tension to give a negligible value similarly to the one that we indirectly measured by fitting a combined model with both tension and tilt, further confirming our conjecture.

      Author response image 1.

      (A) Schematic showing the decay of the power spectrum as a function of the wave number q in the tilt model (top), in the tension model with positive membrane tension (middle), and in the tension model with negative membrane tension (bottom). (B) Fitted power spectrum as a function of q for rigid bolalipid membranes (k<sub>bola</sub>=5k<sub>B</sub>T). The fit shows that while the model with tension (dashed line) cannot fit the data, the model with tilt nicely fits the spectrum (solid-dashed line). The combined model including both tension and tilt does not fit the spectrum any better (dotted line).

      R3.Q3: I got more worried when I noticed in the SI that the simulations had been done with combined ”fix langevin” and ”fix nph” LAMMPS commands. This combination does not result in a proper isothermal-isobaric ensemble. The importance of tilt terms for bolalipids is indeed very interesting, but I believe more care is needed to establish that.

      In what follows, we show that there is no reason to worry. First of all we want to clarify that the physical setup we simulate is that of a membrane contained in a heat bath under negligible tension with correct diffusional dynamics. To achieve this physical setup, for which we use a Langevin thermostat combined with pressure control via an overdamped barostat, which we implement in LAMMPS by combining ”fix langevin” and ”fix nph”.

      In more detail: we simulated particles in an implicit solvent, for which we use a Langevin thermostat to get the right diffusional dynamics. To apply the theory of fitting fluctuation spectrums the simulation box length needs to be (near) constant. However, simulating membranes at a fixed box size results in an average non-zero membrane tension, making it hard to measure bending rigidity. The reason is that the effect of membrane tension is most influential on the largest wavelength modes, which are also most decisive when determining mechanical membrane properties like membrane rigidity. To minimize the effect of tension, we perform our simulation with an overdamped barostat (𝜏<sub>baro</sub> = 10 𝜏 <sub>langevin</sub>), which keeps the membrane near tensionless, as also done before [32]. In the revised manuscript, we have clarified the statement on the physical ensemble used (p.S2):

      “For simulating flat membrane patches of bolalipids, we combined the previously used Langevin thermostat with relaxation time of 1𝜏 with a Nosé–Hoover barostat with relaxation time of 10𝜏. In LAMMPS this amounts to combining the commands ’fix langevin’ with ’fix nph’. We configured the barostat to set lateral pressure P<sub>xy</sub> to zero by re-scaling the simulation box in the x-y plane. We compare this setup to a fixed box length setup, and an NPT ensemble setup, in SI section 17.”

      To connect our results with statistical mechanics ensemble theory we tested alternative setups. Similar setups, including the formal isothermal-isobaric ensemble, where N,P,T are kept constant using Nose-Hoover style equations for thermostating and barostating with modern corrections [34], which the reviewer refers to, result in very similar fluctuation spectrums. Consequently, our measurements of bending and tilt modulus hold true regardless of the integration scheme. However, such a setup does not correctly capture implicit solvent and diffusional dynamics.

      In even more detail: we tested our setup (implemented via ”fix langevin”+”fix nph”) versus a isothermal-isobaric ensemble (implemented via ”fix npt”). We measured volume mean and standard deviation, and found them matching for a reference LJ gas.

      To be completely sure, and to please the reviewer, we have performed additional verifications in the new SI section 17, which we summarize in the following. We simulated three representative membranes with different integration schemes: ”fix npt”, ”fix langevin”+”fix nph”, and ”fix langevin” (Langevin dynamics with projected area fixed at the average value obtained from a ”langevin+nph”). We checked that the ”fix nph” barostat is merely equilibrating the membrane to a tensionless configuration, after which the projected membrane area (A<sub>p</sub> = L<sub>x</sub>L<sub<y</sub>) is practically constant. Consequently, the different schemes resulted in minor changes in the longest wavelength modes that we tracked down to small changes in the negligible tension. The resulting measurements of bending modulus change by less than 10%, and our main text conclusions do not change. Author response image 2 compares the fluctuation spectrums for the different integration schemes.

      Author response image 2.

      Height fluctuation spectrum, for a bilayer membrane at T<sub>eff</sub> =1.1, simulated with Langevin dynamics (pink, ‘langevin‘), our setup (purple, ‘nph+langevin‘), and under an isothermal-isobaric ensemble (blue, ‘npt‘); fits are shown as dotted lines.

      R3.Q4: This issue is reinforced when considering Figure 3B. These results suggest that increasing the fraction of regular lipids increases the tilt modulus, with the maximum value achieved for a normal Cooke-Deserno bilayer void of bolalipids. But this is contradictory. For these bilayers, we don’t need the tilt modulus in the first place.

      We understand the concern why this might be counter-intuitive, and we thank the reviewer for pointing it out. We first want to stress that the tilt modulus can also be measured for bilayer membranes even if it is not needed to fit the fluctuation spectrum. If we measure the tilt modulus for a bilayer membrane, we obtain a value similar to the previously measured one [36]. Importantly, here we also report measurements for the tilt modulus for bolalipid membranes.

      To understand the seemingly contradictory behaviour of the tilt modulus, it is insightful to rewrite the expression for the fluctuation spectrum as done in Eq. (1):

      where is a characteristic length scale related to tilt, which we call the tilt persistence length. From the last equation it is easy to see that the tilt modulus 𝜅<sub>𝜃</sub> becomes relevant for the fluctuation spectrum if the tilt persistence length l<sub>𝜃</sub>  is not negligible. In other words, this means that we have to consider the tilt modulus 𝜅<sub>𝜃</sub> as relevant, if it is sufficiently small compared to the bending rigidity 𝜅.

      However, this is not only counter-intuitive, but also difficult to communicate graphically. Per the excellent reviewer’s suggestion, to make the interpretation more accessible, we converted in the main text and its figures the tilt modulus to the more directly interpretable tilt persistence length l<sub>𝜃</sub>, as this is small when tilt is irrelevant (for bilayer lipids and flexible bolalipids) and large otherwise (for rigid bolalipids). This includes changes to the main text on p.6 and p.8 , and to the insets in Figs. 2C and 3B. We note that for completeness we also report the tilt modulus 𝜅<sub>𝜃</sub>  in the SI.

      R3.Q5: Also, from the SI, I gathered that the authors have neglected the longest wavelength mode because it is not equilibrated. If this is indeed the case, it is a dangerous thing to do, because with a small membrane patch, this mode can very well change the general trend of the power spectrum. As a lot of other analyses in the manuscript rely on these measurements, I believe more elaboration is in order.

      We thank the reviewer for the careful examination of our supplementary material. For each fluctuation spectrum measurement, we ran multiple replicas. We observed that the largest wavelength modes were not fully equilibrated. In the simulations the first mode of the fluctuation spectrum is probed at different amplitudes and phases. We thus expected the potential systematic error would show up clearly when comparing spectrums of the different replicas. As we saw no correlation in these systematic offsets between replicas, we concluded that the simulations are sufficiently equilibrated and we could safely exclude the first mode of the fluctuation spectrum from our analysis.

      To show without doubt that this procedure does not randomly bias our results, we also ran simulations for three representative membranes until all modes were equilibrated. On the modes previously equilibrated, the resulting spectrums agree with our previous shorter simulations. On the largest wavelength modes that were previously not fully equilibrated, we noticed a small deviation from theory, specifically for flexible membranes (small bending modulus). These small deviations can be explained by including a negligible negative tension. Importantly, however, the resulting bending modulus σ stays nearly the same. We note that the small negative tension disappears when we halve the timestep (see Author response image 3). This verification is shown in SI section 17.

      R3.Q6: The authors have found that ”there is a strong dependency of the bending rigidity on the membrane mean curvature of stiffer bolalipids.” The effect is negative, with the membrane becoming less stiff at higher mean curvatures. Why is that? I would assume that with more flexible bolalipids, the possibility of reorganization into U-shaped chains should affect the bending rigidity more (as Figure 2E suggests). While for a stiff bolalipid, not much would change if you increase the mean curvature. This should be either a tilt effect, or have to do with asymmetry between the leaflets. But on the other hand, the tilt modulus is shown to decrease with increasing bolalipid rigidity. The authors get back to this issue only on page 10, when they consider U-shaped lipids in the inner and outer leaflets and write, ”this suggested that an additional membrane-curving mechanism must be involved.” But then again, in the Discussion, the authors write, ”It is striking that membranes made from stiffer bolalipids showed a curvature-dependent bending modulus, which is a clear signature that bolalipid membranes exhibit plastic behaviour during membrane reshaping,” adding to the confusion.

      Author response image 3.

      Height fluctuation spectrum, for a bilayer membrane at T<sub>eff</sub> =1.1, as simulated in the main text (grey, for 60⇥10<sup>3</sup>τ), for longer duration (1_.44⇥10<sup>6</sup>τ) (pink), and with the longer duration and halved timestep =0.005_τ(purple); fits are shown as dotted lines (tension and tilt) or dash-dot lines (tilt only).

      We thank the reviewer for asking this important question. Membrane bending rigidity in bolalipid membranes decreases dramatically once a small fraction of U-shapes is allowed to form, but then plateaus once this U-shape fraction reaches 20%. In a curved bolalipid membrane, U-shapes must accumulate in the outer leaflet to accommodate for area difference. Together, the bending rigidity non-linear dependence on U-shape fraction, and the promotion of U-shapes by curvature, explain why in a membrane made of moderately stiff bolalipids (k<sub>bola</sub> = 1k<sub>B</sub>T), which contain very few U-shapes in the flatstate, the bending rigidity of the membrane decreases as curvature increases. While in a membrane made of flexible bolalipid molecules (k<sub>bola</sub> = 0), where many U-shapes are present in the flat membrane, the bending rigidity does not change with curvature.

      Bending rigidity 𝜅 in flat membranes composed of bolalipids decreases dramatically once a small fraction of U-shapes is allowed to form, but plateaus once more than 20% of U-shaped bolalipids are present. In details, our data shows that with an increasing bolalipid molecular rigidity k<sub>bola</sub>, both the number of U-shaped bolalipids decreases (Fig. 2B) and the membrane rigidity 𝜅 increases (Fig. 2C). Thus, the correlation suggests that U-shaped bolalipids soften the membrane, in a non-linear way where most of the change in membrane bending rigidity happens for U-shaped bolalipid fraction < 20% (Figure S11).

      Separately, membrane curvature affects the area difference between curved membrane leaflets and thus drives U-shape accumulation. To be specific, a cylindrical membrane with area A, mean curvature H and thickness h has the outer leaflet with area A(1 + Hh) and the inner leaflet with smaller area A(1 Hh). This can be large, in our simulations up to an area change of Hh \= 25%. For pure bolalipid membranes, straight bolalipids occupy the same space in each leaflet. Area difference can then be achieved only by having a different amount of U-shaped bolalipids in each leaflet, which can result in a different U-shape fraction between leaflets and thus ’asymmetry between leaflets’. Figure S10 confirms U-shape head fraction asymmetry that increases with curvature, for both flexible (k<sub>bola</sub> = 0) and moderately stiff bolalipids (k<sub>bola</sub> = 1k<sub>B</sub>T).

      Together, these two effects result in membrane softening under curvature for the moderately stiff bolalipids, but constant rigidity for flexible bolalipids (Fig. 2F). In details: for membranes composed of moderately stiff bolalipid molecules (k<sub>bola</sub> = 1k<sub>B</sub>T), the U-shape bolalipid head fraction only increases in the outer leaflet, goingfrom10to20%(Figure S10). This is in the high sensitivity region where the bending rigidity is expected to change the most (Figure S11). We hypothesize that the molecular rigidity of a U-shaped bolalipid creates compression on the outer leaflet that stabilizes the membrane curvature and thus causes membrane softening. We suspect that for membranes composed of rigid bolalipids (k<sub></sub> > 1k<sub>B</sub>T), the effect is likely not present due to the absence of U-shape formation even under strong bending.

      By contrast, for membranes composed of flexible bolalipids (k<sub></sub> = 0), the U-shaped bolalipid head fraction changes relatively little from its value for flat membranes (from 50% to respectively 60 and 40% for the outer and inner leaflet, Figure S10). This is in the region where the membrane bending rigidity is expected to respond weakly to U-shape fraction (Figure S11). Additionally, the change is symmetric, so presumably the outer leaflet becomes softer as the inner leaflet becomes stiffer, thus creating opposing effects and only weakly affecting the membrane bending rigidity as a whole. We note that the distinction between the U-shape head fraction that we plot (Figure S10) and U-shape fraction (Figure S11) matters little for this analysis.

      We have added this deduction and its plots to SI section 8, and revised the corresponding statement in the main text accordingly (p.7 ).

      “Changing membrane curvature alters the area differently in the two membrane leaflets. To adapt to the area difference, we thus expect the fraction of U-shaped bolalipids to change as the membrane curvature changes. Moreover, the results of Fig. 2B and Fig. 2C showed that the U-shaped bolalipid fraction and the membrane bending rigidity are correlated. As a result, we predict that the fraction of straight versus U-shaped bolalipids in a membrane will change in response to membrane bending, in a way that makes the bending rigidity of a bolalipid membrane curvature dependent.”

      R3.Q7: This issue is repeated when the authors study nanoparticle uptake. They write: ”to reconcile these seemingly conflicting observations we reason that the bending rigidity, similar to Figure 2F, is not constant but softens upon increasing membrane curvature, due to dynamic change in the ratio between bolalipids in straight and U-shaped conformation. Hence, bolalipid membranes show stroking plastic behaviour as they soften during reshaping.” But the softening effect that they refer to, as shown in Figure 4B, occurs for very stiff bolalipids, for which not much switching to U-shaped conformation should occur.

      We thank the reviewer for locating a particularly dense sentence. We changed the text to explicitly refer to the range k<sub></sub> 2 [0,2] k<sub>B</sub>T for which there is significant change in U-shape fraction (p.8 ):

      “To reconcile these seemingly conflicting observations we reason that the bending rigidity κ, similar to Fig. 2F, is not constant but softens in the range k<sub></sub> 2 [0,2] k<sub>B</sub>T, upon increasing membrane curvature. This is due to the dynamic change in the ratio between bolalipids in straight and U-shaped conformation.”

      As for Fig. 4B, for k<sub></sub> > 2k<sub>B</sub>T, pores form thus explaining the plateau in adsorption energy.

      R3.Q8: Another major issue is with what the authors refer to as the ”effective temperature”. While plotting phase diagrams for kT/eps value is absolutely valid, I’m not a fan of calling this effective temperature. It is a dimensionless quantity that scales linearly with temperature, but is not a temperature. It is usually called a ”reduced temperature”. Then the authors refer to their findings as studying the stability of archaeal membranes at high temperatures. I have to disagree because eps is not the only potential parameter in the simulations (there are at least space exclusion and angle-bending stiffnesses) so one cannot identify changing eps with changing the global simulation temperature. This only works when you have one potential parameter, like an LJ gas.

      We indeed thought about this before and found that it makes little difference in our set-up. To thoroughly show that the distinction matters very little, per reviewer’s question, we computed our phase diagrams by scaling temperature T explicitly (and not lipid tail interactions T<sub>eff</sub> = k<sub>B</sub>T /ϵ<sub>p</sub>). We added these results to the SI section 14 and found no significant difference when comparing scaling tail interactions (Figure S15A) with scaling temperature explicitly (Figure S15B).

      We also computed Fig. 2A-C for scaling interactions (Figure S17A) and scaling temperature explicitly (Figure S17B). We found a slightly increased U-shaped bolalipid fraction for low k<sub></sub> when comparing scaling interactions (Figure S17A) with temperature scaling (Figure S17B). The reason is that the U-shaped fraction depends on temperature, as with higher temperature bolalipids can easier transition into the U-shape. Most importantly, however, we found no qualitative changes on the liquid region or the mechanical membrane properties when we compared the different scaling variants.

      The reason why both scaling variants match so well can be understood easily. All pair potentials, including volume exclusion interactions between head beads and other membrane beads, were also scaled in the same manner as tail-to-tail interactions, as described in the SI. In contrast, the energy scales for maintaining the lipid bonds, the bilayer lipid angles and the bolalipid angles are relatively large compared to the energy scales involved in tail-to-tail interactions. This separation of energy scales guarantees that there will be little effect when increasing global temperature. Regarding nomenclature, we take the reviewer’s advice and have added ’reduced temperature’ as an alias for T<sub>eff</sub> in the main text.

      In the revised version of the manuscript, we mention these observations in the SI section 14 and point towards these results in the main text (p.4 ):

      “This interaction strength governs the membrane phase behaviour and can be interpreted as the effective temperature or reduced temperature T<sub>eff</sub> = k<sub>B</sub>T /ϵ<sub>p</sub>. As the distinction between scaling interactions (T<sub>eff</sub>) or temperature (T) is not important for our analysis (see Supplemental Information (SI) section 14), for simplicity we refer to T<sub>eff</sub> as temperature in the following.”

      Minor issues

      R3.Q9: As the authors have noted, the fact that the membrane curvature can change the ratio of U-shaped to straight bolalipids would render the curvature elasticity non-linear (though the term ”plastic” should not be used, as this is still structurally reversible when the stress is removed. Technically, it is hypoelastic behaviour, possibly with hysteresis.) With this in mind, when the authors use essentially linear elastic models for fluctuation analysis, they should make a comparison of maximum curvatures occurring in simulations with a range that causes significant changes in bolalipid conformational ratios.

      We thank the reviewer for their suggestion on calling the non-linear behaviour of the curvature elasticity hypoelastic. We have edited the main text accordingly (p.8 ):

      “In an elastic material, the strain modulus holds constant and deformation is reversible. For bolalipid membranes at k<sub></sub> = 1k<sub>B</sub>T, however, the bending modulus decreases when deformation increases, rendering bolalipid membranes hypoelastic.”

      Moreover, regarding the maximum curvatures occurring in the fluctuation simulations: We first note that the ensemble average of the mean curvature H from the fluctuation measurements is indicated as a vertical line in Fig. 2F. As the average value is nearly zero, the membrane can be considered as flat in good approximation. To investigate the question in more detail, we extended the SI with a careful analysis of the validity of the maximum membrane curvature and the validity of the Monge gauge approximation (SI section 15).

      In short, we found that the involved membrane curvatures are small and therefore are unlikely to trigger any significant changes of the bending modulus. Moreover, since we are dealing with two bolalipid conformations, we also tested the homogeneity of the membrane. In our simulations of flat membrane patches we did not observe clustering or phase separation between the two bolalipid conformations beyond the [2,3]σ range. Furthermore, we get good agreement between our fluctuation measurement and the cylinder simulations in Fig. 2F. We now mention this verification in the revised version of the manuscript (p.8 ):

      “Fortunately, this dependency on curvature does not invalidate our fluctuation results, where the curvature is small enough that its effect on the bending modulus is negligible (SI section 15).”

      Last but least, simulating bending/unbending cycles of an arc-shaped membrane (frozen endpoints) shows agreement with cylinder membrane simulations, and no hysteresis at the rates of deformation employed (cf. M. Amaral’s thesis [54], soon to be out of the embargo period).

      R3.Q10: The Introduction section of the manuscript is written with a biochemical approach, with very minor attention to the simulation works on this system. Some molecular dynamics works are only cited as existing previous work, without mentioning what has already been studied in archaeal membranes. While some information, like the binding of ESCRT proteins to archaeal membranes, though interesting, helps little to place the study within the discipline. The Introduction should be revised to show what has already been studied with simulations (as the authors mention in the Discussion) and how the presented research complements it.

      The present research for the first time covers archaeal membranes with a single coarse-grained model capable of assuming both bolalipid in-membrane conformations and sweeps through temperature, membrane composition, and molecular rigidity. The work shows the first curvature dependent bending modulus for pure bolalipid membranes. It also investigates systematically bending modulus and Gaussian modulus, and tests the model in an all-encompassing budding simulation that incorporates topology changes. Existing atomistic or coarse-grained MD simulations (MARTINI or similar force fields) are limited to small patches of membrane, with no study of large-scale deformations or topology changes; plus, they rely on force fields that were parametrized for bilayer membranes.

      To give a comprehensive overview of the field, we revised the introduction section of the manuscript, in which we now discuss previous computational work investigating membrane diffusivity, U-shaped lipid fraction, and bending rigidity (p.3 ):

      “By contrast, only a few studies have investigated bolalipid membranes applying computational or theoretical tools [24, 25]. Specifically, the pore closure time in bolalipid membranes, and the role of cyclopentane rings for membrane properties has been investigated using all-atom simulations, showing decreased lateral mobility, reduced permeability to water, and increased lipid packing [26–28]. Moreover, using coarse-grained simulations, it was suggested that bolalipid membranes are thicker [29], exhibit a gel-to-liquid phase transition at higher temperature [30], and exhibit a reduced diffusivity [31]. However, little research has been devoted to investigating mechanics and reshaping of bolalipid membranes at the mesoscale despite the obvious importance of this question from evolutionary, biophysics, and biotechnological perspectives and although different membrane physics is expected to manifest.”

      Following the reviewer’s advice and to keep the introduction concise and focused on bolalipid membranes, we have removed the paragraph on ESCRT-III proteins in the revised manuscript.

      R3.Q11: The authors have been a bit loose with using the term ”stability”. I’d like to see the distinction in each case, as in ”chemical/thermal/mechanical/conformational stability”.

      We have clarified when applicable the type of stability throughout the manuscript. In all other instances, if not clear from context, we mean simply that the membrane persists being a membrane. At our coarse-grained level, this means the membrane does not disassemble into a gas phase.

      R3.Q12: In the original Cooke-Deserno model, a so-called ”poorman’s angle-bending term” is used, which is essentially a bond-stretching term between the first and third particle. However, I notice the authors using the full harmonic angle-bending potential. This should be mentioned.

      This is made clear in the SI (Eq. (S3)). Cooke and Deserno mention the harmonic angle potential as a valid alternative in their original publication. We now also added this detail to the main text (p.3 ):

      “The angle formed by the chain of three beads is kept near 180° via an angular potential with strength k<sub>0</sub>, instead of the approximation by a bond between end beads of the original model [32].”

      R3.Q13: The analysis of energy of U-shaped lipids with the linear model E \= c<sub>0</sub> + c<sub>1</sub>k<sub></sub> is indeed very interesting. I am curious, can this also be corroborated with mean energy measurements? The minor issue is calling the source of the favorability of U-shaped lipids ”entropic”, while clearly an energetic contribution is found. The two conformations, for example, might differ in the interactions with the neighbouring lipids.

      We were also curious and thank the reviewer for the suggestion of mean energy measurements. We concluded that there must be either an entropic contribution to the free energy or an intermolecular interaction energy favouring U-shaped bolalipids. We have now included these measurements in SI section 6 (p.S5 ):

      “By splitting the average potential energy between an internal contribution (bonds, angles and pair interactions between particles in the same molecule) and an external contribution (pair interactions between a molecule and its neighbours), we determined the transition energy from straight to U-shaped bolalipids in detail. We found that this transition lowers the internal potential energy of the bolalipid while increasing its interaction energy. In total, we obtained an energy barrier for the transition of ΔE<sub>s→u</sub> = 0.79±0.01k<sub>B</sub>T. Since the fit indicates, however, that the U-shaped bolalipid conformation is preferred over the straight conformation, we conclude that there must be either an entropic contribution to the free energy or an intermolecular interaction energy favouring U-shaped bolalipids.”

      We refer to these measurements in the main text (p.6 ):

      “For the fit it appears that c<sub>0</sub> < 0, which implies that bolalipids in U-shape conformation are slightly favoured over straight bolalipids at k<sub></sub> = 0 (explored in SI section 6).”

      R3.Q14: The authors write in the Discussion, ”In any case, our results indicate that membrane remodelling, such as membrane fission during membrane traffic, is much more difficult in bolalipid membranes [34].” Firstly, I’m not sure if studying the dependence of budding behaviour on adhesion energy with nanoparticles is enough to make claims about membrane fission. Secondly, why is the 2015 paper by Markus Deserno cited here?

      We thank the reviewer for giving us the opportunity to clarify. We make an energetic argument on membrane fission based on the observed difference in the ratio of .

      Splitting a spherical membrane vesicle into two spherical vesicles (fission) increases the bending energy by 8𝜋𝜅 and decreases the energy related to the Gaussian bending modulus by . The second part of the argument is given for example in the review by Markus Deserno (p.23, right column), that’s why we cite the paper here. Together, this gives an energy barrier, required for membrane fission in the considered geometry of ∆E<sub>fission</sub> = . We found that is around 0.5 for bolalipid membranes and around 1 for bilayer membranes. Since 𝜅 was typically larger in bolalipid membranes we thus expect the energy barrier for fission ∆E<sub>fission</sub> to be larger for bolalipid membranes. We therefore predict that membrane remodelling, such as membrane fission during membrane trafficking, is harder in bolalipid membranes. We explain our reasoning in the discussion of the revised manuscript (p.13 ):

      “Membrane remodelling, such as the fission of one spherical vesicle into two, increases the bending energy by 8πκ but decreases the energy related to the Gaussian modulus by – [39], giving rise to a fission energy barrier of ∆E<sub>fission</sub> = . Our results indicated that while in bolalipid membranes 𝜅 is larger, is smaller compared to bilayer membranes. Our results thus predict a larger energy barrier for membrane fission ∆E<sub>fission</sub> in bolalipid membranes compared to bilayer membranes.”

      R3.Q15: In the SI, where the measurement of the diffusion coefficient is discussed, the expression for D is missing the power 2 of displacement.

      We thank the reviewer for spotting this oversight. We corrected it in the revised version of the SI (p.S5 ).

      R3.Q16: Where cargo uptake is discussed, the term ”adsorption energy” is used. I think the more appropriate term would be ”adhesion energy”.

      For the sake of simplicity, we changed the term to adhesion energy (caption of Fig. 4, and p.10). We do not have a strong opinion on this, but we believe that adsorption energy would be equally correct as we describe the adsorption of many lipid head beads to a nanoparticle.

      R3.Q17: Typos:

      Page 1, paragraph 2: Adaption → Adaptation. Page 10, paragraph 1: Stroking → Striking.

      We thank the reviewer for spotting these typos which we have corrected in the revised version of the manuscript.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      A few thoughts (likely out of the scope of this paper but possibly to consider upon revision):

      R1.Q1: Do bolalipids always have the same headgroup? I don’t recall reading this in the introduction/discussion. R1 and R2 are in Figure 1, but I don’t know whether there are standard types. Could this be expanded upon? Is the model able to take these differences into account?

      We thank the reviewer for raising this important question. Similar to bacteria and eukaryotes, in archaea there is a huge variety in terms of the different head groups that lipids can contain and thus also lipid variety. Most archaeal lipids have head groups that contain either phosphate groups or sugar residues. Typically, archaeal bolalipids are asymmetric and contain a phosphatidyl and a sugar moiety at the two ends of the lipid molecule. Within the membrane the lipid is oriented such that the phosphatidyl moiety points towards the interior of the cell whereas the sugar moiety points towards the outside of the cell as it occupies more space [5].

      In our computational model, however, we consider symmetric bolalipids for the sake of simplicity and to decouple the role of ”connected geometry” from other effects. In principle, we could investigate the effect of lipid asymmetry by increasing the size of one of the lipid head beads. However, this investigation exceeds the scope of the present study and therefore requires future work.

      In the revised version of the manuscript, we now clarify that bolalipids can have different headgroups (p.1 and the caption of Fig. 1):

      “The hydrophilic heads can be composed of different functional groups with phosphatidyl and sugar being the most relevant moieties. For bolalipids the two head groups at either end of the molecule are typically distinct (Fig. 1A right) [5].”

      “The hydrophilic head of a bolalipid can be composed of different functional groups represented by R1 and R2 (right).”

      We also explicitly state that we neglect lipid head group asymmetry for the sake of simplicity (p.4 ):

      “To decouple the effect of the connected geometry of the bolalipids from that of lipid asymmetry, we assume both head beads of a bolalipid to share the same properties.”

      R1.Q2: Is it possible to compare the mesoscale models to either Coarse-grained or even all-atom lipid models? Have simulations previously been performed for bolalipids at those levels of description?

      A few studies have investigated bolalipids membranes in simulations previously. These studies either used all-atom or coarse-grained simulations. However, none of these studies investigated how bolalipids respond to membrane deformations. Therefore, it is currently not possible to directly compare our results to studies in the literature. However, to recapitulate our predictions experimentally is certainly something that could and should be done in the future. As a reply to this reviewer and reviewer 3, we discuss the current state of modelling bolalipid membranes in simulations in the revised version of the manuscript (p.3 ):

      “By contrast, only a few studies have investigated bolalipid membranes applying computational or theoretical tools [24, 25]. Specifically, the pore closure time in bolalipid membranes, and the role of cyclopentane rings for membrane properties has been investigated using all-atom simulations, showing decreased lateral mobility, reduced permeability to water, and increased lipid packing [26–28]. Moreover, using coarse-grained simulations, it was suggested that bolalipid membranes are thicker [29], exhibit a gel-to-liquid phase transition at higher temperature [30], and exhibit a reduced diffusivity [31]. However, little research has been devoted to investigating mechanics and reshaping of bolalipid membranes at the mesoscale despite the obvious importance of this question from evolutionary, biophysics, and biotechnological perspectives and although different membrane physics is expected to manifest.”

      We want to mention, however, that we do compare membrane diffusivity, U-shaped lipid fraction, and bending rigidity to the behaviour and values that have been previously measured in simulations in the discussion section. In general, we find good agreement between our results and previously reported behaviour/values (p.13 ):

      “While flexible bolalipid membranes are liquid under the same conditions as bilayer membranes, we found that stiff bolalipids form membranes that operate in the liquid regime at higher temperatures. These results agree well with previous molecular dynamics simulations that suggested that bolalipid membranes are more ordered and have a reduced diffusivity compared to bilayer membranes [24, 29]. In our simulations, this is due to the fact that completely flexible bolalipids molecules adopt both straight (transmembrane) as well as the U-shaped (loop) conformation with approximately the same frequency. In contrast, stiff bolalipids typically only take on the straight conformation when assembled in a membrane. These results agree with the previous coarse-grained molecular dynamics simulations using the MARTINI force field which showed that the ratio of straight to U-shaped bolalipids increased upon stiffening the linker between the lipid tails [29].

      [...]

      When we determined the bending rigidity of bolalipid membranes by measuring their response to thermal fluctuations, we found that membranes made from flexible bolalipids are only slightly more rigid than bilayer membranes. This result is consistent with previous atomistic simulations, which showed that the membrane rigidity was similar for membranes composed of bilayer lipids and flexible synthetic bolalipids [45].”

      R1.Q3: How would membrane proteins alter the behaviour of bolalipids? Either those integral to the membrane or those binding peripherally?

      The reviewer asks an important question. However, the question is difficult to answer due to its scope and the gaps in the current literature. Important examples of integral or peripheral membrane proteins that alter the behaviour of bolalipids and archaeal bolalipid membranes are involved in cell homeostasis, cell division, membrane trafficking, and lipid synthesis.

      The cells of many archaeal species are enclosed in a paracrystalline protein layer called the Slayer, which is attached to the lipid membrane [4, 55]. The main function of the S-layer is to keep the cell’s shape and to protect it against osmotic stress. Due to the embedding of the S-layer in the membrane at specific locations, it is to be expected that the membrane properties are influenced by the S-layer. Furthermore, archaea execute cell division by locally reshaping the membrane using FtsZ and ESCRT-III proteins [56]. While Asgard archaeal genomes encode proteins with homology to those regulating aspects of eukaryotic membrane remodelling and trafficking [57], they have yet to be observed undergoing a process like endocytosis [58]. In addition, it has been speculated that the proteins that drive the synthesis of two diether lipids into a tetraether lipid are either membrane associated or integral membrane proteins [59].

      However, to the best of our knowledge it is not known how membrane proteins specifically alter the behaviour of bolalipids. Future work will need to be executed to answer this question. Following the advice of reviewer 3 and to keep the introduction concise and focused on bolalipid membranes, we do not mention these observations in the revised manuscript.

      R1.Q4: Is there a mechanism in cells to convert or switch bolalipids from a straight to a u-shaped description? Does this happen spontaneously or are there enzymes responsible for this?

      We thank the reviewer for bringing up this important point. Despite the relevance of the question, little is currently known about the mechanism that make bolalipids transition between a straight and a U-shaped configuration mainly because there is to date no established experimental method.

      Besides our own results, most of what we know comes from coarse-grained molecular dynamics simulations, which showed that bolalipids can spontaneously transition between the straight and U-shaped configuration [29]. In addition, by using comparative genomic analysis, it has been predicted that many archaeal species contain flippases, i.e., membrane proteins that are able, upon the consumption of energy, to transfer (flipflop) bilayer lipids between the two membrane leaflets [43]. Moreover, it has been shown that Halobacterium salinarum (an archaeon with a bilayer lipid membrane) [44] contains scramblases, which are membrane proteins that passively transfer bilayer lipids from one membrane leaflet to the other. It is therefore tempting to speculate that similar proteins might exist for bolalipids which could facilitate the straight to U-shaped transition.

      In addition, it has been reported that vesicles composed of bolalipid membranes can undergo fusion with enveloped influenza viruses [17]. In this context, it has been suggested that the influenza fusion protein hemagglutinin may locally induce U-shaped bolalipids to facilitate membrane fusion. However, all these hints are by far no proof of a mechanism that can drive the straight to U-shaped bolalipid transition, and further work needs to be done to investigate this question in detail.

      In the revised version of the manuscript, we now discuss what is known about potential mechanisms to facilitate the straight to U-shaped transition in the discussion section (p.13 ):

      “While previous coarse-grained simulations predicted that bolalipids spontaneously transition between the straight and U-shaped conformations [29], how this happens in archaeal membranes and whether membrane proteins are involved in this conformational transition needs to be clarified in the future. Experimental studies suggest that archaeal membranes contain flippases and scramblases for the transitioning of bilayer lipids between membrane leaflets [43, 44], raising the possibility that similar proteins could also facilitate conformational transitions in bolalipids. In addition, it has been suggested that the viral fusion protein hemagglutinin could cause a transition from straight to U-shaped bolalipid conformation during the fusion of bolalipid vesicles with influenza viruses [17]. However, future investigation is required.”

      R1.Q5: Ideally, coordinates and any parameter files required to run the molecular simulations should be included for reproducibility.

      We absolutely share the reviewer’s concern with reproducibility and as such have included in the original submission as part of our data availability section a link to a code repository (available at: https://doi.org/10.5281/zenodo.13934991 [51]) that allows initializing and simulating flat membrane patches, with user control of the parameters explored in this paper (𝜔,T<sub>eff</sub>,k<sub>bola</sub>,f<sup>bi</sup>).

      Reviewer #2 (Recommendations for the authors):

      This is a great paper and I congratulate the authors for writing such a fine piece of scholarship. The only nitty-gritty feedback that I have is summarized in the following three points:

      R2.Q1: In the introduction the authors talk about archaea adapting their membrane to retain membrane fluidity. However, homeoviscous adaptation is also fundamental in bacteria and eukaryotes.

      The reviewer is correct, like archaea the membranes of bacteria and eukaryotes must balance between flexibility and stability. Moreover, the cell membranes in all 3 domains of life need to maintain membrane fluidity and provide mobility to the embedded lipids and membrane proteins (homeoviscous adaptation). The general idea is that these organisms change the ratio of different lipids to change membrane properties and thereby optimally adapt to their environments [10]. Importantly, however, there are differences of how homeoviscous adaptation is maintained across the different domains of life. As a reply to this reviewer and reviewer 3, we now discuss the underlying mechanisms in the revised parts of the introduction (p.1 ):

      “Like for bacteria and eukaryotes, archaea must keep their lipid membranes in a fluid state (homeoviscous adaptation). This is important even under extreme environmental conditions, such as hot and cold temperatures, or high and low pH values [7]. Because of this, many archaea adapt to changes in their environment by tuning the lipid composition of their membranes: altering the ratio between bola- and bilayer lipids in their membranes [8, 9] and/or by changing the number of cyclopentane rings in their lipid tails, which are believed to make lipid molecules more rigid [5]. For example, Thermococcus kodakarensis increases its tetraether bolalipid ratio from around 50% to over 80% when the temperature of the environment increases from 60 to 85 C [10]. Along the same lines, the cell membrane of Sulfolobus acidocaldarius, can contain over 90 % of bolalipids with up to 8 cyclopentane rings at 70 C and pH 2.5 [5, 11]. It is worth mentioning that in exceptional cases bacteria also synthesise bolalipids in response to high temperatures [12], highlighting that the study of bolalipid membranes is relevant not only for archaeal biology but also from a general membrane biophysics perspective.”

      R2.Q2: Uncertainties in Gaussian rigidity modulus estimates are not properly reported.

      The large uncertainties in the Gaussian rigidity modulus were due to the fact how they were calculated. In short, is determined in cap folding simulations [41] (SI section 9), by using the measured values of the dimensionless parameter 𝜉, related to the folding probability, the bending modulus 𝜅, the membrane line tension , and the cap radius R. In our case, the main source of uncertainty for determining comes from the uncertainty in the measurement of the bending rigidity 𝜅. To obtain 𝜅, previously, we fitted fluctuation spectra for different seeds and only then averaged the obtained values. In the revised version of the manuscript, we now first pool the fluctuation spectra of the different simulation seeds before we fit all spectra at the same time. This new approach results in smaller uncertainties for the bending rigidity 𝜅 and also the Gaussian rigidity modulus .

      As a consistency check, in addition to the simulations that we previously performed at T<sub>eff</sub> = 1.3, we have repeated the cap folding and line tension simulations at T<sub>eff</sub> = 1.2, resulting in similar values for . In the revised version of the manuscript, we report the newly calculated values and uncertainties for at T<sub>eff</sub>  = 1.2 in the main text (p.8 ):

      “At T<sub>eff</sub>  = 1.2, we obtained = 4.30±0.22kBT and thus a ratio of = 0.89±0.04 for bilayer membranes, similar to what has been reported previously [41]. For flexible bolalipid membranes, we got a slightly smaller value for = 5.04 ± 0.37kBT. Due to the larger bending modulus, however, flexible bolalipid membranes show a significantly smaller ratio = 0.64± 0.04 (k<sub></sub> = 0). At larger temperature (Teff = 1.3), the ratio can be even smaller = 0.45 ± 0.07 (see SI section 9).”

      In addition, we report the values at T<sub>eff</sub> = 1.3 and T<sub>eff</sub> = 1.2 in the SI (p.S15 , Tabl. S4):

      We have also adapted the discussion of the Gaussian bending modulus accordingly (p.13 ):

      “Another marked difference between bilayer and flexible bolalipid membranes is the ratio of the Gaussian rigidity to the bending modulus. Instead of being around 1 as for bilayer membranes [41], it is around 1/2 and therefore only half of that of bilayer lipids.”

      Reviewer #3 (Recommendations for the authors):

      While I think the bulk of the work presented is useful, some of the issues that I raised in my review are indeed major. Without properly addressing them, it is hard to accept the conclusions of the manuscript. I hope the authors can address them by revising their analysis.

      We thank the reviewer for their constructive feedback, which helped us to improve the manuscript. We have addressed all points raised by the reviewer in our detailed point-by-point response to the reviewer (see above). We hope the reviewer will now find it easier to accept our conclusions.

      (1) R. Phillips, J. Kondev, J. Theriot, and H. Garcia, Physical biology of the cell (Garland Science, New York, 2012).

      (2) H. T. McMahon and J. L. Gallop, Membrane curvature and mechanisms of dynamic cell membrane remodelling, Nature 438, 590 (2005).

      (3) S. B. Gould, Membranes and evolution, Curr. Biol. 28, R381 (2018).

      (4) S.-V. Albers and B. H. Meyer, The archaeal cell envelope, Nat. Rev. Microbiol. 9, 414 (2011).

      (5) P. M. Oger and A. Cario, Adaptation of the membrane in Archaea, Biophys. Chem. 183, 42 (2013).

      (6) K. Rastädter, D. J. Wurm, O. Spadiut, and J. Quehenberger, The Cell Membrane of Sulfolobus spp.—Homeoviscous Adaption and Biotechnological Applications, International Journal of Molecular Sciences 21, 3935 (2020).

      (7) P. L.-G. Chong, Archaebacterial bipolar tetraether lipids: Physico-chemical and membrane properties, Chem. Phys. Lipids 163, 253 (2010).

      (8) M. Tourte, P. Schaeffer, V. Grossi, and P. M. Oger, Functionalized Membrane Domains: An Ancestral Feature of Archaea?, Front. Microbiol. 11, 526 (2020).

      (9) Y. H. Kim, G. Leriche, K. Diraviyam, T. Koyanagi, K. Gao, D. Onofrei, J. Patterson, A. Guha, N. Gianneschi, G. P. Holland, M. K. Gilson, M. Mayer, D. Sept, and J. Yang, Entropic effects enable life at extreme temperatures, Sci. Adv. 5, eaaw4783 (2019).

      (10) M. F. Siliakus, J. van der Oost, and S. W. M. Kengen, Adaptations of archaeal and bacterial membranes to variations in temperature, pH and pressure, Extremophiles 21, 651 (2017).

      (11) D. W. Grogan, Phenotypic characterization of the archaebacterial genus sulfolobus: comparison of five wild-type strains, J. Bacteriol. 171, 6710 (1989).

      (12) D. X. Sahonero-Canavesi, M. F. Siliakus, A. Abdala Asbun, M. Koenen, F. von Meijenfeldt, S. Boeren, N. J. Bale, J. C. Engelman, K. Fiege, L. Strack van Schijndel, J. S. Sinninghe Damsté, and L. Villanueva, Disentangling the lipid divide: Identification of key enzymes for the biosynthesis of membrane-spanning and ether lipids in Bacteria, Sci. Adv. 8, eabq8652 (2022).

      (13) M. van Wolferen, A. A. Pulschen, B. Baum, S. Gribaldo, and S.-V. Albers, The cell biology of archaea, Nat. Microbiol. 10.1038/s41564-022-01215-8 (2022).

      (14) U. Bakowsky, U. Rothe, E. Antonopoulos, T. Martini, L. Henkel, and H.-J. Freisleben, Monomolecular organization of the main tetraether lipid from Thermoplasma acidophilum at the water–air interface, Chem. Phys. Lipids 105, 31 (2000).

      (15) C. Jeworrek, F. Evers, M. Erlkamp, S. Grobelny, M. Tolan, P. L.-G. Chong, and R. Winter, Structure and Phase Behavior of Archaeal Lipid Monolayers, Langmuir 27, 13113 (2011).

      (16) D. P. Brownholland, G. S. Longo, A. V. Struts, M. J. Justice, I. Szleifer, H. I. Petrache, M. F. Brown, and D. H. Thompson, Phase Separation in Binary Mixtures of Bipolar and Monopolar Lipid Dispersions Revealed by 2H NMR Spectroscopy, Small Angle X-Ray Scattering, and Molecular Theory, Biophysical Journal 97, 2700 (2009).

      (17) A. Bhattacharya, I. D. Falk, F. R. Moss, T. M. Weiss, K. N. Tran, N. Z. Burns, and S. G. Boxer, Structure–function relationships in pure archaeal bipolar tetraether lipids, Chem. Sci. 15, 14273 (2024).

      (18) V. Vitkova, D. Mitkova, V. Yordanova, P. Pohl, U. Bakowsky, G. Staneva, and O. Batishchev, Elasticity and phase behaviour of biomimetic membrane systems containing tetraether archaeal lipids, Colloids Surf. A Physicochem. Eng. Asp. 601, 124974 (2020).

      (19) E. Chang, Unusual thermal stability of liposomes made from bipolar tetraether lipids, Biochem. Biophys. Res. Commun. 202, 673 (1994).

      (20) O. V. Batishchev, A. S. Alekseeva, D. S. Tretiakova, T. R. Galimzyanov, A. Y. Chernyadyev, N. R. Onishchenko, P. E. Volynsky, and I. A. Boldyrev, Cyclopentane rings in hydrophobic chains of a phospholipid enhance the bilayer stability to electric breakdown, Soft Matter 16, 3216 (2020).

      (21) U. Seifert, Configurations of fluid membranes and vesicles, Adv. Phys. 46, 13 (1997).

      (22) H. Noguchi, Membrane Simulation Models from Nanometer to Micrometer Scale, J. Phys. Soc. Jpn. 78, 041007 (2009).

      (23) F. Frey and T. Idema, More than just a barrier: using physical models to couple membrane shape to cell function, Soft Matter 17, 3533 (2021).

      (24) C. Huguet, S. Fietz, A. Rosell-Melé, X. Daura, and L. Costenaro, Molecular dynamics simulation study of the effect of glycerol dialkyl glycerol tetraether hydroxylation on membrane thermostability, Biochimica et Biophysica Acta (BBA) - Biomembranes 1859, 966 (2017).

      (25) T. R. Galimzyanov, P. I. Kuzmin, P. Pohl, and S. A. Akimov, Elastic deformations of bolalipid membranes, Soft Matter 12, 2357 (2016).

      (26) T. R. Galimzyanov, P. E. Volynsky, and O. V. Batishchev, Continuum elasticity and molecular dynamics of a pore in archaeal bolalipid membranes, Soft Matter 21, 687 (2025).

      (27) A. O. Chugunov, P. E. Volynsky, N. A. Krylov, I. A. Boldyrev, and R. G. Efremov, Liquid but Durable: Molecular Dynamics Simulations Explain the Unique Properties of Archaeal-Like Membranes, Sci. Rep. 4, 7462 (2015).

      (28) L. F. Pineda De Castro, M. Dopson, and R. Friedman, Biological Membranes in Extreme Conditions: Simulations of Anionic Archaeal, PLoS One 11, e0155287 (2016).

      (29) M. Bulacu, X. Périole, and S. J. Marrink, In Silico Design of Robust Bolalipid Membranes, Biomacromolecules 13, 196 (2012).

      (30) C. H. Davis, H. Nie, and N. V. Dokholyan, Insights into thermophilic archaebacterial membrane stability from simplified models of lipid membranes, Phys. Rev. E 75, 051921 (2007).

      (31) S. Dey and J. Saha, Minimal Coarse-Grained Modeling toward Implicit Solvent Simulation of Generic Bolaamphiphiles, J. Phys. Chem. B 124, 2938 (2020).

      (32) I. R. Cooke and M. Deserno, Solvent-free model for self-assembling fluid bilayer membranes: Stabilization of the fluid phase based on broad attractive tail potentials, J. Chem. Phys. 123, 224710 (2005).

      (33) P. L.-G. Chong, U. Ayesa, V. Prakash Daswani, and E. C. Hur, On Physical Properties of Tetraether Lipid Membranes: Effects of Cyclopentane Rings, Archaea 2012, 1 (2012).

      (34) A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton, LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales, Comput. Phys. Commun. 271, 108171 (2022).

      (35) A. Stukowski, Visualization and analysis of atomistic simulation data with ovito–the open visualization tool, Modelling and Simulation in Materials Science and Engineering 18, 015012 (2009).

      (36) E. R. May, A. Narang, and D. I. Kopelevich, Role of molecular tilt in thermal fluctuations of lipid membranes, Physical Review E 76, 021913 (2007).

      (37) W. Helfrich, Elastic Properties of Lipid Bilayers: Theory and Possible Experiments, Z. Naturforsch. C 28, 693 (1973).

      (38) M. Hamm and M. Kozlov, Elastic energy of tilt and bending of fluid membranes, Eur. Phys. J. E 3, 323 (2000).

      (39) M. Deserno, Fluid lipid membranes: From differential geometry to curvature stresses, Chemistry and Physics of Lipids 185, 11 (2015).

      (40) V. A. Harmandaris and M. Deserno, A novel method for measuring the bending rigidity of model lipid membranes by simulating tethers, The Journal of Chemical Physics 125, 204905 (2006).

      (41) M. Hu, J. J. Briguglio, and M. Deserno, Determining the Gaussian Curvature Modulus of Lipid Membranes in Simulations, Biophys. J. 102, 1403 (2012).

      (42) M. Deserno, Elastic deformation of a fluid membrane upon colloid binding, Phys. Rev. E 69, 031903 (2004), arXiv: cond-mat/0303656.

      (43) K. S. Makarova, M. Y. Galperin, and E. V. Koonin, Comparative genomic analysis of evolutionarily conserved but functionally uncharacterized membrane proteins in archaea: Prediction of novel components of secretion, membrane remodeling and glycosylation systems, Biochimie 118, 302 (2015).

      (44) A. Verchère, W.-L. Ou, B. Ploier, T. Morizumi, M. A. Goren, P. Bütikofer, O. P. Ernst, G. Khelashvili, and A. K. Menon, Light-independent phospholipid scramblase activity of bacteriorhodopsin from Halobacterium salinarum, Sci. Rep. 7, 9522 (2017).

      (45) T. B. H. Schroeder, G. Leriche, T. Koyanagi, M. A. Johnson, K. N. Haengel, O. M. Eggenberger, C. L. Wang, Y. H. Kim, K. Diraviyam, D. Sept, J. Yang, and M. Mayer, Effects of lipid tethering in extremophile-inspired membranes on H(+)/OH(-) flux at room temperature, Biophys. J. 110, 2430 (2016).

      (46) R. Xu, A. Dehghan, A.-C. Shi, and J. Zhou, Elastic property of membranes self-assembled from diblock and triblock copolymers, Chem. Phys. Lipids 221, 83 (2019).

      (47) Z. Dogic and S. Fraden, Ordered phases of filamentous viruses, Curr. Opin. Colloid Interface Sci. 11, 47 (2006).

      (48) E. Barry and Z. Dogic, Entropy driven self-assembly of nonamphiphilic colloidal membranes, Proc. Natl. Acad. Sci. U.S.A. 107, 10348 (2010).

      (49) A. J. Balchunas, R. A. Cabanas, M. J. Zakhary, T. Gibaud, S. Fraden, P. Sharma, M. F. Hagan, and Z. Dogic, Equation of state of colloidal membranes, Soft Matter 15, 6791 (2019).

      (50) M. Saracco, P. Schaeffer, M. Tourte, S.-V. Albers, Y. Louis, J. Peters, B. Demé, S. Fontanay, and P. M. Oger, Bilayer-Forming Lipids Enhance Archaeal Monolayer Membrane Stability, Int. J. Mol. Sci. 26, 3045 (2025).

      (51) M. Amaral, archaeal_membranes : code and examples (2024), available at https://doi.org/10.5281/zenodo. 13934991.

      (52) M. F. Ergüder and M. Deserno, Identifying systematic errors in a power spectral analysis of simulated lipid membranes, The Journal of Chemical Physics 154, 214103 (2021).

      (53) J. Genova, N. Ulrih, V. Kralj-Iglič, A. Iglič, and I. Bivas, Bending Elasticity Modulus of Giant Vesicles Composed of Aeropyrum Pernix K1 Archaeal Lipid, Life 5, 1101 (2015).

      (54) M. Amaral, Archaeal Membranes: In Silico Modelling and Design, Ph.D. thesis, Institute of Science and Technology Austria (2024).

      (55) M. Pohlschroder, F. Pfeiffer, S. Schulze, and M. F. A. Halim, Archaeal cell surface biogenesis, FEMS Microbiol. Rev. 42, 694 (2018).

      (56) K. S. Makarova, N. Yutin, S. D. Bell, and E. V. Koonin, Evolution of diverse cell division and vesicle formation systems in Archaea, Nat. Rev. Microbiol. 8, 731 (2010).

      (57) C. W. Stairs and T. J. Ettema, The Archaeal Roots of the Eukaryotic Dynamic Actin Cytoskeleton, Curr. Biol. 30, R521 (2020).

      (58) B. Baum and D. A. Baum, The merger that made us, BMC Biol. 18, 72 (2020).

      (59) Z. Zeng, H. Chen, H. Yang, Y. Chen, W. Yang, X. Feng, H. Pei, and P. V. Welander, Identification of a protein responsible for the synthesis of archaeal membrane-spanning GDGT lipids, Nat. Commun. 13, 1545 (2022).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Liu et al explores the role of the UPR and immune regulators in the evaluation of nutritional quality in C. elegans. They identify neuronal UPR activation and the MAPK PMK-1 as key responders to low food quality. In particular, the data suggest that these pathways are activated by low levels of vitamin C synthesis that result from the low sugar levels present in heat-killed E. coli.

      Strengths:

      The results are intriguing and expand our understanding both of physiological food evaluation systems, and of the known roles of stress response pathways in organismal physiology. The authors use a range of techniques, encompassing imaging, metabolomic analysis, gene expression analysis, and behavioural assays, to support their claims.

      Thank you for your thorough review and acknowledgment of the strengths of our study.

      Weaknesses:

      There is limited mechanistic analysis in the study. In particular, how does low vitamin C trigger UPR activation? This is an intriguing finding that, if followed up, could potentially reveal a novel mechanism of UPR activation. In addition, how is the activation of the PMK-1 pathway driven by/coordinated with UPR activation? The data in some figures is not as convincing as it could be: the magnitude of the effect size is small in the supplementation experiments, and the statistical tests used are not always appropriate to enable multiple comparisons.

      (1) There is limited mechanistic analysis in the study. In particular, how does low vitamin C trigger UPR activation? This is an intriguing finding that, if followed up, could potentially reveal a novel mechanism of UPR activation. 

      Thank you for highlighting the need for further mechanistic analysis in our study. We appreciate the opportunity to clarify the process by which low vitamin C triggers UPR activation.

      Our investigation revealed that the vitamin C content in heat-killed E. coli (HK-E. coli) is comparable to that of live E. coli or HK-yfbR mutant E. coli (Figure 4-figure supplement 1A), indicating that the induction of unfolded protein response (UPR) in C. elegans by HK-E. coli is not solely attributed to low vitamin C levels but rather involves other unidentified factors.

      Through metabolomic analysis, we observed significant decreases in sugar levels, including lactose, D-(+)-sucrose, and D-(+)-glucose, in HK-E. coli (Figure 3B, Table S1). Notably, supplementing D-(+)-glucose effectively inhibited UPRER, immune response, and avoidance behavior induced by HK-E. coli (Figure 3E-H). These findings suggest that the deficiency in sugars in HK-E. coli triggers a stress response and avoidance behavior in animals, which can be alleviated by D-(+)-glucose supplementation.

      Furthermore, when comparing heat-killed E. coli mutant yfbR (HK-yfbR) to HK-E. coli, we observed significantly higher sugar levels, including lactose and D-(+)-sucrose, in HK-yfbR (Figure 3B). This was accompanied by reduced UPRER in animals feeding on HK-yfbR (Figure 3-figure supplement 1B), indicating that higher sugar levels may inhibit the induction of UPRER by low-quality food.

      Considering that the synthesis of vitamin C (VC) occurs through the glucuronate pathway, utilizing D-glucose as a precursor 1, 2 (Figure 4A), we investigated whether the vitamin C biosynthesis pathway is involved in evaluating low-quality food using D-glucose. Contrary to our initial hypothesis, animals fed live E. coli did not exhibit higher glucose levels compared to those fed low-quality food (HK_-E. coli_). Our results indicate that animals maintain similar VC levels when fed ideal food (live E. coli) compared to low-quality food (HK-E. coli) (Figure 4B), suggesting that animals do not stimulate VC biosynthesis under favorable food conditions. However, supplementation of D-GlcA or E. coli-yfbR mutation in HK-E. coli significantly improved VC levels when animals were fed low-quality food (HK-OP50) (Figure 4B, 4C). Moreover, VC or D-glucuronate (D-GlcA) supplementation inhibited HK-E. coli-induced UPRER (Figure 4D), indicating that glucose boosts the animal's ability to adapt to unfavorable food environments by increasing VC levels, thereby inhibiting UPRER, but not under favorable food conditions.

      These findings shed light on the complex interplay between vitamin C, sugar levels, and UPR activation, providing valuable insights into the mechanisms underlying food evaluation and stress response pathways in organisms.

      Overall, we are grateful for the reviewer's constructive feedback, which motivates us to continue our efforts to understanding how the UPR response contributes to the complexities of food evaluation and behavioral responses in organisms.

      (2) In addition, how is the activation of the PMK-1 pathway driven by/coordinated with UPR activation?

      Thank you for your insightful inquiry. In our discussion section, we have addressed this question by integrating new data and discussion to provide insights into the coordination between PMK-1 pathway activation and UPR activation.

      Previous studies have demonstrated that activating innate immunity, specifically the PMK-1 MAPK pathway, results in a reduction in translation3, as well as a shutdown of food digestion in animals4, likely aimed at reducing protein translation and cellular metabolism. To further investigate this relationship, we measured the translation level of animals fed with heat-killed E. coli (HK-E. coli) and found a significant reduction in total translation ability in these animals (Figure 5-figure supplement 1D). This observation suggests that activating innate immunity through the PMK-1 MAPK pathway may serve as a mechanism to slow down translation progress, thereby alleviating the pressure on the unfolded protein response (UPR) and preventing excessive UPRER activation.

      By integrating these findings, we propose a model wherein activation of the PMK-1 pathway coordinates with UPR activation to regulate translation and cellular metabolism in response to low-quality food. This coordinated response likely serves to maintain cellular homeostasis and prevent detrimental effects associated with excessive UPRER activation.

      These insights contribute to our understanding of the intricate interplay between innate immunity, cellular stress responses, and metabolic regulation in organisms facing nutritional challenges.

      (3) The data in some figures is not as convincing as it could be: the magnitude of the effect size is small in the supplementation experiments, and the statistical tests used are not always appropriate to enable multiple comparisons.

      We appreciate the reviewers' concerns regarding the data presentation and statistical analyses in some of our figures. In response to this feedback, we have made revisions to improve the robustness and clarity of our statistical methods.

      All statistical analyses were conducted using GraphPad Prism 8.0 software. Specifically, a two-tailed unpaired t-test was employed for the statistical analysis of two groups of samples, while one-way or two-way ANOVA was utilized for the statistical analysis of more than two groups of samples. These adjustments ensure appropriate statistical comparisons and enhance the reliability of our findings.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors aim to better understand how C. elegans detects and responds to heat-killed (HK) E. coli, a low-quality food. They find that HK food activates two canonical stress pathways, ER-UPR, and innate immunity, in the nervous system to promote food aversion. Through the creative use of E. coli genetics and metabolomics, the authors provide evidence that the altered carbohydrate content of HK food is the trigger for the activation of these stress responses and that supplementation of HK food with sugars (or their biosynthetic product, vitamin C), reduces stress pathway induction and food avoidance. This work makes a valuable addition to the literature on metabolite detection as a mechanism for the evaluation of nutritional value; it also provides some new insight into the physiologically relevant roles of well-known stress pathways in modulating behavior.

      Strengths:

      -The work addresses an important question by focusing on understanding how the nervous system evaluates food quality and couples this with behavioral change. -The work takes full advantage of the tools available in this powerful system and builds on extensive previous studies on feeding behavior and stress responses in C. elegans.

      -Creative use of E. coli genetics and metabolite profiling enabled the identification of carbohydrate metabolism as a candidate source of food-quality signals.

      -For the most part, the studies are rigorous and logically designed, providing good support for the authors' model.

      We deeply appreciate the reviewer's insightful assessment of our study's strengths. 

      Weaknesses:

      -It is not clear how the mechanism identified here is connected to previously described, related processes. In particular, it is not clear whether this mechanism has a role in the detection of other low-quality foods. Further, the specificity of the ability of sugar/vitamin C to suppress stress pathway induction is unclear (i.e., does sugar/vitamin C have any effect on the activation of these pathways through other means?). Additionally, the relationship of this pathway to the vitamin B2-sensing mechanism previously described by the senior author is unclear. These issues do not weaken confidence in the authors' conclusions, but they do reduce the potential significance of the work.

      (1) In particular, it is not clear whether this mechanism has a role in the detection of other low-quality foods. 

      Thank you for your valuable feedback. In response to your inquiry, we investigated whether the UPRER (IRE-1/XBP-1) - Innate immunity (PMK-1/p38 MAPK) axis is specific to evaluating low-quality food (HK-E. coli) or if it plays a broader role in food detection.

      We conducted behavioral assays using N2, pmk-1, and xbp-1 mutant animals fed with normal E. coli food, inedible food (Saprophytic staphylococci)4, and pathogenic food (Pseudomonas aeruginosa-PA14)5. We found that N2, pmk-1, and xbp-1 mutant worms did not exhibit avoidance behavior when presented with normal food (OP50). However, both N2 and xbp-1 mutant worms were able to escape from inedible food (N2 was predominantly found on the border areas of the bacterial lawn and xbp-1 mutant worms on border and in), Saprophytic staphylococci, whereas pmk-1 mutant worms did not exhibit this avoidance behavior. Notably, N2 and xbp-1 mutant worms exhibited even more pronounced avoidance behavior when exposed to Pseudomonas aeruginosa, whereas pmk-1 mutant worms were more susceptible to infection by this pathogen (Figure 2-figure supplement 2C). These findings suggest that the UPR-Immunity pathway plays a crucial role in helping animals avoid low-quality food (HK-E. coli) by triggering an avoidance response. In contrast, the Innate immunity pathway, mediated by PMK-1/p38 MAPK, appears to play a key role in evaluating unfavorable food sources, such as HK-E. coli, Saprophytic staphylococci, and Pseudomonas aeruginosa, and helping animals avoid these environments.

      (2) Further, the specificity of the ability of sugar/vitamin C to suppress stress pathway induction is unclear (i.e., does sugar/vitamin C have any effect on the activation of these pathways through other means?). 

      Thank you for your inquiry regarding the specificity of the ability of sugar/vitamin C to suppress stress pathway induction. We aimed to address this question by investigating whether high levels of VC inhibit other stress-induced UPRER pathways.

      Previous studies have shown that both Tunicamycin6 and pathogenic bacteria, such as Pseudomonas aeruginosa-PA145, induce UPRER in C. elegans. In response to your query, we conducted experiments to examine whether VC supplementation inhibits UPRER induced by these stressors. Our findings indicate that VC supplementation does not inhibit UPRER induced by either Tunicamycin or PA14 (Author response image 1).

      These results suggest that while sugar/vitamin C may suppress stress pathway induction in the context of low-quality food, its effects may not extend to other stressors that induce UPRER through different mechanisms. This insight helps clarify the specificity of sugar/vitamin C's role in modulating stress pathway activation, contributing to a better understanding of the broader regulatory networks involved in stress response in C. elegans.

      Author response image 1.

      VC supplementation does not inhibit Tunicamycin or PA14-induced UPRER.

      (3) Additionally, the relationship of this pathway to the vitamin B2-sensing mechanism previously described by the senior author is unclear.

      In response to your comment, we would like to clarify the relationship of our pathway to the previously described vitamin B2-sensing mechanism we found. Previous studies have demonstrated that heat-killed E. coli (HK-E. coli) serves as a low-quality food source incapable of supporting the growth of C. elegans larvae, whereas supplementation with vitamin B2 (VB2) can restore animal growth7

      This study investigates the role of sugar deficiency in HK-E. coli, which induces the UPRER-immune response and avoidance behavior in C. elegans. Surprisingly, our findings indicate that supplementing HK-E. coli with carbohydrates such as D-Glc and D-GlcA does not promote animal development (Figure 3-figure supplement 2G), suggesting that carbohydrates are not essential for supporting animal growth on this food source. However, we did observe that carbohydrates play a critical role in inhibiting the UPRER-immune response induced by sugar deficiency in HK-E. coli.

      -The authors claim that the induction of the innate immune pathway reporter irg-5::GFP is "abolished" in pmk-1(RNAi) animals, but Figure S2K seems to show a clear GFP signal when these animals are fed HK-OP50. Similarly, the claim that feeding WT animals HK-OP50 enriches phospho-PMK-1 levels (Fig 2E) is unconvincing - only one western blot is shown, with no quantification, and there is a smear in the critical first lane.

      (1) The authors claim that the induction of the innate immune pathway reporter irg-5::GFP is "abolished" in pmk-1(RNAi) animals, but Figure S2K seems to show a clear GFP signal when these animals are fed HK-OP50. 

      We sincerely appreciate the reviewer's attention. To address this concern, we have replaced the images with higher resolution, larger ones in Figure 2-figure supplement 1-I. These updated images provide a clearer representation of the data, ensuring that all details are readily visible and enabling a more accurate interpretation of the results.

      (2) Similarly, the claim that feeding WT animals HK-OP50 enriches phospho-PMK-1 levels (Fig 2E) is unconvincing - only one western blot is shown, with no quantification, and there is a smear in the critical first lane.

      Thank you, following reviewer’s suggestion, we also repeated some of the western. We now replace the Figure 2E and quantified relative intensity of pPMK-1/tublin. We also provide the uncropped western blots images as source data ( “raw-data WB” file). 

      -The rationales for some of the paper's hypotheses could be improved. For example, the rationale for screening the E. coli mutant library is that some mutants, when heat-killed, may be missing a metabolite that induces the ER-UPR. A more straightforward hypothesis might be that some mutant E. coli strains aberrantly induce the ER-UPR when *not* heat-killed, because they are missing a metabolite that prevents stress pathway induction. This is not in itself a major concern, but it would be useful for the authors to provide a rationale for their hypothesis.

      Thank you for the insightful suggestion. We acknowledge the importance of providing a clear rationale for our hypotheses in the paper. In response to this feedback, we have enhanced the discussion section to better elucidate the rationale behind our hypotheses.

      One limitation of our study is the lack of explanation for why HK-E. coli activates UPRER and immunity. We hypothesized that when heat-killed, HK-E. coli may lack or contain altered levels of certain metabolites that either activate or inhibit UPRER and immunity, respectively. Additionally, we speculated that E. coli mutants killed by heat may lack metabolites that activate UPRER and immunity, or conversely, have increased levels of metabolites that inhibit these pathways.

      Fortunately, our investigation led to the discovery of the E. coli mutant yfbR, which inhibits UPRER and immunity by increasing carbohydrates that aid in resisting these stress pathways. Moving forward, we intend to further explore the intricate relationship between HK-E. coli and UPRER-immunity. This will be a key focus of our future research efforts.

      -The authors do not provide any explanation for some unexpected results from the E. coli screen. Earlier in the paper, the authors found that innate immune signaling is downstream of ER-UPR activation. However, of the 20 E. coli mutants that, when heat-killed, "did not induce... the UPR-ER reporter," 9 of them still activate the innate immune response. This seems at odds with the authors' simple model since it suggests that low-quality food can induce innate immune signaling independently of the ER-UPR. Further, only one of the 9 has an effect on behavior, even though failure to activate the innate immune pathway might be expected to lead to a behavioral defect in all of these.

      Thank you for your understanding, and we apologize for any confusion caused by our earlier statement. To provide clarification, our study revealed that out of the 20 E. coli mutants examined, none activated the UPRER. Among these mutants, 9 did not induce immunity, and interestingly, one out of these 9 mutants demonstrated the ability to inhibit avoidance behavior.

      This diversity in phenotypic outcomes can be attributed to the varied metabolites present in different E. coli mutants. To thoroughly evaluate the effects of these mutants, we conducted a comprehensive three-step screening process, utilizing UPRER marker, immunity marker, and avoidance behavior assays.

      Through this rigorous approach, we identified the E. coli mutant, yfbR, which exhibited the desired inhibitory effects on UPRER, immunity, and avoidance behavior.

      Subsequently, we conducted a metabolomics analysis of various food qualities (HK-K12, HK-yfbR, and Live-K12). Our findings revealed higher sugar levels in

      HK-yfbR and Live-K12 compared to HK-K12 (Figure 3B, Figure 3-figure supplement 2A, and Table S1), indicating that sugar deficiency might trigger the UPRER, immunity responses, and subsequent avoidance behavior. 

      -In a number of places, the writing style can make the authors' arguments difficult to follow.

      Thanks for the reviewer’s efforts. We changed all of these errors and polish the language of this paper. 

      -Some of the effect sizes observed by the authors are exceedingly small (e.g, the suppression of hsp-4::gfp induction by sugar supplementation in Figs 3C-E), raising some concern about the biological significance of the effect.

      Thank you for your feedback. In response to your concern, we have included additional clarification in the manuscript.

      We have added the following statement: “While sugar effectively inhibits the HK-E. coli-induced UPRER and immune response, it does not fully suppress it to the extent observed with live-E. coli (Figure 3C-F). This implies that additional nutrients present in live-E. coli might also contribute to the inhibition of UPRER and immune response.”

      This addition helps to address the observation that some effect sizes appear small, providing context and suggesting potential factors that may influence the outcomes. 

      -In some cases, there is a discrepancy between the fluorescence images and their quantitation (e.g., Figure 3E, where the effect of glucose on GFP fluorescence seems much stronger in the image than in the graph).

      Thank you for your valuable suggestion. In response, we have revised our image selection process to ensure impartiality. We now randomly select images to ensure they accurately represent the quantified data without bias. More details regarding this update can be found in Author response image 2.

      Author response image 2.

      More original picture corresponding to Figure 3E 

      Reviewer #3 (Public Review):

      Summary:

      Animals can evaluate food quality in many ways. In contrast to the rapid sensory evaluation with smell and taste, the mechanism of slow nutrient sensation and its impact on food choice is unexplored. The authors utilize C. elegans larvae and their bacterial food as an elegant model to tackle this question and reveal the detailed molecular mechanism to avoid nutrient-poor foods.

      Strengths:

      The strength of this study is that they identified the molecular identities of the critical players in bacterial food and C. elegans using unbiased approaches, namely metabolome analysis, E. coli mutant screening, and RNA sequencing. Furthermore, they strengthen their findings by thorough experiments combining multiple methods such as genetics, fluorescent reporter analysis, and Western blot.

      Thank you for highlighting the strengths of our study. 

      Weaknesses:

      The major caveat of this study is the reporter genes. The transcriptional reporters were used to monitor the UPRER and immune responses in the intestine of C. elegans.

      However, their tissue-specific rescue experiments suggest that the genes in the UPRER and immune response function in the neurons. Thus, we should carefully interpret the results of the reporter genes.

      Thank you for your insightful comment. We appreciate the opportunity to address your concerns regarding the interpretation of our reporter gene data.

      Upon reevaluation, we observed strong induction of the UPRER reporter

      (Phsp-4::GFP)8 and immunity reporter (Pirg-5::GFP)9 both in the intestine (Figure 1F-G) and in neurons (Figure 1-figure supplement 2A) in response to feeding unfavorable food (HK-E. coli). This suggests that both the UPRER and immune pathways may indeed respond to low-quality food (HK-E. coli) in multiple tissues of C. elegans. While we acknowledge that our tissue-specific rescue experiments suggest a role for these pathways in neurons, the intestinal fluorescence of Phsp-4::GFP or Pirg-5::GFP is easily observable and scorable. Therefore, we chose to focus our further analyses on the intestine for practical reasons.

      Overall, this work provides convincing data to support their model. In the C. elegans field, the behaviors of larvae are not well studied compared to adults. This work will pose an interesting question about the difference between larvae and adults in nutrition sensing in C. elegans and provide a framework and candidate molecules to be studied in other organisms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major suggestions:

      (1) My major overall comment is that the paper would be substantially strengthened by more mechanistic analysis. In particular, how does low vitamin C trigger UPR activation? This is an intriguing finding and it would be important to see it more fully explored.  

      Our study revealed that the vitamin C content in HK_-E. coli_ is comparable to that of live E. coli or HK-yfbR (Figure 4-figure supplement 1A), suggesting that the induction of unfolded protein response (UPR) in C. elegans by HK-E. coli is not attributed to low vitamin C levels, but rather to unknown factors.

      Metabolomic analysis showed that the sugar levels, including lactose, D-(+)-sucrose, and D-(+)-glucose, were significantly decreased in HK-E. coli (Figure 3B, Table S1).

      Furthermore, we found that supplementing D-(+)-glucose effectively inhibited UPRER (Figure 3E), immune response (Figure 3F, 3G, and Figure 3-figure supplement 2D), and avoidance behavior (Figure 3H) induced by HK-E. coli. Our findings suggest that the deficiency in sugars in HK-E. coli triggers a stress response and avoidance behavior in animals, which can be alleviated by D-(+)-glucose supplementation.

      Notably, when E. coli was heat-killed, we observed that the sugar levels, including lactose and D-(+)-sucrose, were significantly higher in the heat-killed E. coli mutant yfbR (HK-yfbR) compared to HK-E. coli (Figure 3B). Moreover, we found that UPRER was reduced in animals feeding HK-yfbR (Figure 3-figure supplement 1B), indicating that higher sugar levels may inhibit the induction of UPRER by low-quality food.

      The synthesis of vitamin C (VC) occurs through the glucuronate pathway, utilizing D-glucose as a precursor 1, 2 (Figure 4A). This led us to investigate whether the vitamin C biosynthesis pathway is involved in evaluating low-quality food by using D-glucose. In this study, we found that animals feeding live E. coli, which should produce more VC, exhibit higher glucose levels. However, our results show that animals maintain similar VC levels when fed ideal food (live E. coli) compared to low-quality food (HK-E. coli) (Figure 4B), suggesting that animals do not stimulate VC biosynthesis under favorable food conditions. In contrast, when animals are fed low-quality food (HK-OP50), we found that supplementing D-GlcA (Figure 4C) or E. coli-yfbR mutation (Figure 4B) in HK-E. coli can improve VC levels. Moreover, we found that VC or D-glucuronate (D-GlcA) supplementation inhibited HK-E. coli induced UPRER (Figure 4D). These data indicate that glucose boosts the animal's ability to adapt to unfavorable food environments by increasing VC levels, thereby inhibiting UPRER, but not in favorable food conditions.

      In addition,we asked whether high level of VC inhibits other stress induced UPRER. Previous study shown that Tunicamycin6 and pathogenic bacteria-Pseudomonas aeruginosa-PA145 induce UPRER in C. elegans. We found that VC supplementation does not inhibit Tunicamycin or PA14-induced URPER (Author response image 3). 

      Author response image 3.

      VC supplementation does not inhibit Tunicamycin or PA14-induced UPRER.

      In addition, how is the activation of the PMK-1 pathway driven by/coordinated with UPR activation? 

      If the authors do not want to pursue these directions experimentally in this study, the discussion would be strengthened by considering these questions and identifying candidate regulatory mechanisms for further exploration.

      In this study, we found that heat-killed E. coli (HK-E. coli), a low-sugar food, triggers cellular unfolded protein response (UPRER) and immune response. We also demonstrated that 1) the activation of UPRER by low-quality food depends on the IRE-1/XBP-1, 2) activation of immune response (PMK-1) is downstream of XBP-1 in responding to low-quality food.

      how is the activation of the PMK-1 pathway driven by/coordinated with UPR activation? 

      In our discussion part, we added new data and discussion to answer reviewer’s question. 

      A previous study has shown that activating innate immunity (PMK-1 MAPK) leads to a reduction in translation 3. Our own previous research has also demonstrated that PMK-1 activation causes a shutdown of food digestion in animals4, likely to reduce protein translation and cellular metabolism. To investigate this further, we measured the translation level of animals fed with HK-E. coli and found that total translation ability is significantly reduced in these animals (Figure 5-figure supplement 1D). This finding suggests that activating innate immunity (PMK-1 MAPK) may serve as a mechanism to slow down translation progress, thereby alleviating the pressure on the unfolded protein response (UPR) and preventing excessive UPRER activation.

      (2) Figure 2C: The data shows that xbp-1 mutants are significantly more likely to leave heat-killed E. coli. However, no other conditions are examined. Is this avoidance defect specific to heat-killed E. coli, or is it a more general effect of xbp-1 mutants - that is, are other conditions that evoke avoidance also affected by mutation of xbp-1? Is feeding behavior on regular E. coli altered in this background? The finding would be more relevant if the authors could clarify or provide more context for their claims here.

      We then asked whether UPRER (IRE-1/XBP-1) - Innate immunity (PMK-1/p38 MAPK) axis is specific to evaluate low-quality food (HK-E. coli). We examined the avoidance behavior phenotype of wild-type and mutant L1 animals by placing them on various food conditions, including normal E. coli food, inedible food (Saprophytic staphylococci) and pathogenic food (Pseudomonas aeruginosa-PA14), for a 24-hour period. We found that N2, pmk-1, and xbp-1 mutant worms did not exhibit avoidance behavior when presented with normal food (OP50). However, both N2 and xbp-1 mutant worms were able to escape from inedible food, Saprophytic staphylococci, whereas pmk-1 mutant worms did not show this avoidance. Notably, xbp-1 mutant worms exhibited even more pronounced avoidance behavior when exposed to Pseudomonas aeruginosa, whereas pmk-1 mutant worms were more susceptible to infection by this pathogen (Figure 2-figure supplement 2C). These findings suggest that the UPR-Immunity pathway plays a crucial role in helping animals avoid low-quality food by triggering an avoidance response. In contrast, the Innate immunity pathway, which is mediated by PMK-1/p38 MAPK, appears to play a key role in evaluating unfavorable food sources, such as HK-E. coli, Saprophytic staphylococci, and Pseudomonas aeruginosa, and helping animals avoid these environments.

      (3) Figure 3C-F: The magnitude of the changes between conditions shown in these panels is small. To what extent does this supplementation represent a full rescue? The findings would be strengthened if figures/images for the control condition (non-HK E. coli) were shown for comparison to allow the reader to assess the extent to which UPR/PMK-1 activation is rescued.

      In response to a reviewer's suggestion, we included live-E. coli as a control in our study. Notably, our data revealed that the addition of lactose, D-(+)-sucrose, and D-(+)-glucose partially inhibited the HK-E. coli-induced unfolded protein response (UPRER) and immune response, suggesting that other nutrients present in live-E. coli may also play a role in inhibiting UPRER.

      We added this in manuscript: “While sugar effectively inhibits the HK-E. coli-induced UPRER and immune response, it does not fully suppress it to the extent observed with live-E. coli (Figure 3C-F). This implies that additional nutrients present in live-E. coli might also contribute to the inhibition of UPRER and immune response.” 

      (4) Figure 5B-D: The magnitude of changes shown between conditions here again appear to be very small, even those labelled as statistically significant. It is important to ensure that the correct statistical tests have been used to assess the significance of these differences (see below).

      All statistical analyses were performed in Graphpad prism 8.0. Two-tailed unpaired t test was used for statistical analysis of two groups of samples,one-way or two-way ANOVA was used for statistical analysis of more than two groups of samples.

      (5) Methods: In the "Statistical analysis" section, the authors state that "All statistical analyses were performed using Student's t-test". However, this is not the appropriate test to use in experiments where multiple comparisons are made, which is true in several instances across the paper. In these cases, a more appropriate statistical test should be used.

      All statistical analyses were performed in Graphpad prism 8.0. Two-tailed unpaired t test was used for statistical analysis of two groups of samples,one-way or two-way ANOVA was used for statistical analysis of more than two groups of samples.

      Minor suggestions:

      (1) Figure S2: RNAi is usually delivered in a different E. coli strain, HT115. Is this the case with the RNAi knockdowns in Figure S2, and given that diet can influence UPR activation, is it possible that this different diet could change the phenotypes observed?

      This should be clarified by the authors.

      In this study, all RNAi experiments involved bleaching adult animals under RNAi strain culture conditions to obtain L1 animals. Subsequently, L1 animals were transferred to HK-E. coli OP50 for phenotype analysis. In response to a reviewer's suggestion, we observed that L1 animals obtained from mothers fed E. coli strains OP50, HT115, or K12 exhibited similar UPR induction under HK-E. coli OP50 feeding conditions (Author response image 4). These findings suggest that variations in diet did not alter the UPR phenotypes.

      Author response image 4.

      L1 animals obtained from mothers fed E. coli strains OP50, HT115, or K12 exhibited similar UPR induction under HK-E. coli OP50 feeding conditions 

      Reviewer #2 (Recommendations For The Authors):

      Line 182: "irg-5::GFP" should be "hsp-4::gfp".

      Thanks for the reviewer’s efforts. We have changed this error.

      Reviewer #3 (Recommendations For The Authors):

      Major comments:

      (1) The reporter genes of UPRER and immune response were analyzed in the intestine throughout the study. On the other hand, their rescue experiments suggest that these pathways function in the neurons. They should provide the fluorescence data in the neurons at least for Figures 1F and 1G to confirm that the intestinal response matches the neuronal response and mention that further analyses were done in the intestine for easy scoring.

      Consistent with the results of the RNA sequencing (RNA-seq) analysis, the UPRER reporter (Phsp-4::GFP)8 and immunity reporter (Pirg-5::GFP)9 were strongly induced in intestinal (Figure 1F-G) and neurons (Figure 1-figure supplement 2A) by feeding unfavorable food (HK-E. coli), suggesting that UPRER and immune pathways may respond to low-quality food (HK-E. coli). As intestinal fluorescence (Phsp-4::GFP or Pirg-5::GFP) is easy observation and scoring, the further analyses were done in the intestine. 

      (2) I have concerns about the interpretation of the p-PMK-1 data. Although the authors described that "p-PMK-1 is prominently increased" in the text (Line 150), it is unclear on the data (Figure 2E). Similarly, the authors' statement "p-PMK-1 is decreased in animals with D-GlcA (F).." was not fully supported by the data in Figure 4F. The experiment should be repeated and quantified. Moreover, pPMK-1 showed single bands in Figure 2E, but double bands in Figure 3G, 4F, and 4G. The authors should explain why that is the case and which band we should look at for Figures 3G, 4F, and 4G.

      As reviewer’s suggestion, we also repeated some of the western. We found that after longer expose, there are two bands for pPMK-1 (Figure 2E, new data; and “raw-data WB” file). The VHP-1 phosphatase is known to inhibit PMK-13. In our previous study, we found that worms treated with vhp-1(RNAi), which hyperactivates p-PMK-1 (lower band) 4. In contrast, the two bands are disappeared in pmk-1 mutant (Author response image 5). Thus, the lower band indicates the pPMK-1. We now replace the Figure 2E and quantified relative intensity of pPMK-1/tublin. We also provide the uncropped western blots images as source data ( “raw-data WB” file). 

      Author response image 5.

      In our previous study, we found that worms treated with vhp-1(RNAi), which hyperactivates p-PMK-1 (lower band) 4. In contrast, the two bands are disappeared in pmk-1 mutant. These pictures are extracted from our previous study4.

      (3) Heat-killed E. coli (HK-E. coli) is low-quality because the lack of sugar cannot support the growth of C. elegans larvae (Qi and Han, Cell, 2018). Thus, animals do not show the UPRER-immune response and avoidance when HK-E. coli is supplemented with sugars such as glucose (Line 225-227). If these sugars are the key, C. elegans larvae should be able to grow better with HK-E. coli supplemented with glucose. Authors should address this possibility.

      Previous studies have shown that heat-killed E. coli (HK-E. coli) is a low-quality food source that cannot support the growth of C. elegans larvae7. Here, we found that sugar deficiency in HK-E. coli induces the UPRER-immune response and avoidance behavior in C. elegans. Given this, we investigated whether sugar supplementation could promote animal growth when fed HK-E. coli. To our surprise, supplementing HK-E. coli with carbohydrates (D-Glc, D-GlcA) did not support animal development (Figure 3-figure supplement 2G), suggesting that carbohydrates are not essential for supporting animal growth on this food source. However, we did find that carbohydrates are critical for inhibiting the UPRER-immune response induced by sugar deficiency in HK-E. coli.

      (4) Line 884: Instead of the Student's t-test, the ANOVA should be used for multiple comparisons.

      All statistical analyses were performed in Graphpad prism 8.0. Two-tailed unpaired t test was used for statistical analysis of two groups of samples,one-way or two-way ANOVA was used for statistical analysis of more than two groups of samples.

      (5) Although the results are interesting and convincing, the manuscript needs some careful editing and proofreading. As far as I could catch, there are more than 100 errors and typos, as I summarized in minor comments. I recommend the authors proofread thoroughly to make this work easier to read.

      Thanks for the reviewer’s efforts. We changed all of these errors and polish the language of this paper. 

      Minor comments:

      (1) Line 30: nature -> natural

      (2) Line 86: elegnas -> elegans

      (3) Line 93: the17h -> the 17h

      (4) Line 97: response -> respond

      (5) Line106: responded -> respond

      (6) Lien 107-109: Add references for the three reporters

      (7) Line 114: immune -> immune pathway

      (8) Line 118: immune depended -> immune-dependent

      (9) Line 128, 594, 596: deferentially -> differentially

      (10) Line 131: Explain what IRE-1-mediated splicing of xbp-1 with references

      (11) Line 170: XPB-1 -> XBP-1

      (12) Line 179: URP -> UPR

      (13) Line 181: hsp-4::GFP -> Phsp-4::GFP

      (14) Line 183: Italicize E. coli; mutant -> mutants

      (15) Line 184: irg-5::GFP -> Pirg-5::GFP (2 places)

      (16) Line 197, 203, 206, 207: Lactose -> lactose

      (17) Line 206, 209, 217, 225, 228, 232, 237, 262, 442, 445, 604, 739: Glucose -> glucose

      (18) Line 218: Sugars deficiency -> sugar deficiency

      (19) Line 229: found contribute to -> found to contribute to

      (20) Line 235, 537, 539, 587, 599, 642, 855: Italicize E. coli

      (21) Line 236: same -> the same

      (22) Line 239: I recommend adding "in C. elegans". This study uses both E. coli and C.

      elegans genetics. Sometimes, it is confusing which organism was mentioned. It should be applied where it is necessary.

      (23) Line 240: additional -> addition

      (24) Line 339, 642: Italicize kgb-1

      (25) Line 390: Italicize Pseudomonas aeruginosa, Bacillus thuringiensis,

      Staphylococcus aureus, and Serratia marcescens

      (26) Line 394: wiht -> with

      (27) Line 400, 550: Change ER to superscript; Italicize ire-1, xbp-1, and pmk-1

      (28) Line 415: xpb-1 -> xbp-1

      (29) Line 460, 525, 531, 532, 617, 655: Italicize yfbR

      (30) Line 457, 468, 472, 475, 482, 497, 513, 624, 629, 633, 733. 758: Vitamin -> vitamin

      (31) Line 459: Make it clear what is the relationship between vitamin C and TAA

      (32) Line 527: Do not italicize mutant

      (33) Line 538: Phsp-6:GFP -> Phsp-6::GFP (to match other descriptions)

      (34) Line 540: Phsp-4:GFP -> Phsp-4::GFP (to match other descriptions)

      (35) Line 540: Italicize hsp-4

      (36) Line 543: Pirg-5:GFP -> Pirg-5::GFP (to match other descriptions) and italicize irg-5

      (37) Line 550, 881: Innate -> innate

      (38) Line 557, 560, 564, 838: Do not italicize HK

      (39) Line 561: Remove the extra space before "three"

      (40) Line 575, 577: Reporter -> reporter

      (41) Line 575, 607: Italicize Phsp-4::GFP

      (42) Line 577: immunity -> Immunity; Italicize Pirg-5::GFP

      (43) Line 585, 653: keio -> Keio

      (44) Line 586: hsp-4::GFP -> Phsp-4::GFP

      (45) Line 586, 589 (2 places): irg-5::GFP -> Pirg-5::GFP

      (46) Line 597: Remove "all"

      (47) Line 600: Trehalose -> trehalose

      (48) Line 609: Italicize Pirg-5::GFP

      (49) Line 615: critically -> critical

      (50) Line 636: Remove "+"

      (51) Line 656 (2 places), 682: Do not italicize OP50

      (52) Line 664: Lead -> lead

      (53) Line 681: Describe the composition of NGM or show the reference. Since this paper examines nutrition, the composition of the medium is crucial.

      (54) Line 686-706: Italicize all allele names. Be consistent with how to write the promoter to avoid confusion (e.g., ttx-3p -> Pttx-3). Be consistent with how to describe the transgene (e.g., Phsp-4::GFP(zcIs4) -> zcIs4[Phsp-4::GFP])

      (55) Line 710: Describe the composition of LB or show the reference. Since this paper examines nutrition, the composition of the medium is crucial.

      (56) Line 709, 856 (2 places), 858: Do not italicize K12 to make it consistent

      (57) Line 719: Podr-1p:RFP -> Podr-1::RFP

      (58) Line 722, 724: Italicize ges-1 and xbp-1

      (59) Line 723: Pges-1:xbp-1::GFP -> Pges-1::xbp-1::GFP

      (60) Line 735: Glucuronic -> glucuronic

      (61) Line 748: I believe it is 5 mm instead of 0.5 mm

      (62) Line 750: The equation should be (5 mm)2/(17.5 mm)2

      (63) Line 759: Remove the period after "pattern".

      (64) Line 766: Describe how they were synchronized

      (65) Line 774: Italicize Psysm-1p::GFP

      (66) Line 785: Insert a space before "until"

      (67) Line 787: the mutant -> mutant

      (68) Line 789, 792, 793, 795 (2 places): GPF -> GFP

      (69) Line 791: next -> Next; an -> a

      (70) Line 799: Remove a space before "MRC".

      (71) Line 804: I do not understand what "until adulthood" means in this context;

      Remove a space before "by". (I recommend searching double space and correcting it.)

      (72) Line 853: Metabolome -> metabolome

      (73) Line 893-1082: Species and gene names should be italicized in Reference

      (74) Figures 1F, 1G, S2F, S2G: The panels' order should match the bar graphs' order. The apparent difference in the representative data does not match the marginal difference in the bar graph in Fig. 1G. The authors should double-check the results.

      (75) Figure 1F, 2A, 2B, 3C, 3D, 3E, 4D, 4I, S1J, S2A, S2B, S2I, S3B, S3F, S3H: hsp-4::GFP -> Phsp-4::GFP

      (76)  Figure 1G, 2D, 3F, 4E, 4J, S1K, S2H, S3C, S3I: irg-5::GFP -> Pirg-5::GFP

      (77)  Figure 6: Liquids -> Lipids; Italicize ire-1, xbp-1, pmk-1

      (78)  Figure S1I: hsp-6::GFP -> Phsp-6::GFP

      (79)  In the legend for Figure S1 after Figure S1, (A), (B)... were duplicated. It is OK in the corresponding main text (Line 530)

      (80)  Figure S2F, S3G, S4C, S4D: sysm-1::GFP -> Psysm-1::GFP

      (81)  Figure S2G: irg-1::GFP -> Pirg-1::GFP

      (82)  Figure S3H and S3I: Describe which ones are Glu + conditions

      References: 

      (1) Patananan AN, Budenholzer LM, Pedraza ME, Torres ER, Adler LN, Clarke SG. The invertebrate Caenorhabditis elegans biosynthesizes ascorbate. Arch Biochem Biophys 569, 32-44 (2015).

      (2) Yabuta Y_, et al. L-Ascorbate Biosynthesis Involves Carbon Skeleton Rearrangement in the Nematode Caenorhabditis elegans. _Metabolites 10,  (2020).

      (3) Weaver BP, Weaver YM, Omi S, Yuan W, Ewbank JJ, Han M. Non-Canonical Caspase Activity Antagonizes p38 MAPK Stress-Priming Function to Support Development. Dev Cell 53, 358-369 e356 (2020).

      (4) Geng S_, et al. Gut commensal E. coli outer membrane proteins activate the host food digestive system through neural-immune communication. _Cell Host Microbe 30, 1401-1416 e1408 (2022).

      (5)  Richardson CE, Kooistra T, Kim DH. An essential role for XBP-1 in host protection against immune activation in C. elegans. Nature 463, 1092-1095 (2010).

      (6) Harding HP_, et al. An Integrated Stress Response Regulates Amino Acid Metabolism and Resistance to Oxidative Stress. _Molecular Cell 11, 619-633 (2003).

      (7) Qi B, Kniazeva M, Han M. A vitamin-B2-sensing mechanism that regulates gut protease activity to impact animal’s food behavior and growth. eLife 6, e26243 (2017).

      (8) Calfon M_, et al. IRE1 couples endoplasmic reticulum load to secretory capacity by processing the XBP-1 mRNA. _Nature 415, 92-96 (2002).

      (9) Bolz DD, Tenor JL, Aballay A. A Conserved PMK-1/p38 MAPK Is Required in Caenorhabditis elegans Tissue-specific Immune Response to Yersinia pestis Infection*. The Journal of Biological Chemistry 285, 10832 - 10840 (2010).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In Ryu et al., the authors use a cortical mouse astrocyte culture system to address the functional contribution of astrocytes to circadian rhythms in the brain. The authors' starting point is transcriptional output from serum-shocked culture, comparative informatics with existing tools and existing datasets. After fairly routine pathway analyses, they focus on the calcium homeostasis machinery and one gene, Herp, in particular. They argue that Herp is rhythmic at both mRNA and protein levels in astrocytes. They then use a calcium reporter targeted to the ER, mitochondria, or cytosol and show that Herp modulates calcium signaling as a function of circadian time. They argue that this occurs through the regulation of inositol receptors. They claim that the signaling pathway is clock-controlled by a limited examination of Bmal1 knockout astrocytes. Finally, they switch to calcium-mediated phosphorylation of the gap junction protein Connexin 43 but do not directly connect HERP-mediated circadian signaling to these observations. While these experiments address very important questions related to the critical role of astrocytes in regulating circadian signaling, the mechanistic arguments for HERP function, its role in circadian signaling through inositol receptors, the connection to gap junctions, and ultimately, the functional relevance of these findings is only partially substantiated by experimental evidence. 

      Strengths: 

      - The paper provides useful datasets of astrocyte gene expression in circadian time. 

      - Identifies HERP as a rhythmic output of the circadian clock. 

      - Demonstrates the circadian-specific sensitivity of ATP -> calcium signaling. 

      - Identifies possible rhythms in both Connexin 43 phosphorylation and rhythmic movement of calcium between cells. 

      Weaknesses: 

      - It is not immediately clear why the authors chose to focus on Ca2+ homeostasis or Herp from their initial screens as neither were the "most rhythmic" pathways in their primary analyses. 

      We appreciate the reviewer’s comment. We chose to focus on Ca2+ homeostasis processes because intracellular Ca2+ signaling plays crucial role in numerous astrocyte functions and is notably associated with sleep/wake status of animals, which is our primary interest (Bojarskaite et al., 2020; Ingiosi et al., 2020; Blum et al., 2021; Szabó et al., 2017). Among the genes involved in calcium ion homeostasis, Herp exhibited the most robust rhythmicity (supplementary table 1). The rationale for our focus on Ca2+ homeostasis and Herp is explained in the results section (line 143-150). We hope this provides a clear justification for our focus.

      - It would have been interesting (and potentially important) to know whether various methods of cellular synchronization would also render HERP rhythmic (e.g., temperature, forskolin, etc). If Herp is indeed relatively astrocyte-specific and rhythmic, it should be easy to assess its rhythmicity in vivo. 

      Thank you for the reviewer’s insightful comment. In response, we examined HERP expression in cultured astrocytes synchronized using either Dexamethasone or Forskolin treatment. We found that Herp exhibited rhythmic expression at both the the mRNA and protein levels under these conditions. These results have been added to Figure S3 and are explained in the manuscript (lines 173-175).

      Additionally, we measured HERP levels in the prefrontal cortex of mice at CT58 and CT70 and found no rhythmicity, as shown in Author response image 1. Given that Herp is expressed in various brain cell types, including microglia, endothelial cells, neurons, oligodendrocytes, and the astrocytes- with the highest expression in microglia(Cahoy et al., 2008), we reason that the potential rhythmic expression of HERP in astrocytes might be masked by its continuous expression in other cell types. Nonetheless, to assess HERP rhythmicity specifically in astrocytes in vivo, we attempted immunostaining using several anti-HERP antibodies, but none were successful. Consequently, we were unable to determine whether HERP exhibits rhythmic expression in astrocytes in vivo.

      Author response image 1.

      HERP levels were constant at CT58 and CT70. (A, B) Mice were entrained under 12h:12h LD cycle and maintained in constant dark. Prefrontal cortices were harvested at indicated time and processed for Western blot analysis. Representative image shows three independent samples. (B) Quantification of HERP levels normalized to VINCULIN. Values in graphs are mean ± SEM (*p < 0.05, **p < 0.005, ***p < 0.0005, and ****p < 0.00005; t-test)

      - The authors show that Herp suppression reduces ATP-mediated suppression of calcium whereas it initially increases Ca2+ in the cytosol and mitochondria and then suppresses it. The dynamics of the mitochondrial and cytosolic responses are not discussed in any detail and it is unclear what their direct relationship is to Herp-mediated ER signaling. What is the explanation for Herp (which is thought to be ER-specific) to calcium signaling in other organelles? 

      Our examination of cytosolic and mitochondrial Ca2+ responses was aimed at corroborating HERP’s effect on ER Ca2+ response. Upon ATP stimulation, Ca2+ is released from the ER via IP3R receptors (IP3Rs) and subsequently transmitted to other organelles including mitochondria (Carreras-Sureda et al., 2018; Giorgi et al., 2018). Ca2+ is directly transferred to the cytosol by IP3Rs located on the ER membrane, and to the mitochondria through a complex formed by IP3R and the voltage-dependent anion channel (VDAC) on the mitochondria (Giorgi et al., 2018).  Consistent with previous reports, we observed an increase of cytosolic and mitochondrial Ca2+ levels accompanied by decrease in ER Ca2+ levels following ATP treatment (See Fig. 3B, E, H, control siRNA). The ATP-stimulated ER Ca2+ release was enhanced by Herp knockdown. We reasoned that if Ca2+ release was enhanced, then cytosolic and mitochondrial Ca2+ uptakes would also be enhanced. The results were consistent with our hypothesis (See Fig. 3B, E, H, Herp siRNA). These observations are described in the Results section (lines 202-208) and in the Discussion (lines 333-348). We hope this explanation clarifies the relationship between Herp-mediated ER Ca2+ response and Ca2+ response in other organelles. Thank you for your consideration.

      - What is the functional significance of promoting ATP-mediated suppression of calcium in ER? 

      In astrocytes, intracellular Ca2+ plays crucial role in regulating several processes. In this study, among various downstream effects of intracellular Ca2+, we examined the gap junction channel (GJC) conductance, which affects astrocytic communication. As discussed in the manuscript (lines 357-381), circadian variation in HERP results in rhythmic Cx43 (S368) phosphorylation linked with GJC conductance. We propose that during the subjective night phase, heightened ATP induced ER Ca2+ release reduces GJC conductance, uncoupling astrocytes from the syncytium, making them better equipped for localized response. On the other hand, during the subjective day phase, increased GJC conductance may allow astrocytes to control a larger area for synchronous neuronal activity which is a key feature of sleep.

      - The authors then nicely show that the effect of ATP is dependent on intrinsic circadian timing but do not explain why these effects are antiphase in cytosol or mitochondria.

      Moreover, the ∆F/F for calcium in mitochondria and cytosol both rise, cross the abscissa, and then diminish - strongly suggesting a biphasic signaling event. Therefore, one wonders whether measuring the area under the curve is the most functionally relevant measurement of the change. 

      We appreciate the reviewer’s insightful comments. As explained in our previous response, Ca2+ released from the ER is transferred to the cytosol and mitochondria. This transfer explains why the fluorescent intensities of cytosolic and mitochondrial Ca2+ indicators show anti-phasic responses to those of the ER.

      We agree that cytosolic and mitochondrial Ca2+ responses may be biphasic. The decrease below the abscissa in mitochondria and cytosol likely reflects Ca2+ extrusion from these organelles. However, our primary focus was on the initial uptake of Ca2+ following ER Ca2+ release. Thus, when calculating the area under the curve (AUC), we measured the area between the ∆F/F graph and the y=0 (X-axis) for both mitochondria and cytosol. We reason that the measuring the area under the curve (above the abscissa) fits with our objective.

      While addressing your concerns, we noticed errors in the Y-axis labels of Fig. 3C, 4D, and 5C. For the ER Ca2+ dynamics, we measured the area above curve. These mistakes have now been corrected.

      - Why are mitochondrial and cytosolic calcium not also demonstrated for Bmal1 KO astrocytes? 

      In two sets of experiments (Fig. 3 and Fig. 4), we demonstrated that the increase in cytosolic and mitochondrial Ca2+ aligns with ER Ca2+ release. Since there were no circadian time differences in ER Ca2+ release in the Bmal1 KO cultures, we concluded that it was unnecessary to measure Ca2+ levels in the mitochondria and cytosol. Additionally, our primary focus is on the ER Ca2+ response rather than the Ca2+ dynamics in subcellular organelles. We hope this clarifies our rationale and maintains the focus of our study.

      - The authors claim that Herp acts by regulating the degradation of ITPRs but this hypothesis - rather central to the mechanisms proposed in this study - is not experimentally substantiated. 

      We appreciate the reviewer’s insightful comments regarding the role of HERP in the degradation of IP3Rs. In the original manuscript, we demonstrated that treating cells with Herp siRNA leads to an increase in the levels of ITPR1 and ITPR2, suggesting that HERP might be involved in the regulation of IP3Rs stability. This observation is consistent with previous studies, which showed that Herp siRNA treatment increases ITPR levels in HeLa and cardiac cells (Paredes et al., 2016; Torrealba et al., 2017). Torrealba et al. also showed that HERP regulates the polyubiquitination of IP3Rs. Based on our results and previous reports, we hypothesized that HERP similarly regulates ITPR degradation in cultured astrocytes.

      However, as the reviewer rightly pointed out, further evidence is needed to confirm that HERP specifically regulates ITPR degradation. To address this, we conducted new experiments examining the effect of XesC, an inhibitor of IP3Rs, on ER Ca2+ release. The treatment of XesC reduced the ER Ca2+ release and abolished the enhancement of ER Ca2+ release by Herp KD. These results demonstrated that HERP influences ER Ca2+ response through IP3Rs. These new findings have been added to Fig. 3N – 3P and explained in the Results section (lines 217-221).

      We believe these additional experiments and clarifications strengthen our hypothesis that HERP regulates IP3R degradation, thereby modulating ER Ca2+ responses.

      - There is no clear demonstration of the functional relevance of the circadian rhythms of ATP-mediated calcium signaling.

      As mentioned in the previous response, we examined Cx43 phosphorylation linked with GJC conductance in the context of ATP-mediated Ca2+ signaling. Our results demonstrated circadian variations in Cx43 Ser368 phosphorylation leading to variations of gap junction channel (GJC) conductance (Fig. 6C – F and Fig. 7D - I). We have discussed the significance of this circadian rhythm in ATP driven ER Ca2+ signaling concerning astrocytic function during sleep/wake states in the manuscript (lines 357 – 382) as follows.

      “ATP-stimulated Cx43 (S368) phosphorylation is higher at 30hr (subjective night phase) than at 42hr (subjective day phase) (Fig. 6C and 6D.), a finding further supported by in vivo experiments showing higher pCx43(S368) levels in the prefrontal cortex during the subjective night than during the day (Fig. 6E and 6F). What are the implications of this day/night variation in Cx43 (S368) phosphorylation? We reasoned that the circadian variation in Cx43 phosphorylation could significantly impact astrocyte functionality within the syncytium. Indeed, our cultured astrocytes exhibited circadian phase-dependent variation in gap junctional communication (Fig.7D – 7F). Astrocytes influence synaptic activity through the release of gliotransmitters such as glutamate, GABA, D-serine, and ATP, triggered by increases in intracellular Ca2+ in response to the activity of adjacent neurons and astrocytes (Verkhratsky & Nedergaard, 2018). Importantly, this increase in Ca2+ spreads to adjacent astrocytes through GJCs (Fujii et al., 2017), influencing a large area of the neuronal network. Considering that Cx43 Ser368 phosphorylation occurs to uncouple specific pathways in the astrocytic syncytium to focus local responses (Enkvist & McCarthy, 1992), our findings suggest that astrocytes better equipped for localized responses when presented with a stimulus during the active phase in mice. Conversely, during the rest period, characterized by more synchronous neuronal activity across broad brain areas (Vyazovskiy et al., 2009) higher GJC conductance might allow astrocytes to exert control over a larger area. In support of this idea, recent study showed that synchronized astrocytic Ca2+ activity advances the slow wave activity (SWA) of the brain, a key feature of non-REM sleep (Szabó et al., 2017). Blocking GJC was found to reduce SWA, further supporting this interpretation. However, conflicting findings have also been reported. For instance, Ingiosi et al. (Ingiosi et al., 2020) found that astrocytic synchrony was higher during wakefulness than sleep in the mouse frontal cortex. Whether these differing results in astrocyte synchrony during resting and active periods are attributable to differences in experimental context (e.g., brain regions, sleep-inducing condition) remains unclear. Indeed, astrocyte Ca2+ dynamics during wakefulness/sleep vary according to brain regions (Tsunematsu et al., 2021). While the extent of astrocyte synchrony might differ depending on brain region and/or stimulus, on our results suggest that the baseline state of astrocyte synchrony, which is affected by GJC conductance, varies with the day/night cycle.”

      Reviewer #2 (Public Review): 

      Summary: 

      The article entitled "Circadian regulation of endoplasmic reticulum calcium response in mouse cultured astrocytes" submitted by Ryu and colleagues describes the circadian control of astrocytic intracellular calcium levels in vitro. 

      Strengths: 

      The authors used a variety of technical approaches that are appropriate 

      We appreciate the reviewer’s acknowledgement of the strengths of our manuscript.

      Weaknesses: 

      Statistical analysis is poor and could lead to a misinterpretation of the data 

      Thank you for the comment. We have carefully reviewed our statistical analyses and applied appropriate methods where necessary. Please see below for the specific revisions and improvements made.

      For Fig. 2D-E, we initially used a t-test. However, after adding more replicates and conducting a normality test, we found that the data did not follow a normal distribution. Therefore, we switched to the Mann-Whitney U test. In Fig. 5D-E, we originally used a repeated measures two-way ANOVA, but we have now changed it to a standard two-way ANOVA. For Fig. 7C and I, we also observed non-normal distribution in the normality test and consequently replaced the t-test with the Mann-Whitney U test. For other analyses not specifically mentioned, normality tests confirmed normal distribution, allowing us to use t-tests or ANOVA as appropriate for statistical analysis.

      Several conceptual issues have been identified. 

      We have addressed the reviewer’s concerns. Please see our detailed point-by-point responses below.

      Overinterpretation of the data should be avoided. This is a mechanistic paper done completely in vitro, all references to the in vivo situation are speculative and should be avoided. 

      We appreciate the reviewer’s insightful comment. Following the reviewer’s suggestion, we have removed the interpretations of GO pathways in the context of in vivo situation.

      Reviewer #3 (Public Review): 

      Astrocyte biology is an active area of research and this study is timely and adds to a growing body of literature in the field. The RNA-seq, Herp expression, and Ca2+ release data across wild-type, Bmal1 knockout, and Herp knockdown cellular models are robust and lend considerable support to the study's conclusions, highlighting their importance. Despite these strengths, the manuscript presents a gap in elucidating the dynamics of HERP and the involvement of ITPR1/2 in modulating Ca2+ release patterns and their circadian variations, which remains insufficiently supported and characterized. While the Connexin data underscore the importance of rhythmic Ca2+ release triggered by ATP, the relationship here appears correlational and the role of HERP and ITPR in Cx function remains to be characterized. Moreover, enhancing the manuscript's clarity and readability could significantly benefit the presentation and comprehension of the findings. 

      We appreciate the reviewer’s acknowledgement of the strengths of our manuscript. Regarding the identified gaps, we have conducted several new experiments to clearly demonstrate the HERP-ITPR-Cx phosphorylation axis. Please see our detailed point-by-point responses below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      - While HERP appears to be a clock-controlled gene and its protein levels appear to demonstrate rhythmicity as well, the data quality of the western blotting in Bmal1 knockout raises some concern about the accuracy of HERP protein quantification. 

      We understand the reviewer’s concern regarding the proximity of the HERP band to a nonspecific band in the Western blotting for the Bmal1 knockout. However, we took great care to ensure the accuracy of our HERP band quantification. We meticulously selected only the specific HERP band, excluding nonspecific band. Therefore, we are confident in the accuracy of our HERP protein measurements.

      - If HERP is rhythmic and ITPRs are not, if their model is correct, might we expect HERP suppression to result in 'unmasking' an ITPR rhythm? 

      Our model suggests that both HERP and ITPRs are rhythmic, with HERP regulating the degradation of ITPR proteins and driving their rhythms. Consistent with this, we observed that day/night variations in ITPR2 levels (Fig. 4N and 4O). Therefore, we concluded that circadian variations in HERP are sufficient to drive ITPR2 rhythms. We have explained this in detail in the Result section (lines 236-241) and the Discussion section (lines 324-332).

      - The authors make a rather abrupt switch to examining gap junctions and connexin 43 phosphorylation. While the data demonstrating that the phosphorylation of S368 may indeed be rhythmic - the authors do not connect these data to the rest of the manuscript by showing a connection to HERP-mediated calcium signaling, limiting the coherence of the narrative. 

      Thank you for the reviewer’s insightful comments. To address the reviewer's concern regarding the connection between Herp and the phosphorylation of CX43 at S368, we have conducted new experiments to test whether KD of Herp abolishes the rhythms of Cx43 phosphorylation at S368. We found that the phosphorylation of Cx43 at S368 is significantly enhanced at 30hrs post sync compared with 42hrs post sync in control siRNA-treated astrocytes consistent with our previous results (Fig. 6C & 6D). On the other hand, this circadian phase dependent difference in phosphorylation was abolished in Herp siRNA treated astrocytes. These results clearly indicate that circadian variations in Cx43 phosphorylation are driven by the HERP. These new results are now included in Fig. 6G and 6H and explained in the Results section (lines 276-281).

      - Comment on data presentation: the authors repeatedly present histograms with attached lines between data points - from my understanding of the experiments, this is inappropriate unless these were repeated measures from the same cells. Otherwise, the lines connecting one data point to another between different conditions (e.g., Ctrl or Herp knockdown) are arbitrary and possibly misleading (i.e., Figure 3K, 3M, 4L, 6D). 

      Thank you for the reviewer’s comment. We have updated the figures by removing the lines connecting data points in the relevant figures (Fig.3K, M, Fig4.N and Fig.6D)).

      Reviewer #2 (Recommendations For The Authors): 

      Most of the suggestions of this reviewer are related to the conceptual interpretation and presentation of the data and to the statistical analysis 

      In Figure 1 the authors analyzed the rhythmic transcriptome of cortical astrocytes synchronized with a serum shock in two different ways. The authors need to discuss what is the difference between the two methods used to detect rhythmic transcripts and make sense of them. 

      Following the reviewer’s suggestion, we have provided a more detailed explanation about MetaCycle and BioCycle, as well as the rationale for using both packages in our analysis as follows: “Various methods have been used to identify periodicity in time-series data, such as Lomb-Scargle (Glynn et al., 2006), JTK_CYCLE (Hughes et al., 2010) and ARSER (Yang & Su, 2010), each with distinct advantages and limitations. MetaCycle, integrates these three methods, facilitating the evaluation of periodicity in time-series data without requiring the selection of an optimal algorithm (Wu et al., 2016). Additionally, BioCycle has been developed using a deep neural network trained with extensive synthetic and biological time series datasets (Agostinelli et al., 2016). Because MetaCycle and Biocycle identify periodic signal based on different algorithms, we applied both packages to identify periodicity in our time-series transcriptome data. BioCycle and MetaCycle analyses detected 321 and 311 periodic transcripts, respectively (FDR corrected, q-value < 0.05) (Fig. 1B). Among these, 220 (53.4%) were detected by both methods, but many transcripts did not overlap. MetaCycle is known for its inability to detect asymmetric waveforms (Mei et al., 2020). In our analysis, genes with increasing waveforms like Adora1 and Mybph were identified as rhythmic only by BioCycle, while Plat and Il34 were identified as rhythmic only by MetaCycle (Fig. S1C). Despite these discrepancies, the clear circadian rhythmic expression profiles of these genes led us to conclude that using the union of the two lists compensates for the limitations of each algorithm.”

      Please refer to lines 105-117 in the Results section.

      The reasoning for comparing CT0 with the phase of the clock 8 hs after SS needs to be explained. Circadian time (CT) conceptually refers to the clock phase in the absence of entrainment cues in vivo, the direct transformation of "time after synchronization" in vitro to CT is misleading. 

      Thank you for the reviewer’s insightful comments. Initially, we believed that transforming TASS to CT, despite being in vitro data, might provide a more intuitive and physiologically relevant interpretation of our results. However, we agree that this approach might be misleading. Following the reviewer’s suggestion, we have revised our terminology by changing “CT” to “Time post sync (hr)”. Nonetheless, in Fig. 1F for circular peak phase map, we set 8hrs post sync to ZT0 based on a phase comparison result in Fig. 1D for physiologically relevant interpretation. We hope these revisions clarify our approach.

      Moreover, also by definition a CT cannot be defined in terms of "dark" or "light". Figure 6M needs to be changed. 

      Following the reviewer’s suggestion, we removed the labels CT22 and CT34. Instead. we have labeled the respective periods as “30hr post sync” and “42hr post sync”.

      In Figure 1D, the authors present a gene ontology analysis that is certainly interesting, however, it should not be overinterpreted when trying to explain processes that take place only in vivo (e.g. wound repair). 

      Thank you for the insightful comment. Following the reviewer’s feedback, we have removed the paragraph interpreting the cell migration process in relation to wound repair and have focused instead on Ca2+ ion homeostasis.

      In Figure 2A the relative expression of clock genes and Herp is again misleading by a white/grey shading indicating subjective night and subjective day when the system under study is a cell culture. 

      We understand the reviewer’s concern that a cell culture system is not equivalent to light/dark entrainment condition. However, we apply time-synchronizing stimuli to recapitulate in vivo entrainment. In addition, by comparing our data with CircaDB, we defined 8hrs post sync as corresponding to ZT0, thus aligning it with the beginning of the day. We have retained the shading to facilitate easier interpretation of our data in relation to in vivo situations. However, in response to the reviewer’s concern, we have revised the shading from white/grey to light grey/dark grey. We hope this adjustment addresses the reviewer’s concern, but if the reviewer still believes it is inappropriate, please let us know, we will gladly update it.

      In the Figure 2A legend, it is indicated that rhythmicity is assessed using MetaCycle with mean values obtained from n=2. The authors need to make clear whether this n=2 mean: 2 biological replicates or 2 technical replicates. This difference is relevant because it would make the analysis statistically valid or invalid, respectively. 

      Thank you for your feedback. n=2 refers to 2 biological replicates. Therefore, the analysis is statistically valid.

      In Figures 2C and D the authors applied a T-test, a parametric statistical test for one-to-one comparison that requires normality distribution of the data to be tested first. To test normality, the authors need at least 4 biological replicates. The suggestion of this reviewer is that these experiments have to be repeated and proper statistics applied. 

      Thank you for your feedback. In response to the reviewer's suggestion, we conducted additional experiments to increase the number of biological replicates to 4. After verifying the normality of the data, we applied a t-test for Figure 2C and a Mann-Whitney test for Figure 2D and 2E. These tests confirmed significant statistical difference between groups.

      Further evidence of Bmal1-dependent control of HERP circadian expression authors could check the presence of E-Box elements in the Herp promoter. 

      Thank you for the reviewer’s insightful comment. In the original version of our manuscript's Discussion section, we mentioned the absence of a canonical E-Box in the upstream of Herp gene. However, following the reviewer’s suggestion and considering the potential role of non-canonical E-Boxes, we conducted an additional analysis. This analysis identified several non-canonical E-Boxes within the 6 kb upstream region of the Herp gene (Table S2). Notably, we found one non-canonical E-Box, “CACGTT,” known to regulate circadian expression (Yoo et al., 2005) is close to the transcription start site (chr8:94386194-94386543). Moreover, this element is evolutionarily conserved across various mammals, including humans, rats, mice, dogs, and opossums (See Author response image 2). Therefore, we reasoned that these non-canonical E boxes might drive the CLOCK/BMAL1 dependent expression of Herp. We have updated the Discussion to reflect these findings in lines 315-319.

      Author response image 2.

      The calcium experiments shown in Figures 3A-I, could be more convincing if the authors showed that the different Ca2+ sensors are compartment-specific by showing co-localization with a subcellular marker. In the pictures shown it is not even possible to recognize the cell dimensions. 

      Following the reviewer’s suggestion, we performed co-staining experiments with organelle specific Ca2+ indicators and organelle markers. First, astrocytes were co-transfected with G-CEPIA1er, an ER specific Ca2+ indicator and ER targeted DsRed2 (with Calreticulin signal sequence). Live imaging analysis showed that the fluorescent intensities of G-CEPIA1er and DsRed2-ER-5 significantly overlapped in co-transfected cells. Secondly, astrocytes were transfected with Mito-R-GECO1 and Mitotracker, a cell permeable mitochondria dye, was applied. The fluorescent intensities of Mito-R-GECO1 and Mitotracker also significantly overlapped. These new data are included in Figure S4 and explained in the Result section (lines 194-195).

      Data analysis in Figure 3 K and M is misleading. According to the explanations of the results, each of the experiments to assess ITRP1 or 2 is run independently. Then it is not clear why the relative levels obtained with control or Herp siRNA are plotted as pairs. Same comment as above for Figure 4L and Figure 6D. 

      Thank you for the reviewer’s insightful comments. Reviewer1 raised similar issues. Following the reviewers’ suggestions, we have removed the lines connecting the data points in Fig. 3K, 3M, 4L, and 6D.

      In Figure 5E the authors need to explain why they consider that repeated measures 2-way ANOVA is the right statistical test to apply. According to the explained experimental design, cells transfected, synchronized, and then harvested independently at the indicated time after synchronization. 

      Thank you for the reviewer’s insightful comment. Upon reviewing the statistical methods as suggested, we have revised our approach. Instead of using repeated measures 2-way ANOVA, we have now applied a standard 2-way ANOVA, which is more appropriate given the experimental procedures were independent, as the reviewer pointed out.

      The English language needs to be revised throughout the text. 

      We have thoroughly revised the English language throughout the text.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Figure 3. Clarify the physiological importance of 100 µM ATP. Would the Herp rhythm warrant Ca2+ release rhythms under basal conditions? In 3J-K, the relatively weak effect of Herp knockdown on ITPR1/2 levels, albeit statistically significant, may not be physiologically significant. This calls into question the claimed Herp-ITPR axis that underlies the Ca2+ release phenotype. Further, the correlation certainly exists but further characterization of Herp KD cells would be required to address the mechanism. 

      As previously reported, a broad range of ATP concentrations can induce Ca2+ activity in the astrocytes (Neary et al., 1988). Originally, we conducted an ATP dose-response analysis to observe ER Ca2+ release in our primary astrocyte culture. Our results show that ER Ca2+ release begins at 50 µM ATP and plateaus at 500 µM. Please refer to Author response image 3. We selected 100µM ATP for our experiments because it induces a medium level of ER Ca2+ response. Importantly, although measuring ATP concentrations at the synapse in vivo is challenging(Tan et al., 2017), estimates suggest synaptic ATP concentrations range from 5-500 µM (Pankratov et al., 2006). Thus, 100µM ATP is a physiologically relevant concentration that can affect nearby cells, including astrocytes, in the nervous system.

      Author response image 3.

      Cultured astrocytes were transfected with G-CEPIA1er ER and at 48hrs post transfection, cultured astrocytes were treated with various concentrations of ATP and Ca2+ imaging analysis was performed. (A) ΔF/F0 values over time following ATP application. (B) Area above curve values. Values in graphs are mean ± SEM (*p < 0.05, **p < 0.005, ***p < 0.0005, and ****p < 0.00005; one-way ANOVA).

      Regarding the comment on Ca2+ release rhythms under basal conditions, we interpret this as referring Ca2+ release in the absence of a stimulus. We typically observe Ca2+ release only upon stimulation, such as ATP treatment. However, we acknowledge that the modest effects of HERP knockdown on ITPR1/2 levels could question the HERP-ITPR axis’s role in ER Ca2+ release.

      To address this, we analyzed whether Herp KD induced increases in ER Ca2+ release were mediated through ITPRs by treating cells with Xestospongin C (XesC), an IP3R inhibitor. XesC treatment reduced ATP-induced ER Ca2+ release and eliminated the differences in ER Ca2+ release between control and Herp KD astrocytes (Fig. 3N – 3P). These results clearly indicate that HERP-ITPR axis plays critical role in controlling ER Ca2+ release. These new experiments have been included in Fig. 3 and explained in the result section (lines 217-221).

      Furthermore, following the reviewer’s suggestion, we examined whether HERP rhythms underlie the rhythms of ER Ca2+ response by analyzing ER Ca2+ response in Herp KD astrocyte in two different times following synchronization. In control astrocytes, ATP-induced ER Ca2+ responses vary depending on time, whereas these time-dependent variations were abolished in Herp KD astrocytes. These new experiments have been included in Fig. 4K – 4M and explained in the Results section (lines 232-235).

      Collectively, these results indicate that HERP rhythms lead to time-dependent differences in ER Ca2+ response through ITPRs.

      (2) Figure 4K-L. As data suggested the involvement of ITPR1 and ITPR2 (circadian effect), a reasonable next step is to determine their involvement, but the study did not pursue the hypothesis. 

      Thank you for your insightful comment. Our results indeed suggest that rhythms in ITPR2 levels may drive the time-dependent variations in ATP-induced ER Ca2+ release following synchronization. The newly conducted experiments demonstrated that treatment with the ITPR inhibitor XesC suppressed ATP-induced ER Ca2+ release at both control and Herp siRNA treatment conditions (Fig. 3). Based on these findings, we now further confirm that rhythms of ITPR levels, specifically ITPR2 underlie the circadian variations in ER Ca2+ release. While examining the effect of ITPR2 siRNA would directly prove the involvement of ITPR2, we have decided to pursue this experiment in the future studies.

      (3) Figure 5A-C. Data from WT cells should be included side by side with Bmal1-/- cells for comparison which is expected to be consistent with the HERP levels as in 5D-E. Again, the role of ITPR2 is suggested but not demonstrated. 

      Following the reviewer's suggestion, we conducted additional experiments including both WT and Bmal1-/- cultured astrocytes side-by-side. The results were consistent with our previous findings: WT astrocytes showed rhythms of ER Ca2+ release while Bmal1-/- astrocytes did not. We have updated the Figure 5A to 5C and the corresponding Results section in lines 242-245 accordingly.<br /> Regarding second comment, as mentioned in our previous response, we plan to examine the role of ITPR2 in further studies.

      (4) Figure 6. The Connexin data seems an addon and is correlative with the Ca2+ release. The role of Herp and Itpr in Connexin function is not addressed. Figure 6E-F was not called out in the results section. Suggest providing additional data to support the role of the HERP-ITPR axis in regulating Ca2+ release and Connexin activity. 

      We agree that additional data are needed to support the role of HERP in regulating CX43 phosphorylation. Therefore, we have conducted further experiments to determine whether rhythms of Cx43 phosphorylation are regulated by HERP. In the control astrocytes, ATP treatment induced time-dependent variations in Cx43 phosphorylation. However, these rhythms were abolished in Herp KD astrocytes. These results indicate that rhythms in HERP levels contribute to the time-dependent variations in Cx43 phosphorylation. These new experiments have included in Fig. 6G and 6H and explained in the results section (lines 276-281).

      Regarding second comment, we have corrected our oversight by properly referencing figures 6E-F in the results section. Please refer to lines 357-359 for clarification.

      (5) Discussion. This section should focus on noteworthy points to discuss, not repeating the results. 

      Based on the reviewer's valuable suggestions, we have revised the Discussion section to minimize repetition of the results. Thank you for your guidance.

      (6) The manuscript exhibits numerous grammatical and textual inaccuracies that necessitate careful revision by the authors. My observations here are confined to the title and the abstract alone. I recommend altering the title from "mouse cultured astrocytes" to "cultured mouse astrocytes" for clarity and grammatical correctness. The abstract, meanwhile, needs enhancements both in terms of its content and language. It should incorporate the results of the partitioning among the ER, cytoplasm, and mitochondria, and provide clear definitions for some of the critical terms used. It's worth noting that the abstract's second sentence contains a grammatical error. 

      Thank you for the reviewer’s valuable feedback. We have carefully revised the title, abstract, and main text to address the grammatical and textual issues. The title has been changed to “cultured mouse astrocytes”. Additionally, the abstract now includes results related to cytoplasmic Ca2+ dynamics and has been revised in several places. We appreciate your insights and have worked to enhance the content and language accordingly.

      Reference

      Agostinelli, F., Ceglia, N., Shahbaba, B., Sassone-Corsi, P., & Baldi, P. (2016). What time is it? Deep learning approaches for circadian rhythms. Bioinformatics, 32(12), i8-i17. https://doi.org/10.1093/bioinformatics/btw243

      Cahoy, J. D., Emery, B., Kaushal, A., Foo, L. C., Zamanian, J. L., Christopherson, K. S., Xing, Y., Lubischer, J. L., Krieg, P. A., Krupenko, S. A., Thompson, W. J., & Barres, B. A. (2008). A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. J Neurosci, 28(1), 264-278. https://doi.org/10.1523/JNEUROSCI.4178-07.2008

      Carreras-Sureda, A., Pihán, P., & Hetz, C. (2018). Calcium signaling at the endoplasmic reticulum: fine-tuning stress responses. Cell Calcium, 70, 24-31. https://doi.org/10.1016/j.ceca.2017.08.004

      Enkvist, M. O., & McCarthy, K. D. (1992). Activation of protein kinase C blocks astroglial gap junction communication and inhibits the spread of calcium waves. J Neurochem, 59(2), 519-526. https://doi.org/10.1111/j.1471-4159.1992.tb09401.x

      Fujii, Y., Maekawa, S., & Morita, M. (2017). Astrocyte calcium waves propagate proximally by gap junction and distally by extracellular diffusion of ATP released from volume-regulated anion channels. Scientific Reports, 7(1), 13115. https://doi.org/10.1038/s41598-017-13243-0

      Giorgi, C., Marchi, S., & Pinton, P. (2018). The machineries, regulation and cellular functions of mitochondrial calcium. Nature Reviews Molecular Cell Biology, 19(11), 713-730. https://doi.org/10.1038/s41580-018-0052-8

      Glynn, E. F., Chen, J., & Mushegian, A. R. (2006). Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics, 22(3), 310-316. https://doi.org/10.1093/bioinformatics/bti789

      Hughes, M. E., Hogenesch, J. B., & Kornacker, K. (2010). JTK_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. J Biol Rhythms, 25(5), 372-380. https://doi.org/10.1177/0748730410379711

      Ingiosi, A. M., Hayworth, C. R., Harvey, D. O., Singletary, K. G., Rempe, M. J., Wisor, J. P., & Frank, M. G. (2020). A Role for Astroglial Calcium in Mammalian Sleep and Sleep Regulation. Curr Biol, 30(22), 4373-4383.e4377. https://doi.org/10.1016/j.cub.2020.08.052

      Mei, W., Jiang, Z., Chen, Y., Chen, L., Sancar, A., & Jiang, Y. (2020). Genome-wide circadian rhythm detection methods: systematic evaluations and practical guidelines. Briefings in Bioinformatics, 22(3). https://doi.org/10.1093/bib/bbaa135

      Neary, J. T., van Breemen, C., Forster, E., Norenberg, L. O., & Norenberg, M. D. (1988). ATP stimulates calcium influx in primary astrocyte cultures. Biochem Biophys Res Commun, 157(3), 1410-1416. https://doi.org/10.1016/s0006-291x(88)81032-5

      Pankratov, Y., Lalo, U., Verkhratsky, A., & North, R. A. (2006). Vesicular release of ATP at central synapses. Pflugers Arch, 452(5), 589-597. https://doi.org/10.1007/s00424-006-0061-x

      Paredes, F., Parra, V., Torrealba, N., Navarro-Marquez, M., Gatica, D., Bravo-Sagua, R., Troncoso, R., Pennanen, C., Quiroga, C., Chiong, M., Caesar, C., Taylor, W. R., Molgó, J., San Martin, A., Jaimovich, E., & Lavandero, S. (2016). HERPUD1 protects against oxidative stress-induced apoptosis through downregulation of the inositol 1,4,5-trisphosphate receptor. Free Radic Biol Med, 90, 206-218. https://doi.org/10.1016/j.freeradbiomed.2015.11.024

      Szabó, Z., Héja, L., Szalay, G., Kékesi, O., Füredi, A., Szebényi, K., Dobolyi, Á., Orbán, T. I., Kolacsek, O., Tompa, T., Miskolczy, Z., Biczók, L., Rózsa, B., Sarkadi, B., & Kardos, J. (2017). Extensive astrocyte synchronization advances neuronal coupling in slow wave activity in vivo. Scientific Reports, 7(1), 6018. https://doi.org/10.1038/s41598-017-06073-7

      Tan, Z., Liu, Y., Xi, W., Lou, H. F., Zhu, L., Guo, Z., Mei, L., & Duan, S. (2017). Glia-derived ATP inversely regulates excitability of pyramidal and CCK-positive neurons. Nat Commun, 8, 13772. https://doi.org/10.1038/ncomms13772

      Torrealba, N., Navarro-Marquez, M., Garrido, V., Pedrozo, Z., Romero, D., Eura, Y., Villalobos, E., Roa, J. C., Chiong, M., Kokame, K., & Lavandero, S. (2017). Herpud1 negatively regulates pathological cardiac hypertrophy by inducing IP3 receptor degradation. Sci Rep, 7(1), 13402. https://doi.org/10.1038/s41598-017-13797-z

      Tsunematsu, T., Sakata, S., Sanagi, T., Tanaka, K. F., & Matsui, K. (2021). Region-specific and state-dependent astrocyte Ca<sup>2+</sup> dynamics during the sleep-wake cycle in mice. The Journal of Neuroscience, JN-RM-2912-2920. https://doi.org/10.1523/jneurosci.2912-20.2021

      Verkhratsky, A., & Nedergaard, M. (2018). Physiology of Astroglia. Physiol Rev, 98(1), 239-389. https://doi.org/10.1152/physrev.00042.2016

      Vyazovskiy, V. V., Olcese, U., Lazimy, Y. M., Faraguna, U., Esser, S. K., Williams, J. C., Cirelli, C., & Tononi, G. (2009). Cortical firing and sleep homeostasis. Neuron, 63(6), 865-878. https://doi.org/10.1016/j.neuron.2009.08.024

      Wu, G., Anafi, R. C., Hughes, M. E., Kornacker, K., & Hogenesch, J. B. (2016). MetaCycle: an integrated R package to evaluate periodicity in large scale data. Bioinformatics, 32(21), 3351-3353. https://doi.org/10.1093/bioinformatics/btw405

      Yang, R., & Su, Z. (2010). Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation. Bioinformatics, 26(12), i168-174. https://doi.org/10.1093/bioinformatics/btq189

      Yoo, S. H., Ko, C. H., Lowrey, P. L., Buhr, E. D., Song, E. J., Chang, S., Yoo, O. J., Yamazaki, S., Lee, C., & Takahashi, J. S. (2005). A noncanonical E-box enhancer drives mouse Period2 circadian oscillations in vivo. Proc Natl Acad Sci U S A, 102(7), 2608-2613. https://doi.org/10.1073/pnas.0409763102

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Yang, Hu et al. examined the molecular mechanisms underlying astrocyte activation and its implications for multiple sclerosis. This study shows that the glycolytic enzyme PKM2 relocates to astrocyte nuclei upon activation in EAE mice. Inhibiting PKM2's nuclear import reduces astrocyte activation, as evidenced by decreased proliferation, glycolysis, and inflammatory cytokine release. Crucially, the study identifies TRIM21 as pivotal in regulating PKM2 nuclear import via ubiquitination. TRIM21 interacts with PKM2, promoting its nuclear translocation and enhancing its activity, affecting multiple signaling pathways. Confirmatory analyses using single-cell RNA sequencing and immunofluorescence demonstrate TRIM21 upregulation in EAE astrocytes. Modulating TRIM21 expression in primary astrocytes impacts PKM2-dependent glycolysis and proliferation. In vivo experiments targeting this mechanism effectively mitigate disease severity, CNS inflammation, and demyelination in EAE.

      The authors supported their claims with various experimental approaches, however, some results should be supported with higher-quality images clearly depicting the conclusions and additional quantitative analyses of Western blots.

      Thanks for the reviewer’s comments. We agree with the reviewer and have added higher magnification images, for example Fig.2A to better visualize the localization of PKM2 in DASA-treated conditions, and Fig. 3A and Fig.3B to better visualize the pSTAT3 and pp65. Moreover, we have added quantitative analyses of Western blots for some key experiments, for example quantitative results for Fig.2D is added in Fig.S3 to show the change of PKM2 and p-c-myc in DASA-58-treated conditions and quantitative results for Fig. 3D are added in Fig.S4B and S4C to show the change of nuclear and cytoplasmic PKM2, STAT3 and NF-κB in different conditions.

      Strength:

      This study presents a comprehensive investigation into the function and molecular mechanism of metabolic reprogramming in the activation of astrocytes, a critical aspect of various neurological diseases, especially multiple sclerosis. The study uses the EAE mouse model, which closely resembles MS. This makes the results relevant and potentially translational. The research clarifies how TRIM21 regulates the nuclear import of PKM2 through ubiquitination by integrating advanced techniques. Targeting this axis may have therapeutic benefits since lentiviral vector-mediated knockdown of TRIM21 in vivo significantly reduces disease severity, CNS inflammation, and demyelination in EAE animals.

      We thank the reviewer for their positive and constructive comments on the manuscript.

      Weaknesses:

      The authors reported that PKM2 levels are elevated in the nucleus of astrocytes at different EAE phases compared to cytoplasmic localization. However, Figure 1 also shows elevated cytoplasmic expression of PKM2. The authors should clarify the nuclear localization of PKM2 by providing zoomed-in images. An explanation for the increased cytoplasmic PKM2 expression should provided. Similarly, while PKM2 translocation is inhibited by DASA-58, in addition to its nuclear localization, a decrease in the cytoplasmic localization of PKM2 is also observed. This situation brings to mind the possibility of a degradation mechanism being involved when its nuclear translocation of PKM2 is inhibited.

      According to the results of immunofluorescence staining of PKM2 in spinal cord of EAE mice and in cultured primary astrocytes, in addition to the observation of PKM2 nuclear translocation in EAE conditions, we showed an elevated expression of PKM2 in astrocytes, including the cytoplasmic and nuclear expression. In neurological diseases, various studies showed consistent results, for example, following spinal cord injury (SCI), not only the upregulated expressing of PKM2 but also nuclear translocation was observed in astrocytes (Zhang et al., 2015). In EAE conditions, CNS inflammation is elevated and several proinflammatory cytokines and chemokines might contribute to the upregulated expression of PKM2 in astrocytes. We have tested TNFα and IL-1β, which are recognized to play important roles in EAE and MS (Lin and Edelson, 2017, Wheeler et al., 2020), and results from western blots showed the increased expression of PKM2 upon stimulation with TNFα and IL-1β (Author response image 1). Moreover, according to the reviewer’s suggestions, we have added zoomed-in images for figure 2A.

      Additionally, the reviewer has noted the decrease in the cytoplasmic PKM2 level, degradation-related mechanism and other mechanisms might be involved in this process.

      Author response image 1.

      Upregulated expression of PKM2 in astrocytes following stimulation with TNF-α and IL-1β. Primary astrocytes were stimulated with TNF-α and IL-1β (50 ng/mL) for 48 h and western blotting analysis were performed.

      In Figure 3D, the authors claim that PKM2 expression causes nuclear retention of STAT3, p65, and p50, and inhibiting PKM2 localization with DASA-58 suppresses this retention. The western blot results for the MOG-stimulated group show high levels of STAT3, p50, and p65 in nuclear localization. However, in the MOG and DASA-58 treated group, one would expect high levels of p50, p65, and STAT3 proteins in the cytoplasm, while their levels decrease in the nucleus. These western blot results could be expanded. Additionally, intensity quantification for these results would be beneficial to see the statistical difference in their expressions, especially to observe the nuclear localization of PKM2.

      We agree with the reviewer’s comments and we have incorporated the quantification of STAT3,p50 and p65 for Fig.3D and Fig.S4B and Fig.S4C. Nevertheless, given that DASA-58 did not trigger a notable increase in the cytoplasmic level of PKM2, we did not detect an upregulation of STAT3, p50, or p65 in the cytoplasm of the MOG and DASA-58-treated groups. With the quantification results, it is more obvious to see the changes of these proteins in different conditions.

      The discrepancy between Figure 7A and its explaining text is confusing. The expectation from the knocking down of TRIM21 is the amelioration of activated astrocytes, leading to a decrease in inflammation and the disease state. The presented results support these expectations, while the images showing demyelination in EAE animals are not highly supportive. Clearly labeling demyelinated areas would enhance readers' understanding of the important impact of TRIM21 knockdown on reducing the disease severity.

      Thank you for pointing this out. We sincerely apologize for our carelessness. Based on your comments, we have made the corrections in the manuscript. As there is indeed a statistical difference in the mean clinical scores between shTRIM21-treated group and shVec group, we have accordingly revised the sentence for Figure 7A to state, “At the end time point at day 22 p.i., shTRIM21-treated group showed reduced disease scores compared to control groups (Fig. 7A).” .

      Additionally, we have added the whole image of the spinal cord for MBP in Author Response image 2. Moreover, we have labelled the demyelinated areas to facilitate readers’ understanding.

      Author response image 2.

      MBP staining of the whole spinal cord in EAE mice from shVec and shTRIM21 group. Scale bar: 100 μm. Demyelinated areas are marked with dashed lines.

      Reviewer #2 (Public Review):

      This study significantly advances our understanding of the metabolic reprogramming underlying astrocyte activation in neurological diseases such as multiple sclerosis. By employing an experimental autoimmune encephalomyelitis (EAE) mouse model, the authors discovered a notable nuclear translocation of PKM2, a key enzyme in glycolysis, within astrocytes.

      Preventing this nuclear import via DASA 58 substantially attenuated primary astrocyte activation, characterized by reduced proliferation, glycolysis, and inflammatory cytokine secretion.<br /> Moreover, the authors uncovered a novel regulatory mechanism involving the ubiquitin ligase TRIM21, which mediates PKM2 nuclear import. TRIM21 interaction with PKM2 facilitated its nuclear translocation, enhancing its activity in phosphorylating STAT3, NFκB, and c-myc. Single-cell RNA sequencing and immunofluorescence staining further supported the upregulation of TRIM21 expression in astrocytes during EAE.

      Manipulating this pathway, either through TRIM21 overexpression in primary astrocytes or knockdown of TRIM21 in vivo, had profound effects on disease severity, CNS inflammation, and demyelination in EAE mice. This comprehensive study provides invaluable insights into the pathological role of nuclear PKM2 and the ubiquitination-mediated regulatory mechanism driving astrocyte activation.

      The author's use of diverse techniques, including single-cell RNA sequencing, immunofluorescence staining, and lentiviral vector knockdown, underscores the robustness of their findings and interpretations. Ultimately, targeting this PKM2-TRIM21 axis emerges as a promising therapeutic strategy for neurological diseases involving astrocyte dysfunction.

      While the strengths of this piece of work are undeniable, some concerns could be addressed to refine its impact and clarity further; as outlined in the recommendations for the authors.

      Thanks for the reviewer’s comment and positive evaluation of our present work. We have further answered each question in recommendations section.

      Reviewer #3 (Public Review):

      Summary:

      Pyruvate kinase M2 (PKM2) is a rate-limiting enzyme in glycolysis and its translocation to the nucleus in astrocytes in various nervous system pathologies has been associated with a metabolic switch to glycolysis which is a sign of reactive astrogliosis. The authors investigated whether this occurs in experimental autoimmune encephalomyelitis (EAA), an animal model of multiple sclerosis (MS). They show that in EAA, PKM2 is ubiquitinated by TRIM21 and transferred to the nucleus in astrocytes. Inhibition of TRIM21-PKM2 axis efficiently blocks reactive gliosis and partially alleviates symptoms of EAA. Authors conclude that this axis can be a potential new therapeutic target in the treatment of MS.

      Strengths:

      The study is well-designed, controls are appropriate and a comprehensive battery of experiments has been successfully performed. Results of in vitro assays, single-cell RNA sequencing, immunoprecipitation, RNA interference, molecular docking, and in vivo modeling etc. complement and support each other.

      Weaknesses:

      Though EAA is a valid model of MS, a proposed new therapeutic strategy based on this study needs to have support from human studies.

      We agree that although we have clarified the therapeutic potential of targeting TRIM21 or PKM2 in the treatment of EAE, a mouse model of MS, the application in human studies warrants further studies. While considering the use of TRIM21 as a target for treating multiple sclerosis in clinical trials, several issues need to be addressed to ensure the safety, efficacy and feasibility. One such aspect is the development of drug that specifically target TRIM21 in brain, capable of crossing the blood-brain barrier and have minimal off-target effects. The translation of preclinical finding into clinical trials poses a significant challenge. To provide evidence for the similarities between the EAE model and multiple sclerosis, we have screened GEO databases (Author response image 3). In GSE214334 which analyzed transcriptional profiles of normal-appearing white matter from non-MS and different subtypes of disease (RRMS, SPMS and PPMS). Although no statistical difference was observed among different groups, the TRIM21 expression has tendency to increase in SPMS (secondary progressive MS) and PPMS (primary progressive MS) patients. In GSE83670, astrocytes from 3 control white matter and 4 multiple sclerosis normal appearing white matter (NAWM) were analyzed. TRIM21 mRNA expression is higher in MS group (78.73 ± 10.44) compared to control group (46.67 ± 24.15). Although these two GEO databases did not yield statistically significant differences, TRIM21 expression appears to be elevated in the white matter of MS patients compared to controls.

      To address this limitation, we have incorporated the following statement in the discussion section: “However, whether TRIM21-PKM2 could potentially serve as therapeutic targets in multiple sclerosis warrants further studies.”

      Author response image 3.

      TRIM21 expression in control and MS patients based on published GEO database. (A) The expression of TRIM21 in normal-appearing white matter in non-MS (Ctl) and different clinical subtypes of MS (RRMS, SPMS, PPMS) based on GSE214334 (one-way ANOVA). (B) The expression of TRIM21 from multiple sclerosis normal appearing white matter (NAWM) and control WM based on GSE83670. RRMS, relapsing--remitting MS; SPMS, secondary progressive MS; PPMS, primary progressive MS (unpaired Student's t test). Data are represented as the means ± SEM.

      Reviewer #4 (Public Review):

      Summary:

      The authors report the role of the Pyruvate Kinase M2 (PKM2) enzyme nuclear translocation as fundamental in the activation of astrocytes in a model of autoimmune encephalitis (EAE). They show that astrocytes, activated through culturing in EAE splenocytes medium, increase their nuclear PKM2 with consequent activation of NFkB and STAT3 pathways. Prevention of PKM2 nuclear translocation decreases astrocyte counteracts this activation. The authors found that the E3 ubiquitin ligase TRIM21 interacts with PKM2 and promotes its nuclear translocation. In vivo, either silencing of TRIM21 or inhibition of PKM2 nuclear translocation ameliorates the severity of the disease in the EAE model.

      Strengths:

      This work contributes to the knowledge of the complex action of the PKM2 enzyme in the context of an autoimmune-neurological disease, highlighting its nuclear role and a novel partner, TRIM21, and thus adding a novel rationale for therapeutic targeting.

      Weaknesses:

      Despite the relevance of the work and its goals, some of the conclusions drawn would require more thorough proof:

      I believe that the major weakness is the fact that TRIM21 is known to have per se many roles in autoimmune and immune pathways and some of the effects observed might be due to a PKM2-independent action. Some of the experiments to link the two proteins, besides their interaction, do not completely clarify the issue. On top of that, the in vivo experiments address the role of TRIM21 and the nuclear localisation of PKM2 independently, thus leaving the matter unsolved.

      We agree that TRIM21 has multifunctional roles and only some of their effects are due to PKM2-independent action. It is obvious that TRIM21 functions as ubiquitin ligases and its substrate are various. Here we identify PKM2 as one of its interacting proteins and our focus is the relationship between TRIM21 and the nuclear translocation PKM2, we have used diverse experiments to clarify their relationships, for example immunoprecipitation, western blotting, immunofluorescence, cyto-nuclear protein extraction. These aforementioned experiments are key points of our studies. From the results of in vitro experiments, targeting either TRIM21 or PKM2 might be potential targets for EAE treatment. Expectedly, from in vivo experiments, either targeting TRIM21 or PKM2 nuclear transport ameliorated EAE. In order to test the relationship of TRIM21 and PKM2 nuclear transport in vivo, we have stained PKM2 in shVec and shTRIM21-treated mice. Expectedly, knocking down TRIM21 led to a decrease in the nuclear staining of PKM2 in spinal cord astrocytes in EAE models (Figure S7A). This observation underscores that the therapeutic potential of inhibiting TRIM21 in astrocytes in vivo might be partially due to its role in triggering the reduced nuclear translocation of PKM2.

      Some experimental settings are not described to a level that is necessary to fully understand the data, especially for a non-expert audience: e.g. the EAE model and MOG treatment; action and reference of the different nuclear import inhibitors; use of splenocyte culture medium and the possible effect of non-EAE splenocytes.

      According to the reviewer’s suggestions, we have added more detailed descriptions in the materials and methods section, for example, the use of splenocytes culture medium, mass spectrometry, HE and LFB staining have been added. More details are incorporated in the part for “EAE induction and isolation and culture of primary astrocytes”. Moreover, the reference of DASA-58 in vitro and TEPP-46 in vivo as inhibitors of PKM2 nuclear transport were added.

      The statement that PKM2 is a substrate of TRIM21 ubiquitin ligase activity is an overinterpretation. There is no evidence that this interaction results in ubiquitin modification of PKM2; the ubiquitination experiment is minimal and is not performed in conditions that would allow us to see ubiquitination of PKM2 (e.g. denaturing conditions, reciprocal pull-down, catalytically inactive TRIM21, etc.).

      To prevent the misunderstanding, we have revised certain statements in the manuscript. In the updated version, the description is as follows: Hereby, we recognized PKM2 as an interacting protein of TRIM21, and further studies are required to determine if it is a substrate of E3 ligase TRIM21.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      General recommendations:

      - The whole manuscript needs language editing.

      We appreciate the comments of the reviewers. We have improved the writing of the manuscript. All modifications are underlined.

      - Details of many experiments are not given in the materials and methods.

      According to the reviewer’s suggestions, we have added more details for experiments in the materials and methods. For example, “Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes”, “mass spectrometry”, “Hematoxylin-Eosin (HE) and Luxol Fast Blue (LFB) staining” were added in the section of Materials and Methods. More detailed information is given for EAE induction and isolation and culture of primary astrocytes.

      - Line properties in graphics should be corrected, some lines in box plots and error bars are very weak and hardly visible. Statistical tests should be included in figure legends as well. Statistical differences should be mentioned for control vs DASA-58 (alone) in all related figures.

      We have revised the figures to enhance their visibility by thickening the lines and error bars. In accordance with the reviewer’s suggestions, we have incorporated statistical tests in figure legends. Moreover, statistical analysis has been made among all groups, if there is no asterisk indicated in the figure legend and figure panels, it means there is no statistical difference between the control vs DASA-58 groups. For most of the experiments conducted in our studies, including lactate production, glucose consumption, the EdU analysis and CCK8 analysis, the change of STAT3 and NF-κB pathways, no statistical difference was observed between the control and DASA-58 group. The reason might be due to that in unstimulated astrocytes, the expression of PKM2 is low and nuclear translocation of PKM2 are few, which may explain why DASA-58 did not exert the anticipated effect. Thus, in our experiments, we have used MOGsup to stimulate astrocytes, enabling us to observe the impact of DASA-58 on the astrocyte proliferation and glycolysis in this condition.

      - Scale bars, arrows, and labeling in the images are not visible.

      We have improved the images according to the reviewer’s suggestions. The scale bars, arrows are made thicker and labeling are larger. The updated figures are visible.

      - Quantitative analysis of all western blot results and their statistics could be provided in every image and for every protein.

      For western blotting results which are further processed with quantitative analysis, for example, Fig.2D, fig. 5G, Fig. 6A and 6B, Fig. S4, we have added their statistics in the raw data sections. The other western blot results, for example, IP analysis, which are used to analyze protein-protein binding are not further processed with quantitative analysis.

      - Proteins that are used for normalizations in western blots should be stated in the text.

      We have added description of proteins that are used for normalization in western blots in figure legends. Moreover, in figure panels, proteins used for normalization are indicated. Globally, whole protein level is normalized to protein level of β-actin. For nuclear and cytoplasmic proteins, nuclear protein is normalized to the expression of lamin, cytoplasmic protein is normalized to the expression of tubulin. 

      - The manuscript investigates the role of TRIM21 in the nuclear localization of PKM2 in astrocytes in EAE mice, however almost no information is given about TRIM21 in the introduction. Extra information is given for PKM2, yet can be concisely explained.

      We have added a paragraph that describes the information of TRIM21 in the introduction section. The description is as follows: “TRIM21 belongs to the TRIM protein family which possess the E3 ubiquitin ligase activity. In addition to its well-recognized function in antiviral responses, emerging evidences have documented the multifaceted role of TRIM21 in cell cycle regulation, inflammation and metabolism (Chen et al., 2022). Nevertheless, the precise mechanisms underlying the involvement of TRIM21 in CNS diseases remain largely unexplored.”

      - "As such, deciphering glycolysis-dominant metabolic switch in astrocytes is the basis for understanding astrogliosis and the development of neurological diseases such as multiple sclerosis." The sentence could be supported by references.

      To support this sentence, we have added the following references:

      (1) Xiong XY, Tang Y, Yang QW. Metabolic changes favor the activity and heterogeneity of reactive astrocytes. Trends in endocrinology and metabolism: TEM 2022;33(6):390-400.

      (2) das Neves SP, Sousa JC, Magalhães R, Gao F, Coppola G, Mériaux S, et al. Astrocytes Undergo Metabolic Reprogramming in the Multiple Sclerosis Animal Model. Cells 2023;12(20):2484.

      Figure 1/Result 1:

      - Figure 1A-B: Quality of the images should be improved.

      According to the reviewer’s suggestion, we have improved the quality of the image, images with higher resolution were added in figure 1A and figure 1B.

      - Control images of Figure 1B are not satisfying. GFAP staining is very dim. Images from control cells should be renewed.

      As mentioned by the reviewer’s, we have renewed the control images and added the DAPI staining figures for all groups. Compared with MOGsup stimulated astrocytes, the control cells are not in activated state and GFAP are relatively low.

      - Labelings on the images are not sufficient, arrows and scale bars are not visible.

      We have improved the images including labels, arrows and scale bars in all figures.

      - How splenocytes were obtained from MOG induced mice were not given in the material and methods section. Thus, it should be clearly stated how splenocyte supernatant is generated (treatment details).

      We have added the detailed information relating to splenocyte isolation and splenocyte supernatant entitled “Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes” in the section of Materials and methods. “Splenocytes were isolated from EAE mice 15 d (disease onset) after MOG35-55 immunization. Briefly, spleen cells were suspended in RPMI-1640 medium containing 10% FBS. Splenocytes were plated in 12-well plates at 1x106 cells/well containing 50 μg/mL MOG35-55 and cultured at 37°C in 5% CO2. After stimulation for 60 h, cell suspension was centrifuged at 3000 rpm for 5 min and supernatants were collected. For the culture of MOGsup-stimulated astrocytes, astrocytes were grown in medium containing 70% DMEM supplemented with 10% FBS and 30% supernatant from MOG35-55-stimulated-splenocytes.”

      - For general astrocyte morphology: authors showed the cells are GFAP+ astrocytes. It is surprising that these cells do not bear classical astrocyte morphology in cell culture. How long do you culture astrocytes before treatment? How do you explain their morphological difference?

      Astrocytes were cultured for 2 to 3 weeks which correspond to 2-3 passages before treatment. There are several possible reasons for the morphological differences observed between GFAP+ astrocytes and their classical morphology. Firstly, the cell density. In low-density culture just as shown in Figure 1B, we have observed that astrocytes adopt a more flattened morphology. In high-density cultures, they adopt a stellate shape. Moreover, variations in culture conditions, such as the use of different fetal bovine serum, can also influence the morphology of astrocytes. In addition, the mechanical injury induced by the isolation procedures for astrocytes might contribute to variations in their morphology during in vitro cultivation. In summary, the morphological differences observed in GFAP+ astrocytes in cell culture likely result from a combination of culture conditions, cell density, and mechanical injury occured during astrocyte isolation etc.

      - Additional verification of reactive astrocytes could be performed by different reactive astrocyte markers, such as GLAST, Sox9, S100ß. Thus, quantitative analysis of activated astrocytes can be done by counting DAPI vs GLAST, Sox9 or S100ß positive cells.

      We really agree with the reviewer that there are other markers of reactive astrocytes such as GLAST, sox9 and S100β. However, numerous evidences support that GFAP is the most commonly used reactive astrocyte markers. Most of the cases, reactive astrocytes undergo GFAP overexpression. GFAP is one the most consistently induced gene in transcriptomic datasets of reactive astrocytes, confirming its usefulness as a reactive marker (Escartin et al., 2019). Thus, we have used GFAP as the marker of astrocyte activation in our study.

      - How you performed quantifications for Figures 1C and 1D should be clearly explained, details are not given.

      Quantification for Figure 1C and 1D were added in the figure legend. In general, Mean fluorescence intensity of PKM2 in different groups of (B) was calculated by ImageJ. The number of nuclear PKM2 was quantified by Image-Pro Plus software manually (eg. nuclear or cytoplasmic based on DAPI blue staining). The proportion of nuclear P KM2 is determined by normalizing the count of nuclear PKM2 to the count of nuclear DAPI, which represents the number of cell nuclei.

      - "Together, these data demonstrated the nuclear translocation of PKM2 in astrocytes from EAE mice." Here the usage of "suggests" instead of "demonstrated".

      Based on the reviewer's suggestion, we have revised the use of "demonstrated" to "suggest" in this sentence.

      Result 2 and 3:

      - In the literature, DASA-58 is shown to be the activator of PKM2 (https://www.nature.com/articles/nchembio.1060https://doi.org/10.1016/j.cmet.2019.10.015).

      - Providing references for the inhibitory use of DASA-58 for PKM2 would be appreciated.

      DASA-58 is referred to as “PKM2 activator” due to its ability to enforce the tetramerization of PKM2, enhancing the enzymatic ability of PKM2 to catalyze PEP to pyruvate conversion. However, the enforced conversion of tetramerization of PKM2 inhibited the dimer form of PKM2, thereby inhibiting its nuclear translocation. For this reason, DASA-58 is also used as the inhibitor of nuclear translocation of PKM2. In primary BMDMs, LPS induced nuclear PKM2. However, driving PKM2 into tetramers using DASA-58 and TEPP-46 inhibited LPS-induced PKM2 nuclear translocation (Palsson-McDermott et al., 2015). Consistently, FSTL1 induced PKM2 nuclear translocation was inhibited by DASA-58 in BMDMs (Rao et al., 2022). Accordingly, we have added these references in the manuscript.

      - Western blot results and statistics for PKM2 should be quantitatively given for all groups.

      According to the reviewer’s suggestions, we have added the quantification of PKM2 for western blots in figure 2 and figure 3. Quantification of PKM2 in figure 2D is added in Fig S3. Quantification of PKM2 in figure 3D is added in Fig.S4B and Fig. S4C.

      - Figure 3A-B: staining method/details are not mentioned in materials and methods.

      Staining methods is in the paragraph entitled “Immunofluorescence” in the section of materials and methods. The descriptions are as follows:

      For cell immunochemistry, cells cultured on glass coverslips were fixed with 4% PFA for 10 min at RT, followed by permeabilization with 0.3% Triton X-100. Non-specific binding was blocked with buffer containing 3% BSA for 30 min at RT. Briefly, samples were then incubated with primary antibodies and secondary antibodies. DAPI was used to stain the nuclei. Tissues and cells were observed and images were acquired using an EVOS FL Auto 2 Cell image system (Invitrogen). The fluorescence intensity was measured by ImageJ.

      - In Figure 3A, in only DASA-58 treated cells, it looks like GFAP staining is decreased. It would be better to include MFI analysis for GFAP in the supplementary information.

      We have added the MFI analysis for GFAP in Figure 3A in Fig.S4A. GFAP expression is decreased after DASA-58 treatment (in both control and MOGsup condition), the reason might be due to the effect of DASA-58 on inhibition of PKM2 nuclear transport, which subsequently suppress the activation of astrocytes, leading to the decreased expression of GFAP.

      Result 4

      - Detailed explanation of the mass spectrometry and IP experiments should be given in materials and methods. What are the conditions of the cells? Which groups were analyzed? Are they only MOG stimulated, MOG-DASA-58 treated, or only primary astrocytes without any treatment? The results should be interpreted according to the experimental group that has been analyzed.

      We have added the detailed information relating to mass spectrometry and immunoprecipitation in the materials and methods. In general, two groups of cells were subjected to mass spectrometry analysis, primary astrocytes without any treatment and MOGsup-stimulated primary astrocytes. These two groups were immunoprecipitated with anti-PKM2 antibody. Moreover, in the manuscript, we have revised the sentence concerning the description of mass spectrometry. The description is as follows: “To illustrate underlying mechanism accounting for nuclear translocation of PKM2 in astrocytes, we sought to identify PKM2-interacting proteins. Here, unstimulated and MOGsup-stimulated primary astrocytes were subjected to PKM2 immunoprecipitation, followed by mass spectrometry”. Furthermore, the description of these two groups of cells were added in the figure legend of Fig.4.

      Result 5:

      - For the reader, it would be better to start this part by explaining the role of TRIM21 in cells by referring to the literature.

      We agreed with the reviewer that beginning this part by explaining the role of TRIM21 would be better. Accordingly, we have added the following descriptions at the beginning of this part: “TRIM21 is a multifunctional E3 ubiquitin ligase that plays a crucial role in orchestrating diverse biological processes, including cell proliferation, antiviral responses, cell metabolism and inflammatory processes (Chen X. et al., 2022).” The relevant literature has been included: Chen X, Cao M, Wang P, Chu S, Li M, Hou P, et al. The emerging roles of TRIM21 in coordinating cancer metabolism, immunity and cancer treatment. Front Immunol 2022;13:968755.

      - The source and the state of the cells (control vs MOG induced) should be stated (Figure 5A).

      In figure 5A to 5D, single-cell RNA-seq were performed from CNS tissues of naive and different phases of EAE mice (peak and chronic). We have added this detailed information in the figure legend of Figure 5.

      - Figure 5D can be placed after 5A. Data in Figure 5A is probably from naive animals, if so, it should be stated in the legend where A is explained. The group details of the data shown in Figure 5 should be clearly stated.

      According to the reviewer’s suggestions, we have placed 5D after 5A. Single-cell RNA seq analysis were performed from CNS tissues of naïve mice and EAE mice. This information is stated in the legend of Figure 5A-D. “Single-cell RNA-seq profiles from naive and EAE mice (peak and chronic phase) CNS tissues. Naive (n=2); peak (dpi 14–24, n=3); chronic (dpi 21–26, n=2).”

      - Immunofluorescence images should be replaced with better quality images, in control images, stainings are not visible.

      We have replaced with better quality images in figure 5H and in control images, the staining is now visible.

      Result 6:

      - Experimental procedures should be given in detail in materials and methods.

      We have revised the section of materials and methods, and more details are added. Detailed information was added for astrocyte isolation, immunoprecipitation. Moreover, mass spectrometry, Hematoxylin-Eosin (HE) and Luxol Fast Blue (LFB) staining, Splenocyte isolation and supernatant of MOG35-55-stimulated-splenocytes were added in materials and methods.

      Result 7:

      - In Figure 7A, the mean clinical score seems significantly reduced in the shTRIM21-treated group, although it is explained in the result text that it is not significant. Explain to us the difference between Figure 7A and the explaining text?

      Thank you for pointing this out. We sincerely apologize for our carelessness. Based on your comments, we have made the corrections in the manuscript. As there is indeed a statistical difference in the mean clinical scores between shTRIM21-treated group and shVec group, we have accordingly revised the sentence for Figure 7A to state, “At the end time point at day 22 p.i., shTRIM21-treated group showed reduced disease scores compared to control groups (Fig. 7A).” .

      - The staining methods for luxury fast blue and HE are not given in materials and methods.

      According to the reviewer’s comments, we have added the staining methods for HE and LFB in materials and methods.

      - In Figure 7E, authors claim that MBP staining is low in an image, however the image covers approximately 500 um area. One would like to see the demyelinated areas in dashed lines, and also the whole area of the spinal cord sections.

      In Author response image 2, we have added the images for MBP staining of the whole area of spinal cord sections. Demyelinated areas are marked with dashed lines.

      - "TEPP-46 is an allosteric activator that blocks the nuclear translocation of PKM2 by promoting its tetramerization." should be supported by references.

      We have added two references for this sentence. Anastasiou D et al. showed that TEPP-46 acts as an activator by stabilizing subunit interactions and promoting tetramer formation of PKM2. Angiari S et al. showed that TEPP-46 prevented the nuclear transport of PKM2 by promoting its tetramerization in T cells.

      These two references are added:

      Angiari S, Runtsch MC, Sutton CE, Palsson-McDermott EM, Kelly B, Rana N, et al. Pharmacological Activation of Pyruvate Kinase M2 Inhibits CD4(+) T Cell Pathogenicity and Suppresses Autoimmunity. Cell metabolism 2020;31(2):391-405.e8.

      Anastasiou D, Yu Y, Israelsen WJ, Jiang JK, Boxer MB, Hong BS, et al. Pyruvate kinase M2 activators promote tetramer formation and suppress tumorigenesis. Nature chemical biology 2012;8(10):839-47.

      - Could you explain what the prevention stage is?

      The term “prevention stage” was used to describe the administration of TEPP-46 before disease onset. To be more accurate, we have revised the phrase from “prevention stage” to “preventive treatment” as described in other references. For example, Ferrara et al. (Ferrara et al., 2020) used “preventive” and “preventive treatment” to mean administration before disease onset.

      The revised sentences are as follows: “To test the effect of TEPP-46 on the development of EAE, the “preventive treatment” (i.e, administration before disease onset) was administered. Intraperitoneal treatment with TEPP-46 at a dosage of 50 mg/kg every other day from day 0 to day 8 post-immunization with MOG35-55 resulted in decreased disease severity (Fig. S8A).”

      - In in vitro experiments, authors used DASA-58, and in vivo they used TEPP-46. What might be the reason that DASA-58 is not applied in vivo?

      The effects of DASA-58 and TEPP-46 in promoting PKM2 tetramerization have been tested in vitro and has been documented. Based on in vitro absorption, distribution, metabolism and excretion profiling studies, Anastasiou et al. predicted that TEPP-46 had better in vivo drug exposure compared to DASA-58. Moreover, TEPP-46, but not DASA-58, is pharmacokinetically validated in vivo (Anastasiou et al., 2012). Thus, we used TEPP-46 for in vivo studies.

      - Authors claim that TEPP-46 activates PKM2 and leads it its nuclear translocation, however, they did not verify PKM2 expression in the nucleus.

      To support that TEPP-46 exerts effects in inhibiting PKM2 nuclear translocation both in vivo and in vitro, we have performed western blotting analysis and immunofluorescence staining. In vitro, TEPP-46 administration inhibited the MOGsup-induced PKM2 nuclear translocation, which exerts similar effects as DASA-58 (Author response image 4). The in vivo effects of TEPP-46 was analyzed by co-immunostaining of PKM2 and GFAP. The results showed reduced nuclear staining of PKM2 in spinal cord astrocytes in TEPP-46-treated EAE mice compared with control EAE mice (Figure S7B).

      Author response image 4.

      TEPP-46 inhibited the nuclear transport of PKM2 in primary astrocytes. Nuclear-cytoplasmic protein extraction analysis showed the nuclear and cytoplasmic changes of PKM2 in TEPP-46 treated astrocytes and MOGsup-stimulated astrocytes. Primary astrocytes were pretreated with 50 μM TEPP-46 for 30 min and stimulated with MOGsup for 24 h.

      Supplementary Figure 3:

      - In Figure 3D, merge should be stated on top of the merged images, it is confusing to the reader.

      According to the reviewer’s comments, we have added merge on top of the merged images.

      Discussion:

      All results should be discussed in detail by interpreting them according to the literature.

      We have further discussed the results in the discussion n section. Firstly, we added a paragraph describing the role of nuclear translocation of PKM2 in diverse CNS diseases. Moreover, a paragraph discussing the nuclear function of PKM2 as a protein kinase or transcriptional co-activator was added. Now the discussion section is more comprehensive, which nearly discuss all the results by interpreting them according to the literature in detail.

      Reviewer #2 (Recommendations For The Authors):

      The authors could address the following points:

      (1) In Figure 1A, the authors present immunofluorescence staining of PKM2 in both control mice and MOG35-725 55-induced EAE mice across different stages of disease progression: onset, peak, and chronic stages. Observing the representative images suggests a notable increase in PKM2 levels, particularly within the nucleus of MOG35-725 55-induced EAE mice. However, to provide a more comprehensive analysis, it would be beneficial for the authors to include statistical data, such as average intensities {plus minus} standard deviation (SD), along with the nuclear PKM2 ratio, akin to the presentation for cultured primary astrocytes in vitro in panels B-D. Additionally, the authors should clearly specify the number of technical repeats and the total number of animals utilized for these data sets to ensure transparency and reproducibility of the findings.

      Thanks for the reviewer’s suggestion. Accordingly, for figure 1A, we have added the nuclear PKM2 ratio in astrocytes in control and different stages of EAE mice in Supplementary figure S1A. Moreover, the quantification of mean fluorescence intensity (MFI) for PKM2 was added in figure S1B. Moreover, we have added the number of animals used in each group in figure legend.

      (2) The blue hue observed in the merged images of Figure 1B (lower panel) presents a challenge for interpretation. The source of this coloration remains unclear from the provided information. Did the authors also include a co-stain for the nucleus in their imaging? To enhance clarity, especially for individuals with color vision deficiency, the authors might consider utilizing different color combinations, such as presenting PKM2 in green and GFAP in magenta, which would aid in distinguishing the two components. Furthermore, for in vitro cell analysis, incorporating a nuclear stain could provide valuable insights into estimating the cytosolic-to-nuclear ratio of PKM2.

      For the question relating to the merged images in figure 1B, PKM2 was presented in green, GFAP was presented in red and blue represents the nuclear staining by DAPI. “Merge” represents the merged images of these three colors. To enhance the clarity, we have added the images for the nuclear staining of DAPI.

      (3) To substantiate the conclusion of the authors regarding the enhancement of aerobic glycolysis due to PKM2 expression and nuclear translocation in MOGsup-stimulated astrocytes, employing supplementary methodologies such as high-resolution respirometry and metabolomics could offer valuable insights. These techniques would provide a more comprehensive understanding of metabolic alterations and further validate the observed changes in glycolytic activity.

      While we recognize the merits of techniques such as high-resolution respirometry and metabolomics, we believe that the conclusions regarding the enhancement of aerobic glycolysis due to PKM2 expression and nuclear translocation in MOGsup-stimulated astrocytes are sufficiently supported by the current experimental evidence. Our study has relied on a robust set of experiments, including lactate production, glucose consumption, cyto-nuclear localization analysis and western blotting analysis of key enzymes in glycolysis. These results, in conjunction with the literature on the role of PKM2 in various cancer cells, keratinocytes and immune cells, provide a strong foundation for our conclusions. Although metabolomics could offer a global view of the changes in metabolic states in astrocytes, as the end product of aerobic glycolysis is lactate, our study, which analyze the change of lactate levels in different experimental conditions might be more direct. However, we fully acknowledge that future studies employing these advanced methodologies could provide further insights into the precise mechanisms underlying PKM2's effects on aerobic glycolysis.

      (4) Minor: Why is the style of the columns different in Gig 2 panel D compared to those shown in panels B, C, and G of Figure 2.

      To maintain consistency in the column style across figure 2, we have updated the column in figure 2D. Now, we use same style of columns in Fig 2B, C, D and G.

      (5) The effect of stimulating astrocytes with MOGsup on cell proliferation, as shown in Figure 2E, is very moderate. Does DASA-58 reduce the proliferation of control cells in this assay?

      In response to the reviewer’s questions, we conducted a CCK8 analysis in astrocytes subjected to DASA-58 treatment. As depicted in Author response image 5, administration of DASA-58 did not reduce the proliferation of control cells. This result aligns with our other findings in the glycolysis assays and EdU analysis, where there is no statistical difference between control group and DASA-58-treated group. One plausible explanation for this is that in their steady state, astrocytes in the control group are not in a hyperproliferative state. Under such conditions, inhibiting the translocation of PKM2 via DASA-58 or other inhibitors did not significantly affect the proliferation of astrocytes.

      Author response image 5.

      CCK8 analysis of astrocyte proliferation. Primary astrocytes were pretreated with 50 μM DASA-58 for 30 min before stimulation with MOGsup. Data are represented as mean ± SEM. ***P<0.001. SEM, standard error of the mean.

      (6) The tables and lists in Figure 4, panels A-D, are notably small, hindering readability and comprehension. Consider relocating these components to the supplementary materials as larger versions.

      We have updated the tables and lists, the lines are made thicker. As suggested by the reviewer, we relocate theses components in Supplementary Figure S5.

      Reviewer #3 (Recommendations For The Authors):

      Higher magnification images that more clearly show nuclear translocation of PKM2 and pp65 and pSTAT3 immunoreactivity should be added to the figures panels, for example as inlets.

      Thank you for pointing out this issue in the manuscript. According to the reviewer’s comments we have included higher magnification images as inlets for Figure 3A, Figure 3B and Figure 2A. These enlarged images now provide a clearer visualization of the nuclear translocation state of PKM2, pp65, and pSTAT3.

      There are seldom wording errors like features => feathers at line 364.

      We are very sorry for our incorrect writing. We have corrected this spelling mistake in the manuscript.

      Reviewer #4 (Recommendations For The Authors):

      Here below are major and minor concerns on the data presented:

      (1) It is not clear from the Methods section what are the culture conditions defined as 'control' in Figure 1B-D. I believe the control should be culturing with the conditioned medium of normal (non-EAE) mice splenocytes to be sure the effect is not from cytokines naturally secreted by these cells.

      Thanks for the reviewer’s comments and we totally understand the reviewer's concern. The control means non-treated primary astrocytes cultured with traditional DMEM medium supplemented with 10% FBS. In fact, we have performed experiments to exclude the possibility that the observed effect of MOGsup on the activation of astrocytes is from cytokines secreted by splenocytes. Splenocytes from normal (non-EAE) mice were isolated, cultured in RPMI-1640 medium containing 10% FBS for 60 hours, and supernatant was collected. Immunofluorescence staining of PKM2 and GFAP were performed in non-treated primary astrocytes and astrocytes stimulated with supernatant from control splenocytes. As shown in Figure S1C, in both groups, no difference was observed in PKM2 expression and localization, PKM2 was located mainly in the cytoplasm in theses conditions. These results indicate that observed effect of PKM2 in MOGsup-stimulated condition is not due to the cytokines secreted from splenocytes. Thus, we used non-treated primary astrocytes as controls in our study. To clarify the control group, we have revised the description in the figure legend, The revised expression is as follows: “Immunofluorescence staining of PKM2 (green) with GFAP (red) in non-treated primary astrocytes (control) or primary astrocytes cultured with splenocytes supernatants of MOG35–55-induced EAE mice (MOGsup) for different time points (6 h, 12 h and 24 h). ”

      (2) Figure 3D: the presence of PMK2 in the nuclear fraction upon MOGSUP together with the DASA-58 (last lane of Figure 3D) is not supporting the hypothesis proposed and further may indicate that the reduction of pSTAT3, pp65, etc. observed is independent of PMK2 nuclear translocation/astrocyte activation being observed even in absence of MOGSUP.

      Thank you for pointing out this problem in manuscript. The representing image of nuclear level of PKM2 in Figure 3D is not obvious, as shown by figure 3D, which has raised doubts among the reviewers. To strengthen our conclusion that the reduction of STAT3 and p65 pathway is related to the inhibited nuclear level of PKM2 induced by DASA-58, nuclear PKM2 level was quantified and added in Figure S4B. From the quantification results, it is evident that DASA-58 administration decreased the nuclear level of PKM2 in MOGsup-stimulated astrocytes. To address this concern, we have updated the immunoblot image for PKM2 in figure 3D and incorporated quantification results in supplementary Figure S4.

      (3) Molecular docking indication and deletion co-immunoprecipitation reported in Figure 4 data are not concordant on TRIM21: N-terminal Phe23 and Thr87 (Figure 4E) predicted by MD to bind PMK2 are not in the PRY-SPRY domain suggested by the co-IP experiment (Figure 4I).

      The discrepancy between the molecular docking prediction and the co-immunoprecipitation can be explained as follows:

      Firstly, molecular docking is computational methods that predicts protein-protein interaction based on 3-D structures of the proteins. However, the accuracy of this predication can be influenced by the different models of 3D structures of TRIM21 and PKM2, as well as by factors such as post-translational modifications and flexibility of the proteins. Proteins in vivo are subject to post-translational modifications that can affect their interactions. These modifications are not fully captured in molecular docking analysis. For example, in our analysis, the predicted N-terminal Phe23 and Thr87 in TRIM21 hold the potential to interact with PKM2 by hydrogen bonds. However, such binding can be influenced by diverse biological environments, such as different cells and pathological conditions. Molecular docking predication may suggest the specific residues and binding pocked within the protein complex, however, the accuracy should be verified by experimental techniques such as immunoprecipitation. To address the predication results of molecular docking, the description has been revised as follows: “TRIM21 is predicted to bound to PKM2 via hydrogen bonds between the amino acids of the two molecules.”

      Co-immunoprecipitation that involves the use of truncated domains of TRIM21 and PKM2, is an experimental technique relies on the specific interaction between antibody and targeted proteins. This technique can provide insights into the precise binding domains between TRIM21 and PKM2. As demonstrated in our study, PRY-SPRY domain of TRIM21 is involved in this binding. In summary, while molecular docking and Co-IP are valuable tools for studying protein-protein interactions, their differing focus and limitations may result in discrepancies between the predicted interaction sites and the experimentally identified interaction domains.

      (4) The Authors state that PMK2 is a substrate of TRIM21 E3 ligase activity, however, this is not proved: i) interaction does not imply a ligase-substrate relationship; ii) the ubiquitination shown in Figure 6C is not performed in denaturing conditions thus the K63-Ub antibody can detect also interacting FLAG-IPed proteins (besides, only a single strong band is seen, not a chain; molecular weights in immunoblot should be indicated); iii) use of a catalytically inactive TRIM21 would be required as well.

      We appreciate the reviewer’s comments regarding the limitations of the immunoprecipitation and K63-antibody test, which could not lead to the conclusion that PKM2 is a substrate of TRIM21. To avoid any misunderstandings, we have revised the relevant sentence from “Hereby, we recognized PKM2 as a substrate of TRIM21” to “Hereby, we recognized PKM2 as an interacting protein of TRIM21, and further studies are required to determine if it is a substrate of E3 ligase TRIM21”. Moreover, we have revised the title of the relevant part in the results section, the previous title, “TRIM21 ubiquitylates and promotes the nuclear translocation of PKM2” has been replaced with “TRIM21 promotes ubiquitylation and the nuclear translocation of PKM2”. Moreover, molecular weights for all proteins in western blotting were indicated.

      (5) As above, molecular weights should always be indicated in immunoblot.

      Thanks for pointing out this problem in the figures. Accordingly, we have added the molecular weights for every protein tested in immunoblot.

      (6) The authors should describe the EAE mouse model in the text and in the material and methods as it may not be so well known to the entire reader audience, and the basic principle of MOG35-55 stimulation, in order to understand the experimental plan meaning.

      We appreciate the reviewer’s comments highlighting the importance of clarifying EAE model for a broader understanding of the reader audience. In response, we have described the EAE model both in the text and in the materials and methods section. In the text, the description of EAE model was added at the beginning of the first paragraph in the Results section. The description is as follows: “EAE is widely used as a mouse model of multiple sclerosis, which is typically induced by active immunization with different myelin-derived antigens along with adjuvants such as pertussis toxin (PTX). One widely used antigen is the myelin oligodendrocyte glycoprotein (MOG) 35-55 peptide (Nitsch et al., 2021), which was adopted in our current studies.”

      We have also added the detailed experimental procedures for EAE induction in the materials and methods section.

      (7) The authors should better explain and give the rationale for the use of splenocytes and why directly activated astrocytes (isolated from the EAE model) cannot be employed to confirm/prove some of the presented data.

      Firstly, splenocytes offer a heterogenous cell population, encompassing T cells and antigen presenting cells (APC), which may better mimic the microenvironment and complex immune responses observed in vivo.

      Myelin oligodendrocyte glycoprotein (MOG) 35-55 peptide is one widely used antigen for EAE induction. MOG35-55 elicits strong T responses and is highly encephalitogenic. Moreover, MOG35-55 induces T cell-mediated phenotype of multiple sclerosis in animal models. Thus, by isolating splenocytes from the onset stage of EAE mice, which contains APC and effector T cells, followed by stimulation with antigen MOG35-55 in vitro for 60 hours, the T-cell response in the acute stage of EAE diseases could be mimicked in vitro. The supernatant from MOG35-55 stimulated splenocytes has high levels of IFN-γ and IL-17A, which in part mimic the pathological process and environment in EAE, and this technique has been documented in the references (Chen et al., 2009, Kozela et al., 2015).

      Correspondingly, we have revised sentence for the use of MOG35-55 stimulates splenocytes in EAE mice and add the relevant references: “Supernatant of MOG35-55-stimulated splenocytes isolated from EAE mice were previously shown to elicit a T-cell response in the acute stage of EAE and are frequently used as an in vitro autoimmune model to investigate MS and EAE pathophysiology (Chen et al., 2009, Du et al., 2019, Kozela et al., 2015).”

      Secondly, activated astrocytes (isolated from the EAE model) can not be employed for in vitro culture for the following reasons:

      (1) Low cell viability. Compared to embryonic or neonatal mice, adult mice yield a limited number of viable cells. The is mainly because that adult tissues possess less proliferative capacity.

      (2) Disease changes. Astrocytes in EAE mice are exposed to microenvironment including inflammatory cytokines, antigens and other pathological factors. Without this environment, the function and morphology of astrocytes undergo changes, which make it difficult to interpret the results in vitro.

      For these reasons, the in vitro cultured primary astrocytes used the neonatal mice.

      (8) The authors should indicate the phosphorylation sites they are referring to when analysing p-c-myc, pSTAT3, pp65, etc...

      According to the reviewer’s suggestions, we have added the phosphorylation sites for pSTAT3 (Y705), pp65 (S536), p-c-myc (S62) and pIKK (S176+S180) in the figure panels.

      (9) Reference of DASA-58 and TEPP-46 inhibitors and their specificity should be given.

      According to the reviewer’s comments, we have added the relevant references for the use of DASA-58 and TEPP-46 as inhibitors of PKM2 nuclear transport. In primary BMDMs, LPS induced nuclear PKM2. However, driving PKM2 into tetramers using DASA-58 and TEPP-46 inhibited LPS-induced PKM2 nuclear translocation (Palsson-McDermott et al., 2015). Consistently, FSTL1 induced PKM2 nuclear translocation was inhibited by DASA-58 in BMDMs (Rao et al., 2022). Accordingly, we have added these references in the manuscript.

      To address the selectivity of TEPP-46 and add the references, the relevant sentence has been revised from “TEPP-46 is an allosteric activator that blocks the nuclear translocation of PKM2 by promoting its tetramerization” to “TEPP-46 is a selective allosteric activator for PKM2, showing little or no effect on other pyruvate isoforms. It promotes the tetramerization of PKM2, thereby diminishing its nuclear translocation (Anastasiou et al., 2012, Angiari et al., 2020).”

      Reviewing Editor (Recommendations For The Authors):

      The reviewing editor would appreciate it if the original blots from the western blot analysis, which were used to generate the final figures, could be provided.

      Thanks for the reviewing editor’s comment, accordingly, we will add the original blots for the western blots analysis.

      References

      Anastasiou D, Yu Y, Israelsen WJ, Jiang JK, Boxer MB, Hong BS, et al. Pyruvate kinase M2 activators promote tetramer formation and suppress tumorigenesis. Nature chemical biology 2012;8(10):839-47.

      Escartin C, Guillemaud O, Carrillo-de Sauvage M-A. Questions and (some) answers on reactive astrocytes. Glia 2019;67(12):2221-47.

      Ferrara G, Benzi A, Sturla L, Marubbi D, Frumento D, Spinelli S, et al. Sirt6 inhibition delays the onset of experimental autoimmune encephalomyelitis by reducing dendritic cell migration. Journal of neuroinflammation 2020;17(1):228.

      Lin CC, Edelson BT. New Insights into the Role of IL-1β in Experimental Autoimmune Encephalomyelitis and Multiple Sclerosis. Journal of immunology (Baltimore, Md : 1950) 2017;198(12):4553-60.

      Palsson-McDermott Eva M, Curtis Anne M, Goel G, Lauterbach Mario AR, Sheedy Frederick J, Gleeson Laura E, et al. Pyruvate Kinase M2 Regulates Hif-1α Activity and IL-1β Induction and Is a Critical Determinant of the Warburg Effect in LPS-Activated Macrophages. Cell metabolism 2015;21(1):65-80.Rao J, Wang H, Ni M, Wang Z, Wang Z, Wei S, et al. FSTL1 promotes liver fibrosis by reprogramming macrophage function through modulating the intracellular function of PKM2. Gut 2022;71(12):2539-50.

      Wheeler MA, Clark IC, Tjon EC, Li Z, Zandee SEJ, Couturier CP, et al. MAFG-driven astrocytes promote CNS inflammation. Nature 2020;578(7796):593-9.

      Zhang J, Feng G, Bao G, Xu G, Sun Y, Li W, et al. Nuclear translocation of PKM2 modulates astrocyte proliferation via p27 and -catenin pathway after spinal cord injury. Cell Cycle 2015;14(16):2609-18.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We sincerely thank the reviewers for their constructive feedback. We have revised our manuscript to address some important concerns. The main changes are summarized as follows:

      (1) A major concern as reflected in the eLife assessment and reviewer comments, was that the “evidence supporting the conclusion that striatal neurons encode single-limb gait is incomplete.” We have now provided an expanded analysis of gait phase-locking to different limbs in Figure 2 – figure supplement 1. The analysis reveals three key new insights: 1) most striatal neurons are significantly entrained to only one or two limbs; 2) for neurons entrained to two limbs, most limb pairs are diagonal pairs, whose phases are closely aligned; 3) the strength of phase-locking, as measured by the mean vector length, is biased toward a single limb. From these results we conclude that striatal neurons are indeed better correlated with single-limb (as opposed to multiple limbs’) gait. However, we speculate that because of the inherently correlated motion across limbs, some neurons also display significant phaselocking to multiple limbs, particularly to diagonal pairs.

      (2) Reviewer 2 noted the lack of a manipulation experiment which would help establish the striatum’s relationship to gait control. We have therefore included the results of new experimental data in Figure 6 – figure supplement 2, in which we show that optogenetically activating D2 MSNs alters both some measures of whole-body motion and single-limb gait. We recognize that these experiments are not ideal, for example, the optical stimulation was not entrained to limb phase. Nevertheless, they hopefully allay any concern that the striatum is incapable of influencing gait performance.

      (3) We have further characterized the relationship between vector length and firing rate, and firing rate between D1 and D2 MSNs. We now show that: 1) vector length is negatively correlated with session-wide firing rate (Figure 2 – figure supplement 1E); 2) session-wide firing rates are similar between D1 and D2 MSNs in both healthy and dopamine lesioned animals (Figure 4D and Figure 6H). Thus, the imbalance in the vector length between D1 and D2 MSNs following dopamine lesions is unlikely to be explained by changes in the overall firing rates of these cells.

      (4) We have added new data similar to Figure 1 with distributions of stride frequency, duration, and length to illustrate the difference between sham and 6OHDA mice (Figure 5 – figure supplement 1B,C).

      (5) We have expanded the Discussion section to discuss a number of important points raised by the reviewers. These include: 1) speculating on the origins of gait coding in the striatum; 2) discussion of some literature which reported similar levels of D1/D2 MSN start coding in contrast to our results in healthy mice; 3) discussion of the finding that almost all phase-locked cells also have a firing rate related to speed or start/stop signals; 4) discussion of one of the limitations of the unilateral 6OHDA model, namely, the strong turning bias, and its potential implications for our results.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Yang et al combine high-speed video tracking of the limbs of freely moving mice with in vivo electrophysiology to demonstrate how striatal neurons encode single-limb gait. They also examine encoding other well-known aspects of locomotion, such as movement velocity and the initiation/termination of movement. The authors show that striatal neurons exhibit rhythmic firing phase-locked with mouse gait, while mice engage in spontaneous locomotion in an open field arena. Moreover, they describe gait deficits induced by severe unilateral dopamine neuron degeneration and associate these deficits with a relative strengthening of gait-modulation in the firing of D2-expressing MSNs. Although the source and function of this gait-modulation remain unclear, this manuscript uncovers an important physiological correlate of striatal activity with gait, which may have implications for gait deficits in Parkinson's Disease.

      Strengths:

      While some previous work has looked at the encoding of gait variables in the striatum and other basal ganglia nuclei, this paper uses more careful quantification of gait with video tracking. In addition, few if any papers do this in combination with optically-labeled recordings as were performed here.

      Weaknesses:

      The data collected has a great richness at the physiological and behavioral levels, and this is not fully described or explored in the manuscript. Additional analysis and display of data would greatly expand the interest and interpretability of the findings.

      There are also some caveats to the interpretation of the analyses presented here, including how to compare encoding of gait variables when animals have markedly different behaviors (eg comparing sham and unilaterally 6-OHDA treated mice), or how to interpret the loss of gait modulation when single unit activity is overall very low.

      (1) The authors use circular analysis to quantify the degree to which striatal neurons are phaselocked to individual limbs during gait. The result of this analysis is shown as the proportion of units phase-locked to each limb, vector length, and vector angle (Fig 2H-K; Fig 4E-F; Fig 6E-F). Given that gait is a cyclic oscillation of the trajectories of all four limbs, one could expect that if one unit is phase-locked to one limb, it will also be phase-locked to the other three limbs but at a different phase. Therefore, it is not clear in the manuscript how the authors determine to which limb each unit is locked, and how some units are locked to more than one limb (Fig 2H). More methodological/analytical detail would be especially helpful.

      We thank the reviewer for raising this important issue, which was not sufficiently explored in our original manuscript. This relates to a major concern that “evidence supporting the conclusion that striatal neurons encode single-limb gait is incomplete.” We have now prepared a new figure supplement to address whether neurons are preferentially entrained to only one or multiple limbs (Figure 2 – figure supplement 1, panels A-C).

      Author response image 1.

      Panels A-C. Phase-locking to different limbs.

      Panel A shows the percentage of striatal neurons (all neurons including untagged cells) with significant phase-locking to only 1, 2, 3, or all 4 limbs. The results indicate that most phaselocked cells are entrained to either only 1, or only 2 limbs, as opposed to 3 or all 4 limbs. We next looked more closely at the cells which were entrained to only 2 limbs: Panel B shows that a significant majority of those cells were coupled to diagonal limb pairs. This finding is insightful because diagonal limb pairs move at nearly the same phase during walking, thus some overlap in phase-locking to these limbs is to be expected. Finally, Panel C shows the mean vector length per neuron ranked from the highest to lowest value. The results reveal that the vector length is significantly biased toward the highest ranked limb. This bias would be absent if neurons were entrained to all 4 limbs with similar strength. Together, these results support the conclusion that striatal neuron spiking is preferentially coupled to single limbs as opposed to multiple limbs. However, we speculate that because of the inherently correlated motion across limbs, some neurons also display significant phase-locking to multiple limbs, particularly to diagonal pairs.

      (2) In Figures 2 and 3, the authors describe the modulation of striatal neurons by gait, velocity, and movement transitions (start/end), with most of their examples showing firing rates compatible with rates typical of striatal interneurons, not MSNs. In order to have a complete picture of the relationship between striatal activity and gait, a cell type-specific analysis should be performed. This could be achieved by classifying units into putative MSN, FS interneurons, and TANs using a spike waveform-based unit classification, as has been done in other papers using striatal single-unit electrophysiology. An example of each cell type's modulation with gait, as well as summary data on the % modulation, would be especially helpful.

      We appreciate the reviewer’s suggestion to analyze our data after classifying units into different putative cell types (MSN, FSI, TAN). Indeed, we have frequently adopted this practice in our other publications (e.g., Bakhurin & Masmanidis 2016, 2017; Lee & Masmanidis 2019). However, this study already relies on a more rigorous method – optogenetic tagging – to identify D1 and D2 MSNs. We felt that adding a second, more subjective and therefore less rigorous identification method based on spike waveforms would add unnecessary confusion in how the results are presented and interpreted. For example, we were unsure how to address the situation where an opto-tagged D1 or D2 MSN may be classified as a putative FSI or TAN according to spike waveform criteria. For this reason, we decided not to perform an analysis by putative MSN, FSI, and TAN. Finally, we have made all our electrophysiological data available should someone want to perform this analysis themselves.

      (3) By normalizing limb trajectories to the nose-tail axis, the analysis ignores whether the mouse is walking straight, or making left/right turns. Is the gait-modulation of striatal activity shaped by ipsi- and contralateral turning? This would be especially important to understand changes in the unilateral disease model, given the imbalance in turning of 6-OHDA mice.

      This is an important question, which our data are unfortunately underpowered to address. Lesioned mice turn sharply for nearly the entire duration of walking, while healthy mice walk in a nearly straight line, with occasional brief turning bouts. Thus, we do not have sufficient stride numbers during healthy turning to enable a rigorous analysis of gait phase locking during left/right turns. This raises some questions about the interpretation of the higher D2 MSN vector length in dopamine lesioned mice – does the higher vector length relate to the impaired gait, or the higher incidence of turning in this PD model? We have acknowledged this issue in the Discussion section as a limitation of the unilateral 6OHDA model. And, in future work we hope to investigate turning effects in more detail using behavioral arenas which force animals to turn left or right at specific locations.

      (4) It looks like the data presented in Figure 4 D-F comes from all opto-identified D1- and D2MSNs. How many of these are gait-modulated? This information is missing (line 110). Pooling all units may dilute differences specific to gait-modulated units, therefore a similar analysis only on gait-modulated units should be performed.

      The reviewer is correct that the data presented in Figure 4 comes from all optogenetically tagged cells. We have now included a new panel, Figure 4H, which shows the proportion of D1 and D2 MSNs which encode limb phase, body speed, or start/stop. The reviewer suggested that a similar analysis only gait-modulated units should be performed. We prefer to stick to our current approach (of using all cells, regardless of whether they show significant gait modulation) because it is less biased. For example, even cells which do not pass our threshold for statistical significance may display weak but visible gait modulation.

      (5) Since 6-OHDA lesions are on the right hemisphere, we would expect left limbs to be more affected than right limbs (although right limbs may also compensate). It is therefore surprising that RF and RR strides seem slightly shorter than LF and LR (Fig 5G), and no differences in other stride parameters (Fig 5H-J). Could the authors comment on that? It may be that this is due to rotational behavior. One interesting analysis would be to compare activity during similar movements in healthy and 6-OHDA mice, eg epochs in which mice are turning right (which should be present in both groups) or walking a few steps straight ahead (which are probably also present in both groups).

      Unilateral 6OHDA lesions are associated with ipsiversive turning (in this case, toward the right). The reviewer noted that the stride length is shorter for the two right compared to the two left limbs (Figure 5G), which is consistent with a right turning bias. In line with this observation, the stride speed for the right limbs also seemed slower than for the left limbs (Figure 5I), though we agree this is a bit difficult to see in the plot due to the choice of y-axis range. We appreciate the reviewer’s suggestion to analyze activity during similar movements in healthy and lesioned mice. As discussed in reply to their third comment above, our data did not contain sufficient bouts of straight walking in lesioned mice, or turning in healthy mice, to make such analysis possible. We have acknowledged this issue in the Discussion section as a limitation of the unilateral 6OHDA model. And, in future work we hope to investigate turning effects in more detail using behavioral arenas which force animals to turn left or right at specific locations.

      (6) Multiple publications have shown that firing rates of D1-MSN and D2-MSN are dramatically changed after dopamine neuron loss. Is it possible that changes observed in gait-modulation might be biased by changes in firing rates? For example, dMSNs have exceptionally low overall activity levels after dopamine depletion (eg Parker...Schnitzer, 2018; Ryan...Nelson, 2018; Maltese...Tritsch, 2021); this might reduce the ability to detect modulation in the firing of dMSNs as compared to iMSNs, which have similar or increased levels of activity in dopamine depleted mice. Does vector length correlate with firing rate? In addition, the normalization method used (dividing firing rate by minimum) may amplify very small changes in absolute rates, given that the firing rates for MSN are very low. The authors could show absolute values or Z-score firing rates (Figure 6 A, D).

      The reviewer asked a number of important questions here. First, is it possible that changes in gait modulation are biased by changes in firing rates? We have included a new analysis comparing the average session-wide firing rate of D1 and D2 MSNs (Figure 6D & 6H). This showed that firing rates were statistically similar between D1 and D2 MSNs for both sham and dopamine lesioned mice. Thus, it seems unlikely that the imbalance in vector length is purely due to changes in firing rate. The reviewer referenced some literature (e.g. Parker & Schnitzer; Ryan & Nelson; Maltese & Tritsch) which does appear to show significant changes in the relative firing levels of D1/D2 MSNs after dopamine lesions. While we can only speculate about the reason for the discrepancy (e.g., differences in measurement method, behavioral task, or analysis method), we note that not all prior literature has reported such changes (e.g., Ketzef & Silberberg 2017).

      Author response image 2.

      Panels D & H. No difference in firing between D1 and D2 MSNs.

      Second, does vector length correlate with firing rate? Interestingly, we found that indeed it does. We now show that vector length is negatively correlated with firing rate (Figure 2 – figure supplement 1E), implying that cells with higher overall firing rates tend to have weaker phaselocking to the gait cycle. Though not shown in the manuscript, we found a similar negative correlation for D1 and D2 MSNs in both healthy and dopamine lesioned mice.

      Author response image 3,

      Panel E. Vector length is negatively correlated to firing rate.

      Third, the reviewer asked about our normalization method in Figure 6A etc, in which we divide by the minimum rate. We would like to clarify that this normalization method was only used for visualizing our data, but not for calculating the vector length. Therefore, we chose to leave the plots as they are.

      (7) The analysis shown in Fig 3C should also be done for opto-identified D1- and D2-MSNs (and for waveform-based classified units as noted above).

      We have now performed the same analysis for optogenetically tagged D1 and D2 MSNs from healthy mice (Figure 4H). As with our original analysis, both populations showed a similar proportion of neurons which encoded limb phase, start of movement, body speed, and the combination of these. We did not perform this analysis for waveform-based classified units as per our reason outlined in reply to the reviewer’s second comment above.

      Author response image 4.

      Panel H. Venn diagrams showing the percentage of D1 and D2 MSNs with significant responses to limb phase of at least one limb, body speed, and start and/or stop of motion.

      (8) Discussion: the origin of the gait-modulation as well as the possible mechanisms driving the alterations observed in 6-OHDA mice should be discussed in more detail.

      Our Discussion section includes the following paragraph speculating on the origin of gait modulation: “Movement-related neural activity is widespread in many brain areas, and it is plausible that the striatum receives both motor and sensory signals involved in gait generation. For example, the primary motor cortex, which projects to dorsal striatum, has been shown to exhibit rhythmic spiking activity consistent with gait phase coding (Armstrong & Drew 1984), suggesting a shared mechanism underlying the production of this code.” We appreciate the request to also discuss the possible mechanisms driving the alterations in 6OHDA mice. But this is a very complex topic which our study is not aimed at addressing. The range of possible mechanisms uncovered in the literature is vast – from synaptic changes in striatal microcircuits, to altered intrinsic excitability of D1/D2 MSNs, and network-level alterations. Therefore, we preferred to keep the discussion focused on gait and movement coding.

      Reviewer #2 (Public Review):

      Summary:

      Yang et al. recorded the activity of D1- and D2-MSNs in the dorsal striatum and analyzed their firing activity in relation to single-limb gait in normal and 6-OHDA lesioned mice. Although some of the observations of striatal encoding are interesting, the novelty and implications of this firing activity in relation to gait behavior remain unclear. More specifically, the authors made two major claims. First, the striatal D1- and D2-MSNs were phase-locked to the walking gait cycles of individual limbs. Second, dopamine lesions led to enhanced phase-locking between D2-MSN activity and walking gait cycles. The second claim was supported by the increase of vector length in D2-MSNs after unilateral 6-OHDA administration to the medial forebrain bundle. However, for the first claim, the authors failed to convincingly demonstrate that striatal MSNs were more phase-locked to gait with single-limb and step resolution than to the global gait cycles.

      We thank the reviewer for their feedback and for their comment that “the authors failed to convincingly demonstrate that striatal MSNs were more phase-locked to gait with single-limb and step resolution than to the global gait cycles.” We now present new analysis demonstrating that neurons are more phase-locked to single-limb gait rather than multiple limbs (Figure 2 – figure supplement 1, panels A-C). These results are discussed in detail in response to Reviewer #1’s first comment. For conciseness we will not repeat the same response here but instead refer the reviewer to Reviewer #1, comment #1.

      Strengths:

      It is a technically advanced study.

      Weaknesses:

      (1) The authors focused on striatal encoding of gait information in current studies. However, it remains unclear whether the part of the striatum for which the authors performed neuronal recording is really responsible for or contributing to gait control. A lesion or manipulation experiment disrupting the part of the striatum recorded seems a necessary step to test or establish its relationship to gait control.

      We agree that our study – like many others which employ recordings – is largely correlative, and that a direct causal relationship was lacking. We have therefore decided to present some data which, despite some caveats, shows that the striatum is in principle capable of altering gait performance (Figure 6 – figure supplement 2).

      Author response image 5.

      Optogenetic activation of D2 MSNs alters whole-body movement and single-limb gait.

      These new results are from healthy mice (n=4) receiving optogenetic stimulation of D2 MSNs over a 5 minute period. Panels A-E show changes in a variety of whole-body measures of motion, mostly replicating the results of Kravitz & Kreitzer 2010. Panels F-I show changes (statistically significant or trending) in a variety of gait parameters, with the greatest effects found on the single-limb stride duration and stride speed. Interestingly, Kravitz & Kreitzer 2010 actually examined effects of this stimulation on gait; quoting from their paper: “we examined gait parameters in D1-ChR2 and D2-ChR2 mice in response to illumination, using a treadmill equipped with a high-speed camera. We quantified multiple gait parameters with the laser on and off, and found no significant differences in the average or variance of stride length, stance width, stride frequency, stance duration, swing duration, paw angle and paw area on belt for either line….This indicates that activation of direct and indirect pathways in the dorsomedial striatum regulates the pattern of motor activity, without changing the coordination of ambulation itself.” We wonder therefore if the reviewer’s comment about causality may have stemmed from the negative result in Kravitz & Kreitzer 2010. In any event, we now present results which firmly show a link between striatal D2 MSNs and gait. To be clear, we are not claiming that Kravitz & Kreitzer’s study was fundamentally flawed, but that perhaps their ability to resolve gait changes using a commercial treadmill system, or their choice of dorsomedial as opposed to more lateral regions of the striatum may have contributed to the negative result.

      It is also important to acknowledge a limitation of our optogenetic stimulation experiment. Our optical stimulation was not phase-locked to the gait cycle; thus, technically, we did not address whether the phase code per se is involved in producing gait. We mention this caveat in the manuscript. Despite this, we believe the new data address the reviewer’s concern about lack of causality.

      (2) The authors attributed one of the major novelties to phase-locking of striatal neural activities with single-limb gait cycles. The claim was not clearly supported, as the authors did not demonstrate that phase-locking to single-limb gaits was more significant than phase-locking to global walking gait cycles. In rhythmic walking, the LR and RF limbs were roughly anti-phase with the LF and RR limbs (Fig. 1D, E). In line with this relationship, striatal neurons were mainly in-phase with LR and RF limbs and anti-phase with LF and RR limbs (Fig. 2J, K). One could instead interpret this as the striatal neurons spanned all the phases of the global walking gait cycles (Fig. 3D). To demonstrate phase-locking with individual limb movements, the authors need to show that neural activities were better correlated with a specific limb than to the global gait cycles.

      We sincerely appreciate the reviewer’s comment. As described above we now present new analysis demonstrating that neurons are more phase-locked to single-limb gait rather than multiple limbs (Figure 2 – figure supplement 1, panels A-C). These results are discussed in detail in response to Reviewer #1’s first comment. For conciseness we will not repeat the same response here but instead refer the reviewer to Reviewer #1, comment #1.

      (3) The observation of the enhancement of coupling between D2 MSN firing and the gait cycles was interesting, but the physiological interpretation was not clear (as the authors also noted in the Discussion), which hampers the significance of the observation.

      In the Discussion we comment on the potential behavioral significance of our findings, keeping in mind the reviewer’s earlier concern about the correlative nature of recordings. For example, we speculate that the increase in D2 MSN limb phase-locking strength contributes to bradykinetic symptoms, specifically the production and maintenance of a normal gait cycle and rhythm. We respectfully disagree with the reviewer about the limited significance of the observations, as this is the first study to describe striatal gait phase coding in detail, noting that gait impairments are a major motor symptom in PD. We believe that progress in better understanding and eventually treating PD will be made through a combination of correlative observations (i.e., neural recordings) and causal manipulations. There are both advantages and disadvantages to correlative as well as causal experiments.

      (4) Due to the lack of causality experiments as mentioned in the first comment above, the observations of coupling between striatal neuronal activity and gait control might well result from a third brain region/factor serving as the common source to both, whether in normal or dopamine lesioned brain. If this is the case, the significance and implications of current findings will be greatly limited.

      As mentioned above we have included new data to address this concern (Figure 6 – figure supplement 2). Please refer to Reviewer #2, comment #4 for a detailed discussion of these results and their caveats.

      Reviewer #3 (Public Review):

      In this study, Yang et al. address a fundamental question of the role of dorsal striatum in neural coding of gait. The authors study the respective roles of D1 and D2 MSNs by linking their balanced activity to detailed gait parameters. In addition, they put in parallel the striatal activity related to whole-body measures such as initiation/cessation of movement or body speed. They are using an elegant combination of high-resolution single-limb motion tracking, identification of bouts of movements, and electrophysiological recordings of striatal neurons to correlate those different parameters. Subpopulations of striatal output neurons (D1 and D2 expressing neurons) are identified in neural recordings with optogenetic tagging. Those complementary approaches show that a subset of striatal neurons have phase-locked activity to individual limbs. In addition, more than a third of MSNs appear to encode all three aspects of motor behavior addressed here, initiation/cessation of movement, body speed, and gait. This activity is balanced between D1 and D2 neurons, with a higher activity of D1 neurons only for movement initiation. Finally, alterations of gait, and the associated striatal activity, are studied in a mouse model of Parkinson's Disease, using 6-OHDA lesions in the medial forebrain bundle (MFB). In the 6OHDA mice, there is an imbalance toward D2 activity.

      Strengths:

      There is a long-standing debate on the respective role of D1 and D2 MSNs on the control of movement. This study goes beyond prior work by providing detailed quantification of individual limb kinematics, in parallel with whole-body motion, and showing a high proportion of MSNs to be phase-locked to precise gait cycle and also encoding whole-body motion. The temporal resolution used here highlights the preferential activity of D1 MSN at the movement starts, whereas previous studies described a more balanced involvement. Finally, they reveal neural mechanisms of dopamine depletion-induced gait alterations, with a preponderant phase-locked activity of D2 neurons. The results are convincing, and the methodology supports the conclusions presented here.

      Weaknesses:

      Some more detailed explanations would improve the clarity of the results in the corresponding section. Analysis of the 6OHDA experiments could be expanded to extract more relevant information.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Panels I and J from Figure 6 are referred to in the text (line 158) but they don't exist.

      Thank you, we have corrected this in the text.

      (2) For the classification of striatal units into putative MSN, FS interneurons, and TANs, see Gage et al. DOI: 10.1016/j.neuron.2010.06.034 or Thorn et al. DOI: 10.1523/JNEUROSCI.178213.2014.

      As explained in the Public Reviews, Reviewer #1 comment #2 we opted not to perform an analysis by putative MSN, FSI, and TAN. We have performed analysis of different putative cell types in several of our other publications (e.g., Bakhurin & Masmanidis 2016, 2017; Lee & Masmanidis 2019). However, this study already relies on a more rigorous method – optogenetic tagging – to identify D1 and D2 MSNs. We felt that adding a second, more subjective and therefore less rigorous identification method based on spike waveforms would add unnecessary confusion in how the results are presented and interpreted. For example, we were unsure how to address the situation where an opto-tagged D1 or D2 MSN may be classified as a putative FSI or TAN according to spike waveform criteria. For this reason, we decided not to perform an analysis by putative MSN, FSI, and TAN. Finally, we have made all our electrophysiological data available should someone want to perform this analysis themselves.

      (3) The discussion section could be improved by elaborating on the origin and function of these gait signals in the striatum, as well as the mechanisms underlying changes in the 6-OHDA model. In addition, it would be important to discuss the limitations of this model, since unilateral 6-OHDA lesions may not accurately recapitulate parkinsonian gait deficits, as it results in a very asymmetric gait.

      Our Discussion section includes a paragraph speculating on the origin of gait modulation in the striatum, and another paragraph addressing the limitation that unilateral 6OHDA lesions induce gait asymmetry. We appreciate the request to also discuss the possible mechanisms driving the alterations in 6OHDA mice. But this is a very complex topic which our study is not aimed at addressing. The range of possible mechanisms uncovered in the literature is vast – from synaptic changes in striatal microcircuits, to altered intrinsic excitability of D1/D2 MSNs, and network-level alterations. Therefore, we preferred to keep the discussion focused on gait and movement coding.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors denoted the limb movement sequences as LR-LF-RR-RF, with limbs on the same left/right side moving first. However, considering multiple gait cycles, the sequence could also be described as RF-LR-LF-RR, with movements of the diagonal limbs temporally closer to each other, which was more intuitive from the visual inspection of Fig. 1D. The LR-LF-RR-RF denotation would make more sense if the authors could demonstrate that a walking bout almost always started from LR, as seen in the two examples in Fig. 1D.

      We designated the sequence as LR-LF-RR-RF to illustrate the lateral sequence pattern. But the reviewer is correct that a shifted version of this sequence, such as RF-LR-LF-RR, is also valid. We are not making any claim that the LR limb is always the first to move in a walking bout, but rather, that limbs on the same side of the body move one after the other, followed by the limbs on the opposite side. We have edited the text to hopefully clarify this point: “Mice walked with a lateral sequence gait pattern (e.g., LRLFRRRF), with the limbs on the same side of the body moving one after the other, followed by movement of limbs on the opposite side (Figure 1E).”

      (2) The study identified a biased D1-MSN activation at movement initiation, which was not reported in previous studies that relied on measuring calcium dynamics. The authors attributed the difference to the temporal resolution of electrophysiological versus optic methods. The authors would probably notice that in some previous studies that relied also on optic-tagging and electrophysiological recordings, start/stop activity was not found to be different between direct and indirect pathway MSNs. The authors should discuss these studies and offer some possible explanations.

      This is an oversight on our part, and we thank the reviewer for noting this. We are aware of one such study (Jin & Costa 2014); we apologize if other studies were missed. The Discussion has been updated as follows to discuss this paper: “We also note that another study employing optogenetic tagging did not find significant D1/D2 MSN differences is start/stop activity (Jin & Costa 2014). However, the movement being measured was an instrumental action (rewardguided lever pressing), as opposed to self-initiated motion examined in our work. This suggests either that imbalances between D1 and D2 MSN start activity may be more pronounced under specific behavioral conditions, or that results vary depending on how movement initiation and cessation events are identified.”

      (3) The authors could add some denotations to the peak firing rates in Fig. 3D to aid visualization, so that readers could get a sense of the distribution of neurons preferring each phase of the movements.

      We appreciate this suggestion. We tried adding various colored lines to denote the peak firing rates, but ultimately, we felt the lines were not helpful and potential deleterious for some readers. We thus decided not to add any lines to the plot.

      (4) Although the relative strength of D1/D2-MSN coding of body speed and movement cessation was found after dopamine lesion, it seemed that D1-MSNs cessation coding, as well as D1- and D2-MSN speed coding, were all altered after dopamine lesion (Fig. S3). The authors could mention these to avoid misunderstandings.

      We thank the reviewer for their observation. In the Results, we now mention that “while speed coding remained balanced between D1 and D2 MSNs, there was a substantial reduction in the speed coding score of both cell types after dopamine lesions.” The stop modulation index did not change appreciably.

      Reviewer #3 (Recommendations For The Authors):

      (1) A suggestion would be to put more emphasis in the title on the first parts of the study, i.e. detailed correlation between striatal activity and quantified motion, and not only focus on the dopamine depletion model.

      We considered other titles, but felt that our current choice is appropriate given that the study’s climax is with the dopamine lesion results in Figures 5 & 6.

      (2) The calculation and the significance of the vector length should be more detailed in the results as it is used all along as a measure of "the strength of neural entrainment to the gait cycle".

      We have added the following statement in the Results section to clarify the significance of vector length: “The vector length is a unitless parameter which can theoretically vary from 0 to 1, with 0 representing a neuron whose spikes occur at random limb phases, and 1 representing a neuron which always spikes at the same phase. Thus, higher vector length indicates a stronger entrainment of spiking activity to a specific limb phase.” For details on how vector length is calculated we refer readers to our Methods, specifically the section entitled “Gait phase coding analysis.”

      (3) There is no difference in the ipsi- or contralateral limbs while recordings are made only in the right hemisphere. Given that MSNs receive inputs from IT and PT neurons from the motor cortex, would it not be expected to have differences in the phase-locked activity to right versus left limbs? This is a question also with the dopamine depletion model which is performed with unilateral 6OHDA injections.

      This is something we also wondered and were somewhat surprised by the lack of a contralateral bias in the phase locking vector length, as shown in Figure 2 – figure supplement 1D. We have two hypotheses as to why there is no ipsi/contra-lateral bias. First, it is possible that striatal neurons receive similar levels of synaptic input signaling ipsi/contra-lateral limb movements. Second, the strongly correlated motion of diagonally opposed limbs may give the appearance that neurons that are phase-locked to one limb (e.g., LF) are also locked to the diagonally opposite limb (i.e., RR). We see evidence of this diagonal limb coupling in Figure 2 – figure supplement 1B.

      (4) Among the 45% of striatal neurons that display significant phase-locking to at least one limb, it would be interesting to describe the % of neurons being phase-locked to several limbs and whether they are specific subtypes. Are there animals with more phase-locked cells in several limbs?

      This is indeed a very interesting and important point which relates to the major concern that “evidence supporting the conclusion that striatal neurons encode single-limb gait is incomplete.” As described above we now present new analysis demonstrating that neurons are more phaselocked to single-limb gait rather than multiple limbs (Figure 2 – figure supplement 1, panels AC). These results are discussed in detail in response to Reviewer #1’s first comment. For conciseness we will not repeat the same response here but instead refer the reviewer to Reviewer #1, comment #1. With regard to whether there are specific subtypes, we performed the same analysis on optogenetically identified D1/D2 MSNs and found similar trends, but did not show these results in the manuscript to avoid redundancy.

      (5) The Venn diagram in Fig. 3C shows ~40% of striatal cells encoding body speed, single-limb and start/stop information. Nevertheless, this percentage is limited by the number of single-limb phase-locked cells as almost all have a firing rate related to body speed and start/stop signals. This could be discussed.

      This is a very interesting observation. Basically, the reviewer is noting that almost all the phaselocked cells also encode start/stop and/or speed. We have now updated the Discussion to specifically discuss this observation: “We found a different percentage of striatal neurons which encoded limb phase, movement initiation or cessation, and speed (Figure 3). Among these three categories, limb phase coding cells represented the smallest population with ~45% of neurons, as opposed to ~90% for start/stop or speed. In addition, nearly all phase coding cells were also significantly responsive to start/stop or speed, whereas a sizable proportion of start/stop or speed coding cells were not entrained to limb phase. It is unclear, however, whether these population size differences reflect a proportionally smaller role for the striatum in regulating single-limb gait as opposed to whole-body movement initiation, cessation or speed.”

      (6) D1/D2 analysis:

      For optogenetic identification of D1 and D2 neurons, 39 D1 neurons and 40 D2 neurons were extracted from the total of 274 recorded neurons while 222 neurons were optogenetically tagged according to the mat and meth. Were there any technical difficulties that made it difficult to identify more neurons?

      The low yield of optogenetic tagging is quite common in the literature due to the rigorous criteria which must be satisfied in order to qualify as a tagged neuron (e.g., Kvitsiani & Kepecs 2013). The number 222 neurons quoted in the methods reflects the entirety of optogenetically tagged neurons in this study. Our study contained 33 mice, thus the average number of tagged units per animal was 222/33 ~ 6.7 units/animal. This is actually comparable to or slightly better than the yield reported in some other striatal literature (see for example, Figure 1 of Ryan & Nelson 2018).

      It is mentioned that "a subset" of these were phase-locked to a single limb. It would be interesting to specify the exact percentage of those neurons for D1 and D2 populations.

      Phase-locking of D2 neurons seems less sharp than D1 neurons, with a lower firing rate (Fig. 4D), please comment. Also difference in vector length for LR while none for other limbs, why? There is a balanced activity of D1 and D2 MSNs during walking (speed) and single-limb movements, but more D1 MSNs active at movement initiation. Is it also true for stop signals? Are they separated based on the speed threshold of 20 mm/s?

      As mentioned above, our new analysis specifically examines the percentage of all neurons which are phase locked to a single limb (Figure 2 – figure supplement 1, panels A-C). We have performed the same analysis on optogenetically tagged D1/D2 MSNs and found similar trends, but not show these results in the manuscript to avoid redundancy. With regard to whether phase-locking of D2 is less sharp than D1 MSNs, the “sharpness” of phase-locking is characterized by the mean vector length. And we show that on average, the vector length is statistically the same for D1 and D2 MSNs in healthy mice (Figure 4F). The reviewer noted that the D2 vector length in Figure 4F appears visibly higher for LR while not for other limbs, however, this difference is not statistically significant. With regard to whether more D1 MSNs are active during movement cessation, we show that both sham and dopamine lesioned mice have similar levels of D1/D2 MSN activity during stop (Figure 6 – figure supplement 1, panels A & B). Details of how start, stop, and speed are calculated are provided in the Methods.

      The relationship between firing and body speed (Fig. 4H) displays differences between D1 and D2. If a speed inferior to 20 mm/s, corresponds to "start or stop signal" as mentioned in the mat and meth, then early difference would correspond to start, but still there is a difference between 20 and 100 mm/s and after 150 mm/s. These results should be commented on.

      The reviewer is correct that in the plot of firing rate vs body speed (Figure 4J), there visibly appears to be a difference between D1 and D2 MSNs at low speeds. However, according to our pre-determined measure of speed coding which relies on the correlation coefficient between firing rate and speed, D1 and D2 MSNs have similar speed coding indices. Since there is a precedent for using the correlation coefficient to quantify speed coding (Fobbs & Kravitz 2020; Kropff & Moser 2015), we prefer to stick with this measure despite some caveats. Furthermore, the apparent difference between D1 and D2 MSNs in Figure 4J is not seen in either sham or dopamine lesioned mice (Figure 6 – figure supplement 1, panels D & E). Taken together, we do not believe the apparent speed coding difference in Figure 4J rises to the level of a consistent result.

      (7) The timing of normalized firing rate in relation to start/stop signals might be also quite interesting to comment on. D1 neurons have stronger activation for start signals and it seems that it is also earlier, with D2 activated after the onset of the movement (Fig. 4G).

      We appreciate the observation that D1 neurons appear to fire a little earlier than D2 neurons in Figure 4I. However, this did not rise to the level of a statistically significant result by our attempted quantitative analysis (not shown). Furthermore, the earlier timing of D1 is not apparent in sham lesioned animals in Figure 6I, thus overall we cannot make any confident statements about earlier timing of D1 start signals.

      In dopamine lesion experiments, in sham mice, it seems that both D1 and D2 have higher activity after the onset of the movement and that the peak of D2 activity is earlier (Fig. 6G). In 6OHDA mice, both peaks are after the onset of the movement although they are much less clearly defined.

      Both peaks become less sharp after 6OHDA lesions, but in terms of amplitude the main effect is a reduction in the D1 start signal. This is reflected in the reduced D1 start modulation index whereas the D2 index remains relatively constant.

      (8) 6OHDA model displays much fewer walking bouts with lower speed and initiation rate. It would be important to include in the figure a similar representation to Fig.1 with distributions of stride frequency, duration, and length to illustrate the difference between control and 6OHDA mice. On average, how many walking bouts were analyzed in control and 6OHDA animals?

      We have added new data similar to Figure 1 with distributions of stride frequency, duration, and length to illustrate the difference between sham and 6OHDA mice (Figure 5 – figure supplement 1, panels B & C). We also added the following information on the number of walking bouts: “The mean number of walking bouts per session was reduced from 124 ± 42 in sham to 47 ± 19 in dopamine lesioned mice (mean ± SD).”

      The initiation rate is particularly low in 6OHDA animals, 3-4 per minute, did the authors make longer behavioral recordings to extract enough initiation/stop signals for neural correlation analysis?

      All of our recordings were of the same duration (30 minutes). This duration was pre-determined at the beginning of the study to ensure consistency.

      The stride length seems smaller on the right limbs in 6OHDA mice and vector length in D2 neurons as well, while there is no change in D1 neurons. Is it a significant effect? If yes, it would be important to comment on this.

      The ANOVA test in those figures was not designed to perform post-hoc multiple comparisons between different limbs. However, if one changes the ANOVA design then the effect for stride length is significant. This is probably related to the ipsiversive turning bias in the unilateral 6OHDA lesion model. Though we have not changed the ANOVA design, in the Discussion we do comment on the shorter stride length on the right limbs in 6OHDA mice in Figure 5G. There is no significant difference in D2 vector length between different limbs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Debeuf et al. introduce a new, fast method for the selection of suitable T cell clones to generate TCR transgenic mice, a method claimed to outperform traditional hybridoma-based approaches. Clone selection is based on the assessment of the expansion and phenotype of cells specific for a known epitope following immune stimulation. The analysis is facilitated by a new software tool for TCR repertoire and function analysis termed DALI. This work also introduces a potentially invaluable TCR transgenic mouse line specific for SARS-CoV-2.

      Strengths:

      The newly introduced method proved successful in the quick generation of a TCR transgenic mouse line. Clone selection is based on more comprehensive phenotypical information than traditional methods, providing the opportunity for a more rational T cell clone selection.

      The study provides a software tool for TCR repertoire analysis and its linkage with function.

      The findings entail general practical implications in the preclinical study of a potentially very broad range of infectious diseases or vaccination.

      A novel SARS-CoV-2 spike-specific TCR transgenic mouse line was generated.

      Weaknesses:

      The authors attempt to compare their novel method with a more conventional approach to developing TCR transgenic mice. In this reviewer's opinion, this comparison appears imperfect in several ways:

      (1) Work presenting the "traditional" method was inadequate to justify the selection of a suitable clone. It is therefore not surprising that it yielded negative results. More evidence would have been necessary to select clone 47 for further development of the TCR transgenic line, especially considering the significant time and investment required to create such a line.

      Based on Supplementary Figure 1A only, we understand the concern of the reviewer. However, the data presented in Supplementary Figure 1A is collected during the first rough screening of clones where only the production of IL-2 and IFN-y was measured as a readout for activation. Thereafter, a large selection of responsive clones was further grown and co-cultured with a dose-titration of the antigenic peptide pool. In this second co-culture, also flow cytometry readouts are included such as CD69 expression (as shown in Supplementary Figure 1B). Finally, a narrower selection of responder clones was co-cultured with the different individual peptides to unravel the specificity of the TCR of the clone. In conclusion, the clone was tested at least three times in three distinct set-ups with multiple different readouts.

      However, a good evaluation of a clone in an in vitro setting does not necessarily translate in optimal functioning of the cells in a biological context. For instance, some clones survive better in an in vitro setting than others or have already a more activated profile before stimulation.

      (2) The comparison is somewhat unfair, because the methods start at different points: while the traditional method was attempted using a pool of peptides whose immunogenicity does not appear to have been established, the new method starts by utilising tetramers to select T cells specific for a well-established epitope.

      Given the costs and time involved, only a single clone could be tested for either method, intrinsically making a proper comparison unfeasible. Even for their new method, the authors' ability to demonstrate that the selected clone is ideal is limited unless they made different clones with varying profiles to show that a particular profile was superior to others.

      In my view, there was no absolute need to compare this method with existing ones, as the proposed method holds intrinsic value.

      We acknowledge the importance of the well-established hydridoma technology and in no way intended to compare these methods head-to-head, nor do not want to question the validity of the classical methods. The reason why we also wanted to show the failed CORSET8 mouse was to highlight the parts of the TCR generating process which could be rationalized. We again want to emphasize that we do not want to compare methods in any way and recognise that we started from two different bases in terms of clone selection (peptide pool stimulation versus tetramer staining). While the tetramer staining that was employed in the generation of CORSET8 mice allowed to enrich the samples for specific responder clones, this enrichment step is not an absolute requirement for the implementation of the presented method or for the successful generation of a TCR Tg mouse model. An alternative approach could be to use the described method to select for activated and expanded clones upon immunisation and test their reactivity in subsequent steps using peptide stimulation before selecting a receptor. In conclusion, we merely wish to present a novel roadmap for others to use for the generation of their TCR Tg mouse to aid in the selection of the most preferable clone for their purposes.

      (3) While having more data to decide on clone selection is certainly beneficial, given the additional cost, it remains unclear whether knowing the expression profiles of different proteins in Figure 2 aids in selecting a candidate. Is a cell expressing more CD69 preferable to a cell expressing less of this marker? Would either have been effective? Are there any transcriptional differences between clonotype 1 and 2 (red colour in Figure 2G) that justify selecting clone 1, or was the decision to select the latter merely based on their different frequency? If all major clones (i.e. by clonotype count) present similar expression profiles, would it have been necessary to know much more about their expression profiles? Would TCR sequencing and an enumeration of clones have sufficed, and been a more cost-effective approach?

      The method we present in the paper serves as a proof-of-concept, to be adapted to the researcher’s own needs. We agree with the reviewer that for our intentions with the CORSET8 mice, TCRseq in combination with an enumeration of the clones could also have sufficed and would lower the cost of sequencing. However, we wish to present a roadmap for others to use for the generation of their TCR Tg mouse. Important in this, is that the cellular phenotype, and activation state can be taken into consideration, which might for some projects be essential.  

      Nonetheless, we do see clear interclonal differences regarding the expression of “activation” genes, where clone 1 is clearly one of the well activated and interferon producing clones (as shown in Author response image 1). As such, researchers could expand these types of analysis to probe for specific phenotypes of characteristics.

      Author response image 1.

      (4) Lastly, it appears that several of the experiments presented were conducted only once. This information should have been explicitly stated in the figure legends.

      To control for interexperimental variation, every experiment represented in the manuscript has been performed at least two times. We have added the additional information regarding the experimental repetitions and groups in the figure legends.

      Reviewer #2 (Public Review):

      Summary:

      The authors seek to use single-cell sequencing approaches to identify TCRs specific for the SARS CoV2 spike protein, select a candidate TCR for cloning, and use it to construct a TCR transgenic mouse. The argument is that this process is less cumbersome than the classical approach, which involves the identification of antigen-reactive T cells in vitro and the construction of T cell hybridomas prior to TCR cloning. TCRs identified by single-cell sequencing that are already paired to transcriptomic data would more rapidly identify TCRs that are likely to contribute to a functional response. The authors successfully identify TCRs that have expanded in response to SARS CoV2 spike protein immunization, bind to MHC tetramers, and express genes associated with functional response. They then select a TCR for cloning and construction of a transgenic mouse in order to test the response of resulting T cells in vivo following immunization with spike protein of coronavirus infection.

      Strengths:

      (1) The study provides proof of principle for the identification and characterization of TCRs based on single-cell sequencing data.

      (2) The authors employ a recently developed software tool (DALI) that assists in linking transcriptomic data to individual clones.

      (3) The authors successfully generate a TCR transgenic animal derived from the most promising T cell clone (CORSET8) using the TCR sequencing approach.

      (4) The authors provide initial evidence that CORSET8 T cells undergo activation and proliferation in vivo in response to immunization or infection.

      (5) Procedures are well-described and readily reproducible.

      Weaknesses:

      (1) The purpose of presenting a failed attempt to generate TCR transgenic mice using a traditional TCR hybridoma method is unclear. The reasons for the failure are uncertain, and the inclusion of this data does not really provide information on the likely success rate of the hybridoma vs single cell approach for TCR identification, as only a single example is provided for either.

      We refer to comments 2 and 3 of reviewer 1 for an answer to this point.

      (2) There is little information provided regarding the functional differentiation of the CORSET8 T cells following challenge in vivo, including expression of molecules associated with effector function, cytokine production, killing activity, and formation of memory. The study would be strengthened by some evidence that CORSET8 T cells are successfully recapitulating the functional features of the endogenous immune response (beyond simply proliferating and expressing CD44). This information is important to evaluate whether the presented sequencing-based identification and selection of TCRs is likely to result in T-cell responses that replicate the criteria for selecting the TCR in the first place.

      We agree with the reviewer that the data in the initial manuscript included only a limited in vivo functional validation of the CORSET8 T cells. Therefore, we extended these in vivo readouts and measured IFN-g production, CD69, T-bet expression (as measure for activation) and Ki-67 expression (as alternative readout than CTV for proliferation). In the single cell data, we saw that these markers were more pronounced in the selected clone compared to other clones. We could confirm these findings in vivo, and found a stronger induction of IFN-g, CD69, T-bet and Ki-67 in CORSET8 T cells compared to endogenous CD45.2 cells and even Spike-Tetramer+ CD45.2 endogenous cells. We added these data in Figure 4.

      (3) While I find the argument reasonable that the approach presented here has a lot of likely advantages over traditional approaches for generating TCR transgenic animals, the use of TCR sequencing data to identify TCRs for study in a variety of areas, including cancer immunotherapy and autoimmunity, is in broad use. While much of this work opts for alternative methods of TCR expression in primary T cells (i.e. CRISPR or retroviral approaches), the process of generating a TCR transgenic mouse from a cloned TCR is not in itself novel. It would be helpful if the authors could provide a more extensive discussion explaining the novelty of their approach for TCR identification in comparison to other more modern approaches, rather than only hybridoma generation.

      By integrating the recent technological advances in single cell sequencing into the generation of TCR Tg mice, possibilities arise to rationalize clone selection regarding clonal size, lineage/phenotype and functional characteristics. Often, the selection process based on hybridoma selection yields multiple epitope specific clones that upregulate CD69 or IL-2, and only minimal functional and phenotypic parameters are checked before prioritizing one clone to proceed with. In our experience, transgenic clones selected in this way sometimes render TCR clones unable to compete with endogenous polyclonal T clones in vivo. Taken all these caveats into account, the novelty we present here is that the researcher is fully able to select clones based on several layers of information without the need for extensive or repeated screening. Moreover, the selection of the TCR Tg clone can be done via the interactive and easily interpretable DALI tool. Owing to the browser-based interactive GUI, immunologists having limited coding experience can effectively analyse their complex datasets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Regarding Supplementary Figure 1A was the experiment conducted more than once? Clone 47 seems minimally superior to the other clones. Incorporating a positive control, such as the response of the OT-I hybridoma to SIINFEKL, could have provided a benchmark to gauge the strength of the observed responses.

      Also, what was the concentration of the peptide used to restimulate the T cells in vitro? High peptide concentrations can lead to non-specific responses. Ideally, a titration should have been performed, perhaps in a subsequent experiment that only tested those clones that responded well initially. Given the resources required to create and maintain a transgenic mouse line, proceeding with the chosen clone based on the data presented seems to carry considerable risk.

      The experiment has been performed three times. The data presented in Supplementary Figure 1A is collected during the first rough screening of clones where only the production of IL-2 and IFN-y was measured as a readout for activation. Thereafter, a large selection of responsive clones was further grown and co-cultured with a dose-titration of the antigenic peptide pool. In this second co-culture, also flow cytometry readouts are included such as CD69 expression (as shown in Supplementary Figure 1B). Finally, a narrower selection of responder clones was co-cultured with the different individual peptides to unravel the specificity of the TCR of the clone. In conclusion, the clone was tested at least three times in three distinct set-ups with multiple different readouts.

      In Supplementary Figure 1C, no response to stimulation was detected. Ideally, this figure should have included a positive control, such as PMA/Ionomycin or aCD3/CD28 stimulation.

      We agree with the reviewer that this experiment should have included a positive control to validate the non-specific responsiveness of the clone and the technical feasibility of the experiment. Unfortunately, the initial CORSET8 line is frozen and is thus not easily available to repeat the experiment.

      Can the authors clarify their gating strategy in the legend of In Supplementary Figure 1D?

      Plotted cells are non-debris > single cells > viable cells > CD45+. We have added the information to the legend of Supplementary Figure 1D.

      In Figure 2, the figure legend should provide more detail on which cells were sorted for the single-cell RNA sequencing analysis. The materials and methods section explains that cells were stained for CD44. Were activated cells then sorted (either tetramer-positive or -negative), plus naïve CD8 T cells from a naïve mouse?

      Supplementary Figure 2 contains the detailed gating strategy during the sort for the single cell experiment. We have added additional red gates to the plots to clarify which samples were sent for sequencing. This has been adapted in the figure legends of both Figure 2 and Supplementary Figure 2. 

      In Figure 3, Rag1 sufficient transgenic mice display similar numbers of CD4 and CD8 T cells as WT mice in the spleen. Typically, transgenic mice present skewed frequencies of T cells towards the type generated (CD8 in this case), which the authors only found in the thymus of CORSET8 mice. Could this be discussed?

      The comment of the reviewer is valid as there is indeed a skewing towards CD8 T cells in the thymi of the CORSET8 mice. We looked back into the data of the experiments and noticed that poor resolution of some markers might have resulted in improper results. We have repeated this and added another T cell marker (TCRbeta) next to the already included CD3e marker. By including both markers, we were able to show that also in spleen the skewing towards the CD8 T cell phenotype is present.

      How many repetitions were performed for the experiments in Figures 3D and 3E? How many mice were analyzed for Figure 3E? Please provide this information in the figure legend. Also, include a proper quantification and statistical analysis of the data shown.

      New quantification graphs with statistical analysis have been added to Figure 3E. The accompanying figure legend has been adapted. The co-culture displayed in Figure 3D is a representative experiment of two repetitions.

      Figure 4C includes 3-4 mice per group. This experiment should have been replicated, and this information should be indicated in the figure legend.

      We apologise for omitting this data in the figure legend. The experiment presented in Figure 4A-C has been repeated twice, yielding results following the same trend. We were unable to pool the data as two different proliferation dyes were used in the separate experiments (CFSE and CTV). Furthermore, in the in vivo BSL3 experiments represented in figure 4E-H, we always took along the Spike/CpG-group as positive control. We have added the additional information regarding the experimental repetitions and groups in the figure legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Aging is associated with a number of physiologic changes including perturbed circadian rhythms. However, mechanisms by which rhythms are altered remain unknown. Here authors tested the hypothesis that age-dependent factors in the sera affect the core clock or outputs of the core clock in cultured fibroblasts. They find that both sera from young and old donors are equally potent at driving robust ~24h oscillations in gene expression, and report the surprising finding that the cyclic transcriptome after stimulation by young or old sera differs markedly. In particular, genes involved in the cell cycle and transcription/translation remain rhythmic in both conditions, while genes associated with oxidative phosphorylation and Alzheimer's Disease lose rhythmicity in the aged condition. Also, the expression of cycling genes associated with cholesterol biosynthesis increases in the cells entrained with old serum. Together, the findings suggest that age-dependent blood-borne factors, yet to be identified, affect circadian rhythms in the periphery. The most interesting aspect of the paper is that the data suggest that the same system (BJ-5TA), may significantly change its rhythmic transcriptome depending on how the cells are synchronized. While there is a succinct discussion point on this, it should be expanded and described whether there are parallels with previous works, as well as what would be possible mechanisms for such an effect.

      We’ve expanded our discussion in the manuscript to discuss possible mechanisms and also how the genes/pathways implicated in our study relate to other aging literature.  

      Major points: 

      Fig 1 and Table S1. Serum composition and levels of relevant blood-borne factors probably change in function of time. At what time of the day were the serum samples from the old and young groups collected? This important information should be provided in the text and added to Table S1. 

      We made sure to highlight the collection time in the abstract of the manuscript “We collected blood from apparently healthy young (age 25-30) and old (age 70-76) individuals at 14:001 and used the serum to synchronize cultured fibroblasts.” The time of blood draw is also in sections of the paper (Intro and Methods). Since Table S1 is demographic information, we did not think that the blood draw time fit best there, but hopefully it is now clear in the text.

      Fig 2A. Luminescence traces: the manuscript would greatly benefit from inclusion of raw luminescence traces.

      Raw luminescence traces have been added to Figure S3 (S3A).

      Fig 2. Of the many genes that change their rhythms after stimulation with young and old sera, what are the typical fold changes? For example, it would be useful to show histograms for the two groups. Does one group tend to have transcript rhythms of higher or lower fold changes? 

      We’ve presented these data in Figure S5. There are a few significant differences, but largely the groups are similar in terms of fold change.

      Fig. 2 Gene expression. Also here, the presentation would benefit from showing a few key examples for different types of responses. 

      Sample traces of genes that gain rhythmicity, lose rhythmicity, phase shift, and change MESOR are now illustrated in Figure S6.

      What was the rationale to use these cells over the more common U2OS cells? Are there similarities between the rhythmic transcriptomes of the BJ-5TA cells and that of U2OS cells or other human cells? This could easily be assessed using published datasets. 

      The original rationale to use BJ-5TA fibroblast cells was that we were aiming to build upon an observation found in a previous study2 which showed that circadian period changes with age in human fibroblasts. While our findings did not match theirs, we think an added benefit of using the BJ-5TA line is that unlike U2OS cells, it is not a carcinoma derived cell line. We’ve added this point in lines 98-101.

      Our study finds many more rhythmic transcripts compared to the previous studies examining U2OS cells. This can be attributed to several factors including differences in methods, including the use of human serum in our study, cell type differences, or decoupling of rhythms in some cancer cells. While a comparison of BJ-5TA cells and U2OS cells could be interesting, a proper comparison requires investigation of many data sets, since any pair of BJ-5TA and U2OS data sets will most likely differ in some detail of experimental design or data processing pipeline, which could contribute to observed differences in rhythmic transcripts.

      That being said, we compared clock reference genes (see Author response image 1) between BJ-5TA and U2OS cells, comparing circadian profiles obtained from our data with those available on CircaDB. These circadian profiles exhibit many similarities and a few differences. The peak to trough ratios (amplitudes) are quite similar for ARNTL, NR1D1, NR1D2, PER2, PER3, and are about 25% lower for CRY1 and somewhat higher for TEF (about 15%) in our data. We find that the MESORS are generally similar with the exception of NR1D1 which is much lower and NR1D2 which is much higher in our data.

      Author response image 1.

      BJ-5TA and U2OS Cells Exhibit Similar Profiles of Circadian Gene Transcription. We compared the transcriptomic profiles of the BJ-5TA cells in young and old serum (left) to the U2OS transcriptomic data (right) available on CircaDB, a database containing profiles of several circadian reference genes in U2OS cells. This figure suggests that circadian profiles of these genes exhibit many similarities. We find that the peak to trough ratios (amplitudes) are similar for ARNTL, NR1D1, NR1D2, Per2, PER3, and that the MESORS are similar (with the exception of NR1D1 which is much lower and NR1D2 which is much higher in the BJ-5TA cells). We find that the amplitudes of CRY1 is ~25% lower and TEF is ~15% higher for the BJ5TA cells. The axis for plots on the left show counts divided by 3.5 in order to made MESORs of ARNTL similar to ease comparison.

      For the rhythmic cell cycle genes, could this be the consequence of the serum which synchronizes also the cell cycle, or is it rather an effect of the circadian oscillator driving rhythms of cell cycle genes? 

      This is an interesting point. Given our previous data showing that the cell cycle gene cyclin D1 is regulated by clock transcription factors3, we believe the circadian oscillator drives, or at least contributes, to rhythms of cell cycle genes. However, the serum clearly makes a difference as we find that MESORs of cell cycle genes decrease with aged serum. This is consistent with the decreased proliferation previously observed in aged human tissue4.

      While the reduction of rhythmicity in the old serum for oxidative phosphorylation transcripts is very interesting and fits with the general theme that metabolic function decreases with age, it is puzzling that the recipient cells are the same, but it is only the synchronization by the old and young serum that changes. Are the authors thus suggesting that decrease of metabolic rhythms is primarily a non cell-autonomous and systemic phenomenon? What would be a potential mechanism? 

      We are indeed suggesting this, although it is also possible that it is not cycling per se, but rather an overall inefficiency of oxidative phosphorylation that is conveyed by the serum. Relating other work in the field to our findings, we’ve added the following to our discussion: “Previous work in the field demonstrates that synchronization of the circadian clock in culture results in cycling of mitochondrial respiratory activity5,6 further underscoring the different effects of old serum, which does not support oscillations of oxidative phosphorylation associated transcripts. Age-dependent decrease in oxidative phosphorylation and increase in mitochondrial dysfunction7 has been seen in aged fibroblasts8 and contributes to age-related diseases9. We suggest that the age-related inefficiency of oxidative phosphorylation is conferred by serum signals to the cells such that oxidative phosphorylation cycles are mitigated. On the other hand, loss of cycling could contribute to impairments in mitochondrial function with age.”

      The delayed shifts after aged serum for clock transcripts (but not for Bmal1) are interesting and indicate that there may be a decoupling of Bmal1 transcript levels from the other clock gene phases. How do the authors interpret this? could it be related to altered chronotypes in the elderly? 

      One possible explanation is that the delay of NPAS2, BMAL1’s binding partner, results in the delay of the transcription of clock controlled genes/negative arm genes. Since the RORs do not seem to be affected, Bmal is transcribed/translated as usual, but there isn’t enough NPAS2 to bind with BMAL1. In this case downstream genes are slower to transcribe causing the phase delay.

      Reviewer #2 (Public Review): 

      Schwarz et al. have presented a study aiming to investigate whether circulating factors in sera of subjects are able to synchronize depending on age, circadian rhythms of fibroblast. The authors used human serum taken from either old (age 70-76) or young (age 25-30) individuals to synchronise cultured fibroblasts containing a clock gene promoter driven luciferase reporter, followed by RNA sequencing to investigate whole gene expression. 

      This study has the potential to be very interesting, as evidence of circulating factors in sera that mediate peripheral rhythms has long been sought after. Moreover, the possibility that those factors are affected by age which could contribute to the weaken circadian rhythmicity observed with aging. 

      Here, the authors concluded that both old and young sera are equally competent at driving robust 24 hour oscillations, in particular for clock genes, although the cycling behaviour and nature of different genes is altered between the two groups, which is attributed to the age of the individuals. This conclusion could however be influenced by individual variabilities within and between the two age groups. The groups are relatively small, only four individual two females and two males, per group. And in addition, factors such as food intake and exercise prior to blood drawn, or/and chronotype, known to affect systemic signals, are not taken into consideration. As seen in figure 4, traces from different individuals vary heavily in terms of their patterns, which is not addressed in the text. Only analysing the summary average curve of the entire group may be masking the true data. More focus should be attributed to investigating the effects of serum from each individual and observing common patterns. Additionally, there are many potential causes of variability, instead or in addition to age, that may be contributing to the variation both, between the groups and between individuals within groups. All of this should be addressed by the authors and commented appropriately in the text. 

      We are not aware of any specific feature distinguishing the subjects (other than age) that could account for the differences between old and young. The fact that we see significant differences between the two groups, even with the relatively small size of the groups, suggests strongly that these differences are largely due to age. Nevertheless, we acknowledge that individual variability can be a contributing factor. For instance, the change in phase of clock genes appears to be driven largely by two subjects. We have commented on this and individual differences, in general, in the discussion.  

      The authors also note in the introduction that rhythms in different peripheral tissues vary in different ways with age, however the entire study is performed on only fibroblast, classified as peripheral tissue by the authors. It would be very interesting to investigate if the observed changes in fibroblast are extended or not to other cell lines from diverse organ origin. This could provide information about whether circulating circadian synchronising factors could exert their function systemically or on specific tissues. At the very least, this hypothesis should be addressed within the discussion. 

      It is likely that factors circulating in serum act on several tissues, and so their effects are relatively broad. However, this would require extensive investigation of other tissues. We now discuss this in the manuscript.

      In addition to the limitations indicated above I consider that the data of the study is an insufficiently analysis beyond the rhythmicity analysis. Results from the STRING and IPA analysis were merely descriptive and a more comprehensive bioinformatic analysis would provide additional information about potential molecular mechanism explaining the differential gene expression. For example, enrichment of transcription factors binding sites in those genes with different patters to pinpoint chromatin regulatory pathways.

      We performed LinC similarity analysis (LISA) to study enrichment of transcription factor binding. Results are displayed in Fig 3B and in lines 157-168. 

      Recommendations for the authors:

      The two reviewers and reviewing editor have agreed on the following recommendations for the authors: 

      Major: 

      (1) The bioinformatic analysis would benefit from a more thorough focus on variability between individuals. Specifically, the main conclusion of the manuscript could be significantly influenced by individual variabilities within and between the two age groups. This is of particular concern, as the groups are relatively small (four individual two females and two males, per group). In addition, the consideration of factors such as food intake and exercise prior to blood drawn, or/and chronotype, known to affect systemic signals should be more adequately explained. The lab is an experienced chronobiology lab, and thus we are confident that these factors had been thought of, but this needs to be better made clear.

      As seen in Figure 4, traces from different individuals vary heavily in terms of their patterns, which is not addressed in the text. Only analysing the summary average curve of the entire group may be masking the relevant data. Furthermore, there are many potential causes of variability, instead or in addition to age, that may be contributing to the variation both, between the groups and between individuals within groups. All of this should be addressed by the authors and commented appropriately in the text. 

      We are not aware of any specific feature distinguishing the subjects (other than age) that could account for the differences between old and young. The fact that we see significant differences between the two groups, even with the relatively small size of the groups, suggests strongly that these differences are largely due to age. Nevertheless, we acknowledge that individual variability can be a contributing factor. For instance, the change in phase of clock genes appears to be driven largely by two subjects. We have commented on this and individual differences, in general, in the discussion. 

      (2) The study would benefit from a more thorough analysis of the data beyond the rhythmicity analysis. Results from the STRING and IPA analysis were merely descriptive and a more comprehensive bioinformatic analysis would provide additional information about potential molecular mechanism explaining the differential gene expression. For example, enrichment of transcription factors binding sites in those genes with different patters to pinpoint chromatin regulatory pathways. This would provide additional value to the study, especially given the otherwise apparent lack of any mechanistic explanation. 

      We performed LinC similarity analysis (LISA) to study enrichment of transcription factor binding. Results are displayed in Fig 3B and in lines 157-168.

      (3) There were some questions about the amplitude of the core circadian clock gene rhythms raised, which in other human cell types would be much higher. A comment on this matter and the provision of the raw luminescence traces for Fig 2A would be greatly beneficial.

      Addressing the same topic: what are the typical fold changes of the many genes that change their rhythms after stimulation with young and old sera? For example, it would be useful to show histograms for the two groups. Does one group tend to have transcript rhythms of higher or lower fold changes? The presentation of the manuscript would further benefit from showing a few key examples for different types of responses. 

      The average luminescence trace for each individual serum sample from Fig 2A has been added to Fig S3A.

      We’ve presented the fold change data in Figure S5. There are a few significant differences, but largely the groups are similar in terms of fold change.

      (4) There are several points that we recommend to consider to add to the discussion: 

      What was the rationale to use these cells over the more common U2OS cells? Are there similarities between the rhythmic transcriptomes of the BJ-5TA cells and that of U2OS cells or other human cells? It should be relatively easy to address this point by assessing published datasets. 

      The original rationale to use BJ-5TA fibroblast cells was that we were aiming to build upon an observation found in a previous study2 which showed that circadian period changes with age in human fibroblasts. While our findings did not match theirs, we think an added benefit of using the BJ-5TA line is that unlike U2OS cells, it is not carcinoma derived cell line. We’ve added this point in lines 98-101. 

      Our study finds many more rhythmic transcripts compared to the previous studies examining U2OS cells. This can be attributed to several factors including differences in methods, including the use of human serum in our study, cell type differences, or decoupling of rhythms in some cancer cells. While a comparison of BJ-5TA cells and U2OS cells could be interesting, a proper comparison requires investigation of many data sets, since any pair of BJ-5TA and U2OS data sets will most likely differ in some detail of experimental design or data processing pipeline, which could contribute to observed differences in rhythmic transcripts.

      That being said, we compared clock reference genes (see Author response image 1) between BJ-5TA and U2OS cells, comparing circadian profiles obtained from our data with those available on CircaDB. These circadian profiles exhibit many similarities and a few differences. The peak to trough ratios (amplitudes) are quite similar for ARNTL, NR1D1, NR1D2, PER2, PER3, and are about 25% lower for CRY1 and somewhat higher for TEF (about 15%) in our data. We find that the MESORS are generally similar with the exception of NR1D1 which is much lower and NR1D2 which is much higher in our data.

      For the rhythmic cell cycle genes, could this be the consequence of the serum which synchronizes also the cell cycle, or is it rather an effect of the circadian oscillator driving rhythms of cell cycle genes? 

      This is an interesting point. Given our previous data showing that the cell cycle gene cyclin D1 is regulated by clock transcription factors3, we believe the circadian oscillator drives, or at least contributes to rhythms of cell cycle genes. However, the serum clearly makes a difference as we find that MESORs of cell cycle genes decrease with aged serum. This is consistent with the decreased proliferation previously observed in aged human tissue.

      While the reduction of rhythmicity in the old serum for oxidative phosphorylation transcripts is very interesting and fits with the general theme that metabolic function decreases with age, it is puzzling that the recipient cells are the same, but it is only the synchronization by the old and young serum that changes. Are the authors thus suggesting that decrease of metabolic rhythms is primarily a non cell-autonomous and systemic phenomenon? What would be a potential mechanism? 

      It may not be the cycling per se, but rather an overall inefficiency of oxidative phosphorylation that is conveyed by the serum. Relating other work in the field to our findings, we’ve added the following to our discussion: “Previous work in the field demonstrates that synchronization of the circadian clock in culture results in cycling of mitochondrial respiratory activity5,6 further underscoring the different effects of old serum, which does not support oscillations of oxidative phosphorylation associated transcripts. Age-dependent decrease in oxidative phosphorylation and increase in mitochondrial dysfunction7 is seen also in aged fibroblasts8 and contributes to age-related diseases9. We suggest that the age-related inefficiency of oxidative phosphorylation is conferred by serum signals to the cells such that oxidative phosphorylation cycles are mitigated. On the other hand, loss of cycling could contribute to impairments in mitochondrial function with age.”

      The delayed shifts after aged serum for clock transcripts (but not for Bmal1) are interesting and indicate that there may be a decoupling of Bmal1 transcript levels from the other clock gene phases. How do the authors interpret this? Could it be related to altered chronotypes in the elderly? 

      One possible explanation is that the delay of NPAS2, BMAL1’s binding partner, results in the delay of the transcription of clock controlled genes/negative arm genes. Since the RORs do not seem to be affected, Bmal is transcribed/translated as usual, but there isn’t enough NPAS2 to bind with BMAL1. In this case downstream genes are slower to transcribe causing the phase delay.

      The discussion would also benefit from mentioning parallels and dissimiliarities with previous works, as well as what would be possible mechanisms for such an effect. 

      We’ve expanded our discussion in the manuscript to discuss possible mechanisms and also how the genes/pathways implicated in our study relate to other aging literature.  

      Minor: 

      While time of serum collection is provided in the methods, it would be very useful to provide this information, along with the accompanying argumentation also at a more prominent position and to also add it to Table S1. 

      We made sure to highlight the collection time in the abstract of the manuscript “We collected blood from apparently healthy young (age 25-30) and old (age 70-76) individuals at 14:001 and used the serum to synchronize cultured fibroblasts.” The time of blood draw is also in sections of the paper (Intro and Methods). Since Table S1 is demographic information, we did not think that the blood draw time fit best there, but hopefully it is now clear in the text.

      L73 EKG: define the abbreviation 

      We rewrote this paragraph, but defined the term where it is used the paper.  

      L77: transfected BJ-5TA fibroblasts. Mention in the text that these are stably transfected cells. 

      We added this to the text.

      L88: Day 2 also revealed different phases of cyclic expression between young and old "groups" for a larger number of genes. Here it is only two donors, right? 

      Yes, we swapped out the word “groups” for “subjects”.

      L115. MESORs of steroid biosynthesis genes, particularly those relating to cholesterol biosynthesis, were also increased in the old sera condition. This is quite interesting, can the authors speculate on the significance of this finding? 

      We’ve added discussion about this finding in the context of the literature in our discussion.

      Fig 3. - FDRs are only listed for certain KEGG pathways, and gene counts for each pathway are also missing, which excludes some valuable context for drawing conclusions. Full tables of KEGG pathway enrichment outputs should be provided in supplementary materials. Input gene lists should also be uploaded as supplementary data files.

      Both output and input files are included in this submission as additional files.  

      Line 322 - How many replicates were excluded in the end for each group? Providing this information would strengthen the claim that the ability of both old and young serum to drive 24h oscillations in fibroblasts is robust and not only individual. 

      Each serum was tested in triplicate in two individual runs of the experiment. Of the 15 serum samples, on one of the runs, a triplicate for each of two serum samples (one old, one young) was excluded. Given that only one technical replicate in one run of the experiment had to be excluded for one old and one young individual out of all the samples assayed, this supports the idea that young and old serum drive robust oscillations.

      Line 373 - Should list which active interaction sources were used for analysis. 

      In this manuscript we used STRING (search tool for retrieval of interacting genes) analysis to broadly identify relevant pathways defined by different algorithms. From these data, we focused in particular on KEGG pathways.

      Reviewer #1 (Recommendations For The Authors): 

      These comments are in addition to those provided above: 

      Minor: 

      L73 EKG: define the abbreviation 

      We rewrote this paragraph, but defined the term where it is used the paper.  

      L77: transfected BJ-5TA fibroblasts. Mention in the text that these are stably transfected cells. 

      We added this to the text.

      L88: Day 2 also revealed different phases of cyclic expression between young and old "groups" for a larger number of genes. Here it is only two donor, right? 

      Yes, we swapped out the word “groups” for “subjects”.

      L115. MESORs of steroid biosynthesis genes, particularly those relating to cholesterol biosynthesis, were also increased in the old sera condition. This is quite interesting, can the authors speculate on the significance of this finding? 

      We’ve added discussion about this finding in the context of the literature.

      Fig.4 The fold change amplitude of the clock gene seems quite a bit lower than what is usually expected (for Nr1d1 it is usually 10 fold). The authors should provide an explanation and discuss this. 

      There are a variety of factors that contribute to the fold change amplitude of clock genes. First, the change in amplitude of clock genes is lower in vitro compared to in vivo samples. For example, in U2OS cell cultures the fold change in the cycling of Nr1d1 is only 2 fold and is not significantly different from the fold change we observe (as shown in the U2OS data from CircaDB plotted in Figure 1R). Second, the method of synchronization contributes to the strength of the rhythms. Serum synchronization is generally less effective at driving strong clock cycling than forskolin or dexamethasone although, as noted in the manuscript, it may promote the cycling of more genes. Lastly, rhythm amplitude is also dependent on the cell type in question so cell to cell variability also contributes to differences. However, overall, we do not find major differences in comparing the U2OS data and ours. Please note that the y-axis has a logarithmic scale.

      What is the authors' strategy to identify which serum components that are responsible for the reported changes? This should be discussed. 

      In the future, we intend to analyze the serum factors using a combination of fractionation and either proteomics or metabolomics to identify relevant factors. We have added this to the discussion.

      Reviewer #2 (Recommendations For The Authors): 

      Overall, the article is well-written but lacks some more rigorous data analysis as mentioned in the public review above. In addition to a more thorough analysis approach focusing much more heavily on individual variability, several other changes can be made to strengthen this study:

      Fig 3. - FDRs are only listed for certain KEGG pathways, and gene counts for each pathway are also missing, which excludes some valuable context for drawing conclusions. Full tables of KEGG pathway enrichment outputs should be provided in supplementary materials. Input gene lists should also be uploaded as supplementary data files. 

      Both output and input files are included in this submission as additional files.

      Fig 1A. - Only n=5 participants were used for this analysis, explanation of the exclusion criteria for the other participants would be useful. 

      As Figure 1A is a schematic, we assume the reviewer is referring to Figure 1B. We’ve provided a flow chart of subject inclusion/exclusion in Figure S2.

      Fig 2. - For circadian transcriptome analysis only n=4 participants were used - what criteria was used to exclude individuals, and why were only these individuals used in the end? 

      As patient recruitment was interrupted by COVID, we selected samples where we had sufficient serum to effectively carry out the RNA seq experiment and control for age and sex.

      Line 322 - How many replicates were excluded in the end for each group? Providing this information would strengthen the claim that the ability of both old and young serum to drive 24h oscillations in fibroblasts is robust and not only individual. 

      Each serum was tested in triplicate in two individual runs of the experiment. Of the 15 serum samples, on one of the runs, a triplicate for each of two serum samples (one old, one young) was excluded. Given that only one technical replicate in one run of the experiment had to be excluded for one old and one young individual out of all the samples assayed, this supports the idea that young and old serum drive robust oscillations.

      Line 373 - Should list which active interaction sources were used for analysis. 

      In this manuscript we used STRING (search tool for retrieval of interacting genes) analysis to identify relevant pathways. We do not present any STRING networks in the paper.

      Line 68 - "These novel findings suggest that it may be possible to treat impaired circadian physiology and the associated disease risks by targeting blood borne factors." This is a completed overstatement that are cannot be sustained by the limited findings provided by the authors. 

      We’ve modified this statement to avoid overstating results.

      (1) Pagani, L. et al. Serum factors in older individuals change cellular clock properties. Proceedings of the National Academy of Sciences 108, 7218–7223 (2011).

      (2) Pagani, L. et al. Serum factors in older individuals change cellular clock properties. Proc Natl Acad Sci U S A 108, 7218–7223 (2011).

      (3) Lee, Y. et al. G1/S cell cycle regulators mediate effects of circadian dysregulation on tumor growth and provide targets for timed anticancer treatment. PLOS Biology 17, e3000228 (2019).

      (4) Tomasetti, C. et al. Cell division rates decrease with age, providing a potential explanation for the age-dependent deceleration in cancer incidence. Proceedings of the National Academy of Sciences 116, 20482–20488 (2019).

      (5) Cela, O. et al. Clock genes-dependent acetylation of complex I sets rhythmic activity of mitochondrial OxPhos. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1863, 596–606 (2016).

      (6) Scrima, R. et al. Mitochondrial calcium drives clock gene-dependent activation of pyruvate dehydrogenase and of oxidative phosphorylation. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1867, 118815 (2020).

      (7) Lesnefsky, E. J. & Hoppel, C. L. Oxidative phosphorylation and aging. Ageing Research Reviews 5, 402–433 (2006).

      (8) Greco, M. et al. Marked aging-related decline in efficiency of oxidative phosphorylation in human skin fibroblasts. The FASEB Journal 17, 1706–1708 (2003).

      (9) Federico, A. et al. Mitochondria, oxidative stress and neurodegeneration. Journal of the Neurological Sciences 322, 254–262 (2012).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      We thank Reviewer 1 for their helpful comments and hope that the changes made to the revised manuscript have addressed their points.

      This study presents a novel application of the inverted encoding (i.e., decoding) approach to detect the correlates of crossmodal integration in the human EEG (electrophysiological) signal. The method is successfully applied to data from a group of 41 participants, performing a spatial localization task on auditory, visual, and audiovisual events. The analyses clearly show a behavioural superiority for audio-visual localization. Like previous studies, the results when using traditional univariate ERP analyses were inconclusive, showing once more the need for alternative, more sophisticated approaches. Instead, the principal approach of this study, harnessing the multivariate nature of the signal, captured clear signs of super-additive responses, considered by many as the hallmark of multisensory integration. Unfortunately, the manuscript lacks many important details in the descriptions of the methodology and analytical pipeline. Although some of these details can eventually be retrieved from the scripts that accompany this paper, the main text should be self-contained and sufficient to gain a clear understanding of what was done. (A list of some of these is included in the comments to the authors). Nevertheless, I believe the main weakness of this work is that the positive results obtained and reported in the results section are conditioned upon eye movements. When artifacts due to eye movements are removed, then the outcomes are no longer significant. 

      Therefore, whether the authors finally achieved the aims and showed that this method of analysis is truly a reliable way to assess crossmodal integration, does not stand on firm ground. The worst-case scenario is that the results are entirely accounted for by patterns of eye movements in the different conditions. In the best-case scenario, the method might truly work, but further experiments (and/or analyses) would be required to confirm the claims in a conclusive fashion.

      One first step toward this goal would be, perhaps, to facilitate the understanding of results in context by reporting both the uncorrected and corrected analyses in the main results section. Second, one could try to support the argument given in the discussion, pointing out the origin of the super-additive effects in posterior electrode sites, by also modelling frontal electrode clusters and showing they aren't informative as to the effect of interest.

      We performed several additional analyses to address concerns that our main result was caused by different eye movement patterns between conditions. We re-ran our key analyses using activity exclusively from frontal electrodes, which revealed poorer decoding performance than that from posterior electrodes. If eye movements were driving the non-linear enhancement in the audiovisual condition, we would expect stronger decoding using sensors closer to the source, i.e., the extraocular muscles. We also computed the correlations between average eye position and stimulus position for each condition to evaluate whether participants made larger eye movements in the audiovisual condition, which might have contributed to better decoding results. Though we did find evidence for eye movements toward stimuli, the degree of movement did not significantly differ between conditions.

      Furthermore, we note that the analysis using a stricter eye movement criterion, acknowledged in the Discussion section of the original manuscript, resulted in very similar results to the original analysis. There was significantly better decoding in the AV condition (as measured by d') than the MLE prediction, but this difference did not survive cluster correction. The most likely explanation for this is that the strict eye movement criterion combined with our conservative measure of (mass-based) cluster correction led to reduced power to detect true differences between conditions. Taken together with the additional analyses described in the revised manuscript and supplementary materials, the results show that eye movements are unlikely to account for differences between the multisensory and unisensory conditions. Instead, our decoding results likely reflect nonlinear neural integration between audio and visual sensory information.

      “Any experimental design that varies stimulus location needs to consider the potential contribution of eye movements. We computed correlations between participants’ average eye position and each stimulus position between the three sensory conditions (auditory, visual and audiovisual; Figure S1) and found evidence that participants made eye movements toward stimuli. A re-analysis of the data with a very strict eye-movement criterion (i.e., removing trials with eye movements >1.875º) revealed that the super-additive enhancement in decoding accuracy no longer survived cluster correction, suggesting that our results may be impacted by the consistent motor activity of saccades towards presented stimuli. Further investigation, however, suggests this is unlikely. Though the correlations were significantly different from 0, they were not significantly different from each other. If consistent saccades to audiovisual stimuli were responsible for the nonlinear multisensory benefit we observed, we would expect to find a higher positive correlation between horizontal eye position and stimulus location in the audiovisual condition than in the auditory or visual conditions. Interestingly, eye movements corresponded more to stimulus location in the auditory and audiovisual conditions than in the visual condition, indicating that it was the presence of a sound, rather than a visual stimulus, that drove small eye movements. This could indicate that participants inadvertently moved their eyes when localising the origin of sounds. We also re-ran our analyses using the activity measured from the frontal electrodes alone (Figure S2). If the source of the nonlinear decoding accuracy in the audiovisual condition was due to muscular activity produced by eye movements, there should be better decoding accuracy from sensors closer to the source. Instead, we found that decoding accuracy of stimulus location from the frontal electrodes (peak d' = 0.08) was less than half that of decoding accuracy from the more posterior electrodes (peak d' = 0.18). These results suggest that the source of neural activity containing information about stimulus position was located over occipito-parietal areas, consistent with our topographical analyses (inset of Figure 3).” 

      The univariate ERP analyses an outdated contrast, AV <> A + V to capture multisensory integration. A number of authors have pointed out the potential problem of double baseline subtraction when using this contrast, and have recommended a number of solutions, experimental and analytical. See for example: [1] and [2]. 

      (1) Teder-Salejarvi, W. A., McDonald, J. J., Di Russo, F., & Hillyard, S. A. (2002). Cognitive Brain Research, 14, 106-114. 

      (2) Talsma, D., & Woldorff, M. G. (2005). Journal of cognitive neuroscience, 17(7), 1098-1114.

      We thank the reviewer for raising this point. Comparing ERPs across different sensory conditions requires careful analytic choices to discern genuine sensory interactions within the signal. The AV <> (A +V) contrast has often been used to detect multisensory integration, though any non-signal related activity (i.e. anticipatory waves; Taslma & Woldorff, 2005) or pre-processing manipulation (e.g. baseline subtraction; Teder-Sälejärvi et al., 2002) will be doubled in (A + V) but not in AV. Critically, we did not apply a baseline correction during preprocessing and thus our results are not at risk of double-baseline subtraction in (A + V). Additionally, we temporally jittered the presentation of our stimuli to mitigate the potential influence of consistent overlapping ERP waves (Talsma & Woldorff, 2005). 

      The results section should provide the neurometric curve/s used to extract the slopes of the sensitivity plot (Figure 2B). 

      We thank the reviewer for raising this point of clarification. The sensitivity plots for Figures 2B and 2C were extracted from the behavioural performance of the behavioural and EEG tasks, respectively. The sensitivity plot for Figure 2B was extracted from individual psychometric curves, whereas the d’ values for Figure 2C were calculated from the behavioural data for the EEG task. This information has been clarified in the manuscript.

      “Figure 1. Behavioural performance is improved for audiovisual stimuli. A) Average accuracy of responses across participants in the behavioural session at each stimulus location for each stimulus condition, fitted to a psychometric curve. Steeper curves indicate greater sensitivity in identifying stimulus location. B) Average sensitivity across participants in the behavioural task, estimated from psychometric curves, for each stimulus condition. The red cross indicates estimated performance assuming optimal (MLE) integration of unisensory cues. C) Average behavioural sensitivity across participants in the EEG session for each stimulus condition. Error bars indicate ±1 SEM.”

      The encoding model was fitted for each electrode individually; I wonder if important information contained as combinations of (individually non-significant) electrodes was then lost in this process and if the authors consider that this is relevant. 

      Although the encoding model was fitted for each electrode individually for the topographic maps (Figure 4B), in all other analyses the encoding model was fitted across a selection of electrodes (see final inset of Figure 3). As this electrode set was used for all other neural analyses, our model would allow for the detection of important information contained in the neural patterns across electrodes. This information has been clarified in the manuscript.

      “Thus, for all subsequent analyses we only included signals from the central-temporal, parietal-occipital, occipital and inion sensors for computing the inverse model (see final inset of Figure 2). As the model was fitted for multiple electrodes, subtle patterns of neural information contained within combinations of sensors could be detected.”

      Neurobehavioral correlations could benefit from outlier rejection and the use of robust correlation statistics. 

      We thank the reviewer for raising this issue. Note, however, that the correlations we report are resistant to the influence of outliers because we used Spearman’s rho1 (as opposed to Pearson’s). This information has been communicated in the manuscript.

      (1) Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069

      “Neurobehavioural correlations. As behavioural and neural data violated assumptions of normality, we calculated rank-order correlations (Spearman’s rho) between the average decoding sensitivity for each participant from 150-250 ms poststimulus onset and behavioural performance on the EEG task. As Spearman’s rho is resistant to outliers (Wilcox, 2016), we did not perform outlier rejection.”

      “Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069”

      Many details that are important for the reader to evaluate the evidence and to understand the methods and analyses aren't given; this is a non-exhaustive list:  

      We thank the reviewer for highlighting these missing details. We have updated the manuscript where necessary to ensure the methods and analyses are fully detailed and replicable.

      - specific parameters of the stimuli and performance levels. Just saying "similarly difficult" or "marginally higher volume" is not enough to understand exactly what was done.  

      “The perceived source location of auditory stimuli was manipulated via changes to interaural level and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992). The precise timing of when each speaker delivered an auditory stimulus was calculated from the following formula:

      where x and z are the horizontal and forward distances in metres between the ears and the source of the sound on the display, respectively, r is the head radius, and s is the speed of sound. We used a constant approximate head radius of 8 cm for all participants. r was added to x for the left speaker and subtracted for the right speaker to produce the interaural time difference. For ±15° source locations, interaural timing difference was 1.7 ms. To simulate the decrease in sound intensity as a function of distance, we calculated interaural level differences for the left and right speakers by dividing the sounds by the left and right distance vectors. Finally, we resampled the sound using linear interpolation based on the calculations of the interaural level and timing differences. This process was used to calculate the soundwaves played by the left and right speakers for each of the possible stimulus locations on the display. The maximum interaural level difference between speakers was 0.14 A for ±15° auditory locations, and 0.07 A for ±7.5°.”

      - where are stimulus parameters adjusted individually or as a group? Which method was followed?  

      To clarify, stimulus parameters (frequency, size, luminance, volume, location, etc.) were manipulated throughout pilot testing only. Parameters were adjusted to achieve similar pilot behavioural results between the auditory and visual conditions. For the experiment proper, parameters remained constant for both tasks and were the same for all participants.

      “During pilot testing, stimulus features (size, luminance, volume, frequency etc.) were manipulated to make visual and auditory stimuli similarly difficult to spatially localize. These values were held constant in the main experiment.”

      - specify which response buttons were used.

      “Participants were presented with two consecutive stimuli and tasked with indicating, via button press, whether the first (‘1’ number-pad key) or second (‘2’ number-pad key) interval contained the more leftward stimulus.”

      “At the end of each sequence, participants were tasked with indicating, via button press, whether more presentations appeared on the right (‘right’ arrow key) or the left (‘left’ arrow key) of the display.”

      - no information is given as to how many trials per condition remained on average, for analysis.  

      The average number of remaining trials per condition after eye-movement analysis is now included in the Methods section of the revised manuscript.

      “We removed trials with substantial eye movements (>3.75 away from fixation) from the analyses. After the removal of eye movements, on average 2365 (SD \= 56.94), 2346 (SD \= 152.87) and 2350 (SD \= 132.47) trials remained for auditory, visual and audiovisual conditions, respectively, from the original 2400 per condition.”

      - no information is given on the specifics of participant exclusion criteria. (even if the attrition rate was surprisingly high, for such an easy task).  

      The behavioural session also served as a screening task. Although the task instructions were straightforward, perceptual discrimination was not easy due to the ambiguity of the stimuli. Auditory localization is not very precise, and the visual stimuli were brief, dim, and diffuse. The behavioural results reflect the difficulty of the task. Attrition rate was high as participants who scored below 60% correct in any condition were deemed unable to accurately perform the task, were not invited to complete the subsequent EEG session, and omitted from the analyses. We have included the specific criteria in the manuscript.

      “Participants were first required to complete a behavioural session with above 60% accuracy in all conditions to qualify for the EEG session (see Behavioural session for details).”

      - EEG pre-processing: what filter was used? How was artifact rejection done? (no parameters are reported); How were bad channels interpolated?  

      We used a 0.25 Hz high-pass filter to remove baseline drifts, but no low-pass filter. In line with recent studies on the undesirable influence of EEG preprocessing on ERPs1, we opted to avoid channel interpolation and artifact rejection. This was erroneously reported in the manuscript and has now been clarified. For the sake of clarity, here we demonstrate that a reanalysis of data using channel interpolation and artifact rejection returned the same pattern of results. 

      (1) Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13, 2372. https://doi.org/10.1038/s41598-023-27528-0

      - specific electrode locations must be given or shown in a plot (just "primarily represented in posterior electrodes" is not sufficiently informative).  

      A diagram of the electrodes used in all analyses is included within Figure 3, and we have drawn readers’ attention to this in the revised manuscript.

      “Thus, for all subsequent analyses we only included signals from the central-temporal, parietal-occipital, occipital and inion sensors for computing the inverse model (see final inset of Figure 2).” 

      - ERP analysis: which channels were used? What is the specific cluster correction method?

      We used a conservative mass-based cluster correction from Pernet et al. (2015) - this information has been clarified in the manuscript.

      “A conservative mass-based cluster correction was applied to account for spurious differences across time (Pernet et al., 2015).” 

      “Pernet, C. R., Latinus, M., Nichols, T. E., & Rousselet, G. A. (2015). Cluster-based computational methods for mass univariate analyses of event-related brain potentials/fields: A simulation study. Journal of Neuroscience Methods, 250, 85-93. https://doi.org/https://doi.org/10.1016/j.jneumeth.2014.08.003” 

      - results: descriptive stats on performance must be given (instead of saying "participants performed well").  

      The mean and standard deviation of participants’ performance for each condition in the behavioural and EEG experiments are now explicitly mentioned in the manuscript.

      “A quantification of the behavioural sensitivity (i.e., steepness of the curves) revealed significantly higher sensitivity for the audiovisual stimuli (M = .04, SD = .02) than for the auditory stimuli alone (M = .03, SD = .01; Z = -3.09, p = .002), and than for the visual stimuli alone (M = .02, SD = .01; Z = -5.28, p = 1.288e-7; Figure 1B). Sensitivity for auditory stimuli was also significantly higher than sensitivity for visual stimuli (Z = 2.02, p = .044).” 

      “We found a similar pattern of results to those in the behavioural session; sensitivity for audiovisual stimuli (M = .85, SD = .33) was significantly higher than for auditory (M = .69, SD = .41; Z = -2.27, p = .023) and visual stimuli alone (M = .61, SD = .29; Z = -3.52, p = 4.345e-4), but not significantly different from the MLE prediction (Z = -1.07, p = .285).” 

      - sensitivity in the behavioural and EEG sessions is said to be different, but no comparison is given. It is not even the same stimulus set across the two tasks...  

      This relationship was noted as a potential explanation for the higher sensitivities obtained in the EEG task, and was not intended to stand up to statistical scrutiny. We agree it makes little sense to compare statistically between the EEG and behavioural results as they were obtained from different tasks. We would like to clarify, however, that the stimuli used in the two tasks were the same, with the exception that in the EEG task the stimuli were presented from 5 locations versus 8 in the behavioural task. To avoid potential confusion, we have removed the offending sentence from the manuscript:

      Reviewer 2:

      Their measure of neural responses is derived from the decoder responses, and this takes account of the reliability of the sensory representations - the d' statistics - which is an excellent thing. It also means if I understand their analysis correctly (it could bear clarifying - see below), that they can generate from it a prediction of the performance expected if an optimal decision is made combining the neural signals from the individual modalities. I believe this is the familiar root sum of squares d' calculation (or very similar). Their decoding of the audiovisual responses comfortably exceeds this prediction and forms part of the evidence for their claims. 

      Yet, superadditivity - including that in evidence in the principle of inverse effectiveness more typically quantifies the excess over the sum of proportions correct in each modality. Their MLE d' statistic can already predict this form of superadditivity. Therefore, the superadditivity they report here is not the same form of superadditivity that is usually referred to in behavioural studies. It is in fact a stiffer definition. What their analysis tests is that decoding performance exceeds what would be expected from an optimally weighted linear integration of the unisensory information. As this is not the common definition it is difficult to relate to behavioral superadditivity reported in much literature (of percentage correct). This distinction is not at all clear from the manuscript. 

      But the real puzzle is here: The behavioural data or this task do not exceed the optimal statistical decision predicted by signal detection theory (the MLE d'). Yet, the EEG data would suggest that the neural processing is exceeding it. So why, if the neural processing is there to yield better performance is it not reflected in the behaviour? I cannot explain this, but it strikes me that the behaviour and neural signals are for some reason not reflecting the same processing. 

      Be explicit and discuss this mismatch they observe between behaviour and neural responses. 

      Thank you, we agree that it is worth expanding on the observed disconnect between MSI in behaviour and neural signals. We have included an additional paragraph in the Discussion of the revised manuscript. Despite the mismatch, we believe the behavioural and neural responses still reflect the same underlying processing, but at different levels of sensitivity. The behavioural result likely reflects a coarse down-sampling of the precision in location representation, and thus less likely to reflect subtle MSI enhancements.

      “An interesting aspect of our results is the apparent mismatch between the behavioural and neural responses. While the behavioural results meet the optimal statistical threshold predicted by MLE, the decoding analyses suggest that the neural response exceeds it. Though non-linear neural responses and statistically optimal behavioural responses are reliable phenomena in multisensory integration (Alais & Burr, 2004; Ernst & Banks, 2002; Stanford & Stein, 2007), the question remains – if neural super-additivity exists to improve behavioural performance, why is it not reflected in behavioural responses? A possible explanation for this neurobehavioural discrepancy is the large difference in timing between sensory processing and behavioural responses. A motor response would typically occur some time after the neural response to a sensory stimulus (e.g., 70-200 ms), with subsequent neural processes between perception and action that introduce noise (Heekeren et al., 2008) and may obscure super-additive perceptual sensitivity. In the current experiment, participants reported either the distribution of 20 serially presented stimuli (EEG session) or compared the positions of two stimuli (behavioural session), whereas the decoder attempts to recover the location of every presented stimulus. While stimulus location could be represented with higher fidelity in multisensory relative to unisensory conditions, this would not necessarily result in better performance on a binary behavioural task in which multiple temporally separated stimuli are compared. One must also consider the inherent differences in how super-additivity is measured at the neural and behavioural levels. Neural super-additivity should manifest in responses to each individual stimulus. In contrast, behavioural super-additivity is often reported as proportion correct, which can only emerge between conditions after being averaged across multiple trials. The former is a biological phenomenon, while the latter is an analytical construct. In our experiment, we recorded neural responses for every presentation of a stimulus, but behavioural responses were only obtained after multiple stimulus presentations. Thus, the failure to find super-additivity in behavioural responses might be due to their operationalisation, with between-condition comparisons lacking sufficient sensitivity to detect super-additive sensory improvements. Future work should focus on experimental designs that can reveal super-additive responses in behaviour.”

      Re-work the introduction to explain more clearly the relationship between the behavioural superadditivities they review, the MLE model, and the superadditivity it actually tests. 

      We agree it is worth discussing how super-additivity is operationalised across neural and behavioural measures. However, we do not believe the behavioural studies we reviewed claimed super-additive behavioural enhancements. While MLE is often used as a behavioural marker of successful integration, it is not necessarily used as evidence for super-additivity within the behavioural response, as it relies on linear operations. 

      “It is important to consider the differences in how super-additivity is classified between neural and behavioural measures. At the level of single neurons, superadditivity is defined as a non-linear response enhancement, with the multisensory response exceeding the sum of the unisensory responses. In behaviour, meanwhile, it has been observed that the performance improvement from combining two senses is close to what is expected from optimal integration of information across the senses (Alais & Burr, 2004; Stanford & Stein, 2007). Critically, behavioural enhancement of this kind does not require non-linearity in the neural response, but can arise from a reliability-weighted average of sensory information. In short, behavioural performance that conforms to MLE is not necessarily indicative of neural super-additivity, and the MLE model can be considered a linear baseline for multisensory integration.”

      Regarding the auditory stimulus, this reviewer notes that interaural time differences are unlikely to survive free field presentation.

      Despite the free field presentation, in both the pilot test and the study proper participants were able to localize auditory stimuli significantly above chance. 

      "However, other studies have found super-additive enhancements to the amplitude of sensory event-related potentials (ERPs) for audiovisual stimuli (Molholm et al., 2002; Talsma et al., 2007), especially when considering the influence of stimulus intensity (Senkowski et al., 2011)." - this makes it obvious that there are some studies which show superadditivity. It would have been good to provide a little more depth here - as to what distinguished those studies that reported positive effects from those that did not.

      We have provided further detail on how super-additivity appears to manifest in neural measures.

      “In EEG, meanwhile, the evoked response to an audiovisual stimulus typically conforms to a sub-additive principle (Cappe et al., 2010; Fort et al., 2002; Giard & Peronnet, 1999; Murray et al., 2016; Puce et al., 2007; Stekelenburg & Vroomen, 2007; Teder- Sälejärvi et al., 2002; Vroomen & Stekelenburg, 2010). However, when the principle of inverse effectiveness is considered and relatively weak stimuli are presented together, there has been some evidence for super-additive responses (Senkowski et al., 2011).”

      “While behavioural outcomes for multisensory stimuli can be predicted by MLE, and single neuron responses follow the principles of inverse effectiveness and super- additivity, among others (Rideaux et al., 2021), how audiovisual super-additivity manifests within populations of neurons is comparatively unclear given the mixed findings from relevant fMRI and EEG studies. This uncertainty may be due to biophysical limitations of human neuroimaging techniques, but it may also be related to the analytic approaches used to study these recordings. For instance, superadditive responses to audiovisual stimuli in EEG studies are often reported from very small electrode clusters (Molholm et al., 2002; Senkowski et al., 2011; Talsma et al., 2007), suggesting that neural super-additivity in humans may be highly specific. However, information encoded by the brain can be represented as increased activity in some areas, accompanied by decreased activity in others, so simplifying complex neural responses to the average rise and fall of activity in specific sensors may obscure relevant multivariate patterns of activity evoked by a stimulus.”

      P9. "(25-75 W, 6 Ω)." This is not important, but it is a strange way to cite the power handling of a loudspeaker. 

      “The loudspeakers had a power handling capacity of 25-75 W and a nominal impedance of 6 Ω.” 

      I am struggling to understand the auditory stimulus: 

      "Auditory stimuli were 100 ms clicks". Is this a 100-ms long train of clicks? A single pulse which is 100ms long would not sound like a click, but two clicks once filtered by the loudspeaker. Perhaps they mean 100us. 

      "..with a flat 850 Hz tone embedded within a decay envelope". Does this mean the tone is gated - i.e. turns on and off slowly? Or is it constant?

      We thank the reviewer for catching this. ‘Click’ may not be the most apt way of defining the auditory stimulus. It was a 100 ms square wave tone with decay, i.e., with an onset at maximal volume before fading gradually. Given that the length of the stimulus was 100 ms, the decay occurs quickly and provides a more ‘click-like’ percept than a pure tone. We have provided a representation of the sound below for further clarification. This represents the amplitude from the L and R speakers for maximally-left and maximally-right stimuli. We have added this clarification in the revised manuscript. 

      Author response image 1.

      “Auditory stimuli were 100 ms, 850 Hz tones with a decay function (sample rate = 44, 100 Hz; volume = 60 dBA SPL, as measured at the ears).”

      P10. "Stimulus modality was either auditory, visual, or audiovisual. Trials were blocked with short (~2 min) breaks between conditions".

      Presumably the blocks were randomised across participants.

      Condition order was not randomised across participants, but counterbalanced. This has been clarified in the manuscript.

      “Stimulus modality was auditory, visual or audiovisual, presented in separate blocks with short breaks (~2 min) between conditions (see Figure 6A for an example trial). The order of conditions was counterbalanced across participants.” 

      P15. Feels like there is a step not described here: "The d' of the auditory and visual conditions can be used to estimate the predicted 'optimal' sensitivity of audiovisual signals as calculated through MLE." Do they mean sqrt[ (d'A)^2 + (d'V)^2] ? If it is so simple then it may as well be made explicit here. A quick calculation from eyeballing Figures 2B and 2C suggests this is the case.

      We thank the reviewer for raising this point of clarification. Yes, the ‘optimal’ audiovisual sensitivity was calculated as the hypotenuse of the auditory and visual sensitivities. This calculation has been made explicit in the revised manuscript.

      The d’ from the auditory and visual conditions can be used to estimate the predicted ‘optimal’ sensitivity to audiovisual signals as calculated through the following formula:

      "The perceived source location of auditory stimuli was manipulated via changes to interaural intensity and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992)." The stimuli were delivered by a pair of loudspeakers, and the incident sound at each ear would be a product of both speakers. And - if there were a time delay between the two speakers, then both ears could potentially receive separate pulses one after the other at different delays. Did they record this audio stimulus with manikin? If not, it would be very difficult to know what it was at the ears. I don't doubt that if they altered the relative volume of the loudspeakers then some directionality would be perceived but I cannot see how the interaural level and timing differences could be matched - as if the sound were from a single source. I doubt that this invalidates their results, but to present this as if it provided matched spatial and timing cues is wrong, and I cannot work out how they can attribute an azimuthal location to the sound. For replication purposes, it would be useful to know how far apart the loudspeakers were and what the timing and level differences actually were.

      The behavioural tasks each had evenly distributed ‘source locations’ on the horizontal azimuth of the computer display (8 for the behavioural session, 5 for the EEG session). We manipulated the perceived location of auditory stimuli through interaural time delays and interaural level differences. By first measuring the forward (z) and horizontal (x) distance of each source location to each ear, the method worked by calculating what the time-course of a sound wave should be at the location of the ear given the sound wave at the source. Then, for each source location, we can calculate the time delay between speakers given the vectors of x and z, the speed of sound and the width of the head.  As the intensity of sound drops inversely with the square of the distance, we can divide the sound wave by the distance for each source location to provide the interaural level difference. Though we did not record the auditory stimulus with a manikin, our behavioural analyses show that participants were able to detect the directions of auditory stimuli from our manipulations, even to a degree that significantly exceeded the localisation accuracy for visual stimuli (for the behavioural session task). This information has been clarified in the manuscript.

      “Auditory stimuli were played through two loudspeakers placed either side of the display (80 cm apart for the behavioural session, 58 cm apart for the EEG session).” 

      “The perceived source location of auditory stimuli was manipulated via changes to interaural level and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992). The precise timing of when each speaker delivered an auditory stimulus was calculated from the following formula:

      where x and z are the horizontal and forward distances in metres between the ears and the source of the sound on the display, respectively, r is the head radius, and s is the speed of sound. We used a constant approximate head radius of 8 cm for all participants. r was added to x for the left speaker and subtracted for the right speaker to produce the interaural time difference. For ±15° source locations, interaural timing difference was 1.7 ms. To simulate the decrease in sound intensity as a function of distance, we calculated interaural level differences for the left and right speakers by dividing the sounds by the left and right distance vectors. Finally, we resampled the sound using linear interpolation based on the calculations of the interaural level and timing differences. This process was used to calculate the soundwaves played by the left and right speakers for each of the possible stimulus locations on the display. The maximum interaural level difference between speakers was 0.14 A for ±15° auditory locations, and 0.07 A for ±7.5°.

      I am confused about this statement: "A quantification of the behavioural sensitivity (i.e., steepness of the curves) revealed significantly greater sensitivity for the audiovisual stimuli than for the auditory stimuli alone (Z = -3.09, p = .002)," It is not clear from the methods how they attributed sound source angle to the sounds. Conceivably they know the angle of the loudspeakers, and this would provide an outer bound on the perceived location of the sound for extreme interaural level differences (although free field interaural timing cues can create a wider sound field). 

      Our analysis of behavioural sensitivity was dependent on the set ‘source locations’ that were used to calculate the position of auditory and audiovisual stimuli.  In the behavioural task, participants judged the position of the target stimulus relative to a central stimulus. Thus, for each source location, we recorded how often participants correctly discriminated between presentations. The quoted analysis acknowledges that participants were more sensitive to audiovisual stimuli than auditory stimuli in the context of this task. A full explanation of how source location was implemented for auditory stimuli has been clarified in the manuscript. 

      It would be very nice to see some of the "channel" activity - to get a feel for the representation used by the decoder. 

      We have included responses for the five channels as a Supplemental Figure.

      Figure 6 appears to show that there is some agreement between behaviour and neural responses - for the audiovisual case alone. The positive correlation of behavioural and decoding sensitivity appears to be driven by one outlier - who could not perform the audiovisual task (and indeed presumably any of them). Furthermore, if we were simply Bonferonni correct for the three comparisons, this would become non-significant. It is also puzzling why the unisensory behaviour and EEG do not correlate - which seems to again suggest a poor correspondence between them. Opposite to the claim made.

      We understand the reviewer’s concern here. We would like to note, however, that each correlation used unique data sets – that is, the behavioural and neural data for each separate condition. In this case, we believe a Bonferroni correction for multiple comparisons is too conservative, as no data set was compared more than once. Neither the behavioural nor the neural data were normally distributed, and both contained outliers. Rather than reduce power through outlier rejection, we opted to test correlations using Spearman’s rho, which is resistant to outliers1. It is also worth noting that, without outlier rejection, the audiovisual correlation (p \= .003) would survive a Bonferroni correction for 3 comparisons. The nonsignificant correlation in the auditory and visual conditions might be due to the weaker responses elicited by unisensory stimuli, with the reduced signal-to-noise ratio obscuring potential correlations. Audiovisual stimuli elicited more precise responses both behaviourally and neurally, increasing the power to detect a correlation. 

      (1) Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069

      “We also found a significant positive correlation between participants’ behavioural judgements in the EEG session and decoding sensitivity for audiovisual stimuli. This result suggests that participants who were better at identifying stimulus location also had more reliably distinct patterns of neural activity. The lack of neurobehavioural correlation in the unisensory conditions might suggest a poor correspondence between the different tasks, perhaps indicative of the differences between behavioural and neural measures explained previously. However, multisensory stimuli have consistently been found to elicit stronger neural responses than unisensory stimuli (Meredith & Stein, 1983; Puce et al., 2007; Senkowski et al., 2011; Vroomen & Stekelenburg, 2010), which has been associated with behavioural performance (Frens & Van Opstal, 1998; Wang et al., 2008). Thus, the weaker signalto-noise ratio in unisensory conditions may prevent correlations from being detected.”

      Further changes:

      (1)   To improve clarity, we shifted the Methods section to after the Discussion. This change included updating the figure numbers to match the new order (Figure 1 becomes Figure 6, Figure 2 becomes Figure 1, and so on).

      (2)   We also resolved an error on Figure 2 (previously Figure 3). The final graph (Difference between AV and A + V) displayed incorrect values on the Y axis.

      This has now been remedied.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study of extrachromosomal DNA (ecDNA) aims to identify genes that distinguish ecDNA+ and ecDNA- tumors. This timely study is important in addressing the genes responding to the amplification of the ecDNA. The data presented are for the most part solid, there were concerns regarding the clarity in the description of the analysis methods and whether the evidence for specific genes required to maintain the ecDNA+ state was entirely conclusive.

      Public Reviews:

      Reviewer #1 (Public Review):

      Recently discovered extrachromosomal DNA (ecDNA) provides an alternative non-chromosomal means for oncogene amplification and a potent substrate for selective evolution of tumors. The current work aims to identify key genes whose expression distinguishes ecDNA+ and ecDNA- tumors and the associated processes to shed light on the biological mechanisms underlying ecDNA genesis and their oncogenic effects. While this is clearly an important question, the analysis and the evidence supporting the claims are weak. The specific machine learning approach seems unnecessarily convoluted, insufficiently justified and explained, and the language used by the authors conflates correlation with causality. This work points to specific GO processes associated (up and down) with ecDNA+ tumors, many of which are expected but some seem intriguing, such as association with DSB pathways. My specific comments are listed below.

      Response. As some of the specific questions below address similar concerns, we have answered them briefly here. As a high level point, the reviewer is correct in that other statistical or ML approaches could potentially have been used, and that some are simpler. However, the test used here directly addresses the question: Find a collection of genes whose expression value is predictive of ecDNA status in the sample. Because the underlying method in the Boruta analysis uses random forests, it can test predictive power without relying on a linearity assumption implicit in other methods. In this revision, we also compare against a Generalized Linear Model and show that it is less suited to the specific task above. We also address the reviewer concerns about specific parameter choices by showing robustness to the specific parameter.

      (A) The claim of identifying genes required to 'maintain' ecDNA+ status is not justified - predictive features are not necessarily causal.

      Response. We agree with the reviewer that predictive features are correlative and not causal. In the manuscript, we identify genes whose expression (when used as a feature) is predictive of ecDNA presence or absence. Such predictive genes are consistently over-expressed or consistently under-expressed in ecDNA(+) samples relative to ecDNA(-) samples even though they are not required to be on ecDNA. To our knowledge, we did not claim that these genes are causal for ecDNA formation or maintenance, only that such genes and the underlying biological processes are worth investigating. In the beginning of the manuscript, we had written the following paragraph, but we have removed the last line (struck out here):

      “In lieu of identifying genes that are highly differentially expressed between ecDNA(+) and ecDNA(-) samples but driven by a small subset of cases (e.g. gene A in Fig. S1a), we sought to identify genes (e.g. gene B) whose expression level was predictive of ecDNA presence. We assumed that genes that were persistently over-expressed or under-expressed in ecDNA(+) samples relative to ecDNA(-) samples were more likely to be involved in ecDNA biogenesis or maintenance, or in mediating the cellular response to the presence of ecDNA.”

      We revised the manuscript to make sure that there are no claims that refer to causality. We revisited all phrases where the words like “maintain” were used and added appropriate disclaimers, or replaced them by the phrase, “ecDNA presence.” The remaining statements say, for example, “These results are consistent with a pan-cancer role of CorEx genes in ecDNA biogenesis and maintenance,” and do not claim causality.

      (B) The methods and procedures to identify the key genes is hyper-parameterized and convoluted and casts doubt on the robustness of the findings given the size and heterogeneity of the data.

      (a) In the first two paragraphs of Boruta Analysis Methods section, authors describe an iterative procedure where in each iteration, a binomial p-value is computed for each gene based on number of iterations thus far in which the gene was selected (higher GINI index than max of shadow features). But then in the third paragraph they simply perform Random Forest in 200 random 80% of samples and pick a gene if it is selected in at least 10/200. It is ultimately not clear what was done. Why 10/200? Also "the probability that a gene is a "hit" or "non-hit" in each iteration is 0.5" is unclear. That probability is of a gene achieving GINI index higher than the max of shadow features. How can it be 0.5?

      Response. We believe that there is some misunderstanding about the algorithm, and we agree that the description should have been more clear. We have greatly simplified the description in the manuscript. However, we want to provide some higher-level explanation here. Boruta is a standard feature extraction algorithm (Kursa, Journal of Statistical Software September 2010, Volume 36, Issue 11), and we used a Python implementation of the method. Given a gene expression data-set with class labels on samples, Boruta extracts features (genes) that best predict the class labels using a Random Forest Classifier, as long as the features are more predictive than permuted features added in each iteration. As we are using an implementation of a published method, we have removed non-essential details, referring directly to the publication. Nevertheless, to address the reviewer’s specific critique, the number of false-features added changes in each iteration (it equals the number of accepted+uncommitted features). Therefore, the choice of 0.5 by Boruta (it is fixed in the published method and not a user-specified parameter) is a conservative approach. If a gene was no better than a randomly chosen feature, its predictive performance would exceed the most predictive randomly chosen feature by at most 0.5 (but could be lower, making the choice of 0.5 conservative).

      While Boruta iteratively picks genes that are significantly better than random features, the list of genes predicted might be specific to the data-set, and might change with different data-sets. Therefore, we employed a bootstrapping strategy: we performed 200 trials each time picking 80% of the ecDNA(+) samples and 80% of the ecDNA(-) samples at random, thus generating many data-sets while maintaining class imbalance. For each of the 200 trials, we performed a Boruta analysis. Finally, we picked a gene if it was selected as a Boruta feature in at least 10 of 200 trials.

      The reviewer has a reasonable critique about why 10 (of 200) specifically, and why not fewer or more. Most genes are weak predictors by themselves. For example, RAE1, which is the top ranked gene, picked in all 200 Boruta trials, can only predict ecDNA status with poor recall for any meaningful precision.

      Author response image 1.

      Given the weakness of an individual gene as a classifier, its repeated selection in multiple Boruta trials is already a significant event. By requiring a gene to be picked in 5% of the trials (10/200), we were selecting a small, but more robust list of genes. However, to further explore the reviewer’s concerns, we also applied 8 other selection criteria ranging from 5 (of 200 Boruta trials) to 200 of 200 Boruta trials. See Figure below. The number of CorEx genes expectedly decreases. However, of the 187 GO terms that were enriched by 262 UP-genes using 10 of 200 Boruta trials as the selection criteria, 93 terms (49.7%) were enriched for each cut-off (see Author response image 2), and 155 terms (82.9%) were enriched in at least 5 of the 8 cut-off criteria. Given that the remaining analysis works on the hierarchy of GO terms and finds 4 GO-categories (Mitotic Cell Cycle, G1/S, G2/M; cell-division; DSB DNA Damage response; and the HOX Gene cluster) enriched by UP-regulated genes, those conclusions would hold regardless of the specific cut-off.

      Author response image 2.

      The number of GO terms that were enriched by DOWN-regulated genes is smaller, only 73, and falls rapidly for higher cut-offs, with 25 at a cut-off of 15. Therefore we see fewer terms enriched for more stringent cut-offs. However, they all support immune processes. These results do suggest that there are fewer genes that are consistently down-regulated in ecDNA(+) cancers, and expression change in a small number of genes may be sufficient to promote conditions for ecDNA.

      Finally, we note that in the final section we discuss the 65 most highly ranked genes with a harmonic mean rank <= 3. These 65 CorEx genes (or a member of their cluster) appear in each of 200 Boruta trials. Thus, their choice is also not dependent on the cut-off of 10 in 200. In summary, the conclusions of the paper do not depend upon the specific cut-off of 10 in 200 trials.

      We have added the figure as a supplemental figure and have added the following text to the manuscript on pages 17 and 18.

      “Any CorEx gene is either a Core gene that was selected as a feature in at least 5% of 200 Boruta trials, or be highly co-expressed with a Core gene. Because the selection criterion of 5% is arbitrary, we also tested robustness with 8 other cut-offs ranging from 5-of-200 to 200-of-200 Boruta trials. The number of CorEx genes expectedly decreases with more stringent cut-offs. However, of the 187 GO terms that were enriched by 262 CorEx UP-genes using 10 of 200 Boruta trials as the selection criteria, 93 terms (49.7%) were enriched for each cut-off (Fig. S9), and 155 terms (82.9%) were enriched in at least 5 of the 8 cut-offs. Given that our subsequent analyses utilized the hierarchy of GO terms and identified 4 GO-categories enriched by UP-regulated genes, the conclusions would hold regardless of the specific cut-off.”

      (b) The approach of combining genes with clusters is arbitrary. Why not start with clusters and evaluate each cluster (using some gene set summary score) for their ability to discriminate? Ultimately, one needs additional information to disambiguate correlated genes (i.e. in a coexpression cluster) in terms of causality.

      Response. In general, the approach proposed by the reviewer is reasonable. However, we did consider that possibility and found that our approach was easier to implement. For example, if we clustered first, we would have the challenge of choosing the correct set of clusters. Also, the Boruta analysis would become very difficult while dealing with clusters (e.g., how to define falsefeatures?). We tested other methods of picking genes that were suggested by other reviewers such as generalized linear models. They turned out not to be as predictive of ecDNA status, as described later in the response. Finally, we performed many experiments to ensure the validity of the clustering. Specifically, we had the following text in the paper:

      “Notably, among the 354 clusters, only 2 clusters (with 14 total genes) did not contain any Core genes. As most genes do not have completely identical expression patterns, we would expect one gene to be consistently picked as a Boruta gene over another co-expressed gene. Consistent with this hypothesis, most (344/354) clusters contained only 1 or 2 Core genes (Fig. 1c). When selecting clusters that contained at least 1 Core and 1 co-expressed gene, 53 of 71 clusters contained 1 to 3 Core genes (Fig. S1b), confirming that a few genes per co-expressed cluster provide sufficient predictive value, but other co-expressed genes might still play an important functional role in maintaining ecDNA(+) status.”

      These experiments suggest that the genes found by extending the Core genes through clustering do not radically change the Core genes, but only enhance the set.

      (c) The cross-validation procedure is not clear at all. There is a mention of 80-20 split but exactly how/if the evaluation is done on the 20% is muddled. The way precision-recall procedure is also a bit convoluted - why not simply use the area under the PR curve?

      Response. We apologize if the method was unclear. We have rewritten the methods part to make things clearer. As a high level point, there are two places where we use the same 80-20 split, and that resulted in some confusion. We start by randomly picking 80% of the ecDNA(+) and 80% of ecDNA(-) samples to create an 80-20 split of all samples. This procedure is repeated to generate 200 80-20 split data-sets. These data-sets are hereafter called 200 training and test samples.

      In the first usage, we use only the ‘training’ part of the 200 samples. We apply Boruta to each training set, and this helps us select the Core genes, which are then expanded to form the CorEx set. At this point, the CorEx genes are frozen for analysis in the rest of the paper. One question that we subsequently answer is what is the predictive power of the CorEx genes in determining if the sample is ecDNA(+) or ecDNA(-)? We also compare the predictive performance of CorEx genes relative to (a) Core genes, (b) LFC genes, and (c) random genes. In the revised manuscript, we have added another list of 3,012 genes selected using a single gene generalized linear model (GLM) for feature prediction. To make these comparisons, we utilized the same 200 training and test data-sets as before. In each test, we trained a random forest classifier on the training set and predicted on the ‘test’ set, for each of the 5 gene lists. This provided a uniform and fair method for testing which of the 5 gene lists was the better predictor of ecDNA status.

      The precision recall values are plotted in Fig. 2b (also included below). We note that none of the gene lists was a great predictor of ecDNA status of a sample. However, the CorEx and Core genes were significantly more predictive than GLM, LFC, and random genes. The predictive power of GLM genes was very similar to LFC, and better than random.

      For each of these 200 tests, we obtained a separate area under the precision-recall curve number for each of the gene-sets. To address the reviewer’s comments regarding a single number, we reported the average of the AUPRC for each of the gene-sets in the revision. The mean AUPRC values were added to the manuscript and are described here as well: Core_408_genes: 0.495 CorEx_643_genes: 0.48 Random_643_genes: 0.36 top_lfc_643_genes: 0.429 GLM_R_3012_genes: 0.426

      We also changed Figure 2b to show box-plots showing distribution of recall values for specific precision windows instead of maximum recall. For ease of checking, the figure is reproduced below.

      Author response image 3.

      (d) The claim is that Boruta genes are different from differentially expressed genes but the differential expression seems to be estimated without regards to cancer type, which would certainly be highly biased and misleading. Why not do a simple regression of gene expression by ecDNA status, cancer type and select the genes that show significant coefficient for ecDNA status?

      Response. As requested by the reviewer, and in the more detailed questions below, we added an alternative model with a generalized linear model (GLM) analysis that controlled for tumor subtype. The method itself is described in the Methods section and pasted below. The GLM genes were tested along with the LFC, CorEx, Core genes as described in response to the previous question, and those results are now presented in Figure 2b and on pages 6 and 7 of the revised manuscript.

      “We tested each of 16,309 genes independently in a separate logistic regression model using the glm() function in the R stats package (v4.2.0), and retained genes that were significant (p-value 0.01). Specifically, the model was defined as glm(𝑦 ~ 𝑔𝑗 + 𝑡𝑡, data = 𝑀, family = binomial(link = 'logit')), where y is the response vector where 𝑦𝑖=1 if sample 𝑖 ∈ {1, . . . ,870} is ecDNA(+) and 𝑦𝑖 =0 otherwise, 𝑔𝑗 is the vector of expression values for gene j ∈ {1, . . . ,16309} in samples 𝑖 ∈ {1,. . . ,870}, t is the covariate vector representing the tumor subtypes of samples 𝑖 ∈ {1, . . . ,870}, and 𝑀 is the data matrix containing values of gene expression, tumor subtype, and ecDNA status for all samples. The equation for the binomial logistic regression described above 𝑝𝑝 is formulated as where p is the probability that the dependent variable y is 1, 𝑋 are the independent variables, and 𝛽 are the coefficients of the model. In this case, k=1 represents independent variable gene j and k=2 represents the tumor subtype covariate t. Of the 16,309 genes tested independently, 3,012 genes were significant at pvalue<0.01.”

      (C) After identifying key features (which the authors inappropriate imply to be causal) they perform a series of enrichment/correlative analysis.

      Response. We have reviewed the document to ensure that we did not use the word ‘causal.’ If the reviewer can point to specific text, we are happy to change the phrasing.

      (a) It is known that ecDNA status associates with poor survival, and so are cell cycle related signal. Then the association between Boruta genes and those processes is entirely expected. Is it not? The same goes for downregulation of immune processes.

      Response. We agree with the reviewer that cell cycle related signals and immune related signals are associated with low survival, and so does ecDNA. However, many cellular processes could be associated with low survival (including for example, metabolic processes, protein and DNA biosynthesis, etc.). The unexpected part is that there appear to be only 4 major processes that are upregulated in ecDNA(+) cancers relative to ecDNA(-) cancers, and only one (immune response) that is downregulated.

      (b) The association with DSB specifically is interesting. Further analysis or discussion of why this should be would strengthen the work.

      Response. We thank the reviewer for their comment, and agree with their perspective. Note that we devoted a fair amount of text to analysis of DSB pathways. Specifically, we parsed the 4 main pathways in Figure 3b, and found our data to suggest that many genes in the classical nonhomologous end joining repair pathway are down-regulated in ecDNA(+) samples relative to ecDNA(-) samples. In contrast, Alternative end-joining and homology directed repair pathways are upregulated. This is a surprising result because c-NHEJ is considered to be an important mechanism of DSB repair. We have some lines in the discussion that address this:

      “The DNA damage genes are broadly up-regulated in ecDNA(+) samples, especially in double-strand break repair. Within this broad category of mechanisms, our analysis suggests that alternative DSB repair pathways such as Alt-EJ are preferred relative to classical NHEJ. This is consistent with previous observations of small microhomologies at breakpoint junctions, and has important implications in therapeutic selection that will need to be validated in future experimental studies. We note, however, the microhomology analyses typically study breakpoint junctions, and might ignore double-strand breaks in non-junctional sequences which could be observed, for example at replication-transcription junctions.”

      We note that additional experimental work to corroborate these findings is significant effort and will be part of ongoing research in our collaborators’ laboratories.

      (c) On page 15, second paragraph, when providing the up versus down CorEx genes, please also provide up versus down for non-CorEx genes as well to get a sense of magnitude.

      Response. We thank the reviewer for the comment. We note that Supplementary Table S15 has the complete contingency tables as well as the Fisher Exact Test statistic for all categories. For the specific categories mentioned in the paper, the chi-square tables are reproduced below. As we are citing TableS15 (containing all numbers and the statistic p-value) in the main text, we thought it was better to leave the text as it was.

      Category: Inflammation (p-value: 0.005)

      CorEx: 18 (UP), 76 (DOWN)

      Non-CorEx: 325 (UP), 657 (DOWN)

      Category: Leukocyte migration and chemotaxis (p-value: 0.03)

      CorEx: 13 (UP), 49 (DOWN)

      Non-CorEx: 213 (UP), 410 (DOWN)

      Category: Lymphocyte activation (p-value: 0.0075)

      CorEx: 23 (UP), 75 (DOWN)

      Non-CorEx: 334 (UP), 560 (DOWN)

      Category: Cytokine production (p-value: 0.117)

      CorEx: 6 (UP), 28 (DOWN)

      Non-CorEx: 93 (UP), 208 (DOWN)

      (d) The finding that Boruta genes are associated with high mutation burden is intriguing because in general mutation burden is associated with better survival and immunotherapy response. This counter-intuitive result should be scrutinized more to strengthen the work.

      Response. We agree with the reviewer that it is an intriguing observation. However, we are cautious in our interpretation. This is for the following reasons (all mentioned in the text):

      (1) The total mutation burden was significantly higher in ecDNA(+) samples relative to ecDNA(-) samples (Fig. 5a). However, when controlling for cancer type, only glioblastoma, low-grade gliomas, and uterine corpus endometrial carcinoma continued to show differential total mutational burden (Fig. S7b).

      (2) We tested if specific genes were differentially mutated between the two classes (Fig. 5b). For deleterious/high-impact mutations, TP53 was the only gene whose mutational patterns were significantly higher in ecDNA(+) compared to ecDNA(-) (OR 2.67, Bonferroni adjusted p-value 4.22e-07). BRAF mutations, however, were more common in ecDNA(-) samples and were significant to an adjusted p-value < 0.1 (OR 0.27).

      (3) In response to another reviewer’s comment, we also tested correlation with variant allele frequencies, and did not find any significant correlation except for TP53. We decided not to include that result in the paper.

      These tissue specific cases might be confounding the main observation, but we have placed all of them together so that the reader can gain a better understanding. It is worth noting that the correlation between high TMB and immunotherapy response is also now controversial, and perhaps not true for all cancer types. See for example (https://www.annalsofoncology.org/article/S0923-7534(21)00123-X/fulltext), which suggests that this relationship is not true for Glioma, and in Glioma (which is ecDNA enriched), higher TMB is associated with worse immunotherapy response. Our results are consistent with that finding. We have modified the discussion paragraph to better reflect this.

      “Mutation data alone does not provide as clear a picture of the genes involved in ecDNA maintenance. We did observe that the total mutation burden (TMB) was higher in ecDNA(+) samples. However, that relationship is much less clear after controlling for cancer type. High TMB has been positively correlated with sensitivity to immunotherapy52, and better patient outcomes; however, the gene expression patterns suggest that immunomodulatory genes are downregulated in ecDNA(+) samples, and patients with ecDNA(+) tumors have worse outcomes2. Notably, other results have suggested that the correlation between TMB and response to immunotherapy is not uniform, and it can vary across different tumor subtypes53. Specifically, our data is consistent with previous results which showed that Gliomas with high TMB have worse response to immunotherapy relative to gliomas with low TMB53. In general, no collection of gene mutations was predictive of ecDNA status, although mutations in TP53 were more likely in ecDNA(+) samples, and perhaps are an important driver for ecDNA formation5.”

      (e) On page 17 "12 of the 47 genes not specifically enriching any known GO biological Process" is confusing. How can individual gene enrich for a GO process?

      Response. We agree that the statement was incorrectly phrased. We have changed it to state that “Only 12 of the 47 genes were not included in the gene sets of any enriched GO term.”

      Reviewer #2 (Public Review):

      In their manuscript entitled "Transcriptional immune suppression and upregulation of double stranded DNA damage and repair repertoires in ecDNA-containing tumors" Lin et al. describe an important study on the transcriptional programs associated with the presence of extrachromosomal DNA in a cohort of 870 cancers of different origin. The authors find that compared to cancers lacking such amplifications, ecDNA+ cancers express higher levels of DNA damage repair-associated genes, but lower levels of immune-related gene programs.

      This work is very timely and its findings have the potential to be very impactful, as the transcriptional context differences between ecDNA+ and ecDNA- cancers are currently largely unknown. The observation that immune programs are downregulated in ecDNA+ cancers may initiate new preclinical and translational studies that impact the way ecDNA+ cancers are treated in the future. Thus, this study has important theoretical implications that have the potential to substantially advance our understanding of ecDNA+ cancers.

      Strengths

      The authors provide compelling evidence for their conclusions based on large patient datasets. The methods they used and analyses are rigorous.

      Weaknesses

      The biological interpretation of the data remains observational. The direct implication of these genes in ecDNA(+) tumors is not tested experimentally.

      Response. We agree with the reviewer that experimental tests would be ideal. Towards that, there are some challenges. The immune system genes cannot be tested in cell line models as they need a tumor microenvironment. Tests of DSB repair mechanisms and cell cycle control can be performed in cell-lines, but not with the TCGA samples which are not available. Some of our collaborators are actively working on these topics, but that extensive experimental work is beyond the scope of this paper.

      Reviewer #3 (Public Review):

      Summary:

      Using a combination of approaches, including automated feature selection and hierarchical clustering, the author identified a set of genes persistently associated with extrachromosomal DNA (ecDNA) presence across cancer types. The authors further validated the gene set identified using gene ontology enrichment analysis and identified that upregulated genes in extrachromosomal DNA-containing tumors are enriched in biological processes like DNA damage and cell proliferation, whereas downregulated genes are enriched in immune response processes.

      Major comments:

      (1) The authors presented a solid comparative analysis of ecDNA-containing and ecDNA-free tumors. An established automated feature selection approach, Boruta, was used to select differentially expressed genes (DEG) in ecDNA(+) and ecDNA(-) TCGA tumor samples, and the iterative selection process and two-tier multiple hypothesis testing ensured the selection of reliable DEGs. The author showed that the DEG selected using Boruta has stronger predictive power than genes with top log-fold changes.

      (2) The author performed a thorough interpretation of the findings with GO enrichment analysis of biological processes enriched in the identified DEG set, and presented interesting findings, including the enrichment in DNA damage process among the genes upregulated in ecDNA(+) tumors.

      (3) Overall, the authors achieved their aims with solid data mining and analysis approaches applied to public data tumor data sets.

      (4) While it may not be the scope of this study, it will be interesting to at least have some justification for choosing Boruta over other feature selection methods, such as Recursive Feature Elimination (RFE) and backward stepwise selection.

      Response. We actually agree with the reviewer that some other feature selection methods could work just as well, and note that the Boruta analysis is not our creation, but a published feature selection method (Kursa, Journal of Statistical Software September 2010, Volume 36, Issue 11). We use Boruta to identify relevant genes, but the bulk of the paper is to understand the biological processes driven by that gene selection. Even if we had chosen another method that performed slightly better, it likely would not change the main conclusions. However, to address the reviewers concerns on over-reliance on one method, we added a different gene list created by a generalized linear model analysis, with the goal of checking if the expression of a gene could predict the ecDNA status of the sample after controlling for tumor subtype. Thus, we tested 5 different genelists in terms of their power in predicting ecDNA. While none of the lists is a great predictor of ecDNA status, the Core and CorEx gene lists are significantly better than the other lists. The Figure below replaces the previous Figure panels 2b and 2c.

      Author response image 4.

      (1) The authors showed that DESEQ-selected DEGs with top log-fold changes have less strong predictive power and speculated that this may be due to the fact that genes with top log-fold changes (LFC) are confined only to a small subset of samples. It will be interesting to select DEGs with top log-fold changes after first partitioning the tumor samples. For example, randomly partition the tumor samples, identify the DEGs with top LFC, combine the DEGs identified from each partition, then evaluate the predictive power of these DEGs against the Boruta-selected DEGs.

      Response. This is a great comment. We added a generalized linear model test for selecting genes whose expression is predictive of ecDNA status. The GLM list described above uses a standard methodology (Analysis of Variance) controls for tumor type as a covariate, and its predictive performance is only slightly better than the Top-|LFC| genes, while improving over a random gene set.

      (2) While the authors showed that the presence of mutations was not able to classify ecDNA(+) and (-) tumor samples, it will be interesting to see if variant allele frequencies of the genes containing these mutations have predictive power.

      Response. This is a great suggestion. To address the reviewer’s question, we used allelic counts (REFs and ALTs) information from the MC3 variant callset, and calculated allele frequencies of all variants from samples where ecDNA status was available. Next, we conducted a Wilcoxon rank-sum test between VAFs of the ecDNA(+) group and VAFs of the ecDNA(-) group for every mutated gene. We found 1,073 genes with p<0.05, but among them, only TP53 passed the multiple testing correction (padj<0.05, Benjamini-Hochberg). As the results are identical to the tests based solely on presence of mutations, we decided not to include this data.

      Reviewer #1 (Recommendations For The Authors):

      (A) The presentation should be substantially streamlined.

      (B) Preferably use a more intuitive simpler ML approach with fewer parameters to make it more credible. Because there are relatively few samples across numerous cancer types with greater variability in representation, a simpler procedure with transparent controls will be more convincing.

      Response. We accept the reviewer’s criticism in that other statistical or ML approaches could potentially have been used, and that some are simpler. However, the test used here directly addresses the question: Find a collection of genes whose expression value is predictive of ecDNA status in the sample. Because the underlying method in the Boruta analysis uses random forests, it can test predictive power without relying on a linearity assumption implicit in other methods. In this revision, we also compare against a Generalized Linear Model (regression analysis) and show that it is less suited to the specific task above. We address the reviewer concerns about specific parameter choices by showing robustness to the specific parameter. All details are provided in the initial questions, and in the revised manuscript.

      (C) Avoid using any term implying causality unless you can bring in direct experimental evidence (e.g. mutagenesis experiment followed by ecDNA measurement. Some places you use the word 'maintain ecDNA' and other places 'ecDNA impact'. But these are all associations. How can you distinguish causal genes from downstream effects without additional data?

      Response. We note that the word causal does not appear anywhere in the manuscript, and was not intended. Additionally we have revised the manuscript and are open to specific changes requested by the reviewer or the editors.

      (D) Along these lines, if Boruta genes are indeed causal, one would expect Boruta-Up genes to be amplified more than expected in the ecDNA+; converse for Boruta-down genes.

      Response. We did not understand the reviewer’s question. By “amplified,” if the reviewer means “amplification of transcript level,” then that is exactly what the Boruta analysis is showing. Specifically, for each gene, we have the ability to pick a transcript level cut-off ‘t’ so that samples in which the expression is higher than t are more likely to be ecDNA(+). However, we are not claiming that there is causality, just that the transcript level is (weakly) predictive of the ecDNA status of the sample.

      (E) A strawman control should be a simple regression-based gene identification that controls for ecDNA status and cancer type.

      Response. We agree that this was a very good suggestion. In the revision, we have applied a GLM, which controls for tumor type. Thus, we have 5 gene-lists (including the Core and CorEx genes). As described in the revised manuscript but also in response to the main comments above, none of the lists are a great predictor. However, the CorEx and Core genes are significantly better at predicting ecDNA status of a sample.

      Reviewer #2 (Recommendations For The Authors):

      Comments

      (1) The analysis hinges on a classification of tumors into ecDNA(+) and ecDNA(-) using AmpliconClassifier. It would be good to know how robust the outcomes are with respect to the performance of AmpliconClassifier - how many false positives and negatives will AmpliconClassifier generate on this dataset and how would this influence the CorEx genes?

      Response. This is a very reasonable request. AA has been extensively tested on established cell-lines for its ability in predicting ecDNA status, and this information is published in multiple venues, including Kim, Nature genetics 2020, and shows precision 85% for recall 83%. For completeness, we have reproduced the relevant plot from that paper here, and the relevant text here, but are not including it in the manuscript.

      “To evaluate the accuracy of the AmpliconArchitect predictions, we analyzed whole-genome sequencing data from a panel of 44 cancer cell lines, and examined tumor cells in metaphase. We used 35 unique fluorescence in-situ hybridization (FISH) probes in combination with matched centromeric probes (81 distinct “cell-line, probe” combinations) to determine the intranuclear location of amplicons (Supplementary Table 2). Following automated analysis >1,600 images, we observed that 85% of amplicons characterized as ‘Circular’ by whole genome sequencing profile demonstrated an extrachromosomal fluorescent signal, representing the positive predictive value. Of the amplicons corresponding to extrachromosomally located FISH probes, 83% were classified as Circular, representing the sensitivity (Extended Data Fig. 1A).”

      Author response image 5.

      (2) It is unclear why genes are labeled Boruta genes when they are present in 10 out of 200 runs, this seems like an unexpectedly low number. How did the authors arrive at this number? Do the authors have any ground truth to estimate how well Boruta works in this setting and implementation?

      Response. This is a great question and asked by another reviewer as well. Given the weakness of an individual gene as a classifier, its repeated selection in multiple Boruta trials is already a significant event. By requiring a gene to be picked in 5% of the trials (10/200), we were selecting a small, but more robust list of genes. However, to further explore the reviewer’s concerns, we also applied 8 other selection criteria ranging from 5 (of 200 Boruta trials) to 200 of 200 Boruta trials. See Figure below. The number of CorEx genes expectedly decreases with increasing stringency. However, of the 187 GO terms that were enriched by UP-genes, 93 terms (50%) were enriched regardless of the cut-off (see Figure below), and 153 terms (82%) were enriched in at least 5 of the 8 cut-offs. Given that the remaining analysis works on the hierarchy of GO terms and finds 4 GO-categories (Mitotic Cell Cycle, G1/S, G2/M; cell-division; DSB DNA Damage response; and the HOX Gene cluster) enriched by UP-regulated genes, those conclusions would hold regardless of the specific cut-off.

      Author response image 6.

      The number of GO terms that were enriched by DOWN-regulated genes is smaller, only 73, and falls rapidly for higher cut-offs, with 25 at a cut-off of 15. Therefore we see fewer terms enriched for more stringent cut-offs. However, they all support immune processes. These results do suggest that there are fewer genes that are consistently down-regulated in ecDNA(+) cancers, and expression change in a small number of genes may be sufficient to promote conditions for ecDNA.

      We have added the figure as a supplemental figure and have added the following text to the manuscript on pages 17 and 18.

      “Any CorEx gene is either a Core gene that was selected as a feature in at least 5% of 200 Boruta trials, or be highly co-expressed with a Core gene. Because the selection criterion of 5% is arbitrary, we also tested robustness with 8 other cut-offs ranging from 5-of-200 to 200-of-200 Boruta trials. The number of CorEx genes expectedly decreases with more stringent cut-offs.

      However, of the 187 GO terms that were enriched by 262 CorEx UP-genes using 10 of 200 Boruta trials as the selection criteria, 93 terms (49.7%) were enriched for each cut-off (Fig. S9), and 155 terms (82.9%) were enriched in at least 5 of the 8 cut-offs. Given that our subsequent analyses utilized the hierarchy of GO terms and identified 4 GO-categories enriched by UP-regulated genes, the conclusions would hold regardless of the specific cut-off.”

      (3) Authors extend the core gene set with co-expressed genes, arguing that "gene C" would not add predictive power in addition to "gene B" and is therefore not identified as a Boruta gene. However, from its description in the manuscript (summarized: "Boruta [...] selects the highest feature importance score, s, of shadow features as a cut off, and returns features with a higher score than s."), it isn't immediately obvious to me why Boruta would not return both genes B and C. Maybe the authors could explain this better.

      Response. We consider the following.

      (1) Consider 100 ecDNA(+) and 100 ecDNA(-) samples. Let the expression levels of genes B and C in the data-sets be as described in the figure below; y-axis is the gene expression, and x-axis is just a listing of all samples, with green color denoting ecDNA(+) samples and orange color denoting ecDNA(-) samples.

      Author response image 7.

      (2) Then, if we choose gene B and a transcript level of 1.25, we have a perfect prediction of ecDNA status because all samples where gene B has a transcript level higher than 1.25 are ecDNA(+) and otherwise they are ecDNA(-). Similarly, using Gene C, we can get perfect predictions. Thus, when Boruta has to select a gene, it will pick either Gene B or Gene C, because picking both will not improve prediction. We can therefore use Boruta to pick one gene, and then co-expression clustering to pick the other gene.

      As an example, cluster #3 consists of 21 genes that were up-regulated in ecDNA(+) samples and enriched in cell-cycle related biological processes (Table S3). While these genes were expressed similarly in ecDNA(+) samples, and separately, in ecDNA(-) samples, out of the 21 genes, only 9 genes were selected in at least 10 out of 200 Boruta trials (i.e., Core genes). Of the 12 remaining genes (i.e., CorEx genes), 8 genes were not selected by the Boruta method at all, 3 genes were selected in less than 5 out of 200 Boruta trials, and 1 gene was selected in 9 out of 200 Boruta trials.

      Author response image 8.

      (4) In Fig 2a, I would like to see the variability of the precision and recall in the main text, not only the maximum values. Authors could plot mean + standard deviation for precision and recall separately, or use S2a/b.

      Response. We have replaced Figures 2b and 2c with a combined figure (Fig. 2b) that gives a box-plot describing the distribution of recall values for 5 gene lists: four from the original manuscript, and another gene list created using a Generalized Linear Model (GLM).

      Author response image 9.

      (5) Since the authors analyze bulk RNA, the gene expression signatures they notice could, in principle, originate from non-tumor cells as well. I do not believe this is the case, however, the paper would be strengthened by an analysis that shows that the difference in expression patterns of the Corex genes between ecDNA(+) and ecDNA(-)-samples does come from tumor cells. One way of showing this would be by using single-cell mRNA-sequencing data, and another way of showing this would be to show that Corex gene-expression correlates with tumor purity in bulk samples.

      Response. The reviewer is correct. Unfortunately, our analysis requires data with whole-genome sequencing (WGS) for ecDNA prediction, as well as RNA-seq for transcriptome profiling. The TCGA data-set is the only available data-set with a significant number of samples that includes both WGS and RNA-seq. They have not made tissue samples available for scRNA analysis, to our knowledge. The reviewer raises an important question regarding purity, but testing if CorEx gene expression correlates with tumor purity would require a large range of purity values, something that scientists would avoid when collecting samples.

      However, the presence of non-cancer tissue (impurity) could reduce sensitivity of ecDNA detection, and therefore, change the results. To better investigate this, we started with a publication that investigated multiple tumor purity metrics and devised a composite score (CPE; Aran et al., 2015). Using their composite tumor purity, we find that ecDNA(-) samples have slightly lower purity than ecDNA(+) samples (p-value 0.0036; Fig. S2a).

      This result is not surprising because one would expect lower detection of ecDNA in less pure samples. The presence of undetected ecDNA in ecDNA(-) samples would confound the results by reducing the discriminating power of genes, but would not give false results. To test this, we measured the expression directionality in CorEx genes in all samples versus samples which had a high tumor purity (CPE 0.8). The results suggest that the p-values of directionality in the pure samples were highly correlated with the expression data from all samples (Fig. S2b).

      Author response image 10.

      (6) The biological interpretation of the data remains a bit too observational. Can the authors offer an interpretation of the enriched GO terms? And are any of these genes already implicated in ecDNA(+) tumors?

      Response. To answer the second question first, prior to our study, the focus was on genes that were amplified on ecDNA. Indeed many oncogenes known to be amplified in cancer are in fact amplified on ecDNA (Turner, Nature 2017, Kim Nature genetics 2020). This study is unique in that it identifies genes whose expression values are predictive of ecDNA(+) status. The Figure below lists 24 genes most frequently amplified on ecDNA from Kim, Nature Genetics 2020. With the exception of EGFR and CDK4, none of these 24 genes was included in the list of the 65 genes reported by us as the most frequently selected genes in the Boruta trials (lowest harmonic rank). Thus, most persistent CorEx genes do not lie on ecDNA. However, they all play important roles in biological processes relevant to cancer pathology including Immune Response, Mitotic cell Cycle, Cell division, and DSB repair. We agree with the reviewer that the results are observational (although statistically significant in populations), and some of our collaborators are actively working to experimentally validate some of these genes. The experimental work, however, is beyond the scope of this paper.

      We have added the following statement to the manuscript. “Notably, of the 24 genes most frequently expressed on ecDNA,2 only EGFR and CDK4 were included in the list of 65 genes, suggesting that the most persistent CorEx genes do not themselves appear frequently on ecDNA.”

      Author response image 11.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      (1) The authors performed gene ontology enrichment test but referred to it as gene set enrichment analysis. Usually gene set enrichment analysis does not refer to Fischer's exact test-based analysis but rather the one described in Subramanian et al 2005. The term correction should be made to avoid confusion.

      Response. We have rephrased text in the manuscript to prevent confusion between enrichment analysis on gene sets using an one-sided Fisher’s exact test and the Gene Set Enrichment Analysis (GSEA) method that exists as a software. We have also revised the header in the methods section from “Gene set enrichment analysis” to “Gene Ontology (GO) enrichment analysis”.

      (2) A couple of figures could use more detailed labels and captions. In Figure 2c, it is unclear what the numbers 100 and 54 right next to the Cliff's Delta heatmap indicate. In Figures 3a and 4a, it is not immediately clear what the barplot on top of the heatmap indicates and there is no label for the y-axis.

      Response. These are good suggestions, and we have added descriptions to the figure captions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript is very well written, the data are clearly presented and the methodology is robust. I only have suggestions to improve the manuscript, to make the study more appealing or to discuss in more detail some questions raised by the work.

      1. In the study as it stands, PFG seems to come out of the blue. The authors apparently selected this protein based on sequence conservation between species but this is unlikely to be sufficient to identify novel TFs. Explaining in more detail the reasoning that led to PFG would make the story more appealing. Perhaps PFG was identified through a large reverse genetics screening?

      Response: Thank you for your suggestion. We identified this gene solely by the strategy we described in the manuscript. We decided on this strategy based on the findings of our previous study on AP2-Family TFs, whose DNA binding domains are highly conserved among Plasmodium orthologues. Using this screening strategy, we identified a novel AP2 family TF AP2-Z. The results of the present study demonstrated that this strategy is applicable to TFs other than those belonging to the AP2 family. We are aware that this strategy is not all-encompassing. In fact, we failed to identify HDP1 as a candidate TF when it was also in the target list of AP2-G. However, at present, this is our primary strategy for identifying novel TFs in the targetome.

      1. The authors propose that PFG and AP2-FG form a complex, but this is actually not shown. Did they try to document a physical interaction between the two proteins, for example using co-IP?

      Response: Even when the two molecules were identified to be at the same position by ChIPseq, it cannot be concluded that they form a physical complex because it is possible that they competitively occupy the region. However, in this study, we performed ChIP-seq in the absence of PFG and demonstrated that the cAP2-FG peaks disappeared while those of sAP2-FG remained. This result can only be explained by the two proteins forming a complex at this region, which excludes the possibility that AP2-FG binds the region independently.

      1. It is unclear how PFG can bind to DNA in the absence of DNA-binding domain. Did the authors search for unconventional domains in the protein? This should be at least discussed in the manuscript.

      Response: We speculate that the two highly conserved regions, region 1 and region 2, function as DNA-binding domains in PFG. However, this domain is not similar to any DNA binding domains reported thus far. A straightforward way to demonstrate this would be to perform in vitro binding assays using a recombinant protein. However, thus far, we have not succeeded in obtaining soluble recombinant proteins for these regions. We have added the following sentences to the results section.

      “At present, we speculate that PFG directly interacts with genomic DNA through two highly conserved regions; region 1 and region 2. However, these regions are not similar to any DNA binding domains reported thus far. In other apicomplexan orthologues, these two domains are located adjacent to one another in the protein (Fig. 1A). Therefore, these two regions may be separated by a long interval region but constitute a DNA binding domain of PFG as a result of protein folding.”

      1. How do the authors explain that PFG is still expressed in the absence of AP2-FG? Is AP2G alone sufficient to express sufficient levels of the protein? Is PFG down-regulated in the absence of AP2-FG?

      Response: Our previous ChIP-seq data indicate that PFG is a target of AP2-G. According to the study by Kent et al. (2018), this gene is up-regulated in the early period following conditional AP2-G induction. The results of the present study showed that PFG is capable of autoactivation through a transcriptional positive feed-back loop. These results suggest that PFG can maintain its expression to a certain level once activated by AP2-G, even in the absence of AP2-FG. In our previous microarray analysis, significant decreases in PFG expression were not observed in AP2-FG-diaruptedparasites.

      1. How do AP2-FG regulated genes (based on RNAseq) compare with the predicted cAP2FG/sAP2-FG predicted genes (based on ChIPseq)? Are the two subsets included in the genes that are actually down-regulated in AP2-FG(-)?

      Response: Disruption of the AP2-FG gene impairs gametocyte development. We considered that the direct effect of this disruption would be difficult to analyze in gametocyte-enriched blood, in which gametocytes are pooled during sulfadiazine treatment to deplete asexual stages. Therefore, in our previous paper, we performed microarray analysis between WT and KO parasites to detect the direct effect of AP2-FG disruption on target gene expression, using mice which were synchronously infected with parasites. According to our results, 206 genes were down-regulated in AP2-FG-disrupted parasites. Of these genes, 40 and 117 were targets of sAP2-FG and cAP2-FG, respectively. However, it is still possible that a significant proportion of genes were indirectly down-regulated by AP2-FG disruption, which may impair gametocyte development. Moreover, based on the results of the present study, expression of a significant proportion of AP2-FG target genes could be complemented by PFG transcription. We believe that it would be difficult to compare the direct effects of these TFs on gene expression via transcriptome analysis (therefore, targetome analysis is important). In this study, we compared the expression of target genes of sAP2-FG and cAP2FG between PFG(-) and WT parasites. We expected that down-regulation of PFG (cAP2FG) targets would be complemented with transcription by sAP2-FG.

      1. Minor points

      -Page 5 Line 10, remove "as"

      Response: We have corrected this.

      -Page 7 Lines 4-13: is it possible to perform the assay in PFG(-) parasites?

      Response: Thank you for your question. Even when the marker gene expression was decreased in PFG(-) parasites, we cannot conclude the reason to be a direct effect of the mutation. To determine the function of the motif, it is necessary to perform the assay using wild-type parasites.

      -Page 7 Line 45: Fig6C instead of 5C

      Response: Thank you for pointing this out. We have corrected this.

      -Page 8 Line 27: "decreases"

      Response: Thank you for pointing this out. We have corrected this.

      -Page 8 Line 36: PFG instead of PGP

      Response: We have corrected this.

      -Page 8 Line 39: remove "the fact"

      Response: We have removed this word.

      -Page 8 Line 42: Fig6G instead of 5G

      Response: We have corrected this.

      -Page 8 Line 43: PFG instead of PGP

      Response: We have corrected this.

      -Page 9 Line 23: "electroporation"

      Response: We have corrected this.

      -Page 9 Line 32: "BamHI"

      Response: We have corrected this.

      -Fig 2E: in the crosses did the authors check oocyst formation in the mosquito?

      Response: We did not check oocyst formation because abnormalities in males may not affect oocyst formation.

      -Page 17, legend Fig3, Line 14, there is probably an inversion between left and right for PFG versus AP2-FG (either in the legend or in the figure)

      Response: Thank you for pointing this out. PFG peaks are located in the center in both heat maps. The description “AP2-FG peaks” over the arrowhead in the left map was incorrect. We have corrected this to “PFG peaks”. The peaks in the left heat map must be located in the center; thus, this figure might be redundant.

      Reviewer #2 (Recommendations for the Authors):

      • Could the authors please state in the results section that PFG stands for partner of AP2FG.

      Response: Thank you for the comment. We have added the following to the results section:

      “Through this screening, a gene encoding a 2709 amino acid protein with two regions highly conserved among Plasmodium was identified (PBANKA0902300, designated as a partner of AP2-FG (PFG; Fig. 1A).”

      • Given that the transcriptional program is so dynamic, the timing of the ChIP-seq experiments is crucial. Could the authors clarify the timings of the different ChIP-seq experiments (AP2-FG, PFG, PFG in AP2-FG-, AP2-FG in PFG-, ...)

      Response: Thank you for the comment. To deplete any parasites in the asexual stages, all ChIP-seq experiments in this study were performed using blood from mice treated with sulfadiazine, namely, gametocyte-enriched blood. As the reviewer points out, timing is important, and samples from the period when TFs are maximally expressed are optimal for ChIP-seq. However, when parasites in the asexual stages are present, the background becomes higher. Thus we usually use gametocyte-enriched blood for ChIP-seq when expression of the TF is observed in mature gametocytes. The exception was our ChIP-seq analysis of AP2-G, because is not present in mature gametocytes.

      • Fig 4c is an example of great overlap of peaks, but it would be helpful if the authors could quantify the overlaps between experiments (and describe the overlap parameters used).

      Response: According to the comment, we have created a Venn diagram of overlapping peaks (attached below). However, the peaks used for this Venn diagram were selected after peakcalling via fold-enrichment values. Thus, even if the counterpart of a peak is absent in these selected peaks (non-overlapping peaks in the Venn diagram), it does not indicate that it is absent in the original read map. We believe the overlap of peaks would be estimated more correctly in the heat maps.

      Author response image 1.

      Legged: The Venn diagram shows the number of common peaks between these ChIP seq experiments (distance of peak summits < 150

      • Additionally, how were the promoter coordinates used for each gene when they associate ChIP peaks to a gene target. Did the authors choose 1-2kb? Or use a TSS/5utr dataset such as Adjalley 2016 or Chappell 2020?

      Response: We selected a 1.2 Kbp region for target prediction based on our previous studies. As the reviewer pointed out, target prediction using TSS information may be more accurate. However, reliable TSS information is not available for P. berghei to the best of our knowledge.

      The two papers are studies on P. falciparum.

      • In the absence of evidence of physical interaction, it remains unclear if AP2-FG and PFG actually interact directly or as part of the same complex. A more detailed characterisation with IPs/co-IPs followed by mass spectrometry of the GFP-tagged version of PFG in the presence and absence of AP2-FG would be highly informative.

      Response: Thank you for the comment. Even when these two TFs occupy the same genomic region, it cannot be conclusively said that they exist at the same time in the region: they might competitively occupy the region. However, we showed that the cAP2-FG peaks disappear from the region when PFG was disrupted, while sAP2-FG peaks remain. We believe that this is evidence that the two TFs physically interact with each other.

      • It was not clear if the assessment of motif binding using cytometry was performed using all the required controls and compensation. This section should be clarified.

      Response: Thank you for the comment. Condensation was performed using parasites expressing a single fluorescent protein. The results are attached below. The histogram of mCherry using control parasites expressing GFP under the control of the HSP70 promoter is also attached.

      Author response image 2.

      However, we found that descriptions of the filters for detecting red signals were not correct. This assay was performed using parasites which expressed GFP constitutively and mCherry under the control of the p28 promoter. These two fluorescent proteins were excited by independent lasers (488 and 561, respectively), and the emission spectra were detected using independent detectors (through 530/30 and 610/20 filters, respectively). We have revised the description regarding our FACS protocols as follows:

      “Flow cytometric analysis was performed using an LSR-II flow cytometer (BD Biosciences). In experiments using 820 parasites, the tail blood from infected mice was selected via gating with forward scatter and staining with Hoechst 33342 (excitation =355 nm, emission = 450/50). The gated population was then analyzed for GFP fluorescence (excitation = 488 nm, emission = 530/30) and RFP fluorescence (excitation = 561 nm, emission = 610/20). In the promoter assay (using parasites transfected with a centromere plasmid), the tail blood from infected mice was selected via gating with forward scatter and staining with Hoechst 33342 (excitation =355 nm, emission = 450/50), followed by GFP fluorescence (excitation = 488 nm, emission = 530/30). The gated population was analyzed for mCherry fluorescence (excitation = 561 nm, emission = 610/20). Analysis was performed using the DIVER program (BD Biosciences).”

      Minor points:

      • Page 4, line 37: The authors should specify the timing of expression of AP2-FG on the text.

      Response: We have added the following description to the text.

      “The timing of the expression was approximately four hours later than that of AP2-FG, which started at 16 hpi (9).” .

      • Ref 9 and 17 are repeated

      Response: Thank you for pointing this out. We have corrected this.

      • Fig 1D and 1F do not have scale bars

      Response: We have added scale bars to Fig. 1D.

      We have not changed Fig. 1F, because we believe that the scales can be estimated from the size of the erythrocyte.

      • Page 5, line 29-30. Could the authors specify how many and which of the de-regulated genes have a PFG in their promoter.

      Response: Thank you for the comment, As described in a later section (page 7; Impact of PFG disruption on the expression of AP2-FG target genes), among the 279 genes significantly downregulated in PFG(-) parasites, 165 genes were targets for PFG (unique for PFG or common for sAP2-FG and PFG). In contrast, only four genes were targets unique to sAP2-FG. Therefore, 165 genes harbor the upstream peaks of PFG. These genes are shown in Table S1.

      • Fig 5F. in the methods associated with this figure there seems to be a mixup with the description of the lasers. In addition, given the spillover of the red and green signal between detectors this experiment needs compensation parameters. The authors should provide the gating strategy before and after compensation as this is critical for the correct calculation of the number of red parasites. Indeed, the lowest red cloud on the gate shown could be green signal spill over.

      Response: Thank you for the comment. As described above, there were some incorrect descriptions about the conditions of our FACS protocols in the methods section. We have revised them.

      -Page 7, line 19. Could the authors explicitly say in the text that the 810 genes are those with 1 (or more?) PFG peaks in their promoter (out of a total of 1029) to best guide the reader. Additionally, it is important to define the maximum distance allowed between a peak and CDS for it to be associated with said CDS.

      Response: We have revised Table S2 by adding the nearest genes. The revised table shows the relationship between a PFG peak and its nearest genes, together with their distances.

      • Page 7, line 45: fig 6c, not 5c

      Response: Thank you for the comment. We have corrected this.

      • Page 7 last paragraph: This section is very hard to follow. For instance, on line 50 do the authors mean that the sAP2-FG unique targets are LESS de-regulated? On line 51: do the authors mean unique targets of cAP2-FG or unique targets of PFG? Line 53: do the authors mean that genes expressed in the "common" category are LESS de-regulated than the PFG unique targets?

      Response: We are sorry for the lack of clarity; after reviewing the manuscript, it appears to be unclear what the fold change means in this section. Here, fold change means the ratio of PFG(-)/wild type. Thus “High log2(fold change) value” means that the genes were less downregulated. We have revised the description as follows:

      “The log2 distribution (fold change = PFG(-)/wild type) in the three groups of target genes showed that the average value was significantly higher (i.e., less down-regulated) in targets unique to sAP2-FG than in the other two groups (targets unique to cAP2-FG or common targets for both), with p-values of 1.3 × 10-10 and 1.4 × 10-5, respectively, by two-tailed Student’s t-test (Fig. 6F). In addition, the average log2 (fold change) value of the common target genes was relatively higher (i.e., less down-regulated) than that of targets unique to PFG, suggesting that transcriptional activation by sAP2-FG partly complements the impact of PFG disruption on these common targets.”

      • Page 8, line 42: Fig 6G, not 5G

      Response: Thank you for pointing this out. We have corrected this.

      Reviewer #3 (Recommendations For The Authors):

      1. The gene at the center of this study (PBANKA_0902300) was identified in an earlier genetic screen by Russell et al. as being a female specific gene with essential role in transmission and named Fd2 (for female-defective 2). Since this name entered the literature first and is equally descriptive, the Fd2 name should be used instead of PFG to maintain clarity and avoid unnecessary confusion. Surprisingly, this study is neither cited nor acknowledged despite a preprint having been available since August of 2021. This should be remedied.

      Response: Thank you for the comment. We have added the paper by Russell et al. accordingly and mentioned the name FD2 in the revised manuscript. However, we have retained the use of PFG throughout the paper. We believe that this usage of PFG shouldn’t be confusing, as FD2 has only been used in one previous paper. We have added the following:

      “Through this screening, a gene encoding a 2709 amino acid protein with two regions highly conserved among Plasmodium was identified (PBANKA0902300, designated as a partner of AP2-FG (PFG; Fig. 1A). This gene is one of the P. berghei genes that were previously identified as genes involved in female gametocyte development (named FD2), based on mass screening combined with single cell RNA-seq (ref).”

      1. While it isn't really important how the authors came to arrive at studying the function of Fd2, the rationale/approach given in the first paragraph of the result section seems far too broad to lead to Fd2, given that it lacks identifiable domains and many other ortholog sets exist across these species.

      Response: We selected this gene from the list of AP2-G targets as a candidate for a sequence-specific TF based on the hypothesis that the amino acid sequences of DNAbinding domains are highly conserved. We successfully identified two TFs (including PFG) using this method. However, there may be TFs that do not fit this hypothesis which are also targets of AP2-G. In fact, we were unable to identify HDP1 as a TF candidate, despite being a AP2-G target.

      1. Fig. 1A-C: Gene IDs for the orthologs should be provided, as well as the methodology for generating the alignments.

      Response; We have added the gene IDs and method for alignment in the legend as follows:

      (A) Schematic diagram of PFG from P. berghei and its homologs in apicomplexan parasites. Regions homologous to Regions 1 and 2, which are highly conserved among Plasmodium species, are shown as yellow and blue rectangles, respectively. Nuclear localization signals were predicted using the cNLS mapper (http://nls-10 mapper.iab.keio.ac.jp/cgibin/NLS_Mapper_form.cgi). The gene IDs of P. berghei PFG, P. falciparum PFG, and their homologs in Toxoplasma gondii, Eimeria tenella and Vitrella brassicaformis are PBANKA_0902300, PF3D7_1146800, TGGT1_239670, ETH2_1252400, and Vbra_10234, respectively.

      (C) The amino acid sequences of Regions 1 and 2 from P. berghei PFG and its homologs from other apicomplexan parasites in (A) were aligned using the ClustalW program in MEGA X. The positions at which all these sequences have identical amino acids are indicated by two asterisks, and positions with amino acid residues possessing the same properties are indicated by one asterisk.

      1. Figure 2: The Phenotype of Fd2 knockout should be characterized more comprehensively.

      It remains unclear whether ∆Fd2 parasite generate the same number of females but these are defective upon fertilization or whether there is also a decrease in the number of female gametocytes. Is the defect just post-fertilization and zygotes lyse or are there fewer fertilization events? If so is activation of female GCs effected?

      The number of male and female gametocytes should be quantified using sex-specific markers not affected by Fd2 knockout rather than providing a single image of each. The ability of ∆Fd2 GCs should also be evaluated.

      This is also important for the interpretation of Fig 2G. Is the down-regulation of the genes due to fewer female GCs or are the down-regulated genes only a subset of female-specific genes.

      Response: In PFG(-) parasites, the rate of conversion into zygotes of female gametocytes decreased, and zygotes had lost capacity for developing into ookinetes. This indicates that gametocyte development (i.e., the ability to egress the erythrocyte and to fertilize) and zygote development were both impaired. This phenotype is consistent with the observation that genes expressed in female gametocytes are broadly downregulated. PFG is a TF, and its disruption led to decreased expression of hundreds of female genes. Thus, the observed phenotype may be derived from combined decreased expression of these genes. We believe further detailed phenotypic analyses will not generate much novel information on this TF. Instead, RNA-seq data in PFG(-) parasites and the targetome have promise in helping to characterize the functions of this TF.

      1. Figure 3: what fraction of down-regulated genes have the Fd2 10mer motif?

      Response: Thank you for the question. We investigated the upstream binding motifs of these genes. Of the 279 significantly down-regulated genes (containing 165 targets), 161 genes harbor the motif (including nine-base motifs that lack one lateral base which is likely not essential for binding) in their upstream regions (within 1,200 bp from the first methionine codon). However, this result has not been described in the revised manuscript because it is more important whether these regions harbor PFG peaks (upstream motifs can exist without being involved in the binding of PFG).

      1. sAP2-FG (single) vs cAP2-FG (complex) nomenclature is confusing and possibly misleading since few TFs function in isolation and sAP2-FG likely functions in a complex that doesn't contain Fd2, possibly with another DNA binding protein that binds the TGCACA hexamer. The name for the distinct peaks should refer to the presence or absence of Fd2 in the complex, or maybe simply refer to them as complex A & B.

      Response: As shown in the DIP-seq analysis results, AP2-FG can bind the motif by itself. In contrast, AP2-FG must form a complex with PFG to bind to the ten-base motif. The complex and single forms are named according to this difference (the presence or absence of PFG) and used solely in its relation with PFG. We wrote “In the following, we refer to the form with PFG as cAP2-FG or the complex form, and the form without PFG as sAP2-FG or the single form.” We believe that the nomenclature has sufficient clarity. However, we have partially (underlined) revised certain sentences in the discussion section as follows.

      “As the expression of PFG increases via this mechanism, AP2-FG recruited by PFG (cAP2FG) increases and eventually becomes predominant in the transcriptional regulation of female gametocytes.”

      “This suggests that the promoter of the CCP2 gene, which is a target of PFG only, is still active in AP2-FG(-)820 parasites.”

      We recently reported that the TGCACA motif is a cis-activation motif in early gametocytes and important for both male and female gametocyte development. Thus we speculate that sAP2-FG is not involved in cis-activation by the TGCACA motif. The p-value of the six-base motif is indeed comparable to that of the five-base motif. However, the pvalue (calculated by Fisher’s exact test) in six-base motifs tend to be lower than that calculated in five-base motifs, because the population is much large. We speculate that there is a sequence-specific TF that may be expressed in early gametocytes and bind this motif, independently of AP2-FG.

      1. I compared the overlap of peaks in the 4 ChIP-seq data sets:

      90% of the Fd2 peaks are shared with AP2-FG (binding 24% of shared peaks is lost in ∆AP2FG)

      10% are bound by Fd2 alone (binding at 35% of Fd2 is lost in ∆AP2-FG)

      75% of Fd2 peaks are bound independently of AP2-FG

      47% of AP2-FG peaks shared with Fd2 (binding at 71% of shared peaks is lost in ∆Fd2) 53% of AP2-FG peaks are bound only by AP2-FG (but binding at 82% of AP2-FG only peaks is still lost in the ∆Fd2)

      Binding at 78% of all AP2-FG peaks is lost in ∆Fd2

      This indicates that much of AP2-FG binding in regions even in regions devoid of Fd2 still depends on Fd2. What are possible explanations for this?

      https://elife-rp.msubmit.net/eliferp_files/2023/04/03/00117573/00/117573_0_attach_10_17936_convrt.pdf

      Response: In the ChIP-seq of AP2-FG in the absence of PFG, 441 peaks are still called. This means that at least 441 binding sites for AP2-FG independent of PFG exist. This is a straightforward conclusion from our ChIP-seq data. On the other hand, simple deduction of peaks between two ChIP-seq experiments (AP2-FG peaks minus PFG peaks) is not a precise method for determining sAP2-FG. Peak-calling is independently performed in each ChIP-seq experiment. Thus, peaks remaining after the deduction between two experiments can still contain peaks that are actually common, but which are differentially picked up through the process of peak calling. Even when using data obtained by the same ChIP-seq experiment, markedly different numbers of peaks are called according to the conditions for peak calling (in contrast, common peaks between two independent experiments increase the reliability of the data). If wanting to identify sAP2-FG peaks via comparisons between AP2-FG peaks and PFG peaks, the reviewer has to increase the number of PFG peaks by reducing the peak-calling threshold until the number of overlapping peaks between AP2-FG and PFG are saturated, and then deduce the overlapping peaks from the AP2-FG peaks. However, as described above, for the purposes of estimating the number of sAP2-FG, it would be better to perform ChIP-seq of AP2-FG in the absence of PFG.

      1. Possible explanations of why recombinant Fd2 doesn't bind the TGCACA hexamer. It would also be good to note that the GCTCA AP2-FG motif found in Fig4G is now perfect match for the motif identified by protein binding microarray in Campbell et al.

      Response: It is not known what sequence recombinant PFG binds. The TGCACA motif is not enriched in PFG peaks. If the reviewer is referring to AP2-FG, our findings that the recombinant AP2 domain binds the five-base motif strongly suggests that other TFs recognize this motif. As described in our response to comment 9, we recently reported that TGCACA is a cis-activating sequence important for the normal development of both male and female gametocytes. Therefore, we currently speculate that this motif is a binding motif of other TFs and is independent of AP2-FG.

      We have mentioned the protein binding microarray data in the Results section as follows.

      “The most enriched motif matched well with the binding sequence of the AP2 domain of P. falciparum AP2-FG, which was reported by Campbell et al.”

      1. What might explain the strong enrichment for TGCACA in ChIPseq but when pulled down by AP2-FG DBD: another binding partner? requires more of AP2-DF than just DBD?

      Response: As described above in our response to comment 6, we have recently submitted a preprint studying the roles of the remodeler subunit PbARID in gametocyte development. We reported that the remodeler subunit is recruited to the six-base motif and that the motif is a novel cis-activation element for early gametocyte development. We speculate that a proportion of AP2-FG targets are also targets of a TF that recognizes this motif and recruits the remodeler subunit. These two TFs may be involved in the regulation of early gametocyte genes but function independently.

      1. Calling DNA pulldown with recombinant AP2-FG DNA-binding domain DNAImmunoprecipitation sequencing (DIP-seq) is confusing since there are no antibodies involved. Describing it directly as a pulldown of fragmented DNA will be clearer to the reader.

      Response: Thank you for the comment. We have also recognized this discrepancy. However we called the method DIP-seq because the original paper reporting this method used this name, wherein it did not use antibodies to capture the MBP-fusion recombinant protein. Our experiment was performed using essentially the same methods, and thus we retained the name.

      1. The legends and methods are very sparse and should include substantially more detail.

      Response: Thank you for the comment. We have revised the description of the FACS experimental method for clarity.

      1. BigWig files for all ChIPseq enrichment used for analysis in this study need to be provided.

      (two replicates each of : Fd2 in WT, Fd2 in ∆AP2-GF, AP2-FG in WT, AP2-FG in ∆Fd2)

      Response: We have deposited the BigWig files to GEO (GSE.226028 and GSE114096).

      1. Tables of ChIP data need to have both summits and peaks and need to list nearest gene. Also the ChIPseq peaks for Fd2 are surprisingly broad (ChIP peaks are very large, e.g. 68% of Fd2 peaks (dataset2) are greater than 1000kb) give its specificity for a long motif. Why is this?

      Response: We have revised Table S2 to include the nearest genes. We are unsure why peaks in the over 1000-bp peak region exist in such high proportions. However, this proportion was also high in our previous ChIP-seq data. Therefore, we speculate that this is a tendency of peak-calling by MACS2. We did not use these values in this paper. For example, targets were predicted using peak summits, and binding motifs were calculated using the 100-base regions around peak summits.

      1. Figure 5E: The positions of the 10mer and 5mer motifs in the promoter should be indicated as well as the length of the promoter. Moreover, mutation of just the 5bp motifs would be valuable to understand if 10mer is sufficient for expression of the reporter.

      Response: Thank you for the comment. We have revised the figure accordingly. The majority of female-specific promoters only harbor ten-base motifs. Thus the ten-base motif is sufficient for evaluating reporter activity (i.e., it would function without five-base motifs).

      1. How is AP2-FG expression affected in ∆Fd2 and vice versa?

      Response: According to our previous microarray data, PFG expression was not significantly downregulated by disruption of AP2-FG. This may be because PFG transcriptionally activates itself through a positive feedback loop after being induced by AP2-G. Similarly, according to our present study, AP2-FG expression was not downregulated by PFG disruption. This may be because AP2-FG is transcriptionally activated by AP2-G.

      1. The single cell data in Russell et al. could easily be used to indicate the order of expression.

      Response: Determining the expression order of gametocyte TFs via the single cell RNA-seq data from Russel et al. is difficult, because only a small number of parasite cells were considered to be in the early gametocyte stage in this study. This is because the parasites were cultured for 24h before the analysis. The analysis suggested by the reviewer may be possible via single cell RNA-seq, but the experiments must be performed with more focus on the early gametocyte stage.

      1. A discussion of the implication of P. falciparum transmission would be appreciated.

      Response: Thank you for the comment. We have added the following to the Discussion section:

      “P. falciparum gametocytes require 9-12 days to mature, which is much longer than that of P. berghei. Meanwhile, it has been reported that the ten-base motif is highly enriched in the upstream regions of female-specific genes also in P. falciparum. Thus, despite the difference in maturation periods, PFG is likely to play an important role in the transcriptional regulation of female P. falciparum gametocyte development."

      1. The lack of identifiable DNA binding domains in Fd2 is intriguing given the strong sequence-specificity. Do the authors think they have identified a new DNA-binding fold ?

      Alphafold of the orthologs with contiguous regions 1&2 might offer insight.

      Response: We speculate that these regions function as DNA binding domains. We performed analysis using Alfafold2 according to the comment. However, the predicted structure of the region was not similar to any other canonical DNA-binding domains. Thus, it may be a novel DNA-binding fold as the reviewer mentioned. Further studies such as binding assays using recombinant proteins would be necessary to confirm this, but thus far we have not successfully obtained the soluble proteins of these regions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Author response:

      Reviewer #1:

      The main objective of this study is to achieve the development of a synthetic autotroph using adaptive laboratory evolution. To accomplish this, the authors conducted chemostat cultivation of engineered E. coli strains under xylose-limiting conditions and identified autotrophic growth and the causative mutations. Additionally, the mutational mechanisms underlying these causative mutations were also explored with drill down assays. Overall, the authors demonstrated that only a small number of genetic changes were sufficient (i.e., 3) to construct an autotrophic E. coli when additional heterologous genes were added. While natural autotrophic microorganisms typically exhibit low genetic tractability, numerous studies have focused on constructing synthetic autotrophs using platform microorganisms such as E. coli. Consequently, this research will be of interest to synthetic biologists and systems biologists working on the development of synthetic autotrophic microorganisms. The conclusions of this paper are mostly well supported by appropriate experimental methods and logical reasoning. However, further experimental validation of the mutational mechanisms involving rpoB and crp would enhance readers' understanding and provide clearer insights, despite acknowledgement that these genes impact a broad set of additional genes. Additionally, a similar study, 10.1371/journal.pgen.1001186, where pgi was deleted from the E. coli genome and evolved to reveal an rpoB mutation is relevant to this work and should be placed in the context of the presented findings.

      We thank the reviewer for pointing this study out. It is very interesting that a mutation in a similar region in RpoB was observed in a related context of Pgi loss of activity. We have added a reference to this study in our text (Page 11, line 21).

      he authors addressed rpoB and crp as one unit and performed validation. They cultivated the mutant strain and wild type in a minimal xylose medium with or without formate, comparing their growth and NADH levels. The authors argued that the increased NADH level in the mutant strain might facilitate autotrophic growth. Although these phenotypes appear to be closely related, their relationship cannot be definitively concluded based on the findings presented in this paper alone. Therefore, one recommendation is to explore investigating transcriptomic changes induced by the rpoB and crp mutations. Otherwise, conducting experimental verification to determine whether the NADH level directly causes autotrophic growth would provide further support for the authors' claim.

      We appreciate the valuable comment and agree that the work was lacking such an analysis. Due to various reasons we have opted to use a proteomic approach which we feel fulfills the same purpose as the transcriptomics suggestion. We found interesting evidence in up-regulation of the fdoGH operon (comprising the native formate dehydrogenase O enzyme complex) which could indicate why there is an increase in NADH/NAD+ levels. We also hypothesize that this upregulation might be important more generally by drawing comparisons to natural chemo-autotrophs.

      Further experimental work (which we were not able to include in the current study) could help validate this link by deleting fdoGH and observing a loss of phenotype and, on the flip side, directly overexpressing the fdoGH operon and observing an increase in the NADH/NAD+ ratio. Indeed, if this overexpression were to prove sufficient for achieving an autotrophic phenotype without the mutations in the global transcription regulators, it would be a much more transparent design.

      We have added a section titled "Proteomic analysis reveals up-regulation of rPP cycle and formate-associated genes alongside down-regulation of catabolic genes" to the Results based on this analysis.

      • It would be beneficial to provide a more detailed explanation of the genetic background before the evolution stage, specifically regarding the ∆pfk and ∆zwf mutations. Furthermore, it is suggested to include a figure that provides a comprehensive depiction of the reductive pentose phosphate pathway and the bypass pathway. These will help readers grasp the concept of the "metabolic scaffold" as proposed by the authors.

      We agree with the reviewer that this could be helpful and we added a reference to the original paper Gleizer et al. 2019 that reported this design and also includes the relevant figure. We feel that the figure should not be added to the current manuscript as we continue to show that this design is not relevant in the context of the three reported mutations and such a figure could distract the attention of the reader from the main takeaways of the current study.

      • Despite the essentiality of the rpoB mutation (A1245V) to the autotrophic phenotype in the final strain, the inclusion of this mutation in step C1 does not appear to be justified. According to line 37 on page 3, the authors chose to retain the unintended mutation in rpoB based on its essentiality to the phenotype observed in other evolved strains. However, it should be noted that the mutations found in the evolved strain I, II, and III (P552T or D866E) were entirely different from the unintended mutation (A1245V) during genetic engineering. This aspect should be revised to avoid confusion among readers.

      Thank you for pointing this issue out, we added a clarification in the text (page 4 line 7) to avoid such confusion. We believe this point is much clearer now.

      The rpoB mutation which was shown to be essential in the study is indeed known to be common in ALE experiments in E. coli. Thus, I searched the different rpoB mutations in ALEdb in E. coli and I was able to find a similar mutation in a study where pgi was knocked out and then evolved. https://doi.org/10.1371/journal.pgen.1001186 This study seems very relevant given that pgi was a key mutation in the compact set of this work and the section "Modulation of a metabolic branch-point activity increased the concentration of rPP metabolites" informs that loss of function mutations in pgi were also found. The findings of this study should thus be put in the context of the previous related ALE study. I would recommend a similar analysis of crp mutations from studies in ALEdb to see if there are similar mutations in this gene as well or if this a unique mutation.

      We thank the reviewer for bringing this publication to our attention. We have addressed this observation in the main text (page 11 , line 21). We agree that it could have some connection to the pgi mutation yet we would not want to overspeculate about this role, as we also found the exact same mutation (A1245V) as an adaptation to higher temperature in another E. coli study (Tenaillon et al. 2012). We would like to bring forward the fact that the two reported rpoB mutations are always accompanied by another mutation with pleiotropic effects, either in the transcription factor Crp or in another RNA polymerase subunit (e.g RpoC). As such many epistatic effects could occur, one of which we also report here in page 13, line 18. In conclusion, although there could be a connection between the rpoB and pgi mutations, it could be a mere coincidence and the two mutations could exhibit two distinct roles in two distinct phenotypes.

      We also would like to thank the reviewer for suggesting a similar analysis for crp and found another mutation at a nearby residue with strong adaptive effects and mentioned it in our main text.

      Can the typical number of mutations found in a given ALE experiment be directly compared to those found in this study? It seems like a retrospective analysis of other ALE studies to show how many mutations typically occur in an ALE study and sets which were found to be causal to reproduce the phenotype of interest (through similar reverse engineering in the starting strain) should be presented. Again, the authors cite ALEdb which should provide direct numbers of mutations found in similar ALE studies with E. coli and one could then examine them to find sets of clearly causal mutations which recreate phenotypes of interest. Such an analysis would go a long way in supporting the main finding of "small number" of mutations.

      Discussion, page 12, line 42. "This could serve as a promising strategy for achieving minimally perturbed genotypes in future metabolic engineering attempts". There is an entire body of work around growth-coupled production which can be predicted and evolved with a genome-scale metabolic model and ALE. Thus, if this statement is going to be made, relevant studies should be cited and placed in context.

      The reviewer raises an important point which could indeed yield an interesting perspective. However, it would be difficult to perform this comparison in practice since many of the studies published on ALEdb have not isolated essential mutations from other mutation incidents nor have they determined the role of each mutation in the reported phenotypes. For example, many ALE trajectories include a hypermutator that greatly increases the number of irrelevant mutations and it is nearly impossible to sieve through them to find an essential set.

      Moreover, it is hard to compare the “level of difficulty” of achieving one phenotype over another and therefore feel that even though such an analysis would be insightful, it requires an amount of work which is outside the scope of this study.

      Finally, we would like to highlight our approach of using the iterative approach, isolating the relevant consensus mutations and repeating this process until no evolution process is required, we are not aware of prior studies that used this approach.

      We now clarified what we mean by "promising strategy" in the discussion in order to avoid any false claims about novelty (page 16 line 32): "Using metabolic growth-coupling as a temporary 'metabolic scaffold' that can be removed, could serve as a promising strategy for achieving minimally perturbed genotypes in future metabolic engineering attempts."

      Reviewer #2:

      Synthetic autotrophy of biotechnologically relevant microorganisms offers exciting chances for CO2 neutral or even CO2 negative production of goods. The authors' lab has recently published an engineered and evolved Escherichia coli strain that can grow on CO2 as its only carbon source. Lab evolution was necessary to achieve growth. Evolved strains displayed tens of mutations, of which likely not all are necessary for the desired phenotype.

      In the present paper the authors identify the mutations that are necessary and sufficient to enable autotrophic growth of engineered E. coli. Three mutations were identified, and their phenotypic role in enhancing growth via the introduced Calvin-Benson-Bassham cycle were characterized. It was demonstrated that these mutations allow autotrophic growth of E. coli with the introduced CBB cycle without any further metabolic intervention. Autotrophic growth is demonstrated by 13C labelling with 13C CO2, measured in proteinogenic amino acids. In Figures 2B and S1, the labeling data are shown, with an interval of the "predicted range under 13CO2".

      Here, the authors should describe how this interval was derived.

      The methodology is clearly described and appropriate.

      The present results will allow other labs to engineer E. coli and other microorganisms further to assimilate CO2 efficiently into biomass and metabolic products. The importance is evident in the opportunity to employ such strain in CO2 based biotech processes for the production of food and feed protein or chemicals, to reduce atmospheric CO2 levels and the consumption of fossil resources.

      Please describe in the methodology how the interval of the predicted range of 13C labeling was derived for Figures 2B and S1. Was it calculated by the dilution factor during 4 generations, or did you predict the label incorporation individually with a metabolic model?

      The text needs careful editing, some sentences are incomplete and there are frequent inconsistencies in writing metabolites and enzymes.

      P2L6: unclear sentence (incomplete?)

      P2L19: pastoris with lower case "p"

      P2L40: incomplete sentence

      P2L42: here, and at many other places, the writing of RuBisCO needs to be aligned. It is an abbreviation and should begin with a capital letter. Most commonly it is written as RuBisCO which I would suggest - please unify throughout the text.

      P3L3: formate dehydrogenase ... metabolites and enzymes with lower case letter. And, no hyphen here.

      P5L4: delete the : after unintentionally

      P6L16: carboxylation of RuBP (it is not CO2 that is carboxylated - if any, CO2 is carboxylating)

      P7L25: phosphoglucoisomerase (lower case)

      P8L5: in line

      P8L9: part of glycolysis/ ...

      P10L4: pentose phosphates (lower case, no hyphen).

      P10L4: all metabolites lower case

      P12L28: incomplete sentence

      P18L4: Escherichia coli in italics P18L15: Pseudomonas sp. in italics P18L16: ... promoter and with a strong ...

      P20, chapter Metabolomics: put the numbers of 12C and 13C in superscript P23L9: pentose phosphates ; all metabolites in lower case (as above) P23: all 12C and 13C with superscript numbers.

      Response to reviewer #2:

      We thank the reviewer for their comments, and for pointing out the need to clarify how we derived the predicted range of 13C labeling. We edited the text accordingly, and added the relevant calculation to the methods section (under the “13C Isotopic labeling experiment”). We would like to also thank the reviewer for the required text improvements, which were implemented. 

      Reviewer #3:

      The authors previously showed that expressing formate dehydrogenase, rubisco, carbonic anhydrase, and phosphoribulokinase in Escherichia coli, followed by experimental evolution, led to the generation of strains that can metabolise CO2. Using two rounds of experimental evolution, the authors identify mutations in three genes - pgi, rpoB, and crp - that allow cells to metabolise CO2 in their engineered strain background. The authors make a strong case that mutations in pgi are loss-of-function mutations that prevent metabolic efflux from the reductive pentose phosphate autocatalytic cycle. The authors also argue that mutations in crp and rpoB lead to an increase in the NADH/NAD+ ratio, which would increase the concentration of the electron donor for carbon fixation. While this may explain the role of the crp and rpoB mutations, there is good reason to think that the two mutations have independent effects, and that the change in NADH/NAD+ ratio may not be the major reason for their importance in the CO2-metabolising strain.

      We thank the reviewer for their comments and constructive feedback.

      We agree that there is probably a broader effect caused by the rpoB and crp mutations, besides the change in the NADH/NAD+ ratio. Hence, we performed a proteomics analysis, comparing the rpoB and crp mutations on a WT background to an autotrophic E.coli, searching for a mutual change in both strains compared to their "ancestors". We found up-regulation of rPP cycle and formate-associated genes, and a down-regulation of catabolic genes. We added a section dedicated to this matter under the title "Proteomic analysis reveals up-regulation of rPP cycle and formate-associated genes alongside down-regulation of catabolic genes".

      Specific comments:

      1. Deleting pgi rather than using a point mutation would allow the authors to more rigorously test whether loss-off-function mutants are being selected for in their experimental evolution pipeline. The same argument applies to crp.

      We appreciate this recommendation and indeed tried to delete pgi, but the genetic manipulation caused a knockout of other genes along with pgi (pepE, rluF, yjbD, lysC) so in the time available to us we cannot confidently determine whether the deletion alone is sufficient and can replace the mutation.

      Regarding crp, we do not think there is a reason to believe the mutation is a loss-of-function. In any case, the proteomics-based characterization of the crp mutation is now included in the SI.

      1. Page 10, lines 10-11, the authors state "Since Crp and RpoB are known to physically interact in the cell (26-28), we address them as one unit, as it is hard to decouple the effect of one from the other". CRP and RpoB are connected, but the authors' description of them is misleading. CRP activates transcription by interacting with RNA polymerase holoenzyme, of which the Beta subunit (encoded by rpoB) is a part. The specific interaction of CRP is with a different RNA polymerase subunit. The functions of CRP and RpoB, while both related to transcription, are otherwise very different. The mutations in crp and rpoB are unlikely to be directly functionally connected. Hence, they should be considered separately.

      Indeed, the fact that the proteins are interacting in the cell does not necessarily mean that the mutations are functionally connected. We therefore added as further justification in the new section:

      "As far as we know, the mutations in the Crp and RpoB genes affect the binding of the RNA polymerase complex to DNA and/or its transcription rates. Depending on the transcribed gene target, the effect of the two mutations might be additive, antagonistic, or synergistic. Since each one of these mutations individually (in combination with the pgi mutation) is not sufficient to achieve autotrophic growth, it is reasonable to assume that only the target genes whose levels of expression change significantly in the double-mutant are the ones relevant for the autotrophic phenotype”.

      In our proteomics analysis we considered each mutation separately. We found that in some cases the two mutations together have an additive effect, but in other cases we found that the two mutations together affect differently on the proteome, compared to the effect of each mutation alone. Since both mutations are essential to the phenotype, we decided to go with the approach of addressing the two mutations as one unit for the physiological and metabolic experiments.

      1. A Beta-galactosidase assay would provide a very simple test of CRP H22N activity. There are also simple in vivo and in vitro assays for transcription activation (two different modes of activation) and DNA-binding. H22 is not near the DNA-binding domain, but may impact overall protein structure.

      The mutation is located in “Activating Region 2”, interacting with RNA polymerase. We tried an in-vivo assay to determine the CRP H22N activity and got inconclusive results, we believe the proteomics analysis serves as a good method for understanding the global effect of the mutation.

      1. There are many high-resolution structures of both CRP and RpoB (in the context of RNA polymerase). The authors should compare the position of the sites of mutation of these proteins to known functional regions, assuming H22N is not a loss-of-function mutation in crp.

      We added a supplementary figure regarding the structural location of the two mutations, where it is demonstrated that crp H22N is located in a region interacting with the RNA polymerase and rpoB A1245V is located in proximity to regions interacting with the DNA.

      1. RNA-seq would provide a simple assay for the effects of the crp and rpoB mutations. While the precise effect of the rpoB mutation on RNA polymerase function may be hard to discern, the overall impact on gene expression would likely be informative.

      Indeed we agree that an omics approach to infer the global effect of these mutations is beneficial, we opted to use a proteomics approach and think it serves the purpose of clarifying the final, down-stream, effect on the cell.

      1. Page 2, lines 40-45, the authors should more clearly explain that the deletion of pfkA, pfkB and zwf was part of the experimental evolution strategy in their earlier work (Gleizer et al., 2019), and not a new strategy in the current study.

      We thank you for pointing this out, and edited the text accordingly.

      1. Page 3, line 27. Why did the authors compare the newly acquired mutants to only two mutants from the earlier work, not all 6?

      The 6 clones that were isolated in Gleizer et al., had 2 distinct mutation profiles. During the isolation process the lineage split into two groups. Three out of the 6 clones (clones 1,2,6) came from the same ancestor, and the other three (clones 3,4,5) came from another ancestor. Hence, these two groups shared almost all of their mutations (see Venn diagram). We decided to use for our comparison the representative with the highest number of mutations from each group (clones 5 and 6).

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chen and colleagues first compared the cartilage tissues collected from OA and HA patients using histology and immunostaining. Then, a genome-wide DNA methylation analysis was performed, which informed the changes of a novel gene, TNXB. IHC confirmed that TNXB has a lower expression level in HA cartilage than OA. Next, the authors demonstrated that TNXB levels were reduced in the HA animal model, and intraarticular injection of AAV carrying TNXB siRNA induced cartilage degradation and promoted chondrocyte apoptosis. Based on KEGG enrichment, histopathological analysis, and western blot, the authors also showed the relationship between TNXB and AKT phosphorylation. Lastly, AKT agonist, specifically SC79 in this study, was shown to partially rescue the changes of in vitro-cultured chondrocytes induced by Tnxb knock-down. Overall, this is an interesting study and provided sufficient data to support their conclusion.

      Strengths:

      (1) Both human and mouse samples were examined.

      (2) The HA model was used.

      (3) Genome-wide DNA methylation analysis was performed.

      Weaknesses:

      (1) In some experiments, the selection of the control groups was not ideal.

      Thank you for comments. The reviewer raised the concerns about using human OA cartilage as control, instead of health cartilage. This is an important detail we didn’t describe in the previous version. We have added our explanation in revised Methods.

      (2) More details on analyzing methods and information on replicates need to be included.

      We greatly appreciate your careful review and helpful suggestions. We have added detailed information to our revised draft.

      (3) Discussion can be improved by comparing findings to other relevant studies.

      Thank the reviewer very much for the opportunity to improve our manuscript. We have improved discussions as reviewer suggested in Recommendation 13.

      (4) The use of transgenic mice with conditional Tnxb depletion can further define the physiological roles of Tnxb.

      Thanks for this valuable comment. We understand that conditional Tnxb-KO mice is much helpful for the study of biological roles of Tnxb, and it will be constructed and used in our future studies.

      Recommendations For the Authors:

      (1) Please add more information about HA such as incidence to highlight the importance of the study.

      We greatly appreciate your careful review and helpful suggestions. We have provided more information about the importance of HA study in revised Introduction. Please see lines 90-93 and 103-112.

      (2) Please justify the use of OA cartilage, instead of normal tissues, as the control.

      Thanks for your suggestion. We certainly would have liked to use healthy cartilage as control, but we were extremely difficult to obtain enough control samples from healthy individuals. Despite the mechanistic and phenotypic differences between HA and OA, OA is often used as “disease” control to reveal the characteristics in HA 1,2. Thus, we measured cartilage degeneration and DNA methylation difference in HA and OA patients. We have provided the statement and evidence in revised manuscript. Please see lines 144-145.

      (3) Please provide details of how to calculate the Cartilage wear area ratio in Figure 1D, and measure the positive staining area in Figure 1F.

      We apologize for the issue you pointed out. Here, we provide detailed information for how positively stained areas are calculated. Specifically, in Figure 1D, we obtained the cartilage area ratio by calculating the ratio of blue cartilage staining area to the whole tissue area by using image J software. In Figure 1F, the area of positive staining was determined upon secondary antibody treatment and color development using DAB chromogen (brown stain). We then obtained the positive staining area ratio by calculating the ratio of positive staining area to the whole cartilage area by using image J software.

      (4) Please label the location of hemorrhagic ferruginous deposits in Figure 1.

      Thank you for your valuable suggestion. We have used black arrows to indicate hemorrhagic ferruginous deposits in revised Figure 1A.

      (5) Please define the meaning of "n" in all figure legends, such as technical or biological replicates.

      Thanks for your suggestion. We have defined the meaning of "n" in all figure legends in revised manuscript.

      (6) In Figure 3, please increase the font size of B, D, F, H, and J. The same applies to other figures.

      Thank you for your valuable suggestion. We have increased the font size of figures in our revised manuscript.

      (7) Line 327, "(Figure 1, F and G)" should be Figure 2F, G.

      Thanks for your reminding. We have corrected it in the revision. Please see lines 347.

      (8) Reduced TNXB levels in human HA cartilage are one of the major findings in this study. Currently, only semi-quatative IHC was used to draw the conclusion. A second method, such as real-time PCR or western blot, is required.

      Thanks for your suggestion. We feel very sorry that we did not have enough samples of human HA cartilages for qPCR and WB experiments, due to severe erosion of the HA cartilage. We have pointed out this limitation in revised drafts. Please see lines 445-448.

      (9) Figure 3 shows that reduced Tnxb was accompanied by the increased Dnmt1. In addition, this study is about methylation. Have the authors tested the change of Dnmt1 levels when Tnxb was knocked down?

      Thanks for your suggestion. According to the reviewer's suggestion, we have tested the expression of Dnmt1 in Tnxb-KD chondrocytes, and no significant alteration was observed. Please see the following Figure.

      Author response image 1.

      Figure Legend: Representative IHC staining of Dnmt1 in articular cartilage from Tnxb-KD HA mice. Corresponding quantification of the proportion of Dnmt1 positive regions. Red arrows indicate positive cells. Scale bar: 100 μm. Data were presented as means ± SD; n = 5 in each group. ns = no significance by unpaired Student’s t test.

      (10) Also, is there a causal relationship between Tnxb levels and the distribution of methylation levels? Any related study was performed?

      Following the valuable suggestion of the reviewer, we used two well-known DNA methyltransferase inhibitors (RG108 or 5-Aza-dc) 3 to examine whether DNA methylation regulates transcriptional expression of TNXB. We found that both inhibitors significantly up-regulated Tnxb mRNA level. We have added this result to the revised Supplementary Figure 4 and draft (lines 292-296 and 369-374).

      (11) In Figure 6, what was the control of "AKT agnost" group?

      Thank you for your suggestion. We feel sorry for our negligence and we have added the vehicle group as a control for AKT agonists in Figure 6 in our revised manuscript.

      (12) Previous studies have reported the involvement of TNXB in TGF-β signaling. Have the authors examined the effect of TNXB on TGF-β signaling in chondrocytes?

      Thank you for your suggestion. Here, we examined the expression of TGF-β signaling in Tnxb-KD chondrocyte and no significant changes were observed. We have discussed this result in revised draft (lines 475-479). We have added this result to the revised Supplementary Figure 7.

      (13) Discussion can be improved. For example, have previous studies reported the association between TNXB and methylation in other cells/tissues? In addition to apoptosis, are there other potential mechanisms underlying the protective role of TNXB in chondrocytes?

      Thank you for your valuable comments. Previous studies have shown the different DNA methylation of TNXB in whole blood from rheumatoid arthritis patients and in retinal pigment epithelium from patients with age-related macular degeneration 4,5. Herein, we were the first to report the association between DNA methylation of TNXB and HA cartilage degeneration. As for TNXB, there are limited public studies regarding physiological function of TNXB, among which mostly report the effect of TNXB on extracellular matrix organization 6,7. In our work, we found that TNXB regulated the phosphorylation of AKT. Since previous reports showed AKT controlled the expression of Mmp13 8, we thought that TNXB might regulated the chondrocyte extracellular matrix organization, in addition to its function on apoptosis. We have discussed these in revised manuscript (lines 462-464, and 495-501).

      (14) The manuscript writing needs to be improved. Typos and grammar issues were noted.

      Thanks. We have modified and polished our language and we hope the revised version could be acceptable for you.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript mainly studied the biological effect of tenascin XB (TNXB) on hemophilic arthropathy (HA) progression. Using bioinformatic and histopathological approaches, the authors identified the novel candidate gene TNXB for HA. Next, the authors showed that TNXB knockdown leads to chondrocyte apoptosis, matrix degeneration, and subchondral bone loss in vivo/vitro. Furthermore, AKT agonists promoted extracellular matrix synthesis and prevented apoptosis in TNXB knockdown chondrocytes.

      Strengths:

      In general, this study significantly advances our understanding of HA pathogenesis. The authors utilize comprehensive experimental strategies to demonstrate the role of TNXB in cartilage degeneration associated with HA. The results are clearly presented, and the conclusions appear appropriate.

      Weaknesses:

      Additional clarification is required regarding the gender of the F8-/- mouse in the study. Is the mouse male or female?

      We feel sorry that we did not provide enough information about the gender of the F8-/- mouse in the previous draft. Here, we used male F8-/- mice as the study subjects for our experiments. Hemophilia A is predominantly seen in males because of the X chromosome linkage 9.

      Recommendations For The Authors:

      Some issues need to be addressed in the manuscript:

      (1) During the progression of HA, in addition to cartilage degeneration, synovial hypertrophy and inflammation are also significant symptoms. How is the expression of TNXB in HA synovium?

      Thank you for your valuable comments. According to the reviewer's suggestion, we tested the expression of TNXB in the synovium, and there was no statistically significant difference in the expression level of TNXB in the synovium (Supplementary Figure. 2) Please see lines 347-349.

      (2) Lines 183-188. The methods of virus infection should be more detailed. What was the concentration of the AAVs injected? And how many doses were administrated?

      Thank you for your suggestion. We have added an explanation of virus infection and injected doses in revised methods section (lines 205-206).

      (3) Line 197-198. Could the author double-check the decalcification time for human cartilage samples? Is it for 3 months? Or for 3 weeks?

      Thank you for your suggestion. We have reconfirmed the decalcification of human cartilage samples for 3 months.

      (4) Line 343-344 "Above results suggest that TNXB might be protective against HA and its cartilage suppression is closely related to HA development." The conclusion is inappropriate, please revise it.

      Thanks for your suggestion. We have revised this conclusion into “Above results suggest that the suppression of TNXB in cartilage promotes the HA development”. Please see lines 365-366.

      (5) Line 326-327, the IHC staining for human samples is shown in Figure 2, not Figure 1. Please double check and revise it.

      Thanks for your reminding. We feel sorry for our negligence and we have corrected it in the revision.

      (6) For Figure 1B, it shows the MRI images of knee joints. However, the method section lacks details regarding the MRI imaging scan and analysis. Could the author include this information in the method section?

      Thank you for your valuable comments. We have added the method of MRI imaging scan and analysis in revised Methods. Please see lines 154-163.

      (7) In Figure 5, The statistical result of Bcl-2 is inconsistent with its Western blot band. Please check.

      Thanks for your reminding. We have modified it in the revision.

      (8) Please read through the text carefully to check for language problems. For example, in Line 68 "Our" not "our".

      Thanks for your reminding. In revision, we have corrected it. Please see Line 68.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Dr. Chen et al. investigates the genes that are differentially methylated and associated with cartilage degeneration in hemophilia patients. The study demonstrates the functional mechanisms of the TNXB gene in chondrocytes and F8-/- mice. The authors first showed significant DNA methylation differences between hemophilic arthritis (HA) and osteoarthritis through genome-wide DNA methylation analysis. Subsequently, they showed a decreased expression of the differentially methylated TNXB gene in cartilage from HA patients and mice. By knocking down TNXB in vivo and in vitro, the results indicated that TNXB regulates extracellular matrix homeostasis and apoptosis by modulating p-AKT. The findings are novel and interesting, and the study presents valuable information in blood-induced arthritis research.

      Strengths:

      The authors adopted a comprehensive approach by combining genome-wide DNA methylation analysis, in vivo and in vitro experiments using human and mouse samples to illustrate the molecular mechanisms involved in HA progression, which is crucial for developing targeted therapeutic strategies. The study identifies Tenascin XB (TNXB) as a central mediator in cartilage matrix degradation. It provides mechanistic insights into how TNXB influences cartilage matrix degradation by regulating the activation of AKT. It opens avenues for future research and potential therapeutic interventions using AKT agonists for cartilage protection in hemophilic arthropathy. The conclusions drawn from the study are clear and directly tied to the findings.

      Weaknesses:

      (1) The study utilizes a small sample size (N=5 for both osteoarthritis and hemophilic arthropathy). A larger sample size would enhance the generalizability and statistical power of the findings.

      Thank you for pointing out this deficiency. Indeed, our sample size is relatively small, although the overall sample size was sufficient for statistical analyses. And we have added this limitation in discussion in revised manuscript. Please see line 445-448. Considering the small sample size, we subsequently performed functional validation study for TNXB, one of the most significant genes, and demonstrated that TNXB exerted critical impacts on chondrocytes apoptosis in HA pathogenesis in vivo and in vitro.

      (2) The use of an animal model (F8-/- mouse) to investigate the role of TNXB may not fully capture the complexity of human hemophilic arthropathy. Differences in the biology between species may affect the translatability of the findings to human patients.

      Thank you for your valuable comments. We recognize that biological differences between species can affect the clinical translation of research findings. In our work, we sequenced human cartilage samples to obtain the differentially methylated gene-TNXB. Meanwhile, we demonstrated that protein expression of TNXB protein was significantly down-regulated in HA human cartilage and F8-/- transgenic mouse cartilage. The F8-/- transgenic mouse serves as a well-accepted model for the study of hemophilia, which is phenotypically similar to that of human patients suffering from the disease and spontaneously bleeds into the joints and soft tissues. Besides, this model mouse has been widely used in the study of hemophilia and hemophilic arthritis 9-11.

      (3) The study primarily focuses on TNXB as a central mediator, but it might overlook other potentially relevant factors contributing to cartilage degradation in hemophilic arthropathy. A more holistic exploration of genetic and molecular factors could provide a broader understanding of the condition.

      Thanks for your suggestion. Since our human sample size is relatively small, we should interpret differentially methylated genes cautiously. Therefore, we mainly focused on the most top significant gene TNXB for functional study. In our further study, we will expand the sample size to more comprehensively explore the molecular mechanisms of HA.

      Recommendations For The Authors:

      The following are my suggestions:

      (1) Why do the authors choose to concentrate on the knee joint in the introduction when hemophilia, characterized by a deficiency in clotting factor F8, is recognized as a systemic disease?

      Thank you for your valuable comments. Although hemophilia a systemic disease, approximately 80%-90% of bleeding episodes in patients with hemophilia occur within the musculoskeletal system, especially in the knee joint 12.

      (2) While Figure 1 illustrates distinct expressions of Dnmt1 and Dnmt3a, only Dnmt1 results are presented in HA mice models in Figure 3. To address this, it is suggested that the expression of Dnmt3a be explored in animal models.

      Thank you for your suggestion. According to the reviewer's suggestion, we examined the expression of Dnmt3a in mouse articular cartilage, and the expression level of Dnmt3a was significantly up-regulated in both the 4W and 8W model groups compared with the control group (Figure 3). Please see line 364.

      (3) In Figure 3, the sample size for Dnmt1 is smaller than the other indicators; therefore, supplementing the sample count is recommended.

      Thanks for your reminding. We have corrected it in the revision.

      (4) Regarding Figure 4G, a few apoptotic cells were observed in the AAV NC group. It is advised that this figure be reviewed for accuracy.

      Thanks for your suggestion. In Figure 5D, the AAV-NC group is the case of needle-injected with AAV. Therefore, it is normal for apoptotic cells to appear in the cartilage layer.

      (5) The authors concluded that TNXB plays a role in apoptosis and AKT signaling. Providing expression data for Caspase9 would be valuable to strengthen this assertion, as PI3K/AKT signaling directly influences its activation during apoptosis.

      Thank you for your comments. We have examined the expression of Cleaved-Caspase9 protein, and found that knockdown of TNXB resulted in upregulation of Cleaved-Caspase9 protein expression, which was reversed by addition of SC79. This result has added in revised Figure 6 and manuscript. Please see line 414.

      (6) Quantitative analysis of the differences between the two groups in Supplemental Figures is necessary.

      Thank you for your suggestion. We have added the quantitative analysis of the differences between the two groups in Supplemental Figures.

      (7) With three major isoforms (homologs) of AKT in mammals-AKT1, 2, and 3 - why did the authors specifically focus on AKT1?

      Thank you for your comments. Based on the results of the KEGG enrichment analysis of differential methylated genes, we investigated the role of PI3K/AKT pathway in apoptosis of HA chondrocytes. AKT is universally acknowledged as a core factor in the PI3K/AKT pathway that plays critical roles in various cellular activities such as cell proliferation, cell differentiation, cell apoptosis, metabolism and so on 13,14, More notably, several studies demonstrated that in AKT family, Akt1 primarily was involved in regulation of chondrocyte survival and proteoglycan synthesis 15. Therefore, we detected phosphorylation of AKT1 in HA cartilages and TNXB-KD chondrocytes, and found that TNXB regulation chondrocytes ECM and apoptosis by AKT1. Reference:

      (1) Cooke, E.J., Zhou, J.Y., Wyseure, T., Joshi, S., Bhat, V., Durden, D.L., Mosnier, L.O., and von Drygalski, A. (2018). Vascular Permeability and Remodelling Coincide with Inflammatory and Reparative Processes after Joint Bleeding in Factor VIII-Deficient Mice. Thromb Haemost 118, 1036-1047. 10.1055/s-0038-1641755.

      (2) Kleiboer, B., Layer, M.A., Cafuir, L.A., Cuker, A., Escobar, M., Eyster, M.E., Kraut, E., Leavitt, A.D., Lentz, S.R., Quon, D., et al. (2022). Postoperative bleeding complications in patients with hemophilia undergoing major orthopedic surgery: A prospective multicenter observational study. J Thromb Haemost 20, 857-865. 10.1111/jth.15654.

      (3) Weiland, T., Weiller, M., Kunstle, G., and Wendel, A. (2009). Sensitization by 5-azacytidine toward death receptor-induced hepatic apoptosis. J Pharmacol Exp Ther 328, 107-115. 10.1124/jpet.108.143560.

      (4) Anaparti, V., Agarwal, P., Smolik, I., Mookherjee, N., and El-Gabalawy, H. (2020). Whole Blood Targeted Bisulfite Sequencing and Differential Methylation in the C6ORF10 Gene of Patients with Rheumatoid Arthritis. J Rheumatol 47, 1614-1623. 10.3899/jrheum.190376.

      (5) Porter, L.F., Saptarshi, N., Fang, Y., Rathi, S., den Hollander, A.I., de Jong, E.K., Clark, S.J., Bishop, P.N., Olsen, T.W., Liloglou, T., et al. (2019). Whole-genome methylation profiling of the retinal pigment epithelium of individuals with age-related macular degeneration reveals differential methylation of the SKI, GTF2H4, and TNXB genes. Clin Epigenetics 11, 6. 10.1186/s13148-019-0608-2.

      (6) Mao, J.R., Taylor, G., Dean, W.B., Wagner, D.R., Afzal, V., Lotz, J.C., Rubin, E.M., and Bristow, J. (2002). Tenascin-X deficiency mimics Ehlers-Danlos syndrome in mice through alteration of collagen deposition. Nat Genet 30, 421-425. 10.1038/ng850.

      (7) Zhang, K., Wang, X., Zeng, L.T., Yang, X., Cheng, X.F., Tian, H.J., Chen, C., Sun, X.J., Zhao, C.Q., Ma, H., and Zhao, J. (2023). Circular RNA PDK1 targets miR-4731-5p to enhance TNXB expression in ligamentum flavum hypertrophy. FASEB J 37, e22877. 10.1096/fj.202200022RR.

      (8) Guo, H., Yin, W., Zou, Z., Zhang, C., Sun, M., Min, L., Yang, L., and Kong, L. (2021). Quercitrin alleviates cartilage extracellular matrix degradation and delays ACLT rat osteoarthritis development: An in vivo and in vitro study. J Adv Res 28, 255-267. 10.1016/j.jare.2020.06.020.

      (9) Weitzmann, M.N., Roser-Page, S., Vikulina, T., Weiss, D., Hao, L., Baldwin, W.H., Yu, K., Del Mazo Arbona, N., McGee-Lawrence, M.E., Meeks, S.L., and Kempton, C.L. (2019). Reduced bone formation in males and increased bone resorption in females drive bone loss in hemophilia A mice. Blood Adv 3, 288-300. 10.1182/bloodadvances.2018027557.

      (10) Haxaire, C., Hakobyan, N., Pannellini, T., Carballo, C., McIlwain, D., Mak, T.W., Rodeo, S., Acharya, S., Li, D., Szymonifka, J., et al. (2018). Blood-induced bone loss in murine hemophilic arthropathy is prevented by blocking the iRhom2/ADAM17/TNF-alpha pathway. Blood 132, 1064-1074. 10.1182/blood-2017-12-820571.

      (11) Vols, K.K., Kjelgaard-Hansen, M., Ley, C.D., Hansen, A.K., and Petersen, M. (2019). Bleed volume of experimental knee haemarthrosis correlates with the subsequent degree of haemophilic arthropathy. Haemophilia 25, 324-333. 10.1111/hae.13672.

      (12) Lobet, S., Peerlinck, K., Hermans, C., Van Damme, A., Staes, F., and Deschamps, K. (2020). Acquired multi-segment foot kinematics in haemophilic children, adolescents and young adults with or without haemophilic ankle arthropathy. Haemophilia 26, 701-710. 10.1111/hae.14076.

      (13) Garcia, D., and Shaw, R.J. (2017). AMPK: Mechanisms of Cellular Energy Sensing and Restoration of Metabolic Balance. Mol Cell 66, 789-800. 10.1016/j.molcel.2017.05.032.

      (14) Johnson, J., Chow, Z., Lee, E., Weiss, H.L., Evers, B.M., and Rychahou, P. (2021). Role of AMPK and Akt in triple negative breast cancer lung colonization. Neoplasia 23, 429-438. 10.1016/j.neo.2021.03.005.

      (15) Rao, Z., Wang, S., and Wang, J. (2017). Peroxiredoxin 4 inhibits IL-1beta-induced chondrocyte apoptosis via PI3K/AKT signaling. Biomed Pharmacother 90, 414-420. 10.1016/j.biopha.2017.03.075.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Rühling et al analyzes the mode of entry of S. aureus into mammalian cells in culture. The authors propose a novel mechanism of rapid entry that involves the release of calcium from lysosomes via NAADP-stimulated activation of TPC1, which in turn causes lysosomal exocytosis; exocytic release of lysosomal acid sphingomyelinase (ASM) is then envisaged to convert exofacial sphingomyelin to ceramide. These events not only induce the rapid entry of the bacteria into the host cells but are also described to alter the fate of the intracellular S. aureus, facilitating escape from the endocytic vacuole to the cytosol.

      Strengths:

      The proposed mechanism is novel and could have important biological consequences.

      Weaknesses:

      Unfortunately, the evidence provided is unconvincing and insufficient to document the multiple, complex steps suggested. In fact, there appear to be numerous internal inconsistencies that detract from the validity of the conclusions, which were reached mostly based on the use of pharmacological agents of imperfect specificity.

      We thank the reviewer for the detailed evaluation of our manuscript. We will address the criticism below.

      We agree with the reviewer that many of the experiments presented in our study rely on the usage of inhibitors. However, we want to emphasize that the main conclusion (invasion pathway affects the intracellular fate/phagosomal escape) was demonstrated without the use of inhibitors or genetic ablation in two key experiments (Figure5 D/E). These experiments were in line with the results we obtained with inhibitors (amitriptyline [Figure 4D], ARC39, PCK310, [Figure 4C] and Vacuolin-1 [Figure4E]). Importantly, the hypothesis was also supported by another key experiment, in which we showed the intracellular fate of bacteria is affected by removal of SM from the plasma membrane before invasion, but not by removal of SM from phagosomal membranes after bacteria internalization (Figure5A-C). Taken together, we thus believe that the main hypothesis is strongly supported by our data.

      Moreover, we either used different inhibitors for the same molecule (ASM was inhibited by ARC39, amitriptyline and PCK310 with similar outcome) or supported our hypothesis with gene-ablated cell pools (TPC1, Syt7, SARM1), as we will point out in more detail below.

      Firstly, the release of calcium from lysosomes is not demonstrated. Localized changes in the immediate vicinity of lysosomes need to be measured to ascertain that these organelles are the source of cytosolic calcium changes. In fact, 9-phenantrol, which the authors find to be the most potent inhibitor of invasion and hence of the putative calcium changes, is not a blocker of lysosomal calcium release but instead blocks plasmalemmal TRPM4 channels. On the other hand, invasion is seemingly independent of external calcium. These findings are inconsistent with each other and point to non-specific effects of 9-phenantrol. The fact that ionomycin decreases invasion efficiency is taken as additional evidence of the importance of lysosomal calcium release. It is not clear how these observations support involvement of lysosomal calcium release and exocytosis; in fact treatment with the ionophore should itself have induced lysosomal exocytosis and stimulated, rather than inhibited invasion. Yet, manipulations that increase and others that decrease cytosolic calcium both inhibited invasion.

      With respect to lysosomal Ca<sup>2<sup>+</sup></sup> release, we agree with the reviewer that direct visual demonstration of lysosomal Ca<sup>2<sup>+</sup></sup> release upon infection will improve the manuscript. We therefore performed live cell imaging to visualize lysosomal Ca<sup>2<sup>+</sup></sup> release by a previously published method.1 The approach is based on two dextran-coupled fluorophores that were incubated with host cells. The dyes are endocytosed and eventually stain the lysosomes. One of the dyes, Rhod-2, is Ca<sup>2<sup>+</sup></sup>-sensitive and can be used to estimate the lysosomal Ca<sup>2<sup>+</sup></sup> content. The second dye, AF647, is Ca<sup>2<sup>+</sup></sup>-insensitive and is used to visualize the lysosomes. If the ratio Rhod-2/AF647 within the lysosomes is decreasing, lysosomal Ca<sup>2<sup>+</sup></sup> release is indicated. We monitored lysosomal Ca<sup>2<sup>+</sup></sup> content during S. aureus infection with this method (Author response image 1 and Author response video 1). However, the lysosomes are very dynamic, and it is challenging to monitor the fluorescence intensities over time. Thus, quantitative measurements are not possible with our methodology, and we decided to not include these data in the main manuscript. However, one could speculate that lysosomal Ca<sup>2<sup>+</sup></sup> content in the selected ROI (Author response image 1 and Author response video 1) is decreased upon attachment of S. aureus to the host cells as indicated by a decrease in Rhod-2/AF647 ratio.

      Author response image 1.

      Lysosomal Ca<sup>2<sup>+</sup></sup> imaging during S. aureus infection. The lysosomes of HuLEC were stained with two dextran-coupled fluorescent dyes. A Ca<sup>2<sup>+</sup></sup>-sensitive dye Rhod-2 as well as Ca<sup>2<sup>+</sup></sup>insensitive AF647. Cells were infected with fluorescent S. aureus JE2 and monitored by live cell imaging (see Author response video 1). The intensity of Rhod-2/AF647 was measured close to a S. aureus-host contact site. Ratio of Rhod-2 vs. AF647 fluorescence intensity was calculated

      As to the TRPM4 involvement in S. aureus host cell internalization, it has been reported that TRPM4 is activated by cytosolic Ca<sup>2<sup>+</sup></sup>. However, the channel conducts monovalent cations such as K<sup>+</sup> or Na<sup>+</sup> but is impermeable for Ca<sup>2<sup>+</sup></sup> [2, 3]. The following of our observations are supporting this:

      i) S. aureus invasion is dependent on intracellular Ca<sup>2<sup>+</sup></sup>, but is independent from extracellular Ca<sup>2<sup>+</sup></sup>  (Figure 1A).

      ii) 9-phenantrol treatment reduces S. aureus internalization by host cells, illustrating the dependence of this process on TRPM4 (data removed from the manuscript) . We therefore hypothesize that TRPM4 is activated by Ca<sup>2<sup>+</sup></sup> released from lysosomes (see above).

      TRPM4 is localized to focal adhesions and is connected to actin cytoskeleton[4, 5] – a requisite of host cell entry of S. aureus.[6, 7] This speaks for an important function of TRPM4 in uptake of S. aureus in general, but does not necessarily have to be involved exclusively in the rapid uptake pathway.

      TRPM4 itself is not permeable for Ca<sup>2<sup>+</sup></sup> but is activated by the cation.  Thus, it is unlikely to cause lysosomal exocytosis. The stronger bacterial uptake reduction by treatment with 9-phenantrol when compared to Ned19 thus may be caused by the involvement of TRPM4 in additional pathways of S. aureus host cell entry involving that association of TRPM4 with focal adhesions or as pointed out by the reviewer, unspecific side effects of 9-phenantrol that we currently cannot exclude.  However, we think that experiments with 9-phenantrol distract from the main story (lysosomal Ca<sup>2<sup>+</sup></sup> and exocytosis) and might be confusing for the reader. We thus removed all data and discussion concerning 9phenantrol in the revised manuscript.

      Regarding the reduced S. aureus invasion after ionomycin treatment, we agree with the reviewer that ionomycin is known to lead to lysosomal exocytosis as was previously shown by others8 as well as our laboratory[9}. 

      We hypothesized that pretreatment with ionomycin would trigger lysosomal exocytosis and thus would reduce the pool of lysosomes that can undergo exocytosis before host cells are contacted by S. aureus. As a result, we should observe a marked reduction of S. aureus internalization in such “lysosome-depleted cells”, if the lysosomal exocytosis is coupled to bacterial uptake. Our observation of reduced bacterial internalization after ionomycin treatment supports this hypothesis.

      However, ionomycin treatment and S. aureus infection of host cells are distinct processes.  

      While ionomycin results in strong global and non-directional lysosomal exocytosis of all “releasable” lysosomes (~5-10 % of all lysosomes according to previous observations)8, we hypothesize that lysosomal exocytosis upon contact with S. aureus only involves a small proportion of lysosomes at host-bacteria contact sites. This is supported by experiments that demonstrate that ~30% of the lysosomes that are released by ionomycin treatment are exocytosed during S. aureus infection (see below and Figure 2, A-C). We added this new data as well as an according section to the discussion  (line 563 ff). Moreover, we moved the data obtained with ionomycin to Figure 2E and described our idea behind this experiment more precisely (line 166 ff).

      The proposed role of NAADP is based on the effects of "knocking out" TPC1 and on the pharmacological effects of Ned-19. It is noteworthy that TPC2, rather than TPC1, is generally believed to be the primary TPC isoform of lysosomes. Moreover, the gene ablation accomplished in the TPC1 "knockouts" is only partial and rather unsatisfactory. Definitive conclusions about the role of TPC1 can only be reached with proper, full knockouts. Even the pharmacological approach is unconvincing because the high doses of Ned-19 used should have blocked both TPC isoforms and presumably precluded invasion. Instead, invasion is reduced by only ≈50%. A much greater inhibition was reported using 9-phenantrol, the blocker of plasmalemmal calcium channels. How is the selective involvement of lysosomal TPC1 channels justified?

      As to partial gene ablation of TPC1: To avoid clonal variances, we usually perform pool sorting to obtain a cell population that predominantly contains cells -here- deficient in TPC1, but also a small proportion of wildtype cells as seen by the residual TPC1 protein on the Western blot. We observe a significant reduction in bacterial uptake in this cell pool suggesting that the uptake reduction in a pure K.O. population may be even more pronounced. 

      As to the inhibition by Ned19: 

      The scale of invasion reduction upon Ned19 treatment (50%, Figure 1B) is comparable with the reduction caused by other compounds that influence the ASM-dependent pathway (such as amitriptyline, ARC39 [Figure 2G], BAPTA-AM [Figure 1A], Vacuolin-1 [Figure 2D], β-toxin [Figure 2L] and ionomycin [Figure 2E]). Further, the partial reduction of invasion is most likely due to the concurrent activity of multiple internalization pathways which are not all targeted by the used compounds and which we briefly discuss in the manuscript.

      We agree with the reviewer that Ned19 inhibits TPC1 and TPC2. Since ablation of TPC1 reduced invasion of S. aureus, we concluded that TPC1 is important for S. aureus host cell invasion. We thus agree with the reviewer that a role for TPC2 cannot be excluded. We clarified this in the revised manuscript (Lines 552). It needs to be noted, however, that deficiency in either TPC1 or TPC2 alone was sufficient to prevent Ebola virus infection10, which is in line with our observations.

      In order to address the role of TPC2 for this review process, we kindly were gifted TPCN1/TPCN2 double knock-out HeLa cells by Norbert Klugbauer (Freiburg, Germany), which we tested for S. aureus internalization. We found that invasion was reduced in these cell lines supporting a role of lysosomal Ca<sup>2<sup>+</sup></sup> release in S. aureus host cell entry and a role for both TPC channels (Author response image 2, see end of the document). Since we did not have a single TPCN2 knock-out available we decided to exclude these data from the main manuscript.

      Author response image 2.

      Invasion efficiency is reduced in TPC1/TPC2 double K.O. HeLa cells. Invasion efficiency of S. aureus JE2 was determined in TPC1/TPC2 double K.O. cells after 10 and 30 min. Results were normalized to the parental HeLa WT cell line (set to 100 %).  

      Invoking an elevation of NAADP as the mediator of calcium release requires measurements of the changes in NAADP concentration in response to the bacteria. This was not performed. Instead, the authors analyzed the possible contribution of putative NAADP-generating systems and reported that the most active of these, CD38, was without effect, while the elimination of SARM1, another potential source of NAADP, had a very modest (≈20%) inhibitory effect that may have been due to clonal variation, which was not ruled out. In view of these data, the conclusion that NAADP is involved in the invasion process seems unwarranted.

      Our results from two independent experimental set-ups (Ned19 [Figure 1B] and TPC1 K.O. [Figure 1C & Figure 2N]) indicate the involvement of NAADP in the process. Together with the metabolomics unit at the Biocenter Würzburg, we attempted to measure cellular NAADP levels, however, this proved to be non-trivial and requires further optimization. However, we can rule out clonal variation in the SARM1 mutant since experiments were conducted with a cell pool as described above in order to avoid clonal variation of single clones.

      The mechanism behind biosynthesis of NAADP is still debated. CD38 was the first enzyme discovered to possess the ability of producing NAADP. However, it requires acidic pH to produce NAADP[11] -which does not match the characteristics of a cytosolic NAADP producer. HeLa cells do not express CD38 and hence, it is not surprising that inhibition of CD38 had no effect on S. aureus invasion in HeLa cells. However, NAADP production by HeLa cells was observed in absence of CD38[12]. Thus CD38independent NAADP generation is likely. SARM1 can produce NAADP at neutral pH[13] and is expressed in HeLa, thus providing a more promising candidate.  

      We agree with the reviewer that the reduction of S. aureus internalization after ablation of SARM1 is less pronounced than in other experiments of ours. This may be explained by NAADP originating from other enzymes, such as the recently discovered DUOX1, DUOX2, NOX1 and NOX2[14], which – with exception of DUOX2- possess a low expression even in HeLa cells. We add this to the discussion in the revised manuscript (line 579).

      We can, however, rule out clonal variation for the inhibitory effect. As stated above we generated K.O. cell pools specifically to avoid inherent problems of clonality. Thus, we also detect some residual wildtype cells within our cell pools.  

      The involvement of lysosomal secretion is, again, predicated largely on the basis of pharmacological evidence. No direct evidence is provided for the insertion of lysosomal components into the plasma membrane, or for the release of lysosomal contents to the medium. Instead, inhibition of lysosomal exocytosis by vacuolin-1 is the sole source of evidence. However, vacuolin-1 is by no means a specific inhibitor of lysosomal secretion: it is now known to act primarily as a PIKfyve inhibitor and to cause massive distortion of the endocytic compartment, including gross swelling of endolysosomes. The modest (20-25%) inhibition observed when using synaptotagmin 7 knockout cells is similarly not convincing proof of the requirement for lysosomal secretion.

      We agree with the reviewer that the manuscript will benefit from a functional analysis of lysosomal exocytosis and therefore conducted assays to investigate exocytosis in the revised manuscript. We previously showed i) by addition of specific antisera that LAMP1 transiently is exposed on the plasma membrane during ionomycin and pore-forming toxin challenge and ii) demonstrated the release of ASM activity into the culture medium under these conditions.[9] However, both measurements are not compatible with S. aureus infection, since LAMP1 antibodies also are non-specifically bound by protein A and another IgG-binding proteins on the S. aureus surface, which would bias the results. Since protein A also may serve as an adhesin in the investigated pathway, we cannot simply delete the ORF without changing other aspects of staphylococcal virulence. Further, FBS contains a ASM background activity that impedes activity measurements of cell culture medium. We previously removed this background activity by a specific heat-inactivation protocol.[9] However, S. aureus invasion is strongly reduced in culture medium containing this heat-inactivated FBS.

      We therefore developed a luminescence assay based on split NanoLuc luciferase that enables detection of LAMP1 exposed on the plasma membrane without usage of antibodies (Figure 2, A-C). We added a section on the assay in the revised manuscript. Briefly, we generated reporter cells by fusing a short peptide fragment of NanoLuc called HiBiT between the signal peptide and the mature luminal domain of LAMP1 and stably expressed the resulting protein in HeLa cells by lentiviral transduction. The LgBiT protein domain of NanoLuc luciferase (Promega) as well as the substrate Furimazine are added to the culture medium. HiBiT can reconstitute a functional NanoLuc with LgBiT and process Furimazine when lysosomes are exocytosed thereby generating luminescence measurable in a suitable plate reader. 

      With this assay we detected that  about 30% of lysosomes that were “releasable” by treatment with ionomycin are exocytosed during S. aureus infection. Lysosomal exocytosis was strongly reduced (even below the levels of untreated controls), if we treated cells with Vacuolin-1 or Ned19.  

      We agree with the reviewer that Vacuolin-1 to some extent has unspecific side effects as has been shown by others and which we addressed in the revised version of the manuscript (line 541 ff). However, our new results with the HiBiT reporter cell line clearly demonstrate a reduction of lysosomal exocytosis after Vacuolin-1 treatment. Supported by this and our other results we hypothesize that Vacuolin-1 decreases S. aureus internalization due to the inhibition of lysosomal exocytosis.

      As to the involvement of synaptotagmin 7: The effect of Syt7 K.O. on invasion was moderate in initial experiments, likely due to a high culture passage and presumably overgrowth of WT cells. However, reduction of invasion in Syt7 K.O.s was more pronounced in experiments with β-toxin complementation (Figure 2, N) and hence, we combined the two data sets (Figure 2, F). This demonstrates the reduction of bacterial invasion by ~40% in Syt7 K.O. cell pools. Moreover, Syt7 is not the only protein possibly involved in Ca<sup>2<sup>+</sup></sup>-dependent exocytosis. For instance, Syt1 has been shown to possess an overlapping function.[15] This may explain the differences between our Vacuolin-1 and Syt7 ablation experiments. We added this information to the discussion. 

      ASM is proposed to play a central role in the rapid invasion process. As above, most of the evidence offered in this regard is pharmacological and often inconsistent between inhibitors or among cell types. Some drugs affect some of the cells, but not others. It is difficult to reach general conclusions regarding the role of ASM. The argument is made even more complex by the authors' use of exogenous sphingomyelinase (beta-toxin). Pretreatment with the toxin decreased invasion efficiency, a seemingly paradoxical result. Incidentally, the effectiveness of the added toxin is never quantified/validated by directly measuring the generation of ceramide or the disappearance of SM.

      Although pharmacological inhibitors can have unspecific side effects, we want to emphasize that the inhibitors used in our study act on the enzyme ASM by completely different mechanisms. Amitriptyline is a so called functional inhibitor of ASM (FIASMA) which induces the detachment of ASM from lysosomal membranes resulting in degradation of the enzyme.[16] By contrast, ARC39 is a competitive inhibitor.[17, 18] 

      There are no inconsistencies in our data obtained with ASM inhibitors. Amitriptyline and ARC39 both reduce the invasion of S. aureus in HuLEC, HuVEC and HeLa cells (Figure 2G). ARC39 needs a longer pre-incubation, since its uptake by host cells is slower (to be published elsewhere). We observe a different outcome in 16HBE14o- and Ea.Hy 926 cells, with 16HBE14o- even demonstrating a slightly increased invasion of S. aureus upon ARC39 treatment. Amitriptyline had no effect (Figure 2G). 

      Thus, the ASM-dependent S. aureus internalization is cell type/line specific, which we state in the manuscript. The molecular origin of these differences is unclear and will require further investigation, e.g. in testing cell lines for potential differences in surface receptors. In a separate study we have already developed a biotinylation-based approach to identify potential novel host cell surface interaction partners during S. aureus infection.[19]

      Moreover, both inhibitors affected the invasion dynamics (Figure 3D), phagosomal escape (Figure 4C and Figure 4D) and Rab7 recruitment (Figure 4A and Supp. Figure 4A-C) in a similar fashion. Proper inhibition of ASM by both compounds in all cell lines used was validated by enzyme assays (Supp. Figure 2H), which again suggests that the ASM-dependent pathway does only exist in specific cell lines and also supports  that we do not observe unspecific side effects of the compounds. We clarified this in the revised manuscript.

      ASM is a key player for SM degradation and recycling. In clinical context, deficiency in ASM results in the so-called Niemann Pick disease type A/B. The lipid profile of ASM-deficient cells is massively altered[20], which will result in severe side effects. Short-term inhibition by small molecules therefore poses a clear benefit when compared to the usage of ASM K.O. cells. In order to satisfy the query of the reviewer, we generated two ASM K.O. cell pools (generated with two different sgRNAs) and tested these for S. aureus invasion efficiency (Figure 2, I). We did not observe bacterial invasion differences between WT and K.O. cells. However, when we treated the cells additionally with ASM inhibitor, we observed a strongly reduced invasion in WT cells, while invasion efficiency in ASM K.O. was only slightly affected (Figure 2, J). We concluded that the reduced invasion observed in inhibitor-treated WT cells  predominantly is due to absence of ASM, while the small reduction observed in ARC39treated ASM K.O.s is likely due to unspecific side effects.  

      We performed lipidomics on these cells and demonstrated a strongly altered sphingolipid profile in ASM K.O. cells compared to untreated and inhibitor-treated WT cells (Figure 2, K). We speculate that other ASM-independent bacterial invasion pathways are upregulated in ASM K.O.s., thereby obscuring the effect contributed by absence of ASM. We discussed this in the revised manuscript (line 518 ff).

      Moreover, we introduced the RFP-CWT escape marker into the ASM K.O. cells and measured phagosomal escape of S. aureus JE2 and Cowan I.  The latter strain is non-cytotoxic and serves as negative control, since it is known to possess a very low escape rate, due to its inability to produce toxin. Again, we compared early invaders (infection for 10 min) with early<sup>+</sup>late invaders (infection for 30 min). As observed  for JE2, “early invaders” possess lower escape rates than “early<sup>+</sup>late invaders”.

      We did not observe differences between WT and ASM K.O. cells, if we infected for only 10 min. By contrast, we observed a lower escape rate in ASM K.O (Author response image 3, see end of the document). compared to WT cells, when we infected for 30 min.  

      However, we usually observe an increased phagosomal escape, when we treated host cells with ASM inhibitors (Figure 4C and D). Reduced phagosomal escape of intracellular S. aureus in ASM K.O. cells may be caused by the altered sphingolipid profile(e.g., by interference with binding of bacterial toxins to phagosomal membranes or altered vesicular acidification). We hence think that these data are difficult to interpret, and clarification would require intense additional experimentation. Thus, we did not include this data in the manuscript. 

      Author response image 3.

      Phagosomal escape rates were established in either HeLa wild-type or ASM K.O. cells expressing the phagosomal escape reporter RFP-CWT. Host cells that were infected with the cytotoxic S. aureus strain JE2 or the non-cytotoxic strain Cowan I for 10 or 30 minutes and escape rates were determined by microscopy 3h p.i.

      As to the treatment with a bacterial sphingomyelinase:

      Treatment with the bacterial SMase (bSMase, here: β-toxin) was performed in two different ways:

      i) Pretreatment of host cells with β-toxin to remove SM from the host cell surface before infection. This removes the substrate of ASM from the cell surface prior to addition of the bacteria (Figure 2L, Figure 4A-C). Since SM is not present on the extracellular plasma membrane leaflet after treatment, a release of ASM cannot cause localized ceramide formation at the sites of lysosomal exocytosis. Similar observations were made by others.[21] 

      ii) Addition of bSMase to host cells together with the bacteria to complement for the absence of ASM (Figure 2N).  

      Removal of the ASM substrate before infection (i) prevents localized ASM-mediated conversion of SM to Cer during infection and resulted in a decreased invasion, while addition of the SMase during infection resulted in an increased invasion in TPC1 and Syt7 ablated cells. Thus, both experiments are consistent with each other and in line with our other observations. 

      Removal of SM from the plasma membrane by β-toxin was indirectly demonstrated by the absence of Lysenin recruitment to phagosomes/escaped bacteria when host cells were pretreatment with the toxin before infection (Figure5C). We also added another data set that demonstrates degradation of a fluorescence SM derivative upon β-toxin treatment of host cells (Supp Figure 2, M). In another publication, we recently quantified the effectiveness of β-toxin treatment, even though with slightly longer treatment times (75 min vs. 3h).[22]

      To clarify our experimental approaches to the readership we added an explanatory section to the revised manuscript (line 287 ff) and we also added a scheme to in Figure 2M describing the experimental settings.

      As to the general conclusions regarding the role of ASM: ASM and lysosomal exocytosis has been shown to be involved in uptake of a variety of pathogens[21, 23-27] supporting its role in the process.

      The use of fluorescent analogs of sphingomyelin and ceramide is not well justified and it is unclear what conclusions can be derived from these observations. Despite the low resolution of the images provided, it appears as if the labeled lipids are largely in endomembrane compartments, where they would presumably be inaccessible to the secreted ASM. Moreover, considering the location of the BODIPY probe, the authors would be unable to distinguish intact sphingomyelin from its breakdown product, ceramide. What can be concluded from these experiments? Incidentally, the authors report only 10% of BODIPY-positive events after 10 min. What are the implications of this finding? That 90% of the invasion events are unrelated to sphingomyelin, ASM, and ceramide?

      During the experiments with fluorescent SM analogues (Figure 3a,b), S. aureus was added to the samples immediately before the start of video recording. Hence, bacteria are slowly trickling onto the host cells, and we thus can image the initial contact between them and the bacteria, for instance, the bacteria depicted in Figure 3A contact the host cell about 9 min before becoming BODIPY-FL-positive (see Supp. Video 1, 55 min). Hence, in these cases we see the formation of phagosomes around bacteria rather than bacteria in endomembrane compartments. Since generation of phagosomes happens at the plasma membrane, SM is accessible to secreted ASM.  

      The “trickling” approach for infection is an experimental difference to our invasion measurements, in which we synchronized the infection by  centrifugation. This ensures that all bacteria have contact to host cells and are not just floating in the culture medium. However, live cell imaging of initial bacterialhost contact and synchronization of infection is hard to combine technically.

      In our invasion measurements -with synchronization-, we typically see internalization of ~20% of all added bacteria after 30 min. Hence, most bacteria that are visible in our videos likely are still extracellular and only a small proportion was internalized. This explains why only 10% of total bacteria are positive for BODIPY-FL-SM after 10 min. The proportion of internalized bacteria that are positive for BODIPY-FL-SM should be way higher but cannot be determined with this method.

      We agree with the reviewer that we cannot observe conversion of BODIPY-FL-SM by ASM. In order to do that, we attempted to visualize the conversion of a visible-range SM FRET probe (Supp. Figure 3), but the structure of the probe is not compatible with measurement of conversion on the plasma membrane, since the FITC fluorophore released into the culture medium by the ASM activity thereby gets lost for imaging. In general, the visualization of SM conversion with subcellular resolution is challenging and even with novel tools developed in our lab[28] visualization of SM on the plasma membrane is difficult. 

      The conclusions we draw from these experiments are that i.) S. aureus invasion is associated with SM and ii.) SM-associated invasion can be very fast, since bacteria are rapidly engulfed by BODIPY-FL-SM containing membranes.

      It is also unclear how the authors can distinguish lysenin entry into ruptured vacuoles from the entry of RFP-CWT, used as a criterion of bacterial escape. Surely the molecular weights of the probes are not sufficiently different to prevent the latter one from traversing the permeabilized membrane until such time that the bacteria escape from the vacuole.

      We here want to clarify that both Lysenin as well as the CWT reporter have access to ruptured vacuoles (Figure 4B). We used the Lysenin reporter in these experiments for estimation of SM content of phagosomal membranes. If a vacuole is ruptured, both the bacteria and the luminal leaflet of the phagosomal membrane remnants get in contact with the cytosol and hence with the cytosolically expressed reporters YFP-Lysenin as well as RFP-CWT resulting in “Lysenin-positive escape” when phagosomes contained SM (see Figure 5C). By contrast, either β-toxin expression by S. aureus or pretreatment with the bSMase resulted in absence of Lysenin recruitment suggesting that the phagosomal SM levels were decreased/undetectable (Figure 5C, Supp Figure 6F, G, I, J).

      Although this approach does not enable a quantitative measurement of phagosomal SM, this method is sufficient to show that β-toxin expression and pretreatment result in markedly decreased phagosomal SM levels in the host cells.

      The approach we used here to analyze “Lysenin-positive escape” can clearly be distinguished from Lysenin-based methods that were used by others.29 There Lysenin was used to show trans-bilayer movement of SM before rupture of bacteria-containing phagosomes.

      To clarify the function of Lysenin in our approach we added  additional figures (Figure 4F, Supp. Figure 5) and a movie (Supp. Video 4) to the revised manuscript.

      Both SMase inhibitors (Figure 4C) and SMase pretreatment increased bacterial escape from the vacuole. The former should prevent SM hydrolysis and formation of ceramide, while the latter treatment should have the exact opposite effects, yet the end result is the same. What can one conclude regarding the need and role of the SMase products in the escape process?

      As pointed out above, pretreatment of host cells with SMase removes SM from the plasma membrane and hence, ASM does not have access to its substrate. Hence, both treatment with either ASM inhibitors or pretreatment with bacterial SMase prevent ASM from being active on the plasma membrane and hence block the ASM-dependent uptake (Figure 2 G, L). Although overall less bacteria were internalized by host cells under these conditions, the bacteria that invaded host cells did so in an ASM-independent manner. 

      Since blockage of the ASM-dependent internalization pathway (with ASM inhibitor [Figure 4C, D], SMase pretreatment [Figure 5B] and Vacuolin-1[Figure.4E]) always resulted in enhanced phagosomal escape, we conclude that bacteria that were internalized in an ASM-independent fashion cause enhanced escape. Vice versa, bacteria that enter host cells in an ASM-dependent manner demonstrate lower escape rates. 

      This is supported by comparing the escape rates of “early” and “late” invaders [Figure 5D, E], which in our opinion is a key experiment that supports this hypothesis. The “early” invaders are predominantly ASM-dependent (see e.g. Figure 3E) and thus, bacteria that entered host cell in the first 10 min of infection should have been internalized predominantly in an ASM-dependent fashion, while slower entry pathways are active later during infection. The early ASM dependent invaders possessed lower escape rates, which is in line with the data obtained with inhibitors (e.g. Figure 4C, D).

      We hypothesize that the activity of ASM on the plasma membrane during invasion mediates the recruitment of a specific subset of receptors, which then influences downstream phagosomal maturation and escape. This hypothesis is supported by the fact that the subset of receptors interacting with S. aureus is altered upon inhibition of the ASM-dependent uptake pathway. We describe this in another study that is currently under evaluation elsewhere.  

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca<sup>2<sup>+</sup></sup> and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry.

      The evidence provided is solid, methods used are appropriate and results largely support their conclusions, but can be substantiated further as detailed below. The weakness is a reliance on chemical inhibitors that can be non-specific to delineate critical steps.

      Specific comments:

      A large number of experiments rely on treatment with chemical inhibitors. While this approach is reasonable, many of the inhibitors employed such as amitriptyline and vacuolin1 have other or nondefined cellular targets and pleiotropic effects cannot be ruled out. Given the centrality of ASM for the manuscript, it will be important to replicate some key results with ASM KO cells.

      We thank the reviewer for the critical evaluation of our manuscript and plenty of constructive comments. 

      We agree with the reviewer, that ASM inhibitors such as functional inhibitors of ASM (FIASMA) like amitriptyline used in our study have unspecific side effects given their mode-of-action. FIASMAs induce the detachment of ASM from lysosomal membranes resulting in degradation of the enzyme.[16]  However, we want to emphasize that we also used the competitive inhibitor ARC39 in our study[17, 18] which acts on the enzyme by a completely different mechanism. All phenotypes (reduced invasion [Figure 2G], effect on invasion dynamics [Figure 3D], enhanced escape [Figure 4C, D] and differential recruitment of Rab7 [Supp. Figure 4A-C]) were observed with both inhibitors thereby supporting the role of ASM in the process.  

      We further agree that experiments with genetic evidence usually support and improve scientific findings. However, ASM is a cellular key player for SM degradation and recycling. In a clinical context, deficiency in ASM results in a so-called Niemann Pick disease type A/B. The lipid profile of ASMdeficient cells is massively altered[20], which in itself will result in severe side effects. Thus, the usage of inhibitors provides a clear benefit when compared to ASM K.O. cells, since ASM activity can be targeted in a short-term fashion thereby preventing larger alterations in cellular lipid composition.

      We nevertheless generated two ASM K.O. cell pools (generated with two different sgRNAs) and tested for invasion efficiency (Figure 2, I). Here, we did not observe differences between WT and mutants. However, if we treated the cells additionally with ASM inhibitor, we observed a strongly reduced invasion in WT cells, while invasion efficiency in ASM K.O. was only slightly affected (Figure 2, J). We concluded that the reduced invasion observed in WT cells upon inhibitor treatment predominantly is due to inhibition of ASM, whereas the small reduction observed in ARC39-treated ASM K.O.s is likely due to unspecific side effects. We also demonstrated a strongly altered sphingolipid profile in ASM K.O. cells when compared to untreated and inhibitor-treated WT cells (new Figure 2, K). We speculate that other ASM-independent invasion pathways are upregulated in ASM K.O.s., thereby making up for the absence of ASM. We discuss this in the revised manuscript (line 518 ff).

      We introduced the RFP-CWT escape marker into the ASM K.O. cells and measured phagosomal escape of S. aureus JE2 and Cowan I (Author response image 3). The latter serves as negative control, since it is known to possess a very low escape rate, due to its inability of toxin production. Again, we compared early invaders (infection for 10 min) with early<sup>+</sup>late invaders (infection for 30 min). As seen before for JE2, early invaders possess lower escape rates than early<sup>+</sup>late invaders. We did not observe differences between WT and K.O. cells, if we infected for 10 min. By contrast, we observed a lower escape rate in ASM K.O. compared to WT cells, when we infected for 30 min. However, we usually observe an increased phagosomal escape, when we treated host cells with ASM inhibitors (Figure 4C and D). We think that the reduced phagosomal escape in ASM K.O. is caused by the altered sphingolipid profile, which could have versatile effects (e.g., inference with binding of bacterial toxins to phagosomal membranes or changes in acidification). We hence think that these data are difficult to interpret, and clarification would require intense additional experimentation. Thus, we did not include this data in the manuscript. 

      Most experiments are done in HeLa cells. Given the pathway is projected as generic, it will be important to further characterize cell type specificity for the process. Some evidence for a similar mechanism in other cell types S. aureus infects, perhaps phagocytic cell type, might be good. 

      Whenever possible we performed the experiments not only in HeLa but also in HuLECs. For example, we refer to experiments concerning the role of Ca<sup>2<sup>+</sup></sup> (Figure 1A/Supp.Figure1A), lysosomal Ca<sup>2<sup>+</sup></sup>/Ned19 (Figure1B/Supp Figure 1C), lysosomal exocytosis/Vacuolin-1 (Figure 2D/Supp. Figure2D), ASM/ARC39 and amitriptyline (Figure 2G), surface SM/β-toxin (Figure 2L/Supp. Figure 2L), analysis of invasion dynamics (complete Figure 3) and measurement of cell death during infection (Figure 6C<sup>+</sup>E, Supp. Figure 8A<sup>+</sup>B).

      HuLECs, however, are not really genetically amenable and hence we were not able to generate gene deletions in these cells and upon introduction of the fluorescence escape reporter the cells are not readily growing. 

      As to ASM involvement in phagocytic cells: a role for ASM during the uptake of S. aureus by macrophages was previously reported by others.[25] However, in professional phagocytes S. aureus does not escape from the phagosome and replicates within the phagosome.[30]

      I'm a little confused about the role of ASM on the surface. Presumably, it converts SM to ceramide, as the final model suggests. Overexpression of b-toxin results in the near complete absence of SM on phagosomes (having representative images will help appreciate this), but why is phagosomal SM detected at high levels in untreated conditions? If bacteria are engulfed by SM-containing membrane compartments, what role does ASM play on the surface? If surface SM is necessary for phagosomal escape within the cell, do the authors imply that ASM is tuning the surface SM levels to a certain optimal range? Alternatively, can there be additional roles for ASM on the cell surface? Can surface SM levels be visualized (for example, in Figure 4 E, F)?

      We initially hypothesized that we would detect higher phagosomal SM levels upon inhibition of ASM, since our model suggests SM cleavage by ASM on the host cell surface during bacterial cell entry. However, we did not detect any changes in our experiments (Supp. Figure 4F). We currently favor the following explanation: SM is the most abundant sphingolipid in human cells.[31] If peripheral lysosomes are exocytosed and thereby release ASM, only a localized and relative small proportion of SM may get converted to Cer, which most likely is below our detection limit. In addition, the detection of cytosolically exposed phagosomal SM by YFP-Lysenin is not quantitative and provides a “Yes or No” measurement. Hence, we think that the rather limited SM to Cer conversion in combination with the high abundance of SM in cellular membranes does not visibly affect the recruitment of the Lysenin reporter. 

      In our experiments that employ BODIPY-FL-SM (Figure 3a<sup>+</sup>b), we cannot distinguish between native SM and downstream metabolites such as Cer. Hence, again we cannot make any assumptions on the extent to which SM is converted on the surface during bacterial internalization. Although our laboratory recently used trifunctional sphingolipid analogs to analyze the SM to Cer conversion[22], the visualization of this process on the plasma membrane is currently still challenging.

      Overall, we hypothesize that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms. Subsequently, a certain subset of receptors may be recruited to these platforms and influence the uptake process. These platforms are supposed to be very small, which also would explain that we did not detect changes in Lysenin recruitment.

      Related to that, why is ASM activity on the cell surface important? Its role in non-infectious or other contexts can be discussed.

      ASM release by lysosomal exocytosis is implied in plasma membrane repair upon injury. We added a short description of the role of extracellular ASM in the introduction (line 35).

      If SM removal is so crucial for uptake, can exocytosis of lysosomes alone provide sufficient ASM for SM removal? How much or to what extent is lysosomal exocytosis enhanced by initial signaling events? Do the authors envisage the early events in their model happening in localized confines of the PM, this can be discussed.

      Ionomycin treatment led to a release of ~10 % of all lysosomes and also increased extracellular ASM activity.[8, 9] In the revised manuscript, we developed an assay to determine lysosomal exocytosis during S. aureus infection (Figure 2, A-C). We detected lysosomal exocytosis of ~30% when compared to ionomycin treatment  during infection. Since this is only a fraction of the “releasable lysosomes”, we assume that the effects (lysosomal Ca<sup>2<sup>+</sup></sup> liberation, lysosomal exocytosis and ASM activity) are very localized and take place only at host-pathogen contact sites (see also above). We discuss this in the revised manuscript (line 563 ff). To our knowledge it is currently unclear to which extent the released ASM affects surface SM levels. We attempted to visualize the local ASM activity on the cell surface by using a visible range FRET probe (Supp. Fig. 3). Cleavage of the probe by ASM on the surface leads to release of FITC into the cell culture medium, which does not contribute a measurable signal at the surface. 

      How are inhibitor doses determined? How efficient is the removal of extracellular bacteria at 10 min? It will be good to substantiate the cfu experiments for infectivity with imaging-based methods. Are the roles of TPC1 and TPC2 redundant? If so, why does silencing TPC1 alone result in a decrease in infectivity? For these and other assays, it would be better to show raw values for infectivity. Please show alterations in lysosomal Ca<sup>2<sup>+</sup></sup> at the doses of inhibitors indicated. Is lysosomal Ca<sup>2<sup>+</sup></sup> released upon S. aureus binding to the cell surface? Will be good to directly visualize this.

      Concerning the inhibitor concentrations, we either used values established in published studies or recommendations of the suppliers (e.g. 2-APB, Ned19, Vacuolin-1). For ASM inhibitors, we determined proper inhibition of ASM by activity assays. Concentrations of ionomycin resulting in Ca<sup>2<sup>+</sup></sup> influx and lysosomal exocytosis was determined in earlier studies of our lab.[9, 32] 

      As to the removal of bacteria at 10 min p.i.: Lysostaphin is very efficient for removal of extracellular S. aureus and sterilizes the tissue culture supernatant. It significantly lyses bacteria within a few minutes, as determined by turbidity assays.[33]

      As to imaging-based infectivity assays: We performed imaging-based invasion assays to show reduced invasion efficiency with two ASM inhibitors in the revised manuscript with similar results as obtained by CFU counts (Supp. Figure 2, J).

      Regarding the roles of TPC1 and TPC2: from our data we cannot conclude whether the roles of TPC1 and TPC2 are redundant. One could speculate that since blockage of TPC1 alone is sufficient to reduce internalization of bacteria, that both channels may have distinct roles. On the other hand, there might be a Ca<sup>2<sup>+</sup></sup> threshold in order to initiate lysosomal exocytosis that can only be attained if TPC1 and TPC2 are activated in parallel. Thus, our observations are in line with another study that shows reduced Ebola virus infection in absence of either TPC1 or TPC2.[34] In order to address the role of TPC2 for this review process, we kindly were gifted TPCN1/TPCN2 double knock-out HeLa cells by Norbert Klugbauer (Freiburg, Germany), which we tested for S. aureus internalization. We found that invasion was reduced in these double KO cell lines even further supporting a role of lysosomal Ca<sup>2<sup>+</sup></sup> release in S. aureus host cell entry (Author response image 2, see end of the document). Since we did not have a single TPCN2 knockout available, we decided to exclude these data from the main manuscript.

      As to raw CFU counts: whereas the observed effects upon blocking the invasion of S. aureus are stable, the number of internalized bacteria varies between individual biological replicates, for instance, by differences in host cell fitness or growth differences in bacterial cultures, which are prepared freshly for each experiment.

      With respect to visualization of lysosomal Ca<sup>2<sup>+</sup></sup> release: we agree with the reviewer that direct visual demonstration of lysosomal Ca<sup>2<sup>+</sup></sup> release upon infection would improve the manuscript. We therefore performed live cell imaging to visualize lysosomal Ca<sup>2<sup>+</sup></sup> release by a previously published method.[1] The approach is based on two dextran-coupled fluorophores that were incubated with host cells. The dyes are endocytosed and eventually stain the lysosomes. One of the dyes, Rhod-2, is Ca<sup>2<sup>+</sup></sup>-sensitive and can be used to estimate the lysosomal Ca<sup>2<sup>+</sup></sup> content. The second dye, AF647, is Ca<sup>2<sup>+</sup></sup>-insensitive and is used to visualize the lysosomes. If the ratio Rhod-2/AF647 within the lysosomes is decreasing, lysosomal Ca<sup>2<sup>+</sup></sup> release is indicated. We monitored lysosomal Ca<sup>2<sup>+</sup></sup> content during S. aureus infection with this method (Author response image 1 and Author response video 1). However, the lysosomes are very dynamic, and it is challenging to monitor the fluorescence intensities over time. Thus, quantitative measurements are not possible with our methodology, and we decided to not include these data in the final manuscript. However, one could speculate that lysosomal Ca<sup>2<sup>+</sup></sup> content in the selected ROI (Author response image 1 and Author response video 1) is decreased upon attachment of S. aureus to the host cells as indicated by a decrease in Rhod-2/AF647 ratio.

      The precise identification of cytosolic vs phagosomal bacteria is not very easy to appreciate. The methods section indicates how this distinction is made, but how do the authors deal with partial overlaps and ambiguities generally associated with such analyses? Please show respective images.

      The number of events (individual bacteria) for the live cell imaging data should be clearly mentioned.

      We apologize for not having sufficiently explained the technology to detect escaped S. aureus. The cytosolic location of S. aureus is indicated by recruitment of RFP-CWT.[35] CWT is the cell wall targeting domain of lysostaphin, which efficiently binds to the pentaglycine cross bridge in the peptidoglycan of S. aureus. This reporter is exclusively and homogenously expressed in the host cytosol. Only upon rupture of phagoendosomal membranes, the reporter can be recruited to the cell wall of now cytosolically located bacteria. S. aureus mutants, for instance in the agr quorum sensing system, cannot break down the phagosomal membrane in non-professional phagocytes and thus stay unlabeled by the CWT-reporter.[35] We  include several images (Figure 4, F, Supp. Figure 5) /movies (Supp. Video 4) of escape events in the revised manuscript.  The bacteria numbers for live cell experiments are now shown in Supp. Figure 7.

      In the phagosome maturation experiments, what is the proportion of bacteria in Rab5 or Rab7 compartments at each time point? Will the decreased Rab7 association be accompanied by increased Rab5? Showing raw values and images will help appreciate such differences. Given the expertise and tools available in live cell imaging, can the authors trace Rab5 and Rab7 positive compartment times for the same bacteria?

      We included the proportion of Rab7-associated bacteria in the revised manuscript (Supp. Figure 4A and C) and also shortly mention these proportions in the text (line 353). Usually, we observe that Rab5 is only transiently (for a few minutes) present on phagosomes and only afterwards the phagosomes become positive for Rab7. We do not think that a decrease in Rab7-positive phagosomes would increase the proportion of Rab5-positive phagosomes. However, we cannot exclude this hypothesis with our data.

      We can achieve tracing of individual bacteria for recruitment of Rab5/Rab7 only manually, which impedes a quantitative evaluation. However, we included a Video (Supp. Video 3)  that illustrates the consecutive recruitment of the GTPases.

      The results with longer-term infection are interesting. Live cell imaging suggests that ASM-inhibited cells show accelerated phagosomal escape that reduces by 6 hpi. Where are the bacteria at this time point ? Presumably, they should have reached lysosomes. The relationship between cytosolic escape, replication, and host cell death is interesting, but the evidence, as presented is correlative for the populations. Given the use of live cell imaging, can the authors show these events in the same cell?

      We think that most bacteria-containing phagoendosomes should have fused with lysosomes 6 h p.i. as we have previously shown by acidification to pH of 5 and LAMP1 decoration.[36]

      The correlation between phagosomal escape and replication in the cytosol of non-professional phagocytes has been observed by us and others. In the revised manuscript we also provide images (Supp. Figure 5)/videos (Supp. Video 4) to show this correlation in our experiments.

      Given the inherent heterogeneity in uptake processes and the use of inhibitors in most experiments, the distinction between ASM-dependent and independent pathways might not be as clear-cut as the authors suggest. Some caution here will be good. Can the authors estimate what fraction of intracellular bacteria are taken up ASM-dependent?

      We agree with the reviewer that an overlap between internalization pathways is likely. A clear distinction is therefore certainly non-trivial. Alternative to ASM-dependent and ASM-independent pathways, the ASM activity may also accelerate one or several internalization pathways. We address this limitation in the discussion of the revised manuscript (line 596 ff).

      Early in infection (~10 min after contact with the cells), the proportion of bacteria that enter host cells ASM-dependently is relatively high amounting to roughly 75-80% in HuLEC. After 30 min, this proportion is decreasing to about 50%. We included a paragraph in the discussion of the revised manuscript (line 593 ff).

      Reviewer #2 (Recommendations for the authors):

      (1) The experiment in Figure 4H is interesting. Details on what proportion of the cell is double positive, and if only this fraction was used for analysis will be good.

      We did use all bacteria found in the images independently from whether host cells were infected with only one or both strains. We unfortunately cannot properly determine the proportion of cells that are double infected, since i) we record the samples with CLSM and hence, cannot exclude that there are intracellular bacteria found in higher or lower optical sections. ii) we visualized cells by staining Nuclei and did not stain the cell borders, thus we cannot precisely tell to which host cell the bacteria localize.

      (2) Data is sparse for steps 5 and 6 of the model (line 330).

      We apologize for the inconvenience. There is a related study published  elsewhere[19], in which we identified NRCAM and PTK7 as putative receptors involved in this invasion pathway. We included a section in the discussion with the corresponding citation (line 569).

      (3) Data for the reduced number of intracellular bacteria upon blocking ASM-dependent uptake (line 235) is not clear. Do they mean decreased invasion efficiency? These two need not be the same.

      We changed “reduced number of intracellular bacteria” to “invasion efficiency”.

      (4) b-toxin added to the surface can get endocytosed. Can its surface effect be delineated from endo/phagosomal effect?

      We attempted to delineate effects contributed by the toxin activity on the surface vs. within phagosomes (Figure 5 A-C). We see an increased phagosomal escape, when we pretreated host cells with β-toxin (removal of SM form the surface) and infected either in presence (toxin will be taken up together with the bacteria into the phagosome) or in absence (toxin was washed away shortly before infection) of β-toxin. By contrast, overexpression of β-toxin by S. aureus did not affect phagosomal escape rates. The proper activity of β-toxin was confirmed by absence of Lysenin recruitment during phagosomal escape in all three conditions. We concluded that the activity on the surface and not the activity in the phagosome is important.

      (5) The potential role(s) of bacterial factors in the uptake and subsequent intracellular stages can be discussed.

      There are multiple bacterial adhesins known in S. aureus. These usually are either covalently attached to the bacterial cell wall such as the sortase-dependently anchored Fibronectin-binding Proteins A and B but also secreted and “cell wall binding” proteins as well at non proteinaceous factor such as wall-teichoic acids. A discussion of these factors would thus be out of the scope of this manuscript, and we here suggest reverting to specialized reviews on that topic.

      (6) The manuscript is not very easy to read. The abstract could be rephrased for better clarity and succinctness, with a clearly stated problem statement. The introduction is somewhat haphazard, I feel it can be better structured.

      We apologize for the inconvenience. We stated the problem/research question in the abstract and tried to improve the introduction without adding too much unnecessary detail. In general, we tried  to improve the readability of the manuscript and hope that our results and conclusions can be easier understood by the reader in the revised version.

      (7) Typo in Figure 5F. Step 6 should read "accessory receptors"

      The typo was corrected.

      References

      (1) Lloyd-Evans, E. et al. Niemann-Pick disease type C1 is a sphingosine storage disease that causes deregulation of lysosomal calcium. Nature Medicine 14, 1247-1255 (2008).

      (2) Launay, P. et al. TRPM4 Is a Ca<sup>2<sup>+</sup></sup>-Activated Nonselective Cation Channel Mediating Cell Membrane Depolarization. Cell 109, 397-407 (2002).

      (3) Nilius, B. et al. The Ca<sup>2<sup>+</sup></sup>‐activated cation channel TRPM4 is regulated by phosphatidylinositol 4,5‐biphosphate. The EMBO Journal 25, 467-478-478 (2006).

      (4) Cáceres, M. et al. TRPM4 Is a Novel Component of the Adhesome Required for Focal Adhesion Disassembly, Migration and Contractility. PLoS One 10, e0130540 (2015).

      (5) Silva, I., Brunett, M., Cáceres, M. & Cerda, O. TRPM4 modulates focal adhesion-associated calcium signals and dynamics. Biophysical Journal 123, 390a (2024).

      (6) Schlesier, T., Siegmund, A., Rescher, U. & Heilmann, C. Characterization of the Atl-mediated staphylococcal internalization mechanism. International Journal of Medical Microbiology 310, 151463 (2020).

      (7) Jevon, M. et al. Mechanisms of Internalization ofStaphylococcus aureus by Cultured Human Osteoblasts. Infection and Immunity 67, 2677-2681 (1999).

      (8) Rodriguez, A., Webster, P., Ortego, J. & Andrews, N.W. Lysosomes behave as Ca<sup>2<sup>+</sup></sup>-regulated exocytic vesicles in fibroblasts and epithelial cells. J Cell Biol 137, 93-104 (1997).

      (9) Krones & Rühling et al. Staphylococcus aureus alpha-Toxin Induces Acid Sphingomyelinase Release From a Human Endothelial Cell Line. Front Microbiol 12, 694489 (2021).

      (10) Sakurai, Y. et al. Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science 347, 995-998 (2015).

      (11) Aarhus, R., Graeff, R.M., Dickey, D.M., Walseth, T.F. & Lee, H.C. ADP-ribosyl cyclase and CD38 catalyze the synthesis of a calcium-mobilizing metabolite from NADP. J Biol Chem 270, 3032730333 (1995).

      (12) Schmid, F., Fliegert, R., Westphal, T., Bauche, A. & Guse, A.H. Nicotinic acid adenine dinucleotide phosphate (NAADP) degradation by alkaline phosphatase. J Biol Chem 287, 32525-32534 (2012).

      (13) Angeletti, C. et al. SARM1 is a multi-functional NAD(P)ase with prominent base exchange activity, all regulated bymultiple physiologically relevant NAD metabolites. iScience 25, 103812 (2022).

      (14) Gu, F. et al. Dual NADPH oxidases DUOX1 and DUOX2 synthesize NAADP and are necessary for Ca(2<sup>+</sup>) signaling during T cell activation. Sci Signal 14, eabe3800 (2021).

      (15) Schonn, J.-S., Maximov, A., Lao, Y., Südhof, T.C. & Sørensen, J.B. Synaptotagmin-1 and -7 are functionally overlapping Ca<sup>2<sup>+</sup></sup> sensors for exocytosis in adrenal chromaffin cells. Proceedings of the National Academy of Sciences 105, 3998-4003 (2008).

      (16) Kornhuber, J. et al. Functional Inhibitors of Acid Sphingomyelinase (FIASMAs): a novel pharmacological group of drugs with broad clinical applications. Cell Physiol Biochem 26, 9-20 (2010).

      (17) Naser, E. et al. Characterization of the small molecule ARC39, a direct and specific inhibitor of acid sphingomyelinase in vitro. J Lipid Res 61, 896-910 (2020).

      (18) Roth, A.G. et al. Potent and selective inhibition of acid sphingomyelinase by bisphosphonates. Angew Chem Int Ed Engl 48, 7560-7563 (2009).

      (19) Rühling, M., Schmelz, F., Kempf, A., Paprotka, K. & Fraunholz Martin, J. Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio 0, e03654-03624 (2025).

      (20) Schuchman, E.H. & Desnick, R.J. Types A and B Niemann-Pick disease. Mol Genet Metab 120, 27-33 (2017).

      (21) Miller, M.E., Adhikary, S., Kolokoltsov, A.A. & Davey, R.A. Ebolavirus Requires Acid Sphingomyelinase Activity and Plasma Membrane Sphingomyelin for Infection. Journal of Virology 86, 7473-7483 (2012).

      (22) M. Rühling, L.K., F. Wagner, F. Schumacher, D. Wigger, D. A. Helmerich, T. Pfeuffer, R. Elflein, C. Kappe, M. Sauer, C. Arenz, B. Kleuser, T. Rudel, M. Fraunholz, J. Seibel Trifunctional sphingomyelin derivatives enable nanoscale resolution of sphingomyelin turnover in physiological and infection processes via expansion microscopy. Nat Commun accepted in principle (2024).

      (23) Peters, S. et al. Neisseria meningitidis Type IV Pili Trigger Ca(2<sup>+</sup>)-Dependent Lysosomal Trafficking of the Acid Sphingomyelinase To Enhance Surface Ceramide Levels. Infect Immun 87 (2019).

      (24) Grassmé, H. et al. Acidic sphingomyelinase mediates entry of N. gonorrhoeae into nonphagocytic cells. Cell 91, 605-615 (1997).

      (25) Li, C. et al. Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal 28, 916-934 (2018).

      (26) Fernandes, M.C. et al. Trypanosoma cruzi subverts the sphingomyelinase-mediated plasma membrane repair pathway for cell invasion. J Exp Med 208, 909-921 (2011).

      (27) Luisoni, S. et al. Co-option of Membrane Wounding Enables Virus Penetration into Cells. Cell Host & Microbe 18, 75-85 (2015).

      (28) Rühling, M. et al. Trifunctional sphingomyelin derivatives enable nanoscale resolution of sphingomyelin turnover in physiological and infection processes via expansion microscopy. Nature Communications 15, 7456 (2024).

      (29) Ellison, C.J., Kukulski, W., Boyle, K.B., Munro, S. & Randow, F. Transbilayer Movement of Sphingomyelin Precedes Catastrophic Breakage of Enterobacteria-Containing Vacuoles. Curr Biol 30, 2974-2983 e2976 (2020).

      (30) Moldovan, A. & Fraunholz, M.J. In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol 21, e12997 (2019).

      (31) Slotte, J.P. Biological functions of sphingomyelins. Progress in Lipid Research 52, 424-437 (2013).

      (32) Stelzner, K. et al. Intracellular Staphylococcus aureus Perturbs the Host Cell Ca(2<sup>+</sup>) Homeostasis To Promote Cell Death. mBio 11 (2020).

      (33) Kunz, T.C. et al. The Expandables: Cracking the Staphylococcal Cell Wall for Expansion Microscopy. Front Cell Infect Microbiol 11, 644750 (2021).

      (34) Sakurai, Y. et al. Ebola virus. Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science 347, 995-998 (2015).

      (35) Grosz, M. et al. Cytoplasmic replication of Staphylococcus aureus upon phagosomal escape triggered by phenol-soluble modulin alpha. Cell Microbiol 16, 451-465 (2014).

      (36) Giese, B. et al. Staphylococcal alpha-toxin is not sufficient to mediate escape from phagolysosomes in upper-airway epithelial cells. Infect Immun 77, 3611-3625 (2009).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the editor and all the reviewers for their time and thoughtful consideration of our manuscript. We appreciate the valuable comments. Our provisional response to the “public review” has been published and now we have corrected factual errors and enhanced the clarity of writings based on the “recommendations for the authors.” We believe these corrections will improve the quality and accuracy of our manuscript.

      Specific responses to the reviewers' recommendations for the authors are as follows:

      Reviewer #1 (Recommendations For The Authors):

      1) Is the Slack current amplitude dependent on the Nav subtype? Differences in Slack current amplitude might explain the sensitization of Slack to quinidine.

      We appreciate the reviewer for raising this point. We examined Slack current amplitudes upon co-expression of Slack with specific NaV subtypes in HEK293 cells. The results have shown that there are no significant differences in Slack current amplitudes upon co-expression of Slack with different NaV channel subtypes (Author response image 1), suggesting whole-cell Slack current amplitudes cannot explain the varied ability of NaV subtypes to sensitize Slack to quinidine blockade.

      Author response image 1.

      The amplitudes of Slack currents upon co-expression of Slack with specific NaV subtypes in HEK293 cells. ns, p > 0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      2) Is the open probability changed by the presence of Nav1.6 and/or by the other Nav subtypes? Changes in open probability might explain the Nav1.6 induced sensitization of Slack to quinidine block.

      We appreciate the reviewer for raising this point. To investigate the effect of different NaV channel subtypes on Slack open probability, we will perform the single-channel recordings in future studies.

      3) Could the authors elaborate more on the coupling between INaT mediated sensitization of Slack to block by quinidine and the Nav1.6 N-and C-tail induced sensitization?

      We appreciate the reviewer for raising this point. We fully agree the importance of investigating the detailed mechanism underlying the sensitization of Slack to quinidine blockade. To address the questions, we plan to employ structural biological methods, such as cryo-electron microscopy (cryo-EM).

      4) Line 85: The authors use an outdated nomenclature of AMPAR subtypes. I would suggest changing to GluA1, GluA2, GluA3 and GluA4.

      We appreciate the reviewer’s suggestion. We have changed the term “GluR” to “GluA” in the revised manuscript.

      The authors do not explain the rationale by using the different homomeric AMPAR subtypes. Most often the AMPARs express as heteromeric receptors decorated by auxiliary subunits. Also, is the GluA2 the edited version?

      We thank the reviewer for raising this point. While AMPARs are often expressed as heteromeric receptors with auxiliary subunits, we focused on the homomeric AMPAR subtypes for initial screening. Through our investigation, we found no significant effects on sensitizing Slack to quinidine blockade. Additionally, the GluA2 used in our study is unedited.

      5) Line 144: I expect a reduction in current amplitude caused by blocking INaT and INaP is tested at +100mV?

      We thank the reviewer for raising this point. The reduction in current amplitude was indeed tested at +100 mV and we have included this information in the revised manuscript.

      6) Line 157 and line 162: Reference to Supplementary table S3 should be Table S2.

      We thank the reviewer for pointing this out. The reference to "Table S3" has been corrected to "Table S2" in the revised manuscript.

      7) How many times did the authors repeat the co-immunoprecipitation? Some of the bands are very weak, and repeats are necessary for all blots.

      We thank the reviewer for raising this concern. We performed the co-immunoprecipitation experiments three times independently.

      8) Line 288: The authors are showing the chimeric construct in Figures 7A and B but are referring to the full length Nav1.6 in the main text line 288.

      We apologize for the confusion. We have clarified in the revised manuscript that we used NaV1.5/6NC in our study.

      9) Figure 1 line 23: 1 uM quinidine must be 30 uM quinidine?

      We thank the reviewer for catching this error. We have corrected the concentration value in the caption of Figure 1 from "1 μΜ" to "30 μΜ" in the revised manuscript.

      10) Figure 2 line 53: I expect IC50 is measured at +100mV? Same question for line 60 in same figure text.

      We thank the reviewer for pointing this out. We have now included this information in the revised manuscript.

      11) Figure 4B color coding is confusing.

      We apologize for the confusion. We would like to clarify that Fig. 4B illustrates the domain architecture of the human NaV channel pore-forming α subunit, and we have changed the color from dark blue to black in the revised figure.

      12) Figure S6: Text for figure S6E and S6F has been swapped (line 96 to 106).

      We thank the reviewer for raising this point. We have rectified the swapped captions for Fig. S6E and Fig. S6F in the revised manuscript.

      13) Methods section line 652: Kainite acid should be changed to kainic acid

      We thank the reviewer for catching this typo. The term “kainite acid” has been corrected to “kainic acid” in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) Discuss limitations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system.

      We thank the reviewer for raising this point. We have discussed the limitations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system (line 344 to line 348).

      2) Riluzole is not a selective drug, so the limitations of this drug should be discussed.

      We thank the reviewer for raising this point. We have discussed the limitations of riluzole in the revised manuscript (line 360 to line 364).

      3) Remove the term in vivo.

      We thank the reviewer for raising this point. In our experiments, although we did not conduct experiments directly in living organisms, our results demonstrated the coimmunoprecipitation of NaV1.6 with Slack in homogenates from mouse cortical and hippocampal tissues (Fig. 3C). This result may support that the interaction between Slack and NaV1.6 occurs in vivo.

      4) Figure 1

      ①C Why does Nav1.2 have a small inward current before the large inward current in the inset? The slope of the rising phase of the larger sodium current seems greater than Nav1.6 or Nav1.5. Was this examined?

      We apologize for the confusion. We would like to clarify that the small inward current can be attributed to the current of membrane capacitance (slow capacitance or C-slow). The larger inward current is mediated by NaV1.2. Additionally, we did not compare the slope of the rising phase of NaV subtypes sodium currents but primarily focused on the current amplitudes.

      ②D-E

      For Nav1.5 the sodium current is very large compared to Nav1.6. Is it possible the greater effect of quinidine for Nav1.6 is due to the lesser sodium current of Nav1.6?

      We thank the reviewer for raising this point. We would like to clarify that our results indicate that transient sodium currents contribute to the sensitization of Slack to quinidine blockade (Fig. 2C,E). Therefore, it is unlikely that the greater effect observed for NaV1.6 in sensitizing Slack is due to its lower sodium currents.

      ③The differences between WT and KO in G -H are hard to appreciate. Could quantification be shown? The text uses words like "block" but this is not clear from the figure. It seems that the replacement of Na+ with Li+ did not block the outward current or effect of quinidine.

      We apologize for the confusion. We would like to clarify the methods used in this experiment. The lithium ion (Li+) is a much weaker activator of sodium-activated potassium channel Slack than sodium ion (Na+)1,2.

      1. Zhang Z, Rosenhouse-Dantsker A, Tang QY, Noskov S, Logothetis DE. The RCK2 domain uses a coordination site present in Kir channels to confer sodium sensitivity to Slo2.2 channels. J Neurosci. Jun 2 2010;30(22):7554-62. doi:10.1523/JNEUROSCI.0525-10.2010

      2. Kaczmarek LK. Slack, Slick and Sodium-Activated Potassium Channels. ISRN Neurosci. Apr 18 2013;2013(2013)doi:10.1155/2013/354262

      Therefore, we replaced Na+ with Li+ in the bath solution to measure the current amplitudes of sodium-activated potassium currents (IKNa)3.

      1. Budelli G, Hage TA, Wei A, et al. Na+-activated K+ channels express a large delayed outward current in neurons during normal physiology. Nat Neurosci. Jun 2009;12(6):745-50. doi:10.1038/nn.2313

      The following equation was used for quantification:

      Furthermore, the remaining IKNa after application of 3 μM quinidine in the bath solution was measured as the following:

      The quantification results were presented in Fig. 1K. The term "block" used in the text referred to the inhibitory effect of quinidine on IKNa.

      ④In K, for the WT, why is the effect of quinidine only striking for the largest currents?

      We thank the reviewer for raising this point. After conducting an analysis, we found no correlation between the inhibitory effect of quinidine and the amplitudes of baseline IKNa in WT neurons (p = 0.6294) (Author response image 2). Therefore, the effect of quinidine is not solely limited to targeting the larger currents.

      Author response image 2.

      The correlation between the inhibitory effect of quinidine and the amplitudes of baseline IKNa in WT neurons (data from manuscript Fig. 1K). r = 0.1555, p=0.6294, Pearson correlation analysis.

      5) Figure 2

      ①A. The argument could be better made if the same concentration of quinidine were used for Slack and Slack + Nav1.6. It is recognized a greater sensitivity to quinidine is to be shown but as presented the figure is a bit confusing.

      We apologize for the confusion. We would like to clarify that the presented concentrations of quinidine were chosen to be near the IC50 values for Slack and Slack+NaV1.6.

      ②C. Can the authors add the effect of quinidine to the condition where the prepulse potential was - 90?

      We apologize for the confusion. We would like to clarify that the condition of prepulse potential at -90 mV is the same as the condition in Fig. 1. We only changed one experiment condition where the prepulse potential was changed to -40 mV from -90 mV.

      6) Figure 3.

      ①line 80 should be coronal not coronary

      We thank the reviewer for catching this error. We have corrected the term “coronary” to “coronal” in the caption of Figure 3.

      ②A. Clarify these 6 panels.

      We thank the reviewer for raising this point. We have clarified the captions of Fig. 3A in the revised manuscript.

      ③Please enlarge fonts in D.

      We thank the reviewer’s suggestion. We’ve enlarged the fonts in Fig. 3D in the revised manuscript.

      ④F. The variances should be checked with a test to determine if they are significantly different because they look different - if so, data can be transformed and if transformed data have variances that are equivalent a t-test can be used on the transformed data. Otherwise, Mann-Whitney should be used.

      We thank the reviewer for pointing this out. We have reanalyzed the data in Fig. 3F using Mann Whitney test after identifying the different variances in the two groups.

      7) Figure 7. The images need more clarity. They are very hard to see. Text is also hard to see.

      We apologize for the lack of clarity in the images and text. we would like to provide a concise summary of the key findings shown in this figure.

      Figure 7 illustrates an innovative intervention for treating SlackG269S-induced seizures in mice by disrupting the Slack-NaV1.6 interaction. Our results showed that blocking NaV1.6-mediated sodium influx significantly reduced Slack current amplitudes (Fig. 2D,G), suggesting that the Slack-NaV1.6 interaction contributes to the current amplitudes of epilepsy-related Slack mutant variants, aggravating the gain-of-function phenotype. Additionally, Slack’s C-terminus is involved in the Slack-NaV1.6 interaction (Fig. 5D). We assumed that overexpressing Slack’s C-terminus can disrupt the Slack-NaV1.6 interaction (compete with Slack) and thereby encounter the current amplitudes of epilepsy-related Slack mutant variants.

      In HEK293 cells, overexpression of Slack’s C-terminus indeed significantly reduced the current amplitudes of epilepsy-related SlackG288S and SlackR398Q upon co-expression with NaV1.5/6NC (Fig. 7A,B). Subsequently, we evaluated this intervention in an in vivo epilepsy model by introducing the Slack G269S variant into C57BL/6N mice using AAV injection, mimicking the human Slack mutation G288S that we previously identified (Fig. 7C-G).

      ②It is not clear how data were obtained because injection of kainic acid does not lead to a convulsive seizure every 10 min for several hours, which is what appears to be shown. Individual seizures are just at the beginning and then they merge at the start of status epilepticus. After the onset of status epilepticus the animals twitch, have varied movements, sometime rear and fall, but there is not a return to normal behavior. Therefore one can not call them individual seizures. In some strains of mice, however, individual convulsive seizures do occur (even if the EEG shows status epilepticus is occurring) but there are rarely more than 5 over several hours and the graph has many more. Please explain.

      We apologize for the confusion. Regarding the data acquisition in relation to kainic acid injection, we initiated the timing following intraperitoneal injection of kainic acid and recorded the seizure scores of per mouse at ten-minute intervals, following the methodology described in previous studies4.

      1. Huang Z, Walker MC, Shah MM. Loss of dendritic HCN1 subunits enhances cortical excitability and epileptogenesis. J Neurosci. Sep 2 2009;29(35):10979-88. doi:10.1523/JNEUROSCI.1531-09.2009

      The seizure scores were determined using a modified Racine, Pinal, and Rovner scale5,6: (1) Facial movements; (2) head nodding; (3) forelimb clonus; (4) dorsal extension (rearing); (5) Loss of balance and falling; (6) Repeated rearing and failing; (7) Violent jumping and running; (8) Stage 7 with periods of tonus; (9) Dead.

      1. Pinel JP, Rovner LI. Electrode placement and kindling-induced experimental epilepsy. Exp Neurol. Jan 15 1978;58(2):335-46. doi:10.1016/0014-4886(78)90145-0

      2. Racine RJ. Modification of seizure activity by electrical stimulation. II. Motor seizure. Electroencephalogr Clin Neurophysiol. Mar 1972;32(3):281-94. doi:10.1016/0013- 4694(72)90177-0

      8) The graphical abstract is quite complicated and somewhat hard to follow. Please simplify and clarify. One aspect of the abstract to clarify is the direction of what is first and second and third (etc.) because arrows point to many directions.

      We thank the review for raising this point. In the revised manuscript, we have included numbering of three components within the graphical abstract:

      1. Pathological phenotype: Increased Slack currents.

      2. Two types of interventions:

      2a. Disruption of the Slack-NaV1.6 interaction.

      2b. NaV1.6-mediated sensitization of Slack to quinidine blockade.

      1. Therapeutic effects: Reduced Slack currents.

      Reviewer #3 (Recommendations For The Authors):

      1) A reference to homozygous knockout is made in the abstract; however, only heterozygous mice are mentioned in the methods section. The genotype of the mice needs to be made clear in the manuscript. Furthermore, at what age were these mice used in the study. Since homozygous knockout of NaV1.6 is lethal at a very young age (<4 wks), it would be important to clarify that point as well.

      We thank the reviewer for pointing this out. In the revised manuscript, we have included information about the source of the primary cortical neurons used in our study. These neurons were obtained from postnatal homozygous NaV1.6 knockout C3HeB/FeJ mice and their wild-type littermate controls.

      2) Coimmunoprecipitation studies in Fig. 3C are not convincing. There appears to be a signal in the control lane. Furthermore, it appears that brightness levels were adjusted of that image, thereby removing completely the background.

      We thank the reviewer for pointing this out. We have replaced Fig. 3C with an unadjusted version in the revised manuscript.

      3) In Fig. 1B, the authors indicate that 30 microM of quinidine was used, while the corresponding figure legend suggest that 1 microM. Please clarify.

      We apologize for this error. We have corrected the concentration value in the caption of Figure 1 from "1 μΜ" to "30 μΜ" in the revised manuscript.

      4) How long were the cells exposed to quinidine before the functional measurement were performed?

      We thank the reviewer for pointing this out. The cells were exposed to the bath solution with quinidine for about one minute before applying step pulses.

      5) In Fig. 6B-D, it is not clear to what extent co-expression of Slack mutants and NaV1.6 increases sodium-activated potassium current.

      We thank the reviewer for pointing this out. We notice that the current amplitudes of Slack mutants exhibit a considerable degree of variation, ranging from less than 1 nA to over 20 nA (n = 5-8). To accurately measure the effects of NaV1.6 on increasing current amplitudes of Slack mutants, we plan to apply tetrodotoxin in the bath solution to block NaV1.6 sodium currents upon coexpression of Slack mutants with NaV1.6.

      6) In Fig.7A and B, it appears that some recordings had no sodium-activated potassium currents. Why were these included in analysis? How was transfection efficacy assessed?

      We apologize for the confusion. We would like to clarify that all recordings included in analysis indeed exhibited outward sodium-activated potassium currents. The current density data in Fig. 7A-B are listed in Author response table 1 (in pA/pF):

      Author response table 1.

      Regarding the assessment of transfection efficacy, we estimated it approximately by using fluorescence proteins as reporters, which were co-expressed with the relevant proteins via the selfcleaving 2A peptide.

      7) Greater detail needs to be provided for the generation of NaV1.5 and NaV1.6 chimeras. Specifically, what AA residues were changed between sodium channel isoforms?

      We thank reviewer for pointing this out. In the revised manuscript, we have included the specific amino acid residues that were changed between NaV1.5 and NaV1.6 to generate the chimeric constructs.

      8) In line 481, the authors refer to Fig. S2d instead of Fig. S6D. This should be corrected. Furthermore, the unusual shift in sodium current kinetics that the authors observe might be due in part to junction potential. Did the authors take that into consideration?

      We apologize for this error. The reference to "Fig. S2d" has been corrected to "Fig. S6D" in the revised manuscript.

      Regarding the unusual shift observed in the sodium current kinetics, we agree with the reviewer's suggestion that the junction potential may contribute to this phenomenon. During patch-clamp recordings, we ensure that the junction potential was properly compensated by the amplifier. Additionally, the replacement of CsF in pipette solution may have contributed to the observed unusual shift, as CsF in pipette solution has been reported to shift the voltage dependence of activation and fast/slow inactivation of NaV channels towards more negative potentials7.

      1. Korngreen A. Advanced patch-clamp analysis for neuroscientists. Neuromethods. Humana Press; 2016:xii, 350 pages.

      9) Legends for Fig.S6E and S6F are flipped. Please correct.

      We apologize for this error. We have rectified the flipped captions for figure S6E and S6F in the revised manuscript.

      10) Variance should be provided for the IC50 values and kinetic parameters of the sodium channels in the supplemental tables.

      We thank the reviewer for raising this point. We have included the 95% confidence interval (95%CI) for the IC50 values and kinetic parameters in the revised supplementary tables.

      Additionally, we have corrected some equations in the methods section:

      1. Line 500 and line 503: We have corrected equation (1) by adding the parameter hill coefficient.

      2. Line 514: We have revised equation (4) from to

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, authors have investigated the effects of JNK inhibition on sucrose-induced metabolic dysfunction in rats. They used multi-tissue network analysis to study the effects of the JNK inhibitor JNK-IN-5A on metabolic dysfunction associated with excessive sucrose consumption. Their results show that JNK inhibition reduces triglyceride accumulation and inflammation in the liver and adipose tissues while promoting metabolic adaptations in skeletal muscle. The study provides new insights into how JNK inhibition can potentially treat metabolic dysfunction-associated fatty liver disease (MAFLD) by modulating inter-tissue communication and metabolic processes.

      Strengths:

      The study has several notable strengths:

      Comprehensive Multi-Tissue Analysis: The research provides a thorough multi-tissue evaluation, examining the effects of JNK inhibition across key metabolically active tissues, including the liver, visceral white adipose tissue, skeletal muscle, and brain. This comprehensive approach offers valuable insights into the systemic effects of JNK inhibition and its potential in treating MAFLD.

      Robust Use of Systems Biology: The study employs advanced systems biology techniques, including transcriptomic analysis and genome-scale metabolic modeling, to uncover the molecular mechanisms underlying JNK inhibition. This integrative approach strengthens the evidence supporting the role of JNK inhibitors in modulating metabolic pathways linked to MAFLD.

      Potential Therapeutic Insights: By demonstrating the effects of JNK inhibition on both hepatic and extrahepatic tissues, the study offers promising therapeutic insights into how JNK inhibitors could be used to mitigate metabolic dysfunction associated with excessive sucrose Behavioral and Metabolic Correlation: The inclusion of behavioral tests alongside metabolic assessments provides a more holistic view of the treatment's effects, allowing for a better understanding of the broader physiological implications of JNK inhibition.

      Weaknesses:

      While the study provides a comprehensive evaluation of JNK inhibitors in mitigating MAFLD conditions, addressing the following points will enhance the manuscript's quality:

      The authors should explicitly mention and provide a detailed list of metabolites affected by sucrose and JNK inhibition treatment that have been previously associated with MAFLD conditions. This will better contextualize the findings within the broader field of metabolic disease research.

      We fully agreed on this constructive suggestion to improve our understanding of the metabolic effect of JNK inhibition under sucrose overconsumption. While technical limitations made it challenging to directly analyze metabolites in the current study, we employed genome-scale metabolic modeling—a robust approach for studying metabolism—to predict the metabolic pathways potentially impacted by the interventions (Fig. 7 and Data S8). Additionally, as part of this revision, we conducted an extensive literature review to identify metabolites previously reported to be affected by sucrose consumption in MAFLD rodent models and MASLD patients. A detailed summary of these metabolites is now presented in attached Table 1 and several of these metabolites have been incorporated into the revised results section (Lines 308-314) to support some of the predicted metabolic activities.

      “Some of the predicted metabolic changes align with previous findings in rodents subjected to sucrose overconsumption. For example, Öztürk et al. reported altered tryptophan metabolism, including decreased serum levels of kynurenic acid and kynurenine, in rats consuming 10% sucrose in drinking water. Similarly, increased triglyceride-bound oleate, palmitate, and stearate were observed in the livers of rats fed a 10% sucrose solution, indicating JNK-IN-5A treatment may regulate lipid metabolism by modulating these metabolic activities.”

      It is important to note, however, that data on metabolites specifically affected by JNK inhibition in MASLD contexts remains lacking in the literature. The predicted metabolites and associated metabolic pathways in the current study could provide a starting point for such exploration in future studies. We have emphasized this in the revised manuscript and highlighted the need for further studies to explore these mechanisms in greater detail.

      Author response table 1.

      Metabolites associated with sucrose overconsumption in MASLD.

      The limitations of the study should be clearly stated, particularly the lack of evidence on the effects of chronic JNK inhibitor treatment and potential off-target effects. Addressing these concerns will offer a more balanced perspective on the therapeutic potential of JNK inhibition.

      Thank you for this constructive comment. We have acknowledged limitations of the current study in Discussion section (Lines 397-406) of the revised manuscript:

      “Nevertheless, several limitations warrant consideration. First, while we observed transcriptional adaptations in skeletal muscle tissue following treatment, the exact molecular mechanisms underlying these changes and their roles in skeletal muscle function and systemic metabolic homeostasis remain unclear. Further investigation is warranted to elucidate the muscle-specific effects of JNK inhibition. Second, our study did not investigate the dosedependent or potential off-target effects of JNK-IN-5A, particularly its activity on other members of the kinase family and associated signaling pathways. Lastly, the long-term effects of JNKIN-5A administration remain unexplored. Understanding its prolonged impact across different stages of MAFLD, including advanced MASH, is crucial for assessing the full therapeutic potential of JNK inhibition in the treatment of MAFLD.“

      The potential risks of using JNK inhibitors in non-MAFLD conditions should be highlighted, with a clear distinction made between the preventive and curative effects of these therapies in mitigating MAFLD conditions. This will ensure the therapeutic implications are properly framed.

      Thank you for this insightful suggestion. The potential risks of using JNK inhibitors in nonMAFLD conditions have been considered and are now highlighted in Lines 369-390 of the revised discussion

      “Although overactivated JNK activity presents an attractive opportunity to combat MAFLD, inhibition of JNK presents substantial challenges and potential risks due to its broad and multifaceted roles in many cellular processes. One key challenge is the dual role of JNK signaling (Lamb et al., 2003). For instance, long-term JNK inhibition may disrupt liver regeneration, as JNK plays a critical role in liver repair by regulating hepatocyte proliferation and survival following injury or stress (Papa and Bubici, 2018). In HCC, it has been reported that JNK acts as both a tumor promoter, driving inflammation, fibrosis, and metabolic dysregulation, and a tumor suppressor, facilitating apoptosis and cell cycle arrest in damaged hepatocytes. Its inhibition, therefore, carries the risk of inadvertently promoting tumor progression under certain conditions (Seki et al., 2012). Furthermore, the differential roles of JNK isoforms (JNK1, JNK2, JNK3) and a lack of specificity of JNK inhibitors present another layer of complexity. Given these challenges, while our study demonstrated the potential of JNK-IN-5A in mitigating early metabolic dysfunction in the liver and adipose tissues, JNK targeting strategies should be carefully tailored to the disease stage under investigation. For curative approaches targeting advanced MAFLD, such as MASH, future studies are warranted to address considerations related to dosing, tissue specificity, and the long-term effects.”

      The statistical analysis section could be strengthened by providing a justification for the chosen statistical tests and discussing the study's power. Additionally, a more detailed breakdown of the behavioral test results and their implications would be beneficial for the overall conclusions of the study.

      We would like to thank you for this constructive suggestion. In this study, differences among more than two groups were tested using ANOVA or Kruskal-Wallis test based on the normality testing (Shapiro–Wilk test) on the data (continuous variables from different measurements). Pairwise comparisons, were performed using Tukey’s post hoc test following ANOVA or Dunn’s multiple comparisons post hoc test following the Kruskal-Wallis test, as appropriate. 

      The study used 11 animals per group, a group size widely used in preclinical animal research [13]. To evaluate the power of this study design to detect group differences, we conducted a power analysis using G*Power 3.1 software [14], with ANOVA used as an example. The power analysis revealed the following:

      - For a small effect size (partial eta.sq = 0.01), the power was 7.5% at 𝑝<0.05.

      - For a medium effect size (partial eta.sq = 0.06), the power was 23.7% at 𝑝<0.05.

      - For a large effect size (partial eta.sq = 0.14), the power is 55.4% at 𝑝<0.05

      Bonapersona et al. reported that the median statistical power in animal studies is often between 15–22% [15], the achieved power of the current study design is within the range observed in most exploratory animal research. However, we acknowledge that the power for detecting smaller effects within groups is limited, which is also a common challenge in animal research due to ethical considerations on increasing sample sizes.

      As suggested, we’ve revised the ‘Statistical Analysis’ and ‘Result’ sections to improve clarity:

      “Statistical Analysis:

      Data were shown as mean ± standard deviation (SD), unless stated otherwise. The assumption of normality for continuous variables from behavior test, biometric measurements, and plasm biochemistry was determined using the Shapiro–Wilk test. Differences among multiple groups were tested by ANOVA or, for data that were not normally distributed, the non-parametric Kruskal-Wallis test. Pairwise comparisons were performed using Tukey’s post hoc test following the ANOVA or Dunn’s multiple comparisons post hoc test following the Kruskal-Wallis test, as appropriate. The Jaccard index was used to evaluate the similarity and diversity of two gene sets, and a  hypergeometric test was used to test the significance of their overlap. All results were considered statistically significant at p < 0.05, unless stated otherwise.”

      Behavior tests (Lines 150-157):

      “We found no significant differences among groups in retention latencies, a measure of learning and memory abilities in passive avoidance test (Data S3). Additionally, the locomotor activity test was used to analyze behaviors such as locomotion, anxiety, and depression in rat. No significant differences were observed among groups in stereotypical movements, ambulatory activity, rearing, resting percentage, and distance travelled (Data S4). Similarly, the elevated plus maze test (Walf and Frye, 2007), an assay for assessing anxiety-like behavior in rodents, showed that rats in all groups had comparable open-arm entries and durations (Data S5). Collectively, the behavior tests indicate the JNK-IN-5A-treated rats exhibit no evidence of anxiety and behavior disorders.”

      Reviewer #2 (Public review):

      Summary:

      Excessive sucrose is a possible initial factor for the development of metabolic dysfunctionassociated fatty liver disease (MAFLD). To investigate the possibility that intervention with JNK inhibitor could lead to the treatment of metabolic dysfunction caused by excessive sucrose intake, the authors performed multi-organ transcriptomics analysis (liver, visceral fat (vWAT), skeletal muscle, and brain) in a rat model of MAFLD induced by sucrose overtake (+ a selective JNK2 and JNK3 inhibitor (JNK-IN-5A) treatment). Their data suggested that changes in gene expression in the vWAT as well as in the liver contribute to the pathogenesis of their MAFLD model and revealed that the JNK inhibitor has a cross-organ therapeutic effect on it.

      Strengths:

      (1)It has been previously reported that inhibition of JNK signaling can contribute to the prevention of hepatic steatosis (HS) and related metabolic syndrome in other models, but the role of JNK signaling in the metabolic disruption caused by excessive intake of sucrose, a possible initial factor for the development of MAFLD, has not been well understood, and the authors have addressed this point.

      (2)This study is also important because pharmacological therapy for MAFLD has not yet been established.

      (3)By obtaining transcriptomic data in multiple organs and comprehensively analyzing the data using gene co-expression network (GCN) analysis and genome-scale metabolic models (GEM), the authors showed the multi-organ interaction in not only in the pathology of MAFLD caused by excessive sucrose intake but also in the treatment effects by JNK-IN-5A.

      (4) Since JNK signaling has diverse physiological functions in many organs, the authors effectively assessed possible side effects with a view to the clinical application of JNK-IN-5A.

      Weaknesses:

      (1) The metabolic process activities were evaluated using RNA-seq results in Figure 7, but direct data such as metabolite measurements are lacking.

      Thank you for these valuable insights. We fully agree that direct metabolite measurements would provide a deeper understanding of the metabolic impact of sucrose overconsumption and JNK-IN-5A administration. Unfortunately, due to technical limitations, we were unable to directly measure metabolites in this study. To address this, we supported our genome-scale metabolic modeling predictions with an extensive literature review, which is summarized in attached Table 1. This table highlights key metabolites and associated metabolic pathways that have been previously associated with sucrose overconsumption in MAFLD contexts. We incorporated some of these metabolites into the revised results section (Lines 308–314) to demonstrate the consistency between our predicted metabolic changes and experimental findings from the literature. For instance, studies have reported altered tryptophan metabolism, including decreased serum kynurenic acid and kynurenine levels, as well as increased triglyceride-bound oleate, palmitate, and stearate in sucrose-fed rodents. These findings align with our predictions of altered metabolic activities in fatty acid oxidation, fatty acid synthesis, and tryptophan metabolism.

      (2) There is a lack of consistency in the data between JNK-IN-5A_D1 and _D2, and there is no sufficient data-based explanation for why the effects observed in D1 were inconsistent in the D2 samples.

      Thank you for raising this important point regarding the differences between the two dosages. As this was not the primary focus of the current study and we do not have sufficient data to fully explain these observations. Our speculation is that this may arise from pharmacokinetic differences associated with the dosing of this small molecule inhibitor, including potential saturation of transport mechanisms, alter tissue distribution, or off-target effects.

      (3) Although it is valuable that the authors were able to suggest the possibility of JNK inhibitor as a therapeutic strategy for MAFLD, the evaluation of the therapeutic effect was limited to the evaluation of plasma TG, LDH, and gene expression changes. As there was no evaluation of liver tissue images, it is unclear what changes were brought about in the liver by the excessive sucrose intake and the treatment with JNK-IN-5A.

      We acknowledge that the lack of histological evaluations may limit to having a complete picture of the interventions' effects. However, as you noted, our transcriptional and systems-wide investigation across multiple tissues provides novel and significant insights into the molecular and systemic impacts of JNK-IN-5A treatment.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) It would be useful to explain why the authors conducted their research using female rats but not male rats.

      Thank you for raising this insightful point. We chose female rats for the current study was based on several considerations. 1) Previous research has demonstrated that female rats exhibit metabolic dysfunction (e.g., hypertriglyceridemia, liver steatosis, insulin resistance) in response to dietary factors, such as high-sucrose feeding [16-19]. These metabolic characteristics made them an appropriate model for assessing the in vivo effects of JNK inhibition under high-sucrose conditions. 2) It is also reported that female rats show resilience to high-sucrose-induced metabolic dysfunction due to the protective effects of estrogen [8], we aimed to determine whether JNK inhibition could provide therapeutic benefits in this context. This allows us to evaluate the effect of JNK inhibition even in metabolically advantaged groups. 3) Our results from the tolerance test (Fig. 2a) indicated that female rats displayed more fluctuating variation to JNK-IN-5A administration. This variation allowed us to evaluate how JNK inhibition influences metabolic outcomes in a sex that is more responsive to the intervention. Nonetheless, we emphasize the importance of future studies involving male rats to better understand sex-specific responses to JNK inhibition and to provide more comprehensive guidance for the development of JNK-targeting therapies in MAFLD treatment.

      (2) Figure 2C shows that JNK-IN-5A administration reduces the mRNA levels of Mapk8 and Mapk9 in the liver and the SkM. It would be useful to provide the authors' insight into the data. 

      In the liver, the data in Fig. 2c in original submission and the attached Fig. 1 show that sucrose feeding induces opposite alterations in the mRNA expression of Mapk8 (Jnk1, increased, log2FC<sub>SucrosevsControl</sub>= 0.02) and Mapk9 (Jnk2, decreased, log2FC<sub>SucrosevsControl</sub>= -0.43), though these changes do not reach statistical significance. JNK-IN-5A administration reverses these effects, significantly decreasing Mapk8 expression (log2FC<sub>Sucrose+JNK_D1vsSucrose</sub>= -0.37) while increasing Mapk9 expression (log2FC<sub>Sucrose+JNK_D1vsSucrose</sub>= 0.42). This suggests potential differential yet compensatory roles of these two isoforms in regulating JNK activity during these interventions in the liver, keeping in line with the findings from Jnk1- and/or Jnk2-specific knockout studies [20, 21]. Additionally, emerging evidence indicates that Jnk1 plays a major role in diet-induced liver fibrosis and metabolic dysfunction [22-25]. Therefore, the reduced Mapk8 expression following JNK-IN-5A administration may contribute to the observed improvements in liver metabolism.

      Author response image 1.

      The spearman correlation between expression levels of Mapk8

      In skeletal muscle, the primary site for insulin-stimulated glucose uptake, insulin signaling is crucial for maintaining metabolic homeostasis [26]. Numerous studies have demonstrated that JNK activation promotes insulin resistance and targeting JNK might be a promising therapeutic strategy for the treatment of metabolic diseases associated with insulin resistance, such as MAFLD [24]. In our study, while sucrose overconsumption did not significantly alter the mRNA levels of JNK isoforms in this tissue, JNK-IN-5A at dosage 30 mg/kg/day administration significantly reduced the expression of both Jnk1 and Jnk2 as well as genes involved in insulin signaling (Fig. 5). This suggests a potential interplay between JNK inhibition and insulin signaling pathways in the skeletal muscle, where inhibition of JNK activity may improve insulin sensitivity by modulating these pathways. However, it is also crucial  to investigate the longterm effects of JNK-IN-5A administration and its broader impact on many other physiological processes regulated by the JNK pathway. These aspects will be a focus of our future studies.

      (3) The notations a and b in Figure S5 are missing.  

      Thank you for this constructive comment. We have corrected this in the revised figure S5.

      (4) Data S13 described in the figure legend for Figure 7 (lines 630 and 632) seems a mistake and should be Data S8.

      (5) The notations a, b, and c in Figure 7 are incorrect. The figure legend for Figure 7a doesn't seem to match the figure contents.

      We appreciate your attention to details regarding Fig. 7. We have corrected the reference and the figure legend in revised Fig. 7.

      Reference

      (1) Fujii, A., et al., Sucrose Solution Ingestion Exacerbates DinitrofluorobenzeneInduced Allergic Contact Dermatitis in Rats. Nutrients, 2024. 16(12).

      (2) Sun, S., et al., High sucrose diet-induced dysbiosis of gut microbiota promotes fatty liver and hyperlipidemia in rats. J Nutr Biochem, 2021. 93: p. 108621.

      (3) Qi, S., et al., Inositol and taurine ameliorate abnormal liver lipid metabolism induced by high sucrose intake. Food Bioscience, 2024. 60: p. 104368.

      (4) Ramos-Romero, S., et al., The Buckwheat Iminosugar d-Fagomine Attenuates Sucrose-Induced Steatosis and Hypertension in Rats. Mol Nutr Food Res, 2020. 64(1): p. e1900564.

      (5) Ortiz, S.R. and M.S. Field, Sucrose Intake Elevates Erythritol in Plasma and Urine in Male Mice. J Nutr, 2023. 153(7): p. 1889-1902.

      (6) Beckmann, M., et al., Changes in the human plasma and urinary metabolome associated with acute dietary exposure to sucrose and the identification of potential biomarkers of sucrose intake. Mol Nutr Food Res, 2016. 60(2): p. 444-57.

      (7) He, X., et al., High Fat Diet and High Sucrose Intake Divergently Induce Dysregulation of Glucose Homeostasis through Distinct Gut Microbiota-Derived Bile Acid Metabolism in Mice. J Agric Food Chem, 2024. 72(1): p. 230-244.

      (8) Stephenson, E.J., et al., Chronic intake of high dietary sucrose induces sexually dimorphic metabolic adaptations in mouse liver and adipose tissue. Nat Commun, 2022. 13(1): p. 6062.

      (9) Mock, K., et al., High-fructose corn syrup-55 consumption alters hepatic lipid metabolism and promotes triglyceride accumulation. J Nutr Biochem, 2017. 39: p. 32-39.

      (10) Eryavuz Onmaz, D. and B. Ozturk, Altered Kynurenine Pathway Metabolism in Rats Fed Added Sugars. Genel Tıp Dergisi, 2022. 32(5): p. 525-529.

      (11) Gariani, K., et al., Eliciting the mitochondrial unfolded protein response by nicotinamide adenine dinucleotide repletion reverses fatty liver disease in mice. Hepatology, 2016. 63(4): p. 1190-204.

      (12) Togo, J., et al., Impact of dietary sucrose on adiposity and glucose homeostasis in C57BL/6J mice depends on mode of ingestion: liquid or solid. Mol Metab, 2019. 27: p. 22-32.

      (13) Arifin, W.N. and W.M. Zahiruddin, Sample Size Calculation in Animal Studies Using Resource Equation Approach. Malays J Med Sci, 2017. 24(5): p. 101-105.

      (14) Faul, F., et al., G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods, 2007. 39(2): p. 175-91.

      (15) Bonapersona, V., et al., Increasing the statistical power of animal experiments with historical control data. Nat Neurosci, 2021. 24(4): p. 470-477.

      (16) Kendig, M.D., et al., Metabolic EYects of Access to Sucrose Drink in Female Rats and Transmission of Some EYects to Their OYspring. PLoS One, 2015. 10(7): p. e0131107.

      (17) Harris, R.B.S., Source of dietary sucrose influences development of leptin resistance in male and female rats. Am J Physiol Regul Integr Comp Physiol, 2018. 314(4): p. R598-R610.

      (18) Velasco, M., et al., Sexual dimorphism in insulin resistance in a metabolic syndrome rat model. Endocr Connect, 2020. 9(9): p. 890-902.

      (19) Maniam, J., C.P. Antoniadis, and M.J. Morris, The eYect of early-life stress and chronic high-sucrose diet on metabolic outcomes in female rats. Stress, 2015. 18(5): p. 524-37.

      (20) Singh, R., et al., DiYerential eYects of JNK1 and JNK2 inhibition on murine steatohepatitis and insulin resistance. Hepatology, 2009. 49(1): p. 87-96.

      (21) Sabapathy, K., et al., Distinct roles for JNK1 and JNK2 in regulating JNK activity and c-Jun-dependent cell proliferation. Mol Cell, 2004. 15(5): p. 713-25.

      (22) Zhao, G., et al., Jnk1 in murine hepatic stellate cells is a crucial mediator of liver fibrogenesis. Gut, 2014. 63(7): p. 1159-72.

      (23) Czaja, M.J., JNK regulation of hepatic manifestations of the metabolic syndrome. Trends Endocrinol Metab, 2010. 21(12): p. 707-13.

      (24) Solinas, G. and B. Becattini, JNK at the crossroad of obesity, insulin resistance, and cell stress response. Mol Metab, 2017. 6(2): p. 174-184.

      (25) Schattenberg, J.M., et al., JNK1 but not JNK2 promotes the development of steatohepatitis in mice. Hepatology, 2006. 43(1): p. 163-72.

      (26) Sylow, L., et al., The many actions of insulin in skeletal muscle, the paramount tissue determining glycemia. Cell Metab, 2021. 33(4): p. 758-780.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The study starts with the notion that in an AD-like disease model, ILC2s in the Rag1 knockout were expanded and contained relatively more IL-5<sup>+</sup> and IL-13<sup>+</sup> ILC2s. This was confirmed in the Rag2 knock-out mouse model.

      By using a chimeric mouse model in which wild-type knock-out splenocytes were injected into irradiated Rag1 knock-out mice, it was shown that even though the adaptive lymphocyte compartment was restored, there were increased AD-like symptoms and increased ILC2 expansion and activity. Moreover, in the reverse chimeric model, i.e. injecting a mix of wild-type and Rag1 knock-out splenocytes into irradiated wild-type animals, it was shown that the Rag1 knock-out ILC2s expanded more and were more active. Therefore, the authors could conclude that the RAG1 mediated effects were ILC2 cell-intrinsic.

      Subsequent fate-mapping experiments using the Rag1Cre;reporter mouse model showed that there were indeed RAGnaïve and RAGexp ILC2 populations within naïve mice. Lastly, the authors performed multi-omic profiling, using single-cell RNA sequencing and ATACsequencing, in which a specific gene expression profile was associated with ILC2. These included well-known genes but the authors notably also found expression of Ccl1 and Ccr8 within the ILC2. The authors confirmed their earlier observations that in the RAGexp ILC2 population, the Th2 regulome was more suppressed, i.e. more closed, compared to the RAGnaïve population, indicative of the suppressive function of RAG on ILC2 activity. I do agree with the authors' notion that the main weakness was that this study lacks the mechanism by which RAG regulates these changes in ILC2s.

      The manuscript is very well written and easy to follow, and the compelling conclusions are well supported by the data. The experiments are meticulously designed and presented. I wish to commend the authors for the study's quality.

      Even though the study is compelling and well supported by the presented data, some additional context could increase the significance:

      (1) The presence of the RAGnaïve and RAGexp ILC2 populations raises some questions on the (different?) origin of these populations. It is known that there are different waves of ILC2 origin (most notably shown in the Schneider et al Immunity 2019 publication, PMID 31128962). I believe it would be very interesting to further discuss or possibly show if there are different origins for these two ILC populations.

      Several publications describe the presence and origin of ILC2s in/from the thymus (PMIDs 33432227 24155745). Could the authors discuss whether there might be a common origin for the RAGexp ILC2 and Th2 cells from a thymic lineage? If true that the two populations would be derived from different populations, e.g. being the embryonic (possibly RAGnaïve) vs. adult bone marrow/thymus (possibly RAGexp), this would show a unique functional difference between the embryonic derived ILC2 vs. adult ILC2.

      We agree with the Reviewer that our findings raise important questions about ILC ontogeny. These are areas of ongoing investigation for us, and it is our hope this study may inform further investigation by others as well.

      Regarding the Schneider et al study, we have considered the possibility that RAG expression may mark a particular wave of ILC2 origin. In that study, the authors used a tamoxifen-based inducible Cre strategy in their experiments to precisely time the lineage tracing of a reporter from the Rosa26 locus. Those lineage tracing mice would overlap genetically with the RAG lineage tracing mice we used in our current study, thus performing combined timed migration fate mapping and RAG fate mapping experiments would require creating novel mouse strains.

      Similarly, the possible influence of the thymic or bone marrow environment on RAG expression in ILCs is an exciting possibility. Perhaps there are signals common to those environments that can influence all developing lymphocytes, including not only T and B cells but also ILCs, with one consequence being induction of RAG expression. While assessing levels of RAG-experienced ILCs in these tissues using our lineage tracing mouse may hint at these possibilities, conclusive evidence would require more precise control over the timing of RAG lineage tracing than our current reagents allow (e.g. to control for induction in those environments vs migration of previously fate-mapped cells to those environments).

      To answer these questions directly, we are developing orthogonal lineage tracing mouse strains, which can report on both timing of ILC development and RAG expression, but these mice are not available yet. Given the limitations of our currently available reagents, we were careful to focus our manuscript on the skin phenotype and the more descriptive aspects of the RAG-induced phenotype. We have elaborated on these important questions and referenced all the studies noted by the Reviewer in the Discussion section as areas of future inquiry on lines 421-433.  

      (2) On line 104 & Figures 1C/G etc. the authors describe that in the RAG knock-out ILC2 are relatively more abundant in the lineage negative fraction. On line 108 they further briefly mentioned that this observation is an indication of enhanced ILC2 expansion. Since the study includes an extensive multi-omics analysis, could the authors discuss whether they have seen a correlation of RAG expression in ILC2 with regulation of genes associated with proliferation, which could explain this phenomenon?

      We thank the Reviewer for pointing out this opportunity to further correlate our functional and multiomic findings. To address this, we first looked deeper into our prior analyses and found that among the pathways enriched in GSEA analysis of differentially expressed genes (DEGs) between RAG<sup>+</sup> and RAG<sup>-</sup> ILC2s, one of the pathways suppressed in RAG<sup>+</sup> ILC2s was “GOBP_EPITHELIAL_CELL_PROLIFERATION.”

      ( Author response image 1). There are a few other gene sets present in other databases such as MSigDB with terms including “proliferation,” but these are often highly specific to a particular cell type and experimental or disease condition (e.g. tissue-specific cancers). We did not find any of these enriched in our GSEA analysis.

      Author response image 1.

      GSEA plot of GOBP epithelial proliferation pathway in RAG-experienced vs RAG-naïve ILC2s.

      The ability to predict cellular proliferation states from transcriptomic data is an area of active research, and there does not appear to be any universally accepted method to do this reliably. We found two recent studies (PMIDs 34762642; 36201535) that identified novel “proliferation signatures.” Since these gene sets are not present in any curated database, we repeated our GSEA analysis using a customized database with the addition of these gene sets. However, we did not find enrichment of these sets in our RAG+/- ILC2 DEG list. We also applied our GPL strategy integrating analysis of our epigenomic data to the proliferation signature genes, but we did not see any clear trend. Conversely, our GSEA analysis did not identify any enrichment for apoptotic signatures as a potential mechanism by which RAG may suppress ILC2s.

      Notwithstanding the limitations of inferring ILC2 proliferation states from transcriptomic and epigenomic data, our experimental data suggest RAG exerts a suppressive effect on ILC2 proliferation. To formally test the hypothesis that RAG suppresses proliferation in the most rigorous way, we feel new mouse strains are needed that allow simultaneous RAG fate mapping and temporally restricted fate mapping. We elaborate on this in new additions to the discussion on lines 421-433.

      Reviewer #2 (Public Review):

      Summary:

      The study by Ver Heul et al., investigates the consequences of RAG expression for type 2 innate lymphoid cell (ILC2) function. RAG expression is essential for the generation of the receptors expressed by B and T cells and their subsequent development. Innate lymphocytes, which arise from the same initial progenitor populations, are in part defined by their ability to develop in the absence of RAG expression. However, it has been described in multiple studies that a significant proportion of innate lymphocytes show a history of Rag expression. In compelling studies several years ago, members of this research team revealed that early Rag expression during the development of Natural Killer cells (Karo et al., Cell 2014), the first described innate lymphocyte, had functional consequences.

      Here, the authors revisit this topic, a worthwhile endeavour given the broad history of Rag expression within all ILCs and the common use of RAG-deficient mice to specifically assess ILC function. Focusing on ILC2s and utilising state-of-the-art approaches, the authors sought to understand whether early expression of Rag during ILC2 development had consequences for activity, fitness, or function. Having identified cell-intrinsic effects in vivo, the authors investigated the causes of this, identifying epigenetic changes associated with the accessibility genes associated with core ILC2 functions.

      The manuscript is well written and does an excellent job of supporting the reader through reasonably complex transcriptional and epigenetic analyses, with considerate use of explanatory diagrams. Overall I think that the conclusions are fair, the topic is thoughtprovoking, and the research is likely of broad immunological interest. I think that the extent of functional data and mechanistic insight is appropriate.

      Strengths:

      - The logical and stepwise use of mouse models to first demonstrate the impact on ILC2 function in vivo and a cell-intrinsic role. Initial analyses show enhanced cytokine production by ILC2 from RAG-deficient mice. Then through two different chimeric mice (including BM chimeras), the authors convincingly show this is cell intrinsic and not simply as a result of lymphopenia. This is important given other studies implicating enhanced ILC function in RAG-/- mice reflect altered competition for resources (e.g. cytokines).

      - Use of Rag expression fate mapping to support analyses of how cells were impacted - this enables a robust platform supporting subsequent analyses of the consequences of Rag expression for ILC2.

      - Use of snRNA-seq supports gene expression and chromatin accessibility studies - these reveal clear differences in the data sets consistent with altered ILC2 function.

      - Convincing evidence of epigenetic changes associated with loci strongly linked to ILC2 function. This forms a detailed analysis that potentially helps explain some of the altered ILC2 functions observed in ex vivo stimulation assays.

      - Provision of a wealth of expression data and bioinformatics analyses that can serve as valuable resources to the field.

      We appreciate the strengths noted by the Reviewer for our study. We would like to especially highlight the last point about our single cell dataset and provision of supplemental data tables. Although our study is focused on AD-like skin disease and skin draining lymph nodes, we hope that our findings can serve as a valuable resource for future investigation into mechanisms of RAG modulation of ILC2s in other tissues and disease states.  

      Weaknesses:

      - Lack of insight into precisely how early RAG expression mediates its effects, although I think this is beyond the scale of this current manuscript. Really this is the fundamental next question from the data provided here.

      We thank the Reviewer for their recognition of the context of our current work and its future implications. We aimed to present compelling new observations within the scope of what our current data can substantiate. We believe answering the next fundamental question of the mechanisms by which RAG mediates its effects in ILC2s will require development of novel reagents. We are actively pursuing this, and we look forward to others building on our findings as well.

      - The epigenetic analyses provide evidence of differences in the state of chromatin, but there is no data on what may be interacting or binding at these sites, impeding understanding of what this means mechanistically.

      We thank the Reviewer for pointing out this aspect of the epigenomic data analysis and the opportunity to expand the scope of our manuscript. We performed additional analyses of our data to identify DNA binding motifs and infer potential transcription factors that may be driving the effects of a history of RAG expression that we observed. We hope that these additional data, analyses, and interpretation add meaningful insight for our readers.

      We first performed the analysis for the entire dataset and validated that the analysis yielded results consistent with prior studies (e.g. finding EOMES binding motifs as a marker in NK cells). Then, we examined the differences in RAG fate-mapped ILC2s. These analyses are in new Figure S10 and discussed on lines 277-316.  

      We also performed an analysis specifically on the Th2 locus, given the effects of RAG on type 2 cytokine expression. These analyses are in new Figure S12 and discussed on lines 366-378.

      - Focus on ILC2 from skin-draining lymph nodes rather than the principal site of ILC2 activity itself (the skin). This may well reflect the ease at which cells can be isolated from different tissues.

      We appreciate the Reviewer’s insight into the limitations of our study. Difficulties in isolating ILC2s from the skin were indeed a constraint in our study. In particular, we were unable to isolate enough ILC2s from the skin for stimulation and cytokine staining. Given that one of our main hypotheses was that RAG affects ILC2 function, we focused our studies on skin draining lymph nodes, which allowed measurement of the two main ILC2 functional cytokines, IL-5 and IL-13, as readouts in the key steady state and AD-like disease experiments.

      - Comparison with ILC2 from other sites would have helped to substantiate findings and compensate for the reliance on data on ILC2 from skin-draining lymph nodes, which are not usually assessed amongst ILC2 populations.

      We agree with the Reviewer that a broader survey of the RAG-mediated phenotype in other tissues and by extension other disease models would strengthen the generalizability of our observations. Indeed, we did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and -donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated ( Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant ( Author response image 2B,D,F,H,J).

      Notwithstanding these results, given that we unexpectedly observed enhanced AD-like inflammation in the MC903 model in Rag1 KO mice, we concentrated our later experiments and analyses on defining the differences in skin draining ILC2s modulated by RAG. Our subsequent findings in the skin provoke many new hypotheses about the role of RAG in ILC2s in other tissues, and our tissue survey in the BM chimera provides additional rationale to pursue similar studies in disease models in other tissues. While this is an emerging area of investigation in our lab, we opted to focus this manuscript on our findings related to the AD-like disease model. We have ongoing studies to investigate other tissues, and we are still in the early stages of developing disease models to expand on these findings. However, if the reviewer feels strongly this additional data should be included in the manuscript, we are happy to add it. Considering the complexity of the data and concepts in the manuscript, we hoped to keep it focused to where we have strong molecular, cellular, and phenotypic outcomes.

      Author response image 2.

      Comparison of immune reconstitution in and ILC2 donor proportions in different tissues from BM chimeras. Equal quantities of bone marrow cells from Rag1<sup>-/-</sup> (CD45.2,CD90.2) and WT (CD45.2, CD90.1) C57Bl/6J donor mice were used to reconstitute the immune systems of irradiated recipient WT (CD45.1) C57Bl/6J mice. The proportion of live cells that are donor-derived (CD45.2), host-derived (CD45.1), or parenchymal (CD45-) [above] and proportion of ILC2s that are from Rag1<sup>-/-</sup> (CD90.2) or WT (CD90.1) donors [below] for A,B) skin C,D) sdLN E,F) lung G,H) spleen and I,J) mLN.

      - The studies of how ILC2 are impacted are a little limited, focused exclusively on IL-13 and IL-5 cytokine expression.

      We agree with the reviewer that our functional readout on IL-5 and IL-13 is relatively narrow. However, this focused experimental design was based on several considerations. First, IL-5 and IL-13 are widely recognized as major ILC2 effector molecules (Vivier et al, 2018, PMID 30142344). Second, in the MC903 model of AD-like disease, we have previously shown a clear correlation between ILC2s, levels of IL-5 and IL-13, and disease severity as measured by ear thickness (Kim et al, 2013, PMID 23363980). Depletion of ILC2s led to decreased levels of IL-13 and IL-5 and correspondingly reduced ear inflammation. However, while ILC2s are also recognized to produce other effector molecules such as IL-9 and Amphiregulin, which are likely involved in human atopic dermatitis (Namkung et al, 2011, PMID 21371865; Rojahn et al, 2020, PMID 32344053), there is currently no evidence linking these effectors to disease severity in the MC903 model. Third, IL-13 is emerging as a key cytokine driving atopic dermatitis in humans (Tsoi et al, 2019, PMID 30641038). Drugs targeting the IL-4/IL-13 receptor (dupilumab), or IL-13 itself (tralokinumab, lebrikizumab), have shown clear efficacy in treating atopic dermatitis. Interestingly, drugs targeting more upstream molecules, like TSLP (tezepelumab) or IL-33 (etokimab), have failed in atopic dermatitis. Taken together, these findings from both mouse and human studies suggest IL-13 is a critical therapeutic target, and thus functional readout, in determining the clinical implications of type 2 immune activation in atopic dermatitis.

      Aside from effector molecules, other readouts such as surface receptors may be of interest in understanding the mechanism of how RAG influences ILC2 function. For example, IL-18 has been shown to be an important co-stimulatory molecule along with TSLP in driving production of IL-13 by cutaneous ILC2s (Ricardo-Gonzalez et al, 2018, PMID 30201992). Our multiomic analysis showed decreased IL-18 receptor regulome activity in RAG-experienced ILC2s, which may be a mechanism by which RAG suppresses IL-13 production. Ultimately, in that study the role of IL-18 in enhancing MC903-induced inflammation through ILC2s was via increased production of IL-13, which was one of our major functional readouts. To clearly define mechanisms like these will require generation of new mice to interrogate RAG status in the context of tissue-specific knockout of other genes, such as the IL-18 receptor. We plan to perform these types of experiments in follow up studies. Notwithstanding this, we have now included additional discussion on lines 476508 to highlight why understanding how RAG impacts other regulatory and effector pathways would be an interesting area of future inquiry.

      Reviewer #3 (Public Review):

      In this study, Ver Heul et al. investigate the role of RAG expression in ILC2 functions. While RAG genes are not required for the development of ILCs, previous studies have reported a history of expression in these cells. The authors aim to determine the potential consequences of this expression in mature cells. They demonstrate that ILC2s from RAG1 or RAG2 deficient mice exhibit increased expression of IL-5 and IL-13 and suggest that these cells are expanded in the absence of RAG expression. However, it is unclear whether this effect is due to a direct impact of RAG genes or a consequence of the lack of T and B cells in this condition. This ambiguity represents a key issue with this study: distinguishing the direct effects of RAG genes from the indirect consequences of a lymphopenic environment.

      The authors focus their study on ILC2s found in the skin-draining lymph nodes, omitting analysis of tissues where ILC2s are more enriched, such as the gut, lungs, and fat tissue. This approach is surprising given the goal of evaluating the role of RAG genes in ILC2s across different tissues. The study shows that ILC2s derived from RAG-/- mice are more activated than those from WT mice, and RAG-deficient mice show increased inflammation in an atopic dermatitis (AD)-like disease model. The authors use an elegant model to distinguish ILC2s with a history of RAG expression from those that never expressed RAG genes. However, this model is currently limited to transcriptional and epigenomic analyses, which suggest that RAG genes suppress the type 2 regulome at the Th2 locus in ILC2s.

      We agree with the Reviewer that understanding the role of RAG in ILC2s across different tissues is an important goal. One of the primary inspirations for our paper was the clinical paradox that patients with Omenn syndrome, despite having profound adaptive T cell deficiency, develop AD with much greater penetrance than in the general population. Thus, there was always an appreciation for the likelihood that skin ILC2s have a unique proclivity towards the development of AD-like disease. Notwithstanding this, given the profound differences that can be found in ILC2s based on their tissue residence and disease state (as the Reviewer also points out below), we focused our investigations on characterizing the skin draining lymph nodes to better define factors underlying our initial observations of enhanced AD-like disease in Rag1<sup>-/-</sup> mice. While our findings in skin provoke the hypothesis that similar effects may be observed in other tissues and influence corresponding disease states, we were cautious not to suggest this may be the case by reporting surveys of other tissues without development of additional disease models to formally test these hypotheses. We present this manuscript now as a short, skin-focused study, rather than delaying publication to expand its scope. Truthfully, this project started in 2015 and has undergone many delays with the hopes of newer technologies and reagents coming to add greater clarity. We hope our study will enable others to pursue the goal of understanding the broader effects of RAG in ILC2s, and potentially other innate lymphoid lineages as well.

      We did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated ( Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant ( Author response image 2B,D,F,H,J). However, given the lack of correlation to disease readouts in other organ systems, we chose to not include this data in our manuscript. However, if the Reviewer feels these data should be included, we would be happy to include as a supplemental figure.

      The authors report a higher frequency of ILC2s in RAG-/- mice in skin-draining lymph nodes, which is expected as these mice lack T and B cells, leading to ILC expansion. Previous studies have reported hyper-activation of ILCs in RAG-deficient mice, suggesting that this is not necessarily an intrinsic phenomenon. For example, RAG-/- mice exhibit hyperphosphorylation of STAT3 in the gut, leading to hyperactivation of ILC3s. This study does not currently provide conclusive evidence of an intrinsic role of RAG genes in the hyperactivation of ILC2s. The splenocyte chimera model is artificial and does not reflect a normal environment in tissues other than the spleen. Similarly, the mixed BM model does not demonstrate an intrinsic role of RAG genes, as RAG1-/- BM cells cannot contribute to the B and T cell pool, leading to an expected expansion of ILC2s. As the data are currently presented it is expected that a proportion of IL-5-producing cells will come from the RAG1/- BM.

      The Reviewer raises an important point about the potential cell-intrinsic roles of RAG vs the many cell-extrinsic explanations that could affect ILC2 populations, with the most striking being the lack of T and B cells in RAG knockout mice. It is well-established that splenocyte transfer into T and B cell-deficient mice reconstitutes T cell-mediated effects (such as the T cell transfer colitis model pioneered by Powrie and others), and we were careful in our interpretation of the splenocyte chimera experiment to conclude only that lack of Tregs was unlikely to explain the enhanced ADlike disease in T (and B) cell-deficient mice.

      We agree with the Reviewer that the Rag1<sup>-/-</sup> BM will not contribute to the B and T cell pool. However, BM from the WT mice would be expected to contribute to development of the adaptive lymphocyte pool. Indeed, we found that most of the CD45<sup>+</sup> immune cells in the spleens of BM chimera mice were donor-derived ( Author response image 3A), and total levels of B cells and T cells showed reconstitution in a pattern similar to control spleens from donor WT mice, while spleens from donor Rag1<sup>-/-</sup> mice expectedly had essentially no detectable adaptive lymphocytes ( Author response image 3B-D). From this, we concluded the BM chimera experiment was successful in establishing an immune environment with the presence of adaptive lymphocytes, and the differences in ILC2 proportions we observed were in the context of developing alongside a normal number of B and T lymphocytes. Notwithstanding the potential role of the adaptive lymphocyte compartment in shaping ILC2 development, since we transplanted equal amounts of WT and Rag1<sup>-/-</sup> BM into the same recipient environment, we are not able to explain how cell-extrinsic effects alone would account for the unequal numbers of WT vs Rag1<sup>-/-</sup> ILC2s we observed after immune reconstitution.

      Author response image 3.

      Comparison of immune reconstitution in BM chimeras to controls. Equal quantities of bone marrow cells from Rag1<sup>-/-</sup> (CD45.2) and WT (CD45.2) C57Bl/6J donor mice were used to reconstitute the immune systems of irradiated recipient WT (CD45.1) C57Bl/6J mice. A) Number of WT recipient CD45.1+ immune cells in the spleens of recipient mice compared to number of donor CD45.2+ cells (WT and Rag1<sup>-/-</sup>) normalized to 100,000 live cells. Comparison of numbers of B cells, CD4+ T cells, and CD8+ T cells in spleens of B) BM chimera mice, C) control WT mice and D) control Rag1<sup>-/-</sup> mice.

      We also subsequently found transcriptional and epigenomic differences in RAG-experienced ILC2s compared to RAG-naïve ILC2s. Critically, these differences were present in ILC2s from the same mice that had developed normally within an intact immune system, rather than in the setting of a BM transplant or a defective immune background such as in Rag1<sup>-/-</sup> mice.

      We recognize that there are almost certainly cell-extrinsic factors affecting ILC2s in Rag1<sup>-/-</sup> mice due to lack of B and T cells, and that BM chimeras are not perfect substitutes for simulating normal hematopoietic development. However, the presence of cell-extrinsic effects does not negate the potential contribution of cell-intrinsic factors as well, and we respectfully stand by our conclusion that our data support a role, however significant, for cell-intrinsic effects of RAG in ILC2s.

      Finally, the Reviewer mentions the interesting observation that gut ILC3s exhibit hyperphosphorylation of STAT3 in Rag1<sup>-/-</sup> mice compared to WT as an example of cell-extrinsic effects of RAG deficiency (we assume this is in reference to Mao et al, 2018, PMID 29364878 and subsequent work). We now reference this paper and have included additional discussion on how our observations of ILC2s may be generalizable to not only other organ systems, but also other ILC subsets, limitations on these generalizations, and future directions on lines 477-520.

      Overall, the level of analysis could be improved. Total cell numbers are not presented, the response of other immune cells to IL-5 and IL-13 (except the eosinophils in the splenocyte chimera mice) is not analyzed, and the analysis is limited to skin-draining lymph nodes.

      We thank the Reviewer for the suggestions to add rigor to our analysis. ILC2 populations are relatively rare, and we designed our experiments to assess frequencies, rather than absolute numbers. We did not utilize counting beads, so our counts may not be comparable between samples. We have added additional data for absolute cell counts normalized to 100,000 live cells for each experiment (see below for a summary of new panels in each figure). Our new data on total cell numbers are consistent with the initial observations regarding frequency of ILC2s we reported from our experiments. For the BM chimera experiments, we presented the proportions of ILC2s, and IL-5 and IL-13 positive ILC2s, by donor source, as this is the critical question of the experiment. Notwithstanding our analysis by proportion, we found that the frequency of Rag1<sup>-/-</sup> ILC2s, IL-5<sup>+</sup> cells, or IL-13<sup>+</sup> cells within Lin- population was also significantly increased. While our initial submission included only the proportions for clarity and simplicity, we now include frequency and absolute numbers in new panels for more critical appraisal of our data by readers.

      In New Figure 1, we added new panels for ILC2 cell number in both the AD-like disease experiment (C) and in steady state (H).

      In New Figure S2, we added a panel for ILC2 cell number in steady state (B).

      In Figure 2 and associated supplemental data in Figure S4, we added several more panels. For the splenocyte chimera, we added a panel for ILC2 cell number in New Figure 2C.

      We incorporated multiple new panels in New Figure S4 to address the need for more data to be shown for the BM chimera (also requested by Reviewer #2). These included total cell counts and frequency for ILC2 (New Figure S4F,G), and IL-5<sup>+</sup> (New Figure S4I,K) and IL-13<sup>+</sup> (New Figure S4J,L) ILCs in addition to the proportions originally presented in Figure 2.  

      In terms of the limited analysis of other tissues, our initial observation of enhanced AD-like disease in Rag1<sup>-/-</sup> compared to WT mice built on our prior work elucidating the role of ILC2s in the MC903 model of AD-like disease in mice and AD in humans (Kim et al, 2013, PMID 23363980). Consequently, we focused on the skin to further develop our understanding of the role of RAG1 in this model. As in our prior studies, technical limitations in obtaining sufficient numbers of ILC2s from the skin itself for ex vivo stimulation to assess effector cytokine levels required performing these experiments in the skin draining lymph nodes.

      We agree that IL-5 and IL-13 are major mediators of type 2 pathology and studying their effects on immune cells is an important area of inquiry, particularly since there are multiple drugs available or in development targeting these pathways. However, our goal was not to study what was happening downstream of increased cytokine production from ILC2s, but instead to understand what was different about RAG-deficient or RAG-naïve ILC2s themselves that drive their expansion and production of effector cytokines compared to RAG-sufficient or RAGexperienced ILC2s. By utilizing the same MC903 model in which we previously showed a critical role for ILC2s in driving IL-5 and IL-13 production and subsequent inflammation in the skin, we were able to instead focus on defining the cell-intrinsic aspects of RAG function in ILC2s.

      The authors have a promising model in which they can track ILC2s that have expressed RAG or not. They need to perform a comprehensive characterization of ILC2s in these mice, which develop in a normal environment with T and B cells. Approximately 50% of the ILC2s have a history of RAG expression. It would be valuable to know whether these cells differ from ILC2s that never expressed RAG, in terms of proliferation and expression of IL5 and IL-13. These analyses should be conducted in different tissues, as ILC2s adapt their phenotype and transcriptional landscape to their environment. Additionally, the authors should perform their AD-like disease model in these mice.

      We agree with the Reviewer (and a similar comment from Reviewer #2) that a broader survey of the RAG-mediated phenotype in other tissues and by extension other disease models would strengthen the generalizability of our observations. Indeed, we did a more expansive survey of tissues in our BM chimera experiments. We found a similar trend to our reported findings in the sdLN in tissues known to be affected by ILC2s ( Author response image 2) including the skin and lung and in other lymphoid tissues including spleen and mesenteric lymph nodes (mLN). We found that donor reconstitution in each tissue was robust except for the skin, where there was no significant difference between host and donor CD45<sup>+</sup> immune cells and where CD45<sup>-</sup> parenchymal cells predominated (Author response image 2A,C,E,G,I). This may explain why Rag1<sup>-/-</sup> donor ILC2s were significantly higher in proportion in all tissues except the skin, where we observed a similar trend that was not statistically significant (Author response image 2B,D,F,H,J). We omitted these analyses to maintain the focus on the skin, but we will be happy to add this data to the manuscript if the Reviewer feels this figure should be helpful.

      Notwithstanding these results, given that we unexpectedly observed enhanced AD-like inflammation in the MC903 model in Rag1 KO mice, we concentrated our later experiments and analyses on defining the differences in skin draining ILC2s modulated by RAG. Our subsequent findings in the skin provoke many new hypotheses about the role of RAG in ILC2s in other tissues, and our tissue survey in the BM chimera provides additional rationale to pursue similar studies in disease models in other tissues. While this is an emerging area of investigation in our lab, we opted to focus this manuscript on our findings related to the AD-like disease model. We have ongoing studies to investigate other tissues, and we are still in the early stages of developing disease models to expand on these findings. However, if the reviewer feels strongly this additional data should be included in the manuscript, we are happy to add it. Considering the complexity of the data and concepts in the manuscript, we hoped to keep it focused to where we have strong molecular, cellular, and phenotypic outcomes. We elaborate on the implications of our work for future studies, including limitations of our study and currently available reagents and need for new mouse strains to rigorously answer these questions on lines 476-508

      The authors provide a valuable dataset of single-nuclei RNA sequencing (snRNA-seq) and ATAC sequencing (snATAC-seq) from RAGexp (RAG fate map-positive) and RAGnaïve (RAG fate map-negative) ILC2s. This elegant approach demonstrates that ILC2s with a history of RAG expression are epigenomically suppressed. However, key genes such as IL-5 and IL-13 do not appear to be differentially regulated between RAGexp and RAGnaïve ILC2s according to Table S5. Although the authors show that the regulome activity of IL-5 and IL-13 is decreased in RAGexp ILC2s, how do the authors explain that these genes are not differentially expressed between the RAGexp and RAGnaïve ILC2? I think that it is important to validate this in vivo.

      We thank the Reviewer for highlighting the value and possible elegance of our data. The Reviewer brings up an important issue that we grappled with in this study and that highlights a major technical limitation of single cell sequencing studies. Genes for secreted factors such as cytokines are often transcribed at low levels and are poorly detected in transcriptomic studies. This is particularly true in single cell studies with lower sequencing depth. Various efforts have been made to overcome these issues such as computational approaches to estimate missing data (e.g. van Djik et al, 2018, PMID 29961576; Huang et al, 2018, PMID 29941873), or recent use of cytokine reporter mice and dial-out PCR to enhance key cytokine signals in sequenced ILCs (Bielecki et al, 2021, PMID 33536623). We did not utilize computational methods to avoid the risk of introducing artifacts into the data, and we did not perform our study in cytokine reporter mice. Thus, cytokines were poorly detected in our transcriptomic data, as evidenced by lack of identification of cytokines as markers for specific clusters (e.g. IL-5 for ILC2s) or significant differential expression between RAG-naïve and RAG-experienced ILC2s.

      However, the multiomic features of our data allowed a synergistic analysis to identify effects on cytokines. For example, transcripts for the IL-4 and IL-5 were not detected at a high enough level to qualify as marker genes of the ILC2 cluster in the gene expression (GEX) assay but were identified as markers for the ILC2 cluster in the ATAC-seq data in the differentially accessible chromatin (DA) assay. Using the combined RNA-seq and ATAC-seq gene to peak links (GPL) analyses, many GPLs were identified in the Th2 locus for ILC2s, including for IL-13, which was not identified as a marker for ILC2s by any of the assays alone. Thus, our combined analysis took advantage of the potential of multiomic datasets to overcome a general weakness inherent to most scRNAseq datasets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Line 168; Reference 23 also showed expression in the NK cells, please add this reference to reference 24.

      We thank the reviewer for catching this oversight, and we have corrected it in the revised manuscript.

      - Please add the full names for GPL and sdLN in the text of the manuscript when first using these abbreviations. They are now only explained in the legends.

      We reviewed the manuscript text and found that we defined sdLNs for the first time on line 104. We defined GPLs for the first time on line 248. We believe these definitions are placed appropriately near the first references to the corresponding figures/analysis, but if the Reviewer believes we should move these definitions earlier, we are happy to do so.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest that the following reanalyses would improve the clarity of the data:

      - Can ILC2 numbers, rather than frequency, be used (e.g. in Figure 1C, S2B, and so on). This would substantiate the data that currently relies on percentages.

      This was a weakness also noted by Reviewer #3. We have added data on ILC2 numbers for each experiment as outlined below:

      In New Figure 1, we added new panels for ILC2 cell number in both the AD-like disease experiment (C) and in steady state (H).

      In New Figure S2, we added a panel for ILC2 cell number in steady state (B).

      In Figure 2 and associated supplemental data in Figure S4, we added several more panels. For the splenocyte chimera, we added a panel for ILC2 cell number in New Figure 2C.

      We incorporated multiple new panels in New Figure S4 to address the need for more data to be shown for the BM chimera (also requested by Reviewer #2). These included total cell counts and frequency for ILC2 (New Figure S4F,G), and IL-5<sup>+</sup> (New Figure S4I,K) and IL-13<sup>+</sup> (New Figure S4J,L) ILCs in addition to the proportions originally presented in Figure 2.  

      - Can the authors provide data on IL-33R expression on sdLN ILC2s? Expression of ST-2 (IL-33R) does vary between ILC2 populations and is impacted by the digestion of tissue. All of the data provided here requires ILC2 to be IL-33R<sup>+</sup>. In the control samples, the ILC2 compartment is very scarce - in LNs, ILC2s are rare. The gating strategy with limited resolution of positive and negative cells in the lineage gate doesn't help this analysis.

      The Reviewer raises a valid point regarding the IL-33R marker and ILC2s. We designed our initial experiments to be consistent with our earlier observations of skin ILC2s, which were defined as CD45<sup>+</sup>Lin-CD90+CD25+IL33+, and the scarcity of skin draining lymph node ILC2s at steady state was consistent with our prior findings (Kim et al, 2013, PMID 23363980). We can include MFI data on IL-33R expression in these cells if the reviewer feels strongly that this would add to the manuscript, but we did not include other ILC2-specific markers in these experiments that would give us an alternative total ILC2 count to calculate frequency of IL-33R<sup>+</sup> ILC2s, which would also make the context of the IL-33 MFI difficult to interpret.

      Other studies defining tissue specific expression patterns in ILC2s have called into question whether IL-33R is a reliable marker to define skin ILC2s (Ricardo-Gonzalez et al, 2018, PMID 30201992). However, there is evidence for region-specific expression of IL-33R (Kobayashi et al, 2019, PMID 30712873), with ILC2s in the subcutis expressing high levels of IL-33R and both IL5 and IL-13, while ILC2s in the epidermis and dermis have low levels of IL-33R and IL-5 expression. In contrast to the Kobayashi et al study, Ricardo-Gonzalez et al sequenced ILC2s from whole skin, thus the region-specific expression patterns were not preserved, and the lower expression of IL-33R in the epidermis and dermis may have diluted the signal from the ILC2s in the subcutis. These may also be the ILC2s most likely to drain into the lymph nodes, which is the tissue on which we focused our analyses (consistent with our prior work in Kim et al, 2013).

      - In Figure 2 (related to 2H, 2I) can flow plots of the IL-5 versus IL-13 gated on either CD90.1+CD45.2+ or CD90.2+CD45.2+ ILC2 be shown? I.e. gate on the ILC2s and show cytokine expression, rather than the proportion of donor IL5/13. The proportion of donor ILC2 is shown to be significantly higher in 2G. Therefore gating on the cells of interest and showing on a cellular basis their ability to produce the cytokines would better make the point I think.

      We agree that this is important additional data to include. We have added flow plots of sdLN ILC2s from the BM chimera divided by donor genotype showing IL-5 and IL-13 expression in New Figure S4H.

      I assume the authors have looked and there is no obvious data, but does analysis of transcription factor consensus binding sequences in the open chromatin provide any new insight?

      The Reviewer also commented on this in the public review. As copied from our response above:

      We found that the most enriched sites in the ILC2 gene loci contained the consensus sequence GGGCGG (or its reverse complement), a motif recognized by a variety of zinc finger transcription factors (TFs). Predictions from our analyses predicted the KLF family of zinc finger TFs as most likely to be enriched at the identified open chromatin regions. To infer which KLFs might be occupying these sites in the RAG-experienced or RAG-naïve cells, we also assessed the expression levels of these identified TFs. Interestingly, KLF2 and KLF6 are more expressed in RAG-experienced ILC2s. KLF6 is a tumor suppressor (PMID: 11752579), and both KLF6 and KLF2 were recently shown to be markers of “quiescent-like” ILCs (PMID: 33536623). Further, upon analysis of the Th2 locus, the (A/T)GATA(A/G) consensus site (or reverse complement) was enriched in identified open chromatin at that locus. The algorithm predicted multiple TFs from the GATA family as possible binding partners, but expression analysis showed only GATA3 was highly expressed in ILC2s, consistent with what would be predicted from prior studies (PMID: 9160750).

      We have added this data in new Figure S10 and new Figure S12, with corresponding text in the Results section on lines 277-316 and lines 366-378.

      In terms of phrasing and presentation:

      - It would help to provide some explanation of why all analyses focus on the draining LNs rather than the actual site of inflammation (the ear skin). I do not think it appropriate to ask for data on this as this would require extensive further experimentation, but there should be some discussion on this topic. This feels relevant given that the skin is the site of inflammatory insult and ILC2 is present here. How the ILC2 compartment in the skindraining lymph nodes relates to those in the skin is not completely clear, particularly given the prevailing dogma that ILC2 are tissue-resident.

      Given limitations of assessing cytokine production of the relatively rare population of skin-resident ILC2s, we focused on the skin-draining lymph nodes (sdLN). Our findings in the current manuscript are consistent with our prior work in Kim et al, 2013 (PMID 23363980), and more recently in Tamari et al, 2024 (PMID 38134932), which demonstrated correlation of increased ILC2s in sdLN with increased skin inflammation in the MC903 model. Similarly, Dutton et al (PMID 31152090) have demonstrated expansion of the sdLN ILC2 pool in response to MC903-induced AD-like inflammation in mice. We elaborate on the implications of our work for future studies, including limitations of our study (including the focus on the sdLN), and currently available reagents and need for new mouse strains to rigorously answer these questions on lines 476-508

      - I think the authors should explicitly state that cytokine production is assessed after ex vivo restimulation (e.g. Lines 112-113).

      We have added this statement to the revised text.

      - I also think that it would help to be consistent with axis scales where analyses are comparable (e.g. Figure 1D vs Figure 1H).

      We agree with the Reviewer and we have adjusted the axes for consistency. The data remains unchanged, but axes are slightly adjusted in New Figure 1 (D&I, E&J, F&K) and New Figure S2 (C-E match New Figure 1 D-F). This same axis scaling scheme is carried forward to New Figure 2 (D-E) and New Figure S4 (G,K,L). New data on cell counts is also included per request by Reviewers 2 and 3 (see above). However, we found results for total cells, including ILC2s (New Figure 1C,H, New Figure S2B, New Figure 2C, New Figure S4F), were consistent within experiments, but not between experiments, likely representing issues with normalizing counts (we did not include counting beads for more accurate total counts). Thus, the y-axes in those panels are not consistent between experiments/figures.

      We feel reporting the proportion of WT vs Rag1<sup>-/-</sup> donor cells for the BM chimera is most illustrative of the effect of RAG and have kept it in the main New Figure 2, but for the BM chimera experiment panels we also include the total counts of IL-5<sup>+</sup> and IL-13<sup>+</sup> ILC2s (New Figure S4I,J).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given that KRAS inhibition approaches are a relatively new innovation and that resistance is now being observed to such therapies in patients with NSCLC, investigation of combination therapies is valuable. The manuscript furthers our understanding of combination therapy for KRAS mutant non-small cell lung cancer by providing evidence that combined inhibition of ULK1/2 (and therefore autophagy) and KRAS can inhibit KRAS-mutant lung cancer growth. The manuscript will be of interest to the lung cancer community but also to researchers in other cancer types where KRAS inhibition is relevant.

      Strengths:

      The manuscript combines cell line, cell line-derived xenograft, and genetically-engineered mouse model data to provide solid evidence for the proposed combination therapy.  The manuscript is well written, and experiments are broadly well performed and presented.

      We thank Reviewer #1 (R1) for the generally favorable review of our manuscript, and also for the more detailed critique that identifies potential weaknesses in the research, which we address on a point-by-point basis below. 

      Weaknesses:

      With 3-4 mice per group in many experiments, experimental power is a concern and some comparisons (e.g. mono vs combination therapy) seem to be underpowered to detect a difference. Both male and female mice are used in experiments which may increase variability.

      We thank R1 for pointing out concerns regarding statistical power in our various mouse models of NSCLC experiments, and agree that more mice per group would certainly increase statistical power.  However, there are certain logistical considerations that impact the generation of cohorts of experimental KrasLSL-G12C mice.  Because mice homozygous for the KrasLSL-G12C allele display embryonic lethality, we are required to generate experimental mice by crossing heterozygous male and female KrasLSL-G12C mice.  Although 66% of the progeny of such crosses are predicted to be KrasLSL-G12C/+, experience tells us that we only obtain ~40-50% heterozygous KrasLSL-G12C/+ mice with litter sizes around 6-8 mice from such crosses.  Therefore, there are usually only about 4 heterozygous KrasLSL-G12C mice per litter, which presents a substantial challenge in generating larger cohorts of age-matched mice suitable for experiments, especially under conditions where we wish to euthanize mice at multiple time points for analysis.  For the GEM model experiments, Figure 3B is the only experiment that has n=3.  All other experiments contain 4-6 mice per experimental condition.  We rationalized using both male and female mice because both human males and females have high lung cancer rates.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Ghazi et reported that inhibition of KRASG12C signaling increases autophagy in KRASG12C-expressing lung cancer cells. Moreover, the combination of DCC 3116, a selective ULK1/2 inhibitor, plus sotorasib displays cooperative/synergistic suppression of human KRASG12C-driven lung cancer cell proliferation in vitro and tumor growth in vivo. Additionally, in genetically engineered mouse models of KRASG12C-driven NSCLC, inhibition of either KRASG12C or ULK1/2 decreases tumor burden and increases mouse survival. Additionally, this study found that LKB1 deficiency diminishes the sensitivity of KRASG12C/LKB1Null-driven lung cancer to the combination treatment, perhaps through the emergence of mixed adeno/squamous cell carcinomas and mucinous adenocarcinomas.

      Strengths:

      Both human cancer cells and mouse models were employed in this study to illustrate that inhibiting ULK1/2 could enhance the responsiveness of KRASG12C lung cancer to sotorasib. This research holds translational importance.

      We thank Reviewer #2 (R2) for the generally favorable review of our manuscript, and also for the more detailed critique that identifies potential weaknesses in the research, which we address on a point-by-point basis below. 

      Weaknesses:

      Additional validation of certain data is necessary.

      (1) mCherry-EGFP-LC3 reporter was used to assess autophagy flux in Figure 1A. Please explain how autophagy status (high, medium, and low) was defined. It's also suggested to show WB of LC3 processing in different treatments as in Figure 1A at 48 hours.

      We thank the reviewer for this comment and agree that a more thorough description of how autophagy status is assessed using the Fluorescent Autophagy Reporter (FAR) would benefit the readers of our manuscript.  Cells engineered to express the FAR are analyzed by flow cytometry in which we defined autophagy status by gating viable (based Sytox Blue staining), DMSO-treated control cells into three bins based on the ratio of EGFP:mCherry fluorescence.  We gate all live cells into the 33% highest EGFP-positive cells (autophagy low) and the 33% highest mCherry-positive cells (autophagy high), and therefore, the proportion in the middle is also approximately 33% and considered the medium autophagy status.  Again, these gates are based entirely on the DMSO-treated control cells, and all other treatments within the experiment are compared to settings on these gates.  In response to a specific manipulation (sotorasib, trametinib, DCC-3116 etc) we assess how the specific treatment changes the percentages of cells in each of the pre-specified gates to assess increased autophagy (decreased EGFP:mCherry ratio) or decreased autophagy (increased increased EGFP:mCherry ratio). 

      Although LC3 processing and/or the expression of p62SQSTM1 are used by others as markers of autophagy, there is much debate in the literature as to how reliable immunoblotting analysis of LC3 processing or p62SQSTM1 expression are as measures of autophagy.  Certainly, in our hands, we find that the Fluorescent Autophagy Reporter is a much more sensitive measure of changes in autophagy in various different cancer cell lines as we have described in previous papers (Kinsey et al., PMID: 30833748, Truong et al., PMID: 32933997 and Silvis & Silva et al., PMID: 36719686).  Furthermore, in the omnibus publication that describes techniques for measuring autophagy (Klionsky et al., PMID: 33634751) the use of the FAR (or similarly configured reporters) is regarded as the gold standard for measuring autophagy status in cells.  We have amended the Materials & Methods section of our manuscript to better describe the use of the FAR in measuring autophagy. 

      (2) For Figures 1J, K, and L, please provide immunohistochemistry (IHC) images demonstrating RAS downstream signaling blockade by sotorasib and autophagy blockade by DCC 3116 in tumors.

      We thank the reviewer for the comment and have probed the tumors from the xenograft experiments in Figures 1J, K, and L for pERK1/2 and p62SQSTM1 to determine the biochemical activity of sotorasib or DCC-3116, respectively and have provided representative images below. We observed the expected decrease in pERK and p62 signal after sotorasib treatment in all three xenografted cell lines. We did observe the expected accumulation of p62 in the DCC-3116 treated tumors from the NCI-H2122 and NCI-H358 cell lines. There appears to be no difference between the vehicle and DCC-3116 treated tumors in the NCI-H358 cell line-derived tumors as detected by IHC.

      Author response image 1.

      (3) Given that both DCC 3116 and ULK1K46N exhibit the ability to inhibit autophagy and synergize with sotorasib in inhibiting cell proliferation, in addition to demonstrating decreased levels of pATG13 via ELISA assay, please include Western blot analyses of LC3 or p62 to confirm the blockade of autophagy by DCC 3116 and ULK1K46N in Figure 1 & Figure 2.

      We appreciate the reviewer's comment and have performed an immunoblot analysis of cells treated with DCC-3116 or expressing ULK1K46N and probed for p62SQSTM1 and LC3 expression.  We did observe the expected accumulation of p62 SQSTM1 in NCI-H2122 (ULK1K46N) cells treated with 1ug/ml doxycycline to induce expression of ULK1K46N compared to DMSO treatment.  Additionally, we treated the human cell lines from Figure 1 with sotorasib and/or DCC-3116 and tested for p62SQSTM1 expression after 48 hours of treatment. In the human cell lines NCI-H2122 and NCI-H358, there was a decrease in the p62 signal with increasing doses of sotorasib, as expected. There was no detectable change in p62 levels in the Calu-1 cells by immunoblot. For LC3-I/LC3-II, there was only one detectable band in the NCI-H2122 cells, which makes it difficult to interpret the results and further emphasizes why we use the fluorescent autophagy reporter which is more sensitive than immunoblotting. There is no detectable change in LC3-I/LC3-II in the Calu-1 cells treated with increasing doses of sotorasib, but the expected decrease in LC3-I is observed with sotorasib treatment in the NCI-H358 cells.

      Author response image 2.

      (4) Since adenocarcinomas, adenosquamous carcinomas (ASC), and mucinous adenocarcinomas were detected in KL lung tumors, please conduct immunohistochemistry (IHC) to detect these tumors, including markers such as p63, SOX2, Katrine 5.

      We have included IHC analysis of the adenosquamous carcinomas for the markers p63, SOX2, and Keratin 5 from the KL mouse in Figure 3 and the ASC tumors in Supplemental Figure 4, and thank the reviewer for this excellent suggestion. The straining for these markers is below. Of note, we tried two different SOX2 antibodies (cell signaling technologies #14962 and cell signaling technologies # 3728) and could not detect any staining in any section.

      Author response image 3.

      (5) Please provide the sample size (n) for each treatment group in the survival study (Figure 4E). It appears that all mice were sacrificed for tumor burden analysis in Figure 4F. However, there doesn't seem to be a significant difference among the treatment groups in Figure 4F, which contrasts with the survival analysis in Figure 4E. It is suggested to increase the sample size in each treatment group to reduce variation.

      We have updated Figure 4E to indicate sample size for each treatment group and thank the reviewer for this suggestion.  Any mice that remained on study through the entire 8-week treatment regimen were sacrificed after the last day of treatment (Day 56).  Figure 4F indicates analysis of total tumor burden in all mice that remained on treatment for the full 8 weeks and mice that reached euthanasia criteria before the end of the 8-week treatment.  Therefore, it is important to note that the mice in Figure 4F were not all euthanized on the same day.  There is no statistically significant difference between the 3 treatment groups (sotorasib, DCC-3116, combination).  This may be due to a lower sample size as well as ending the treatment at 8 weeks as opposed to continuing the treatment for a longer period of time.  Although we agree that increasing the sample size would benefit the study, due to how long the GEMM model experiments take (12-16 weeks of breeding, 6 weeks for the mice to reach adulthood, 10 weeks of tumor formation post-initiation, 8 weeks of treatment= ~40 weeks) we would respectfully submit that the analysis of additional mice is outside the scope of the current revised manuscript.

      (6) In KP mice (Figure 5), it seems that a single treatment alone is sufficient to inhibit established KP lung tumor growth. Combination treatment does not further enhance anti-tumor efficacy. Therefore, this result doesn't support the conclusion generated from human cancer cell lines. Please discuss.

      We thank the reviewer for this observation.  Indeed, KP lung tumors were sensitive to single agent DCC-3116 treatment, which is reflected in the tumor burden analysis.  This was somewhat surprising to us as we have not previously detected much anti-tumor activity using 4-amino-quinoloines (chloroquine or hydroxychloroquine) or other autophagy inhibitors.  It should be noted however that the KRASG12C/TP53R175H NSCLC model has a very low tumor burden overall (~4% in vehicle-treated mice).  Additionally, our microCT imager cannot detect AAH and small tumors at the settings/resolution used.  Therefore, we were limited in our ability to detect small tumors or hyperplasia by microCT imaging.  Although there was a decrease in overall tumor burden with single agent DCC-3116 treatment, we could not demonstrate using microCT imaging that KRASG12C/TP53R175H lung tumors were actually regressing with single agent DCC-3116 treatment.  The larger tumors that were detected appeared to show a cytostatic effect (i.e. no or slow growth) with DCC-3116 monotherapy.  This may reflect our inability to detect regression of AAH or small tumors with the microCT.  In all human cell lines tested, the only cell line that responded to single agent DCC-3116 treatment was NCI-H358 cells, which do have a complete heterozygous loss of the TRP53 gene and lack TP53 protein.  However, other cells that also have a loss of expression of TP53 expression (Calu-1) are insensitive to single-agent DCC-3116 treatment. Due to the low mutational burden of the KP mouse model compared to human NSCLC cell lines driven by mutationally-activated KRASG12C and the loss of TP53 function, it is difficult to directly compare GEM models to the human cell line models.  Most of the human cell lines have alterations in other genes that are not altered in the KP mouse model which could affect the sensitivity of treatment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Figure legends are currently not adequate - information about the number and nature of replicates, stats, and definitions of the labelling used for stats should be added throughout. In Figure 5B, only two lines of four are labelled with * or ns.

      We thank the reviewer for this comment and have included more details in the figure legends that describe replicates, statistical analysis and definitions of labeling.  We also note that the methods section has a detailed description of the statistical analysis used.

      (2) What statistical test is performed on Figure 5E to get a p < 0.05 between the vehicle and DCC group?

      We performed a one-way ANOVA for all statistical analyses with more than 2 experiential groups. We thank the reviewer for pointing out this typo. These data points (vehicle vs. DCC-3116) are not statistically significant, which has been revised in the figure.

      (3) The manuscript figures would be improved by the use of a colourblind-friendly palette.

      We have previously published multiple manuscripts using this color scheme for the fluorescent autophagy reporter experiments and chose to use red and green as the reporter uses EGFP and mCherry.  We wanted to keep this color scheme consistent across our publications and would prefer not to change the colors.  However, we agree with the reviewer that the data should be accessible to all people and, therefore, have updated these graphs to include slashes over the red color to ease in telling the differences between the red and green colors.  Thank you to the reviewer for this excellent suggestion.

      (4) The manuscript should be fully checked for mouse (sentence case) and human (caps) gene (italics) and protein (non-italics).

      In this manuscript we are using the nomenclatures approved by the HUGO Gene Nomenclature Committee (https://en.wikipedia.org/wiki/HUGO_Gene_Nomenclature_Committee) in which:

      Human genes are written as KRAS, TP53 etc i.e. ITALICIZED CAPS

      Mouse genes are written as Kras, Trp53 etc:  i.e. Italicized and sentence case

      Human and mouse proteins are written as KRAS, TP53 etc:  i.e. NON-ITALICIZED CAPS

      In response to the reviewer’s suggestion, we have gone through the manuscript to check for this and make any appropriate changes.  Of note, we intentionally refer to the mouse protein changes as KRASG12C/LKB1null or KRASG12C/TP53R172H (capitalized), as this references the protein change and not the nucleotide change that occurs in the gene.

      (5) Adenosquamous is the correct term for the disease.  In parts, it's referred to as adeno/squamous or adeno-squamous.  The abbreviation ADC is also defined many times.

      Thank you to the reviewer for this comment.  We have corrected the manuscript text to only use adenosquamous and only define ADC in the first instance.

      (6) Line 434 - "as previously described" but no reference.

      Typos:

      (1) Line 117 – either

      (2) Line 314 – synergistic

      (3) Line 317 – therefore

      (4) Line 502 – medium

      We thank the reviewer for pointing out these typos and have modified the text appropriately.

      Reviewer #2 (Recommendations For The Authors):

      (1) The statement on Page 4, Lines 119-120, lacks clarity: 'Furthermore, LKB1 silencing diminishes the sensitivity of KRASG12C/LKB1Null-driven lung cancer perhaps through the emergence of mixed adeno/squamous cell carcinomas and mucinous adenocarcinomas.  It is unclear whether this refers to the sensitivity to the combination treatment or to the KRASc inhibitor alone.

      We thank the reviewer for this comment and agree that the statement lacks clarity.  The intent of this statement was to refer to both single agent sotorasib treatment as well as the combination with DCC-3116.  

      (2) Page 5 Line 147 "KRASG12X ". Please correct this typo.

      We thank the reviewer for this comment, but this is not a typo. We intended for this line to state KRASG12X to refer to cell lines with any KRASG12 alteration, e.g KRASG12D, KRASG12C, KRASG12S, KRASG12R etc.  

      (3) The color of the dots in Figure 5B labeling does not match the dots in the graph.

      For all bar graphs in the manuscript, the dots representing individual mice are black, and the bar itself is color-coded based on treatment type. The dots in Figure 5B follow this pattern and are intended to be this way.

      (4) Figure 5C depicts lung weight rather than tumor growth, contrary to the text description "regression of pre-existing lung tumors was detected by microCT scanning (Figure 5C, Figure S5)".

      Figure 5C does not depict lung weight but the percent body weight change in treated mice, described in the figure legend.  We thank the reviewer for pointing this out because we referenced the wrong panel in the text.  The figures referenced should be Figure 5B, Figure S5.  We have corrected this in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In summary, the changes made in the revision process include:

      An addition of a paragraph in the result section that discusses the absolute values of measured Young’s moduli in the light of probing frequencies, accompanied by a new supplementary figure and a supplementary table that support that discussion

      - Fig. S10. Absolute Young’s modulus values across the frequencies characteristic for the three measurement methods.

      - Table S9. Operation parameters of the three methods used for characterizing the mechanical properties of cells.

      Three new supplementary figures that display the expression matrices for the genes from the identified modules in carcinoma datasets used for validation:

      - Fig. S4. Expression of identified target genes in the CCLE microarray dataset used for validation.

      - Fig. S5. Expression of identified target genes in the CCLE RNA-Seq dataset used for validation.

      - Fig. S6. Expression of identified target genes in the Genentech dataset used for validation.

      An addition of a paragraph in the discussion section that discusses the intracellular origins of resistance to deformation and the dominance of actin cortex at low deformations.

      - Refinement of the manuscript text and figures based on the specific feedback from the Reviewers.

      Please see below for detailed responses to the Reviewers’ comments.

      Reviewer #1 (Public Review)

      In this work, Urbanska and colleagues use a machine-learning based crossing of mechanical characterisations of various cells in different states and their transcriptional profiles. Using this approach, they identify a core set of five genes that systematically vary together with the mechanical state of the cells, although not always in the same direction depending on the conditions. They show that the combined transcriptional changes in this gene set is strongly predictive of a change in the cell mechanical properties, in systems that were not used to identify the genes (a validation set). Finally, they experimentally after the expression level of one of these genes, CAV1, that codes for the caveolin 1 protein, and show that, in a variety of cellular systems and contexts, perturbations in the expression level of CAV1 also induce changes in cell mechanics, cells with lower CAV1 expression being generally softer. 

      Overall the approach seems accessible, sound and is well described. My personal expertise is not suited to judge its validity, novelty or relevance, so I do not make comments on that. The results it provides seem to have been thoroughly tested by the authors (using different types of mechanical characterisations of the cells) and to be robust in their predictive value. The authors also show convincingly that one of the genes they identified, CAV1, is not only correlated with the mechanical properties of cells, but also that changing its expression level affects cell mechanics. At this stage, the study appears mostly focused on the description and validation of the methodological approach, and it is hard to really understand what the results obtain really mean, the importance of the biological finding - what is this set of 5 genes doing in the context of cell mechanics? Is it really central, or is it just one of the set of knobs on which the cell plays - and it is identified by this method because it is systematically modulated but maybe, for any given context, it is not the dominant player - all these fundamental questions remain unanswered at this stage. On one hand, it means that the study might have identified an important novel module of genes in cell mechanics, but on the other hand, it also reveals that it is not yet easy to interpret the results provided by this type of novel approach. 

      We thank the Reviewer #1 for the thoughtful evaluation of our manuscript. The primary goal of the manuscript was to present a demonstration of an unbiased approach for the identification of genes involved in the regulations of cell mechanics. The manuscript further provides a comprehensive computational validation of all genes from the identified network, and experimental validation of a selected gene, CAV1. 

      We agree that at the current stage, far-reaching conclusions about the biological meaning of the identified network cannot be made. We are, however, convinced that the identification of an apparently central player such as CAV1 across various cellular systems is per se meaningful, in particular since CAV1 modulation shows clear effects on the cell mechanical state in several cell types. 

      We anticipate that our findings will encourage more mechanistic studies in the future, investigating how these identified genes regulate mechanical properties and interact with each other. Notwithstanding, the identified genes (after testing in specific system of interest) can be readily used as genetic targets for modulating mechanical properties of cells. Access to such modifications is of huge relevance not only for performing further research on the functional consequence of cell mechanics changes (in particular in in-vivo systems where using chemical perturbations is not always possible), but also for the potential future implementation in modulating mechanical properties of the cells to prevent disease (for example to inhibit cancer metastasis or increase efficacy of cancer cell killing by cytotoxic T cells).

      We have now added a following sentence in the first paragraph of discussion to acknowledge the open ends of our study:

      “(...). Here we leveraged this opportunity by performing discriminative network analysis on transcriptomes associated with mechanical phenotype changes to elucidate a conserved module of five genes potentially involved in cell mechanical phenotype regulation. We provided evidence that the inferred conserved functional network module contains an ensemble of five genes that, in particular when combined in a unique combinatorial marker, are universal, specific and trustworthy markers of mechanical phenotype across the studied mouse and human systems. We further demonstrated on the example of a selected marker gene, CAV1, that its experimental up- and downregulation impacts the stiffness of the measured cells. This demonstrates that the level of CAV1 not only correlates with, but also is causative of mechanical phenotype change. The mechanistic insights into how precisely the identified genes are involved in regulating mechanical properties, how they interact with each other, and whether they are universal and dominant in various contexts all remain to be established in

      future studies.”

      Reviewer #2 (Public Review)

      A key strength is the quantitative approaches all add rigor to what is being attempted. The approach with very different cell culture lines will in principle help identify constitutive genes that vary in a particular and predictable way. To my knowledge, one other study that should be cited posed a similar pan-tissue question using mass spectrometry proteomics instead of gene expression, and also identified a caveolae component (cavin-1, PTRF) that exhibited a trend with stiffness across all sampled tissues. The study focused instead on a nuclear lamina protein that was also perturbed in vitro and shown to follow the expected mechanical trend (Swift et al 2013). 

      We thank the Reviewer #2 for the positive evaluation of the breadth of the results and for pointing us to the relevant reference for the proteomic analysis related to tissue stiffness (Swift et al., 2013). This study, which focused primarily on the tissue-level mechanical properties, identifying PTRF, a caveolar component, which links to our observation of another caveolar component, CAV1, at the single-cell level. 

      We have now included the citation in the following paragraph of the discussion:

      “To our knowledge, there are no prior studies that aim at identifying gene signatures associated with single-cell mechanical phenotype changes, in particular across different cell types. There are, however, several studies that investigated changes in expression upon exposure of specific cell types to mechanical stimuli such as compression (87, 88) or mechanical stretch (22, 80, 89), and one study that investigated difference in expression profiles between stiffer and softer cells sorted from the same population (90). Even though the studies concerned with response to mechanical stimuli answer a fundamentally different question (how gene expression changes upon exposure to external forces vs which genes are expressed in cells of different mechanical phenotype), we did observe some similarities in the identified genes. For example, in the differentially expressed genes identified in the lung epithelia exposed to compression (87), three genes from our module overlapped with the immediate response (CAV1, FHL2, TGLN) and four with the long-term one (CAV1, FHL2, TGLN, THBS1). We speculate that this substantial overlap is caused by the cells undergoing change in their stiffness during the response to compression (and concomitant unjamming transition). Another previous study explored the association between the stiffness of various tissues and their proteomes. Despite the focus on the tissue-scale rather than single-cell elasticity, the authors identified polymerase I and transcript release factor (PTRF, also known as cavin 1 and encoding for a structural component of the caveolae) as one of the proteins that scaled with tissue stiffness across samples (91).”

      Reviewer #3 (Public Review)

      In this work, Urbanska et al. link the mechanical phenotypes of human glioblastoma cell lines and murine iPSCs to their transcriptome, and using machine learning-based network analysis identify genes with putative roles in cell mechanics regulation. The authors identify 5 target genes whose transcription creates a combinatorial marker which can predict cell stiffness in human carcinoma and breast epithelium cell lines as well as in developing mouse neurons. For one of the target genes, caveolin1 (CAV1), the authors perform knockout, knockdown, overexpression and rescue experiments in human carcinoma and breast epithelium cell lines. They determine the cell stiffness via RT-DC, AFM indentation and AFM rheology and confirm that high CAV1 expression levels correlate with increased stiffness in those model systems. This work brings forward an interesting approach to identify novel genes in an unbiased manner, but surprisingly the authors validate caveolin 1, a target gene with known roles in cell mechanics regulation. 

      I have two main concerns with the current version of this work: 

      (1) The authors identify a network of 5 genes that can predict mechanics. What is the relationship between the 5 genes? If the authors aim to highlight the power of their approach by knockdown, knockout or over-expression of a single gene why choose CAV1 (which has an individual p-value of 0.16 in Fig S4)? To justify their choice, the authors claim that there is limited data supporting the direct impact of CAV1 on mechanical properties of cells but several studies have previously shown its role in for example zebrafish heart stiffness, where a knockout leads to higher stiffness (Grivas et al., Scientific Reports 2020), in cancer cells, where a knockdown leads to cell softening (Lin et al., Oncotarget 2015), or in endothelial cell, where a knockout leads to cell softening (Le Master et al., Scientific Reports 2022). 

      We thank the reviewer for their comments. First, we do acknowledge that studying the relationship between the five identified genes is an intriguing question and would be a natural extension of the currently presented work. It is, however, beyond the scope of presented manuscript, in which our primarily goal was to introduce a general pipeline for de novo identification of genes related to cell mechanics. We did add a following statement in the discussion (yellow highlight) to acknowledge the open ends of our study:

      “The mechanical phenotype of cells is recognized as a hallmark of many physiological and pathological processes. Understanding how to control it is a necessary next step that will facilitate exploring the impact of cell mechanics perturbations on cell and tissue function (76).

      The increasing availability of transcriptional profiles accompanying cell state changes has recently been complemented by the ease of screening for mechanical phenotypes of cells thanks to the advent of high-throughput microfluidic methods (77). This provides an opportunity for data-driven identification of genes associated with the mechanical cell phenotype change in a hypothesis-free manner. Here we leveraged this opportunity by performing discriminative network analysis on transcriptomes associated with mechanical phenotype changes to elucidate a conserved module of five genes potentially involved in cell mechanical phenotype regulation. We provided evidence that the inferred conserved functional network module contains an ensemble of five genes that, in particular when combined in a unique combinatorial marker, are universal, specific and trustworthy markers of mechanical phenotype across the studied mouse and human systems. We further demonstrated on the example of a selected marker gene, CAV1, that its experimental up- and downregulation impacts the stiffness of the measured cells. This demonstrates that the level of CAV1 not only correlates with, but also is causative of mechanical phenotype change. The mechanistic insights into how precisely the identified genes are involved in regulating mechanical properties, how they interact with each other, and whether they are universal and dominant in various contexts all remain to be established in future studies.”

      Regarding the selection of CAV1 as the gene that we used for validation experiment; as mentioned in the introductory paragraph of the result section “Perturbing expression levels of CAV1 changes cells stiffness” (copied below), we were encouraged by the previous data already linking CAV1 with cell mechanics when selecting it as our first target. The relationship between CAV1 and cell mechanics regulation, however, is not very well established (of note, two of the latest manuscripts came out after the initial findings of our study). 

      Regarding the citations suggested by the reviewer: two are already included in the original manuscript (Lin et al., Oncotarget 2015 – Ref (63), Le Master –2022 Ref (67)), along with an additional one (Hsu et al 2018 (66)), and the third one (Grivas et al, 2020 (68)) is now also added to the manuscript. Though, we would like to highlight that even though Grivas et al state that the CAV1 KO cells are stiffer, the AFM indentation measurements were performed on the cardiac tissue, with a spherical tip of 30 μm radius and likely reflect primarily supracelluar, tissue-scale properties, as opposed to cell-scale measurements performed in our study (we used cultured cells which mostly lack the extracellular tissue structures, deformability cytometry was performed on dissociated cells and picks up on cell properties exclusively, and in case of AFM measurements a spherical tip with 5 μm radius was used).

      “We decided to focus our attention on CAV1 as a potential target for modulating mechanical properties of cells, as it has previously been linked to processes intertwined with cell mechanics. In the context of mechanosensing, CAV1 is known to facilitate buffering of the membrane tension (45), play a role in β1-inegrin-dependent mechanotransduction (58) and modulate the mechanotransduction in response to substrate stiffness (59). CAV1 is also intimately linked with actin cytoskeleton — it was shown to be involved in cross-talk with Rho-signaling and actin cytoskeleton regulation (46, 60–62), filamin A-mediated interactions with actin filaments (63), and co-localization with peripheral actin (64). The evidence directly relating CAV1 levels with the mechanical properties of cells (47, 62, 65, 66) and tissues (66, 67) , is only beginning to emerge.”

      Regarding the cited p-value of 0.16, we would like to clarify that it is the p-value associated with the coefficient of the crude linear regression model fitted to the data for illustrative purposes in Fig S4. This value only says that from the linear fit we cannot conclude much about the correlation of the level of Cav1 with the Young’s modulus change. Much more relevant parameters to look at are the AUC-ROC values and associated p-values reported in the Table 4 in the main text (see below), which show good performance of CAV1 in separating soft and stiff cell states. 

      The positive hypothesis I assumes that markers are discriminative of samples with stiff/soft mechanical phenotype regardless of the studied biological system, and CAV1 has a clear trend with the minimum AUC-ROC on 3 datasets of 0.78, even though the p-value is below the significance level. The positive hypothesis II assumes that markers are discriminative of samples with stiff/soft mechanical phenotype in carcinoma regardless of data source, and CAV1 has a clear significance because the minimum AUC-ROC on 3 datasets is 0.89 and the p-value is 0.02.

      (2) The authors do not show how much does PC-Corr outperforms classical co-expression network analysis or an alternative gold standard. It is worth noting that PC-Corr was previously published by the same authors to infer phenotype-associated functional network modules from omics datasets (Ciucci et al., Scientific Reports 2017). 

      As pointed out by the Reviewer, PC-corr has been introduced and characterized in detail in a previous publication (Ciucci et al, 2017, Sci. Rep.), where it was compared against standard co-expression analysis (below reported as: p-value network) on molecules selected using univariate statistical analysis. 

      See the following fragment of Discussion in Ciucci et al, 2017:

      “The PC-corr networks were always compared to P-value networks. The first strategical difference lies in the way features are selected: while the PC-corr adopts a multivariate approach, i.e. it uses a combination of features that are responsible for the sample discrimination, in the P-value network the discriminating features are singly selected (one by one) with each Mann-Whitney test (followed by Benjamini-Hochberg procedure). The second strategical difference lies in the generation of the correlation weights in the network. PC-corr combines in parallel and at the same time in a unique formula the discrimination power of the PC-loadings and the association power of the Pearson correlation, directly providing in output discriminative omic associations. These are generated using a robust (because we use as merging factor the minimum operator, which is a very penalizing operator) mathematical trade-off between two important factors: multivariate discriminative significance and correlation association. In addition, as mentioned above, the minimum operator works as an AND logical gate in a digital circuit, therefore in order to have a high link weight in the PCcorr network, both the discrimination (the PC-loadings) and the association (the Pearson correlations) of the nodes adjacent to the link should be simultaneously high. Instead, the Pvalue procedure begins with the pre-selection of the significant omic features and, only in a second separated step, computes the associations between these features. Therefore, in P-value networks, the interaction weights are the result neither of multivariate discriminative significance, nor of a discrimination/association interplay.”

      Here we implement PC-corr for a particular application and do not see it as central to the message of the present manuscript to compare it with other available methods. We considered it much more relevant to focus on an in-silico validation on dataset not used during the PCcorr analysis (see Table 3 and 4 for details).

      Altogether, the authors provide an interesting approach to identify novel genes associated with cell mechanics changes, but the current version does not fulfill such potential by focusing on a single gene with known roles in cell mechanics. 

      Our manuscript presents a demonstration of an overall approach for the identification of genes involved in the regulation of cell mechanics, and the perturbations performed on CAV1 have a demonstrative role (please also refer to the explanations of why we decided to perform the verification focused on CAV1 above). The fact that we identify CAV1, which has been implicated in regulating cell mechanics in a handful of studies, de novo and in an unbiased way speaks to the power of our approach. We do agree that investigation into the effect of manipulating the expression of the remaining genes from the identified network module, as well as into the mutual relationships between those genes and their covariance in perturbation experiments, constitutes a desirable follow-up on the presented results. It is, however, beyond the scope of the current manuscript. Regardless, the other genes identified can be readily tested in systems of interest and used as potential knobs for tuning mechanical properties on demand.

      Reviewer #1 (Recommendations For Authors)

      I am not a specialist of the bio-informatics methods used in this study, so I will not make any specific technical comments on them. 

      In terms of mechanical characterisation of cells, the authors use well established methods and the fact that they systematically validate their findings with at least two independent methods (RT-DC and AFM for example) makes them very robust. So I have no concerns with this part.  The experiments of perturbations of CAV 1 are also performed to the best standards and the results are clear, no concern on that. 

      My main concerns are rather questions I was asking myself and could not answer when reading the article. Maybe the authors could find ways to clarify them - the discussion of their article is already very long and maybe it should not be lengthened to much. In my opinion, some of the points discussed are not really essential and rather redundant with other parts of the paper. This could be improved to give some space to clarify some of the points below:  

      We thank the Reviewer #1 for an overall positive evaluation of the manuscript as well as the points of criticism which we addressed in a point-by-point manner below.

      (1) This might be a misunderstanding of the method on my side, but I was wondering whether it is possible to proceed through the same steps but choose other pairs of training datasets amongst the 5 systems available (there are 10 such pairs if I am not mistaken) and ask whether they always give the same set of 5 genes. And if not, are the other sets also then predictive, robust, etc. Or is it that there are 'better' pairs than others in this respect. Or the set of 5 genes is the only one that could be found amongst these 5 datasets - and then could it imply that it is the only group 'universal' group of predictive genes for cell mechanics (when applied to any other dataset comprising similar mechanical measures and expression profiles, for other cells, other conditions)? 

      I apologize in case this question is just the result of a basic misunderstanding of the method on my side. But I could not answer the question myself based on what is in the article and it seems to be important to understand the significance of the finding and the robustness of the method. 

      We thank the Reviewer for this question. To clarify: while in general it is possible to proceed through the same analysis steps choosing a different pair of datasets (see below for examples), we have purposefully chosen those two and not any other datasets because they encompassed the highest number of samples per condition in the RNAseq data (see Fig 4 and Table R1 below), originated from two different species and concerned least related tissues (the other option for mouse would be neural progenitors which in combination with the glioblastoma would likely result in focusing on genes expressed in neural tissues). This is briefly explained in the following fragment of the manuscript on Page 10:

      “For the network construction, we chose two datasets that originate from different species, concern unrelated biological processes, and have a high number of samples included in the transcriptional analysis: human glioblastoma and murine iPSCs (Table 1).”

      To further address the comment of the reviewer: there is indeed a total of 10 possible two-set combinations of datasets, 6 of those pairs are human-mouse combinations (highlighted in orange in Author response Table 1), 3 are human-human combinations (highlighted in blue), and 1 is mousemouse (marked in green).

      Author response table 1.

      Possible two-set combinations of datasets. For each combination, the number of common genes is indicated. The number on the diagonal represents total number of transcripts in the individual datasets, n corresponds to the number of samples in the respective datasets.  * include non-coding genes.

      To reiterate, we have chosen the combination of set A (glioblastoma) and set D (iPSCs) to choose datasets from different species and with highest sample number. 

      As for the other combinations of human-mouse datasets:

      • set A & E lead to derivation of a conserved module, however as expected this module includes genes specific for neuronal tissues (such as brain & testis specific immunoglobulin IGSF11, or genes involved in neuronal development such as RFX4, SOX8)

      Author response image 1.

      • the remaining combinations (set B&D, B&E, C&D and C&E) do not lead to a derivation of a highly interconnected module

      Author response image 2.

      Author response image 3.

      Author response image 4.

      Author response image 5.

      Finally, it would have also been possible to perform the combined PC-corr procedure on all 5 datasets. However, this would prevent us from doing validation using unknown datasets.

      Hence, we decided to proceed with the 2 discovery and 4 validation datasets.

      For the sake of completeness, we present below some of the networks obtained from the analysis performed on all 5 datasets (which intersect at 8059 genes).

      Author response image 6.

      The above network was created by calculating mean/minimum PC-corr among all five datasets and applying the threshold. The thresholding can be additionally restricted in that we:

      a. constrain the directionality of the correlation between the genes (𝑠𝑔𝑛(𝑐) ) to be the same among all or at least n datasets

      b. constrain the directionality of the correlation between the cell stiffness and gene expression level (𝑠𝑔𝑛(𝑉)) for individual genes.

      Some of the resulting networks for such restrictions are presented below.

      Author response image 7.

      Author response image 8.

      Of note, some of the nodes from the original network presented in the paper (CAV1, FHL2, and IGFBP7) are preserved in the 5-set network (and highlighted with blue rims),

      (2) The authors already use several types of mechanical characterisation of the cells, but there are even more of them, in particular, some that might not directly correspond to global cell stiffness but to other aspects, like traction forces, or cell cortex rheology, or cell volume or passage time trough constrictions (active or passive) - they might all be in a way or another related, but they are a priori independent measures. Would the authors anticipate finding very different 'universal modules' for these other mechanical properties, or again the same one? Is there a way to get at least a hint based on some published characterisations for the cells used in the study? Basically, the question is whether the gene set identified is specific for a precise type of mechanical property of the cell, or is more generally related to cell mechanics modulation - maybe, as suggested by the authors because it is a set of molecular knobs acting upstream of general mechanics effectors like YAP/TAZ or acto-myosin? 

      We thank the Reviewer for this comment. We would like to first note that in our study, we focused on single-cell mechanical phenotype understood as a response of the cells to deformation at a global (RT-DC) or semi-local (AFM indentation with 5-μm bead) level and comparatively low deformations (1-3 μm, see Table S9). There is of course a variety of other methods for measuring cell mechanics and mechanics-related features, such as traction force microscopy mentioned by the reviewer. Though, traction force microscopy probes how the cells apply forces and interact with their environment rather than the inherent mechanical properties of the cells themselves which were the main interest of our study. 

      Nevertheless, as mentioned in the discussion, we found some overlap with the genes identified in other mechanical contexts, for example in the context of mechanical stretching of cells:

      “Furthermore, CAV1 is known to modulate the activation of transcriptional cofactor yesassociated protein, YAP, in response to changes in stiffness of cell substrate (60) and in the mechanical stretch-induced mesothelial to mesenchymal transition (74).”

      Which suggests that the genes identified here may be more broadly related to mechanical aspects of cells. 

      Of note, we do have some insights connected to the changes of cell volume — one of the biophysical properties mentioned by the reviewer — from our experiments.  For all measurements performed with RT-DC, we can also calculate cell volumes from 2D cell contours (see Author response images 9, 10, and 11). For most of the cases (all apart from MEF CAV1KO), the stiffer phenotype of the cells, associated with higher levels of CAV1, shows a higher volume.

      Author response image 9.

      Cell volumes for the divergent cell states in the five characterized biological systems. (A) Glioblastoma. (B) Carcinoma, (C) MCF10A, (D) iPSCs, (E) Developing neurons. Data corresponds to Figure 2. Cell volumes were estimated using Shape-Out 1.0.10 by rotation of the cell contours.

      Author response image 10.

      Cell volumes for CAV1 perturbation experiments. (A) CAV1 knock down performed in TGBC cells. (B) CAV1 overexpression in ECC4 and TGBC cells. Data corresponds to Figure 5. Cell volumes were estimated using Shape-Out 1.0.10 by rotation of the cell contours.  

      Author response image 11.

      Cell volumes for WT and CAV1KO MEFs. Data corresponds to Figure S9. Cell volumes were estimated using Shape-Out 1.0.10 by rotation of the cell contours.  

      (3) The authors have already tested a large number of conditions in which perturbations of the level of expression of CAV1 correlates with changes in cell mechanics, but I was wondering whether it also has some direct explanatory value for the initial datasets used - for example for the glioblastoma cells from Figure 2, in the different media, would a knock-down of CAV1 prevent the increase in stiffness observed upon addition of serum, or for the carcinoma cells from different tissues treated with different compounds - if I understand well, the authors have tested a subset of these (ECC4 versus TGBC in figure 5) - how did they choose these and how general is it that the mechanical phenotype changes reported in Figure 2 are all mostly dependant on CAV1 expression level? I must say that the way the text is written and the results shown, it is hard to tell whether CAV1 is really having a dominant effect on cell mechanics in most of these contexts or only a partial effect. I hope I am being clear in my question - I am not questioning the conclusions of Figures 5 and 6, but asking whether the level of expression of CAV1, in the datasets reported in Figure 2, is the dominant explanatory feature for the differences in cell mechanics. 

      We thank reviewer for this comment and appreciate the value of the question about the generality and dominance of CAV1 in influencing cell mechanics.

      On the computational side, we have addressed these issues by looking at the performance of CAV1 (among other identified genes) in classifying soft and stiff phenotypes across biological systems (positive hypothesis I), as well as across data of different type (sequencing vs microarray data) and origin (different research institutions) (positive hypothesis II). CAV1 showed strong classification performance (Table 4), suggesting it is a general marker of stiffness changes.  

      On the experimental side, we conducted the perturbation experiments in two systems of choice: two intestinal carcinoma cell lines (ECC4 and TGBC) and the MCF10A breast epithelial cell line. These choices were driven by ease of handling, accessibility, as well as (for MCF10A) connection with a former study (Taveres et al, 2017). While we observed correlations between CAV1 expression and cell mechanics in wide range of datasets, the precise role of CAV1 in each system may vary, and further perturbation experiments in specific systems could be performed to solidify the direct/dominant role of CAV1 in cell mechanics. We hypothesize that the suggested knockdown of CAV1 upon serum addition in glioblastoma cells could reduce or prevent the increase in stiffness observed, though this experiment has not been performed. 

      In conclusion, while the computational analysis gives us confidence that CAV1 is a good indicator of cell stiffness, we predict that it acts in concert with other genes and in specific context could be replaced by other changes. We suggest that the suitability of CAV1 for manipulation of the mechanical properties should be tested in each system of interested before use. 

      To highlight the fact that the relevance of CAV1 for modulating cell mechanics in specific systems of interest should be tested and the mechanistic insights into how CAV1 regulates cell mechanics are still missing, we have added the following sentence in the discussion:

      “The mechanical phenotype of cells is recognized as a hallmark of many physiological and pathological processes. Understanding how to control it is a necessary next step that will facilitate exploring the impact of cell mechanics perturbations on cell and tissue function (76). The increasing availability of transcriptional profiles accompanying cell state changes has recently been complemented by the ease of screening for mechanical phenotypes of cells thanks to the advent of high-throughput microfluidic methods (77). This provides an opportunity for data-driven identification of genes associated with the mechanical cell phenotype change in a hypothesis-free manner. Here we leveraged this opportunity by performing discriminative network analysis on transcriptomes associated with mechanical phenotype changes to elucidate a conserved module of five genes potentially involved in cell mechanical phenotype regulation. We provided evidence that the inferred conserved functional network module contains an ensemble of five genes that, in particular when combined in a unique combinatorial marker, are universal, specific and trustworthy markers of mechanical phenotype across the studied mouse and human systems. We further demonstrated on the example of a selected marker gene, CAV1, that its experimental up- and downregulation impacts the stiffness of the measured cells. This demonstrates that the level of CAV1 not only correlates with, but also is causative of mechanical phenotype change. The mechanistic insights into how precisely the identified genes are involved in regulating mechanical properties, how they interact with each other, and whether they are universal and dominant in various contexts all remain to be established in future studies.”

      (4) It would be nice that the authors try to more directly address, in their discussion, what is the biological meaning of the set of 5 genes that they found - is it really mostly a product of the methodology used, useful but with little specific relevance to any biology, or does it have a deeper meaning? Either at a system level, or at an evolutionary level. 

      We would like to highlight that our manuscript is focused on the method that we introduce to identify sets of genes involved in the regulation of cell mechanics. The first implementation included here is only the beginning of this line of work which, in the future, will include looking in detail at the biological meaning and the interconnectivity of the genes identified. Most likely, there is a deeper meaning of the identified module which could be revealed with a lot of dedicated future work. As it is a mere speculation at this point, we would like to refrain from going into more detail about it in the current manuscript. We provide below a few words of extended explanation and additional analysis that can shed light on the current limited knowledge of the connections between the genes and evolutionary preservation of the genes. 

      While it is difficult to prove at present, we do believe that the identified node of genes may have an actual biological meaning and is not a mere product of the used methodology. The PC-corr score used for applying the threshold and obtaining the gene network is high only if the Pearson’s correlation between the two genes is high, meaning that the high connected module of genes identified show corelated expression and is likely co-regulated. Additionally, we performed the GO Term analysis using DAVID to assess the connections between the genes (Figure S3). We have now performed an additional analysis using two orthogonal tools the functional protein association tool STRING and KEGG Mapper. 

      With STRING, we found a moderate connectivity using the five network nodes identified in our study, and many of the obtained connections were based on text mining and co-expression, rather than direct experimental evidence (Author response image 12A). A more connected network can be obtained by allowing STRING to introduce further nodes (Author response image 12B). Interestingly, some of the nodes included by STRING in the extended network are nodes identified with milder PCcorr thresholds in our study (such as CNN2 or IGFBP3, see Table S3). 

      With KEGG Mapper, we did not find an obvious pathway-based clustering of the genes from the module either. A maximum of two genes were assigned to one pathway and those included: 

      • focal adhesions (pathway hsa04510): CAV1 and THBS1

      • cytoskeleton in muscle cells (pathway hsa04820): FHL2 and THBS1

      • proteoglycans in cancer (pathway hsa05205): CAV1 and THBS1.

      As for the BRITE hierarchy, following classification was found:

      • membrane trafficking(hsa04131): CAV1, IGFBP7, TAGLN, THBS, with following subcategories:

      - endocytosis / lipid raft mediated endocytosis/caveolin-mediated endocytosis:

      CAV1

      - endocytosis / phagocytosis / opsonins: THBS1

      - endocytosis / others/ insulin-like growth factor-binding proteins: IGFBP7 o others / actin-binding proteins/others: TAGLN.

      Taken together, all that analyses (DAVID, STRING, KEGG) show that at present no direct relationship/single pathway can be found that integrates all the genes from the identified modules. Future experiments, including investigations of how other module nodes are affected when one of the genes is manipulated, will help to establish actual physical or regulatory interactions between the genes from our module. 

      To touch upon the evolutionary perspective, we provide an overview of occurrence of the genes from the identified module across the evolutionary tree. This overview shows that the five identified genes are preserved in phylum Chordata with quite high sequence similarity, and even more so within mammals (Author response image 13).

      Author response image 12.

      Visualisation of interactions between the nodes in the identified module using functional protein association networks tool STRING. (A) Connections obtained using multiple proteins search and entering the five network nodes. (B) Extended network that includes further genes to increase indirect connectivity. The genes are added automatically by STRING. Online version of STRING v12.0 was used with Homo sapiens as species of interest.   

      Author response image 13.

      Co-occurrence of genes from the network module across the evolutionary tree. Mammals are indicated with the green frame, glires (include mouse), as well as primates (include human) are indicated with yellow frames. The view was generated using online version of STRING 12.0.

      Reviewer #2 (Recommendations For Authors) 

      (1) The authors need to discuss the level of sensitivity of their mechanical measurements with RT-DC for changes to the membrane compared to changes in microtubules, nucleus, etc. The limited AFM measurements also seem membrane/cortex focused. For these and further reasons below, "universal" doesn't seem appropriate in the title or abstract, and should be deleted. 

      We thank the reviewer for this comment. Indeed, RT-DC is a technique that deforms the entire cell to a relatively low degree (inducing ca 17% mean strain, i.e. a deformation of approximately 2.5 µm on a cell with a 15 µm diameter, see Table S9 and Urbanska et al., Nat Methods 2020). Similarly, the AFM indentation experiments performed in this study (using a 5-µm diameter colloidal probe and 1 µm indentation) induce low strains, at which, according to current knowledge, the actin cortex dominates the measured deformations. However, other cellular components, including the membrane, microtubules, intermediate filaments, nucleus, other organelles, and cytoplasmic packing, can also contribute. We have reviewed these contributions in detail in a recent publication (Urbanska and Guck, 2024, Ann Rev Biophys., PMID 38382116). For a particular system, it is hard to speculate without further investigation which parts of the cell have a dominant effect on the measured deformability. We have added now a following paragraph in the discussion to include this information:

      “The mechanical phenotype of single cells is a global readout of cell’s resistance to deformation that integrates contributions from all cellular components. The two techniques implemented for measuring cell mechanical in this study — RT-DC and AFM indentation using a spherical indenter with 5 µm radius — exert comparatively low strain on cells (< 3 µm, see Table S9), at which the actin cortex is believed to dominate the measured response. However, other cellular components, including the membrane, microtubules, intermediate filaments, nucleus, other organelles, and cytoplasmic packing, also contribute to the measured deformations (reviewed in detail in (79)) and, for a particular system, it is hard to speculate without further investigation which parts of the cell have a dominant effect on the measured deformability.”

      The key strength of measuring the global mechanics is that such measurements are agnostic of the specific origin of the resistance to shape change. As such, the term “universal” could be seen as rather appropriate, as we are not testing specific contributions to cell mechanics, and we see the two methods used (RT-DC and AFM indentation) as representative when it comes to measuring global cell mechanics. And we highlighted many times throughout the text that we are measuring global single-cell mechanical phenotype. 

      Most importantly, however, we have used the term “universal” to capture that the genes are preserved across different systems and species, not in relation to the type of mechanical measurements performed and as such we would like to retain the term in the title.

      (2) Fig.2 cartoons of tissues is a good idea to quickly illustrate the range of cell culture lines studied. However, it obligates the authors to examine the relevant primary cell types in singlecell RNAseq of human and/or mouse tissues (e.g. Tabula Muris). They need to show CAV1 is expressed in glioblastoma, iPSCs, etc and not a cell culture artifact. CAV1 and the other genes also need to be plotted with literature values of tissue stiffness.  

      We thank the reviewer for this the comment; however, we do believe that the cartoons in Figure 2 should assist the reader to readily understand whether cultured cells derived from the respective tissues were used (see cartoons representing dishes), or the cells directly isolated from the tissue were measured (this is the case for the developing neurons dataset). 

      We did, however, follow the suggestion of the reviewer to use available resources and checked the expression of genes from the identified network module across various tissues in mouse and human. We first used the Mouse Genome Informatics (MGI; https://www.informatics.jax.org/) to visualize the expression of the genes across organs and organ systems (Author response image 14) as well as across more specific tissue structures (Author response image 15). These two figures show that the five identified genes are expressed quite broadly in mouse. We next looked at the expression of the five genes in the scRNASeq dataset from Tabula Muris (Author response image 16). Here, the expression of respective genes seemed more restricted to specific cell clusters. Finally, we also collected the cross-tissue expression of the genes from our module in human tissues from Human Protein Atlas v23 at both mRNA (Author response image 17) and protein (Author response image 18) levels. CAV1, IGFBP7, and THBS1 showed low tissue specificity at mRNA level, FHL2 was enriched in heart muscle and ovary (the heart enrichment is also visible in Author response image 15 for mouse) and TAGLN in endometrium and intestine. Interestingly, the expression at the protein level (Author response image 18) did not seem to follow faithfully the mRNA levels (Author response image 17). Overall, we conclude that the identified genes are expressed quite broadly across mouse and human tissues. 

      Author response image 14.

      Expression of genes from the identified module across various organ and organ systems in mouse. The expression matrices for organs (A) and organ systems (B) were generated using Tissue x Gene Matrix tool of Gene eXpression Database (https://www.informatics.jax.org/gxd/, accessed on 22nd September 2024). No pre-selection of stage (age) and assay type (includes RNA and protein-based assays) was applied. The colors in the grid (blues for expression detected and reds for expression not detected) get progressively darker when there are more supporting annotations. The darker colors do not denote higher or lower levels of expression, just more evidence.

      Author response image 15.

      Expression of genes from the identified module across various mouse tissue structures. The expression matrices for age-selected mouse marked as adult (A) or young individuals (collected ages labelled P42-84 / P w6-w12 / P m1.5-3.0) (B) are presented and were generated using RNASeq Heatmap tool of Gene eXpression Database (https://www.informatics.jax.org/gxd/, accessed on 2nd October 2024).

      Author response image 16.

      Expression of genes from the identified module across various cell types and organs in t-SNE embedding of Tabula Muris dataset. (A) t-SNE clustering color-coded by organ. (B-F) t-SNE clustering colorcoded for expression of CAV1 (B), IGFBP7 (C), FHL2 (D), TAGLN (E), and THBS1 (F). The plots were generated using FACS-collected cells data through the visualisation tool available at https://tabulamuris.sf.czbiohub.org/ (accessed on 22nd September 2024).

      Author response image 17.

      Expression of genes from the identified module at the mRNA level across various human tissues. (A-E) Expression levels of CAV1 (A), IGFBP7 (B), FHL2 (C), TAGLN (D), and THBS1 (E). The plots were generated using consensus dataset from Human Protein Atlas v23 https://www.proteinatlas.org/ (accessed on 22nd September 2024).

      Author response image 18.

      Protein levels of genes from the identified module across various human tissues. (A-E) Protein levels of CAV1 (A), IGFBP7 (B), FHL2 (C), TAGLN (D), and THBS1 (E). The plots were generated using Human Protein Atlas v23 https://www.proteinatlas.org/ (accessed on 22nd September 2024).

      Regarding literature values and tissue stiffness, we would like to argue that cell stiffness is not equivalent to tissue stiffness, and we are interested in the former. Tissue stiffness is governed by a combination of cell mechanical properties, cell adhesions, packing and the extracellular matrix. There can be, in fact, mechanically distinct cell types (for example characterized by different metabolic state, malignancy level etc) within one tissue of given stiffness. Hence, we consider that testing for the correlation between tissue stiffness and expression of identified genes is not immediately relevant.

      (3) Fig.5D,H show important time-dependent mechanics that need to be used to provide explanations of the differences in RT-DC (5B,F) and in standard AFM indentation expts (5C,G). In particular, it looks to me that RT-DC is a high-f/short-time measurement compared to the AFM indentation, and an additional Main or Supp Fig needs to somehow combine all of this data to clarify this issue. 

      We thank the reviewer for this comment. It is indeed the case, that cells typically display higher stiffness when probed at higher rates. We have now expanded on this aspect of the results and added a supplementary figure (Fig. S10) that illustrates the frequencies used in different methods and summarizes the apparent Young’s moduli values into one plot in a frequencyordered manner. Of note, we typically acquire RT-DC measurements at up to three flowrates, and the increase in measurement flow rates accompanying increase in flow rate also results in higher extracted apparent Young’s moduli (see Fig. S10 B,D). We have further added Table S9 that summarizes operating parameters of all three methods used for probing cell mechanics in this manuscript:

      “The three techniques for characterizing mechanical properties of cells — RT-DC, AFM indentation and AFM microrheology — differ in several aspects (summarized in Table S9), most notably in the frequency at which the force is applied to cells during the measurements, with RT-DC operating at the highest frequency (~600 Hz), AFM microrheology at a range of frequencies in-between (3–200 Hz), and AFM indentation operating at lowest frequency (5 Hz) (see Table S9 and Figure S10A). Even though the apparent Young’s moduli obtained for TGBCS cells were consistently higher than those for ECC4 cells across all three methods, the absolute values measured for a given cell line varied depending on the methods: RT-DC measurements yielded higher apparent Young’s moduli compared to AFM indentation, while the apparent Young’s moduli derived from AFM microrheology measurements were frequency-dependent and fell between the other two methods (Fig. 5B–D, Fig. S10B). The observed increase in apparent Young’s modulus with probing frequency aligns with previous findings on cell stiffening with increased probing rates observed for both AFM indentation (68, 69) and microrheology assays (70–72).”

      (4) The plots in Fig.S4 are important as main Figs, particularly given the cartoons of different tissues in Fig.1,2. However, positive correlations for a few genes (CAV1, IGFBP7, TAGLN) are most clear for the multiple lineages that are the same (stomach) or similar (gli, neural & pluri). The authors need to add green lines and pink lines in all plots to indicate the 'lineagespecific' correlations, and provide measures where possible. Some genes clearly don't show the same trends and should be discussed. 

      We thank reviewer for this comment. It is indeed an interesting observation (and worth highlighting by adding the fits to lineage-restricted data) that the relationship between relative change in Young’s modulus and the selected gene expression becomes steeper for samples from similar tissue contexts. 

      For the sake of keeping the main manuscript compact, we decided to keep Fig. S7 (formerly Fig. S4) in the supplement, however, we did add the linear fit to the glioblastoma dataset (pink line) and a fit to the related neural/embryonic datasets (gli, neural & pluri – purple line) as advised — see below.

      We did not pool the stomach data since it is represented by a single point in the figure, aligning with how the data is presented in the main text—stomach adenocarcinoma cell lines (MKN1 and MKN45) are pooled in Fig. 1B (see below).

      We have also amended the respective results section to emphasize that, in certain instances, the correlation between changes in mechanical phenotype and alterations in the expression of analysed genes may be less pronounced:

      “The relation between normalized apparent Young’s modulus change and fold-change in the expression of the target genes is presented in Fig. S7. The direction of changes in the expression levels between the soft and stiff cell states in the validation datasets was not always following the same direction (Fig. 4, C to F, Fig. S7). This suggests that the genes associated with cell mechanics may not have a monotonic relationship with cell stiffness, but rather are characterized by different expression regimes in which the expression change in opposite directions can have the same effect on cell stiffness. Additionally, in specific cases a relatively high change in Young’s modulus did not correspond to marked expression changes of a given gene — see for example low CAV1 changes observed in MCF10A PIK3CA mutant (Fig. S7A), or low IGFBP7 changes in intestine and lung carcinoma samples (Fig. S7C). This indicates that the importance of specific targets for the mechanical phenotype change may vary depending on the origin of the sample.”

      (5) Table-1 neuro: Perhaps I missed the use of the AFM measurements, but these need to be included more clearly in the Results somewhere. 

      To clarify: there were no AFM measurements performed for the developing neurons (neuro) dataset, and it is not marked as such in Table 1. There are previously published AFM measurements for the iPSCs dataset (maybe that caused the confusion?), and we referred to them as such in the table by citing the source (Urbanska et al (30)) as opposed to the statement “this paper” (see the last column of Table 1). We did not consider it necessary to include these previously published data. We have added additional horizontal lines to the table that will hopefully help in the table readability.

      Reviewer #3 (For Authors) 

      Major 

      -  I strongly encourage the authors to validate their approach with a gene for which mechanical data does not exist yet, or explore how the combination of the 5 identified genes is the novel regulator of cell mechanics. 

      We appreciate the reviewer’s insightful comment and agree that it would be highly interesting to validate further targets and perform combinatorial perturbations. However, it is not feasible at this point to expand the experimental data beyond the one already provided. We hope that in the future, the collective effort of the cell mechanics community will establish more genes that can be used for tuning of mechanical properties of cells.

      - If this paper aims at highlighting the power of PC-Corr as a novel inference approach, the authors should compare its predictive power to that of classical co-expression network analysis or an alternative gold standard. 

      We thank the reviewer for the suggestion to compare the predictive power of PC-Corr with classical co-expression network analysis or an alternative gold standard. PC-corr has been introduced and characterized in detail in a previous publication (Ciucci et al, 2017, Sci. Rep.), where it was compared against standard co-expression analysis methods. Here we implement PC-corr for a particular application. Thus, we do not see it as central to the message of the present manuscript to compare it with other available methods again.

      - The authors call their 5 identified genes "universal, trustworthy and specific". While they provide a great amount of data all is derived from human and mouse cell lines. I suggest toning this down. 

      We thank the reviewers for this comment. To clarify, the terms universal, trustworthy and specific are based on the specific hypotheses tested in the validation part of the manuscript, but we understand that it may cause confusion. We have now toned that the statement by adding “universal, trustworthy and specific across the studied mouse and human systems” in the following text fragments:

      (1) Abstract

      “(…) We validate in silico that the identified gene markers are universal, trustworthy and specific to the mechanical phenotype across the studied mouse and human systems, and demonstrate experimentally that a selected target, CAV1, changes the mechanical phenotype of cells accordingly when silenced or overexpressed. (...)”

      (2) Last paragraph of the introduction

      “(…) We then test the ability of each gene to classify cell states according to cell stiffness in silico on six further transcriptomic datasets and show that the individual genes, as well as their compression into a combinatorial marker, are universally, specifically and trustworthily associated with the mechanical phenotype across the studied mouse and human systems. (…)”

      (3) First paragraph of the discussion

      “We provided strong evidence that the inferred conserved functional network module contains an ensemble of five genes that, in particular when combined in a unique combinatorial marker, are universal, specific and trustworthy markers of mechanical phenotype across the studied mouse and human systems.”

      Minor suggestions 

      -  The authors point out how genes that regulate mechanics often display non-monotonic relations with their mechanical outcome. Indeed, in Fig.4 developing neurons have lower CAV1 in the stiff group. Perturbing CAV1 expression in that model could show the nonmonotonic relation and strengthen their claim. 

      We thank reviewer for highlighting this important point. It would indeed be interesting to explore the changes in cell stiffness upon perturbation of CAV1 in a system that has a potential to show an opposing behavior. Unfortunately, we are unable to expand the experimental part of the manuscript at this time. We do hope that this point can be addressed in future research, either by our team or other researchers in the field. 

      -  In their gene ontology enrichment assay, the authors claim that their results point towards reduced transcriptional activity and reduced growth/proliferation in stiff compared to soft cells. Proving this with a simple proliferation assay would be a nice addition to the paper. 

      This is a valuable suggestion that should be followed up on in detail in the future. To give a preliminary insight into this line of investigation, we have had a look at the cell count data for the CAV1 knock down experiments in TGBC cells. Since CAV1 is associated with the GO Term “negative regulation of proliferation/transcription” (high CAV1 – low proliferation), we would expect that lowering the levels of CAV1 results in increased proliferation and higher cell counts at the end of experiment (3 days post transfection). As illustrated in Author response image 19 below, the cell counts were higher for the samples treated with CAV1 siRNAs, though, not in a statistically significant way. Interestingly, the magnitude of the effect partially mirrored the trends observed for the cell stiffness (Figure 5F).

      Author response image 19.

      The impact of CAV1 knock down on cell counts in TGBC cells. (A) Absolute cell counts per condition in a 6-well format. Cell counts were performed when harvesting for RT-DC measurements using an automated cell counter (Countess II, Thermo Fisher Scientific). (B) The event rates observed during the RT-DC measurements. The harvested cells are resuspended in a specific volume of measuring buffer standardized per experiment (50-100 μl); thus, the event rates reflect the absolute cell numbers in the respective samples. Horizontal lines delineate medians with mean absolute deviation (MAD) as error, datapoints represent individual measurement replicates, with symbols corresponding to matching measurement days. Statistical analysis was performed using two sample two-sided Wilcoxon rank sum test.

      Methods

      - The AFM indentation experiments are performed with a very soft cantilever at very high speeds. Why? Also, please mention whether the complete AFM curve was fitted with the Hertz/Sneddon model or only a certain area around the contact point. 

      We thank the reviewer for this comment. However, we believe that the spring constants and indentation speeds used in our study are typical for measurements of cells and not a cause of concern. 

      For the indentation experiments, we used Arrow-TL1 cantilevers (nominal spring constant k = 0.035-0.045 N m<sup>−1</sup>, Nanoworld, Switzerland) which are used routinely for cell indentation (with over 200 search results on Google Scholar using the term: "Arrow-TL1"+"cell", and several former publications from our group, including Munder et al 2016, Tavares et al 2017, Urbanska et al 2017, Taubenberger et al 2019, Abuhattum et al 2022, among others). Additionally, cantilevers with the spring constants as low as 0.01 N m−1 can be used for cell measurements (Radmacher 2002, Thomas et al, 2013). 

      The indentation speed of 5 µm s<sup>−1</sup> is not unusually high and does not result in significant hydrodynamic drag. 

      For the microrheology experiments, we used slightly stiffer and shorter (100/200 µm compared to 500 µm for Arrow-TL1) cantilevers: PNP-TR-TL (nominal spring constant k = 0.08 N m<sup>−1</sup>, Nanoworld, Switzerland). The measurement frequencies of 3-200 Hz correspond to movements slightly faster than 5 µm s<sup>−1</sup>, but cells were indented only to 100 nm, and the data were corrected for the hydrodynamic drag (see equation (8) in Methods section).

      Author response image 20.

      Exemplary indentation curve obtained using arrow-TL1 decorated with a 5-µm sphere on a ECC4 cell. The shown plot is exported directly from JPK Data Processing software. The area shaded in grey is the area used for fitting the Sneddon model.  

      In the indentation experiments, the curves were fitted to a maximal indentation of 1.5 μm (rarely exceeded, see Author response image 20). We have now added this information to the methods section:

      - Could the authors include the dataset wt #1 in Fig 4D? Does it display the same trend? 

      We thank the reviewer for this comment. To clarify: in the MCF10A dataset (GEO: GSE69822) there are exactly three replicates of each wt (wild type) and ki (knock-in, referring to the H1047R mutation in the PIK3CA) samples. The numbering wt#2, wt#3, wt#4 originated from the short names that were used in the working files containing non-averaged RPKM (possibly to three different measurement replicates that may have not been exactly paired with the ki samples). We have now renamed the samples as wt#1, wt#2 and wt#3 to avoid the confusion. This naming also reflects better the sample description as deposited in the GSE69822 dataset (see Author response table 2).

      Author response table 2.

      - Reference (3) is an opinion article with the last author as the sole author. It is used twice as a self-standing reference, which is confusing, as it suggests there is previous experimental evidence. 

      We thank the reviewer for pointing this out and agree that it may not be appropriate to cite the article (Guck 2019 Biophysical Reviews, formerly Reference (3), currently Reference (76)) in all instances. The references to this opinion article have now been removed from the introduction:

      “The extent to which cells can be deformed by external loads is determined by their mechanical properties, such as cell stiffness. Since the mechanical phenotype of cells has been shown to reflect functional cell changes, it is now well established as a sensitive label-free biophysical marker of cell state in health and disease (1-2).”

      “Alternatively, the problem can be reverse-engineered, in that omics datasets for systems with known mechanical phenotype changes are used for prediction of genes involved in the regulation of mechanical phenotype in a mechanomics approach.”

      But has been kept in the discussion:

      “The mechanical phenotype of cells is recognized as a hallmark of many physiological and pathological processes. Understanding how to control it is a necessary next step that will facilitate exploring the impact of cell mechanics perturbations on cell and tissue function

      (76).”.

      This reference seems appropriate to us as it expands on the point that our ability to control cell mechanics will enable the exploration of its impact on cell and tissue function, which is central to the discussion of the current manuscript. 

      -The authors should mention what PC-corr means. Principle component correlation? Pearson's coefficient correlation? 

      PC-corr is a combination of loadings from the principal component (PC) analysis and Pearson’s correlation for each gene pair. We have aimed at conveying this in the “Discriminative network analysis on prediction datasets” result section. We have now added and extra sentence at the first appearance of PC-corr to clarify that for the readers from the start:

      “After characterizing the mechanical phenotype of the cell states, we set out to use the accompanying transcriptomic data to elucidate genes associated with the mechanical phenotype changes across the different model systems. To this end, we utilized a method for inferring phenotype-associated functional network modules from omics datasets termed PCCorr (28), that relies on combining loadings obtained from the principal component (PC) analysis and Pearson’s correlation (Corr) for every pair of genes. PC-Corr was performed individually on two prediction datasets, and the obtained results were overlayed to derive a conserved network module. Owing to the combination of the Pearson’s correlation coefficient and the discriminative information included in the PC loadings, the PC-corr analysis does not only consider gene co-expression — as is the case for classical co-expression network analysis — but also incorporates the relative relevance of each feature for discriminating between two or more conditions; in our case, the conditions representing soft and stiff phenotypes. The overlaying of the results from two different datasets allows for a multi-view analysis (utilizing multiple sets of features) and effectively merges the information from two different biological systems.”

      - The formatting of Table 1 is confusing. Horizontal lines should be added to make it clear to the reader which datasets are human and which mouse as well as which accession numbers belong to the carcinomas. 

      Horizontal lines have now been added to improve the readability of Table 1. We hope that makes the table easier to follow and satisfies the request. We assume that further modifications to the table appearance may occur during publishing process in accordance with the publisher’s guidelines. 

      - In many figures, data points are shown in different shapes without an explanation of what the shapes represent. 

      We thank the reviewer for this comment and apologize for not adding this information earlier. We have added explanations of the symbols to captions of Figures 2, 3, 5, and 6 in the main text:

      “Fig. 2. Mechanical properties of divergent cell states in five biological systems. Schematic overviews of the systems used in our study, alongside with the cell stiffness of individual cell states parametrized by Young’s moduli E. (…) Statistical analysis was performed using generalized linear mixed effects model. The symbol shapes represent measurements of cell lines derived from three different patients (A), matched experimental replicates (C), two different reprogramming series (D), and four different cell isolations (E). Data presented in (A) and (D) were previously published in ref (29) and (30), respectively.”

      “Fig. 3. Identification of putative targets involved in cell mechanics regulation. (A) Glioblastoma and iPSC transcriptomes used for the target prediction intersect at 9,452 genes. (B, C) PCA separation along two first principal components of the mechanically distinct cell states in the glioblastoma (B) and iPSC (C) datasets. The analysis was performed using the gene expression data from the intersection presented in (A). The symbol shapes in (B) represent cell lines derived from three different patients. (…)”

      “Fig. 5. Perturbing levels of CAV1 affects the mechanical phenotype of intestine carcinoma cells. (…) In (E), (F), (I), and (J), the symbol shapes represent experiment replicates.”

      “Fig. 6. Perturbations of CAV1 levels in MCF10A-ER-Src cells result in cell stiffness changes. (…)  Statistical analysis was performed using a two-sided Wilcoxon rank sum test. In (B), (D), and (E), the symbol shapes represent experiment replicates.”

      As well as to Figures S2, S9, and S11 in the supplementary material (in Figure S2, the symbol explanation was added to the legends in the figure panels as well): 

      “Fig. S2. Plots of area vs deformation for different cell states in the characterized systems. Panels correspond to the following systems: (A) glioblastoma, (B) carcinoma, (C) non-tumorigenic breast epithelia MCF10A, (D) induced pluripotent stem cells (iPSCs), and (E) developing neurons. 95%- and 50% density contours of data pooled from all measurements of given cell state are indicated by shaded areas and continuous lines, respectively. Datapoints indicate medians of individual measurements. The symbol shapes represent cell lines derived from three different patients (A), two different reprogramming series (D), and four different cell isolations (E), as indicated in the respective panels. (…).”

      “Fig. S9. CAV1 knock-out mouse embryonic fibroblasts (CAV1KO) have lower stiffness compared to the wild type cells (WT). (…) (C) Apparent Young’s modulus values estimated for WT and CAV1KO cells using areadeformation data in (B). The symbol shapes represent experimental replicates. (…)”

      “Fig. S11. Plots of area vs deformation from RT-DC measurements of cells with perturbed CAV1 levels. Panels correspond to the following experiments: (A and B) CAV1 knock-down in TGBC cells using esiRNA (A) and ONTarget siRNA (B), (C and D) transient CAV1 overexpression in ECC4 cells (C) and TGBC cells (D). Datapoints indicate medians of individual measurement replicates. The isoelasticity lines in the background (gray) indicate regions of of same apparent Young’s moduli. The symbol shapes represent experimental replicates.”

      - In Figure 2, the difference in stiffness appears bigger than it actually is because the y-axes are not starting at 0. 

      While we acknowledge that starting the y-axes at a value other than 0 is generally not ideal, we chose this approach to better display data variability and minimize empty space in the plots.

      A similar effect can be achieved with logarithmic scaling, which is a common practice (see  Author response image 21 for visualization). We believe our choice of axes cut-off enhances the interpretability of the data without misleading the viewer.

      Author response image 21.

      Visualization of different axis scaling strategies applied to the five datasets presented in Figure 2 of the manuscript. 

      Of note, apparent Young’s moduli obtained from RT-DC measurements typically span 0.5-3.0 kPa (see Figure 2.3 from Urbanska et al 2021, PhD thesis). Differences between treatments rarely exceed a few hundred pascals. For example, in an siRNA screen of mitotic cell mechanics regulators in Drosophila cells (Kc167), the strongest hits (e.g., Rho1, Rok, dia) showed changes in stiffness of 100-150 Pa (see Supplementary Figure 11 from Rosendahl, Plak et al 2018, Nature Methods 15(5): 355-358).

      - In Figure 3, I don't personally see the benefit of showing different cut-offs for PC-corr. In the end, the paper focuses on the 5 genes in the pentagram. I think only showing one of the cutoffs and better explaining why those target genes were picked would be sufficient and make it clearer for the reader. 

      We believe it is beneficial to show the extended networks for a few reasons. First, it demonstrates how the selected targets connect to the broader panel of the genes, and that the selected module is indeed much more interconnected than other nodes. Secondly, the chosen PC-corr cut-off is somewhat arbitrary and it may be interesting to look through the genes from the extended network as well, as they are likely also important for regulating cell mechanics. This broader view may help readers identify familiar genes and recognizing the connections to relevant signaling networks and processes of interest.

      - In Figure 4C, I suggest explaining why the FANTOM5 and not another dataset was used for the visualization here and mentioning whether the other datasets were similar. 

      In Figure 4C, we have chosen to present data corresponding to FANTOM5, because that was the only carcinoma dataset in which all the cell lines tested mechanically are presented. We have now added this information to the caption of Figure 4. Additionally, the clustergrams corresponding to the remaining carcinoma datasets (CCLE RNASeq, Genetech ) are presented in supplementary figures S4-S6. 

      “The target genes show clear differences in expression levels between the soft and stiff cell states and provide for clustering of the samples corresponding to different cell stiffnesses in both prediction and validation datasets (Fig. 4, Figs. S4-S6).”

      Typos 

      We would like to thank the Reviewer#3 for their detailed comments on the typos and details listed below. This is much appreciated as it improved the quality of our manuscript.

      -  In the first paragraph of the results section the 'and' should be removed from this sentence: Each dataset encompasses two or more cell states characterized by a distinct mechanical phenotype, and for which transcriptomic data is available. 

      The sentence has been corrected and now reads:

      “Each dataset encompasses two or more cell states characterized by a distinct mechanical phenotype, and for which transcriptomic data is available.”

      -  In the methods in the MCF10A PIK3CA cell lines part, it says cell liens instead of cell lines. 

      The sentence has been corrected and now reads:

      “The wt cells were additionally supplemented with 10 ng ml<sup>−1</sup> EGF (E9644, Sigma-Aldrich), while mutant cell lienes were maintained without EGF.”

      -  In the legend of Figure 6 "accession number: GSE17941, data previously published in ())" the reference is missing. 

      The reference has been added.

      -  In the legend of Figure 5 "(E) Verification of CAV1 knock-down in TGBC cells using two knock-down system" 'a' between using and two is missing. 

      The legend has been corrected (no ‘a’ is missing, but it should say systems (plural)):

      -  In Figure 5B one horizontal line is missing. 

      The Figure 5B has been corrected accordingly. 

      -  Terms such as de novo or in silico should be written in cursive. 

      We thank the Reviewer for this comment; however, we believe that in the style used by eLife, common Latin expressions such as de novo or in vitro are used in regular font.

      -  In the heading of Table 4 "The results presented in this table can be reproducible using the code and data available under the GitHub link reported in the methods section." It should say reproduced instead of reproducible. 

      Yes, indeed. It has been corrected.

      -  The citation of reference 20 contains several author names multiple times. 

      Indeed, it has been fixed now:

      -  In Figure S2 there is a vertical line in the zeros of the y axis labels. 

      I am not sure if there was some rendering issue, but we did not see a vertical line in the zeros of the y axis label in Figure S2.

      - The Text in Figure S4 is too small.                   

      We thank the reviewer for pointing this out. We have now revised Figure S7 (formerly Figure S4) to increase the text size, ensuring better readability. (It has also been updated to include additional fits as requested by Reviewer #2).

      - In Table 3 "positive hypothesis II markers are discriminative of samples with stiff/soft independent of data source" the words 'mechanical phenotype' are missing. 

      The column headings in Table 3 have now been updated accordingly.

      - In Table S3 explain in the table headline what vi1, vi2 and v are. I assume the loading for PC1, the loading for PC2 and the average of the previous two values. But it should be mentioned somewhere.

      The caption of table S3 has been updated to explain the meaning of vi1, vi2 and v.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors provide strong evidence that the cell surface E3 ubiquitin ligases RNF43 and ZNRF3, which are well known for their role in regulating cell surface levels of WNT receptors encoded by FZD genes, also target EGFR for degradation. This is a newly identified function for these ubiquitin ligases beyond their role in regulating WNT signaling. Loss of RNF43/ZNRF3 expression leads to elevated EGFR levels and signaling, suggesting a potential new axis to drive tumorigenesis, whereas overexpression of RNF43 or ZNRF3 decreases EGFR levels and signaling. Furthermore, RNF43 and ZNRF3 directly interact with EGFR through their extracellular domains.

      Strengths:

      The data showing that RNF43 and ZNRF3 interact with EGFR and regulate its levels and activity are thorough and convincing, and the conclusions are largely supported.

      Weaknesses:

      While the data support that EGFR is a target for RNF43/ZNRF3, some of the authors' interpretations of the data on EGFR's role relative to WNT's roles downstream of RNF43/ZNRF3 are overstated. The authors, perhaps not intentionally, promote the effect of RNF43/ZNRF3 on EGFR while minimizing their role in WNT signaling. This is the case in most of the biological assays (cell and organoid growth and mouse tumor models). For example, the conclusion of "no substantial activation of Wnt signaling" (page 14) in the prostate cancer model is currently not supported by the data and requires further examination. In fact, examination of the data presented here indicates effects on WNT/b-catenin signaling, consistent with previous studies.

      Cancers in which RNF43 or ZNRF3 are deleted are often considered to be "WNT addicted", and inhibition of WNT signaling generally potently inhibits tumor growth. In particular, treatment of WNT-addicted tumors with Porcupine inhibitors leads to tumor regression. The authors should test to what extent PORCN inhibition affects tumor (and APC-min intestinal organoid) growth. If the biological effects of RNF43/ZNRF3 loss are mediated primarily or predominantly through EGFR, then PORCN inhibition should not affect tumor or organoid growth.

      We thank the reviewer’s appreciation of the key strength of our study. We fully agree with the reviewer that RNF43/ZNRF3 play key roles in restraining WNT signaling and their deletions activate WNT signaling that leads  to cancer promotion, as discussed and cited in our manuscript (Hao et al, 2012; Koo et al, 2012). We have revised the language in this manuscript to avoid any confusion or appearance of downplaying this known signaling pathway in cancer progression.

      What we would like to highlight in this work is that our study uncovered an effect of RNF43/ZNRF3 on EGFR, leading to biological impact in multiple model systems. In particular, we included the APC-mutated human cancer cell line HT29 and Apc min mouse intestinal tumor organoids. In the context of APC mutations, β-catenin stabilization and the activation of WNT target genes are essentially decoupled from upstream WNT ligand binding to WNT receptors, thus we could primarily focus on the effect of RNF43/ZNRF3 on EGFR. Our statement of “no substantial activation of WNT signaling” as cited by the reviewer was made in describing the data in Fig. 7E where we did not observe β-catenin accumulation in the nucleus and reasoned no substantial activation of canonical WNT signaling. We agree that further examination would help strengthen the conclusion and appreciate the reviewer’s suggestion of PORCN inhibition experiments. While PORCN inhibition is a valuable experiment in models with abundance of WNT ligands/receptors and non-mutationally activated regulators of WNT signaling (Yu et al, 2020), in biological scenarios with existing APC mutations, another group has previously demonstrated that PORCN inhibition had no observable effect on WNT signaling in APC-deficient cells (PMID: 29533772). In our initial submission, we confirmed this predicted low response to manipulation of WNT signaling components upstream of a mutated APC. We showed that addition of RSPO1 in Apc min mouse intestinal tumor organoids failed to further activate WNT target expression (Fig. 6G). Furthermore, in this revised manuscript, we added new data on EGFR inhibition and PORCN inhibition in WT and Znrf3 KO MEFs (Fig. 6L). PORCN inhibition had no impact on cell growth in neither WT nor Znrf3 KO MEFs, suggesting that Znrf3 KO promoting MEF growth is WNT independent. In contrast, inhibition of EGFR downstream signaling components (Fig. 6L) significantly blocked MEF growth and abolished the impact of Znrf3 KO in MEF growth. This new evidence further supports our main conclusion that RNF43/ZNRF3 controls EGFR signaling to regulate cell growth.

      Reviewer #2 (Public Review):

      Using proteogenomic analysis of human cancer datasets, Yu et al, found that EGFR protein levels negatively correlate with ZNFR3/RNF43 expression across multiple cancers. Interestingly, they found that CRC harbouring the frequent RNF43 G659Vfs*41 mutation exhibits higher levels of EGFR when compared to RNF43 wild-type tumors. This is highly interesting since this mutation is generally not thought to influence Frizzled levels and Wnt-bcatenin pathway activity. Using CRISPR knockouts and overexpression experiments, the authors show that EGFR levels are modulated by ZNRF3/RNF43. Supporting these findings, modulation of ZNRF3/RNF43 activity using Rspondin also leads to increased EGFR levels. Mechanistically, the authors, show that ZNRF3/RNF43 ubiquitinate EGFR and leads to degradation. Finally, the authors present functional evidence that loss of ZNRF3/RNF43 unleashes EGFR-mediated cell growth in 2D culture and organoids and promotes tumor growth in vivo.

      Overall, the conclusions of the manuscript are well supported by the data presented, but some aspects of the mechanism presented need to be reinforced to fully support the claims made by the authors. Additionally, the title of the paper suggests that ZNRF3 and RNF43 loss leads to the hyperactivity of EGFR and that its signalling activity contributes to cancer initiation/progression. I don't think the authors convincingly showed this in their study.

      We thank the reviewer commenting that our “conclusions of the manuscript are well supported by the data presented.”  We address the concerns raised by this reviewer in an itemized way as detailed below:

      Major points:

      (1) EGFR ubiquitination. All of the experiments supporting that ZNFR3/RNF43 mediates EGFR ubiquitination are performed under overexpression conditions. A major caveat is also that none of the ubiquitination experiments are performed under denaturing conditions. Therefore, it is impossible to claim that the ubiquitin immunoreactivity observed on the western blots presented in Figure 4 corresponds to ubiquitinated-EGFR species. Another issue is that in Figure 4A, the experiments suggest that the RNF43-dependent ubiquitination of EGFR is promoted by EGF. However, there is no control showing the ubiquitination of EGFR in the absence of EGF but under RNF43 overexpression. According to the other experiments presented in Figures 4B, 4C, and 4F, there seems to be a constitutive ubiquitination of EGFR upon overexpression. How do the authors reconcile the role of ZNRF3/RNF43 vs c-cbl?

      We agree with this reviewer of the limitation of overexpression experiments. In this manuscript, we actually leveraged both overexpression and knockout systems to demonstrate that ZNRF3/RNF43 regulates EGFR ubiquitination: in Fig 4A, we showed that overexpression of RNF43 increased EGFR ubiquitination; in Fig 4B&C and Fig S3A, we showed that RNF43 knockout decreased EGFR ubiquitination; in Fig 4F, we showed that overexpression of ZNRF3 WT increased EGFR ubiquitination but overexpression of ZNRF3 RING domain deletion mutant failed to increase EGFR ubiquitination.

      We also appreciate the rigor with which the reviewer has approached our methodology. We acknowledge that denaturing conditions can provide additional validation, but the technical challenges associated with denaturing conditions include the potential disruption of epitope structures recognized by these antibodies. Our methodology was chosen to balance the need for accurate detection with the preservation of protein structure and function, which are crucial for understanding the biological implications of EGFR ubiquitination. Moreover, our immunoprecipitation and subsequent Western blotting were stringent with high SDS and 2-ME, optimized to minimize non-specific binding and enhance the specificity of detection. We believe that the data presented are robust and contribute significantly to the existing body of knowledge on EGFR ubiquitination.

      CBL is a well-known E3 ligase of EGFR, and it induces EGFR ubiquitination upon EGF ligand stimulation. Therefore, in order to have a fair comparison of RNF43 and CBL on EGFR ubiquitination, we designed Fig 4A and related experiments in the setting of EGF stimulation. We observed that RNF43 overexpression increased EGFR ubiquitination as potently as CBL did. Following this result, we further demonstrated that knockout of RNF43 decreased endogenous ubiquitinated EGFR level in the unstimulated/basal condition (Fig 4B) as well as in the EGF-stimulated condition (Fig 4C). We acknowledge the importance and interest in fully understanding how ZNRF3/RNF43 interplays with the functions of CBL in regulating EGFR ubiquitination. This line of investigation indeed holds the potential to uncover novel regulatory mechanisms in detail. However, the primary focus of the current study was to establish a foundational understanding of ZNRF3/RNF43 role in regulating EGFR ubiquitination. We look forward to exploring further in future work.

      (2) EGFR degradation vs internalization. In Figure 3C, the authors show experiments that demonstrate that RNF43 KO increases steady-state levels of EGFR and prevents its EGF-dependent proteolysis. Using flow cytometry they then present evidence that the reduction in cell surface levels of EGFR mediated by EGF is inhibited in the absence of RNF43. The authors conclude that this is due to inhibition of EGF-induced internalization of surface EGF. However, the experiments are not designed to study internalization and rather merely examine steady-state levels of surface EGFR pre and post-treatment. These changes are an integration of many things (retrograde and anterograde transport mechanisms presumable modulated by EGF). What process(es) is/are specifically affected by ZNFR3/RNF43? Are these processes differently regulated by c-cbl? If the authors are specifically interested in internalization/recycling, the use of cell surface biotinylation experiments and time courses are needed to examine the effect of EGF in the presence or absence of the E3 ligases.

      We agree that our study design primarily assesses EGFR levels on the cell surface before and after EGF treatment and does not comprehensively measure the whole internalization process. In response to the reviewer’s comments, we have revised the relevant sections of manuscript to clarify that our current findings are focused on changes in cell surface EGFR and do not extend to the detailed mechanisms of EGF-induced internalization or recycling.

      (3) RNF43 G659fs*41. The authors make a point in Figure 1D that this mutant leads to elevated EGFR in cancers but do not present evidence that this mutant is ineffective in mediated ubiquitination and degradation of EGFR. As this mutant maintains its ability to promote Frizzled ubiquitination and degradation, it would be important to show side by side that it does not affect EGFR. This would perhaps imply differential mechanisms for these two substrates.

      Fig 1D is based on bioinformatic analysis of colon cancer patient samples, showing that RNF43 G659Vfs*41 mutant tumors exhibited significantly higher levels of EGFR protein compared to RNF43 WT tumors. Following this lead, we investigated whether this RNF43 G659fs*41 hotspot mutation lost its role in downregulating EGFR. To this end, we transfected the same amount of control vector, RNF43 WT, RING deletion mutant, G659fs*41 mutant DNA into 293T cells and measured the level of EGFR (co-transfected). As shown in Author response image 1, overexpression of RNF43 WT decreased EGFR level while overexpression of RING deletion mutant had no impact on EGFR level as compared with the Vector group, which is consistent with our findings in the manuscript. Cells transfected with the RNF43 G659Vfs*41 mutant exhibited nearly normal levels of EGFR; however, we also observed that RNF43 G659Vfs*41 was less expressed than WT, even though the same amounts of DNA were transfected. Therefore, the insubstantial impact on EGFR levels could be attributed to both functional loss or compromised stability of RNF43 G659Vfs*41 mRNA or protein. Further investigation on RNF43 G659Vfs*41 mRNA and protein stability vs. RNF43 G659Vfs*41 protein function is needed to draw a solid conclusion.

      Author response image 1.

      (4) "Unleashing EGFR activity". The title of the paper implies that ZNRF3/RNF43 loss leads to increased EGFR expression and hence increased activity that underlies cancer. However, I could find only one direct evidence showing that increased proliferation of the HT29 cell line mutant for RNF43 could be inhibited by the EGFR inhibitor Erlotinib. All the other evidence presented that I could find is correlative or indirect (e.g. RPPA showing increased phosphorylation of pathway members upon RNF43 KO, increased proliferation of a cell line upon ZNRF3/ RNF43 KO, decreased proliferation of a cell line upon ZNRF3/RNF43 OE in vitro or in xeno...). Importantly, the authors claim that cancer initiation/ progression in ZNRF3/RNF43 mutants may in some contexts be independent of their regulation of Wnt-bcatenin signaling and relying on EGFR activity upregulation. However, this has not been tested directly. Could the authors leverage their znrf3/RNF43 prostate cancer model to test whether EGFR inhibition could lead to reduced cancer burden whereas a Frizzled or Wnt inhibitor does not?

      More broadly, if EGFR signaling were to be unleashed in cancer, then one prediction would be that these cells would be more sensitive to EGFR pathway inhibition. Could the authors provide evidence that this is the case? Perhaps using isogenic cell lines or a panel of patient-derived organoids (with known genotypes).

      We appreciate the reviewer’s suggestion to provide more direct evidence demonstrating the importance of the ZNRF3/RNF43-EGFR axis in cancer cell proliferation.   In this revised manuscript, we further studied this issue in the WT vs. Znrf3 KO MEF cells. We observed that treatment with the EGFR inhibitor erlotinib did not affect WT MEF but stunted the growth advantage of Znrf3 KO MEF cells (Fig. 6L). On the other hand, treatment with the porcupine inhibitor C59 did not impact either WT or Znrf3 KO MEF cells (Fig. 6L), suggesting a more important role of the ZNRF3/RNF43-EGFR axis in mediating the enhanced cell growth of MEF caused by Znrf3 knockout. Furthermore, considering EGFR is often mutated in human cancer, to increase the clinical relance of our study, we also tested the effect of RNF43 knockout on EGFR L858R (Fig. 2D), a common oncogenic EGFR mutant, and found that RNF43 knockout in HT29 boosted levels of this EGFR mutant detected by its FLAG tag, suggesting that RNF43 degrades both WT and mutated EGFR and its loss can enhance signaling of both WT EGFR and its oncogenic mutant .  However, we emphasize again that this manuscript is in no way written to diminish the proven importance of ZNRF3/RNF43-WNT-β-catenin axis in cancer and development.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The main conclusion that EGFR is targeted for degradation by RNF43 and ZNRF3 is well supported and documented. Figures 1-5 and associated supplemental figures contain largely convincing data. Figures 6 and 7, however, require some modifications, as follows in order of appearance:

      Figure 6C: Growth of intestinal tumor organoids from Apcmin mice does not require Rspo, however, the authors show that these organoids grow larger in the presence of Rspo, an effect they attribute to increased EGFR activity, rather than increased WNT activity. While this conclusion may be correct, the authors should address this possibility by treating the organoids with PORCN inhibitor. The prediction would be that Rspo treatment still increases organoid size in the presence of PORCN inhibition. A further prediction would be that blocking EGFR (e.g. with Cetuximab) will abrogate the RSPO1 effect.

      Yes, we attributed the impact of Rspo on Apc min organoid growth to enhanced EGFR activity because we observed increased EGFR levels (Fig 6F) but no detectable increase in eight WNT target genes assayed. We agree that further pharmacologic experiments would further boost our conclusion, but our few attempts at treating organoids encountered technical difficulties. Hence, we switched to testing PORCN inhibition vs EGFR inhibition in WT and Znfr33 KO MEFs. As shown in the revised Fig. 6L, EGFR inhibition significantly reversed the growth advantage caused by Znrf3 KO but C59 did not.

      Figure 6G: It is unclear why the authors provide "8-day RSPO1 treatment" data. Here, EGFR mRNA appears to be elevated 2-fold (perhaps not statistically significant), and the Wnt targets Lef1 and Axin2 are decreased, as indicated by the statistical significance. What point is being made here?

      Our observation of increased size of APC min mouse intestinal tumor organoids and increased the EGFR protein levels were at 8 days of RSPO1 treatment. Therefore, we measured mRNA levels at the same time point with the 2-day time point also included for comparison. The goal of this qPCR experiment was to detect the contribution of WNT signaling, and we did not detect an increased transcriptional readout. We included EGFR mRNA levels for comparison, and we did not detect a statistically significant increase, consistent with our experiments concluding that ZNRF3/RNF43 regulate EGFR at the protein level. As stated in the preceding response, these data led us to attribute the impact of Rspo on Apc min organoid growth to enhanced EGFR activity.

      Figure 7A: This requires quantitation. How many mice were used per cell line? The data shown is not particularly convincing, with ZNRF3 overexpressing HT29 cells growing detectably. Showing representative mice is fine, but this should be supplemented with quantitation of all mice.

      We had provided this data. The BLI signal quantification was shown below the representative BLI images. Seven mice were used per cell line, as annotated at the top of the graph.

      Figure 7B: The authors assert that "canonical WNT signaling, based on levels of active-β-Catenin (non-phosphorylated at Ser33/37/Thr41; Figure 7B), remained unaffected". As shown, 2 of the 3 Myc-Znrf3 tumors have increased active-b-catenin signal over the GFP tumors. This indicates to me that canonical Wnt signaling was affected. The authors either need to present quantitative data that supports this claim or modify their conclusions. As presented, I don't think it is appropriate to decouple the effect of Znrf3 overexpression on EGFR from its effect on WNT.

      As requested, we have quantified the level of non-phospho β-Catenin at Ser33/37/Thr41 and found no significant differences (p > 0.05) between the control group vs. ZNRF3 overexpression group. We once again note that our manuscript was not meant to dispute the proven signaling and biological significance of WNT signaling regulation by ZNRF3/RNF43, and we have proof-read the manuscript multiple times to ensure that we did not make any generalized or misleading statements in this aspect.

      Author response image 2.

      Figure 7E: Here the authors assert that "no substantial activation of canonical Wnt signaling" in the Z&R KO tumors, however, the figure shows a substantial increase in active b-catenin staining. The current resolution is insufficient to claim that there is no increase in nuclear b-catenin. The authors' claim that WNT signaling is not involved here is not supported by the data presented here. One way to demonstrate that this effect is through EGFR activation and not through WNT activation is to treat mice with PORCN inhibitor. WNT-addicted tumors, such as by Rnf43 or Znrf3 deletion, regress upon PORCN inhibition. In this case, if the effect of Z&R KO is mediated through EGFR rather than WNT, then there should be no effect on tumor growth upon PORCN inhibition. This is a critical experiment in order to make this point.

      We appreciate the reviewer’s comments and suggestion of experiments. We based our initial statement on insubstantial nuclear β-catenin staining, but we agree that immunohistochemical staining lacks the resolution suitable for quantification. We could not generate the adequate number of KO animals for these in vivo experiments in the window of time planned for this revision. Rather, as shown in the newly added Fig. 6L, we tested EGFR inhibition and PORCN inhibition in Znrf3 KO MEFs and obtained strong data further supporting EGFR in mediating Znrf3 KO promotion of MEF growth. Notwithstanding, we have carefully revised our description of the in vivo data in Fig 7E to avoid any confusion or over-interpretation.

      Minor points:

      Figure 2A: provide quantitation of this immunoblot.

      We have revised manuscript with quantification result shown next to the immunoblot.

      Figure 2B: provide more detail in the figure legend and in the Materials and Methods section on how the KO MEFs were generated. Confirmation that Znrf3 (or in cases of Rnf43 KO) expression is lost in KO would be advisable.

      We have confirmed Znrf3 KO by genotyping and RNF43 KO by immunofluorescent staining. We have also tested multiple commercial anti-ZNRF3 antibodies and anti-RNF43 antibodies for Western blotting, but they all failed.

      Figure 4C is a little misleading. The schematic indicates that ECD-TM and TM-ICD truncations were analyzed for both ZNRF3 and RNF43. However, Figure 4 only shows data for ZNRF3, and the corresponding Figure S4 lacks data for the TM-ICD of Rnf43. A recommendation is to show only those schematics for which data is presented in that figure. On a related topic, the results using the deltaRING constructs (Figure S5) are not mentioned/described in the text.

      We think that the reviewer meant Fig 5C. We have revised the Fig 5C by removing the RNF43 label, and we confirm that  Results section does include the data in Fig S5.

      Figure S4A: Only ZNRF3 is indicated in this figure. Please explain why RNF43 is not represented here. Also, indicate what is plotted along the x-axis.

      We only detected the endogenous ZNRF3-EGFR interaction, possibly because the RNF43 protein level is relatively low in the cell line we used for the mass spec experiment. X-axis is the proteins ordered based on Y-axis values as detailed in the figure legend  -- each data point was arranged along the x axis based on the fold change of iBAQ of EGFR-associated proteins identified in EGF-stimulated vs. control in the log2 scale, from low to high (from left to right on x axis). We have added the phrase “Proteins detected by Mass-Spec” for X-axis.

      Reviewer #2 (Recommendations For The Authors):

      Minor Points.

      (1) In Figure 2B, the authors claim that Znrf3 KO enhanced both EGFR and p-EGFR levels both in the absence and presence of EGF. Although it is clear in the presence of EGF, the increased in p-EGFR in the absence of EGF is less than clear.

      We have revised the manuscript to more clearly state the result in Fig 2B.

      (2) Importantly the authors validated their findings using three independent RNF43 gRNA (fig S2D) but they do not show the editing efficiency obtained with the gRNA.

      We did not include RNF43 IB in this Figure due to lack of specific antibodies for detecting RNR43 in IB. We have no reasons to doubt adequate efficiency of knockout since EGFR was increased compared to the control group. As a result, we did not perform deep sequencing to validate knockout efficacy.

      (3) In S2E, the authors show that KO of either ZNRF3 or RNF43 enhance HER2 levels. This suggests that there is no redundancy between these E3 ligases, at least in this context. How do the authors reconcile that?

      The reviewer raised an interesting issue. Due to the lack of WB antibodies for these two proteins, we would not easily assess the feedback impact of knockout of either gene on the protein levels of the other gene. We speculate that there may be a threshold level of the sum of the two proteins that is needed for adequate degradation of HER2, leading to HER2 increase when either gene is knocked out. Detailed studies of this issue is beyond the scope of this current work.

      (4) Experiments performed in Fig 3C are performed in only one clone. The authors need to repeat in an additional clone or rescue this phenotype using a RNF43 cDNA.

      Our RNF43 KO HT29 line is a pool of KO cells, not a single clone.

      (5) In Figure 7E, the authors suggest that the absence of nuclear bcatenin means that canonical Wnt signaling is unaffected. It is widely known that nuclear bcatenin is often not correlating with pathway activity.

      As stated above, we have revised the manuscript to avoid confusion and misinterpretation.

      (6) What is the nature of the error bars in Fig 3c? Are the differences statistically significant?

      As mentioned in the figure legend, the error bars are SEM. The result is statistically significant, and p-value is noted in the graph.

      (7) In the Figure legends, it should be stated clearly how many biological replicates were performed for each experiment and single data points should be plotted where applicable (e.g. qPCR data). It would be helpful if the uncropped and unprocessed Western blot membranes and replicates that are not shown would be accessible to allow the reader a more comprehensive view of the acquired data, especially for blots that were quantified (e.g. Figure 2F, Figure 3C, there is clearly some defect on the blot).

      For WB representation, it would be helpful to include more size markers on the Western blots (especially on the Ips that show ubiquitin smear) and in general to use a reference protein (GAPDH, Actin, Vinculin) that is closer to the protein being accessed.

      More details should be added in the Methods section to explain how protocols were performed in detail. For example, it should be explained how the viruses used for infecting cells were produced (which plasmids were transfected using which transfection reagent, how long was the virus collected for, etc). Then, it should be stated how long the cells were undergoing selection before being harvested. Because the expression of the viral constructs potentially has an effect on cell proliferation through EGFR, this information is quite relevant. This is just an example, there are details missing in nearly every section (Flow: washing protocols, gating protocols (Live/dead stain?), WB: RIPA lysis buffer composition? How much protein was loaded on blots? How was protein quantification done? IP: how were washes performed and how often repeated?)

      Missing: antibody dilutions for IF, IHC, and WB, plasmid backbones, sequences and availability, qPCR primer sequences from Origene.

      Incucyte experiments are not described.

      We have revised the relevant sections to include more details.

      (8) Line 141: revise text: 2x mRNA abundance in the same sentence.

      Line 162: define intermediate expression better.

      Line 197/198: revise text ('the predominant one'?).

      Line 218/219: revise text (Internalisation of surface EGFR?).

      Line 245: clarify in text that it is endogenous EGFR that is being pulled down.

      Line 264: typo: conserved instead of conservative.

      Line 324: revise text (What does 'unknown significance' mean).

      Line 396/397: revise text: 2x Co-IP in the same sentence.

      Figure 3 D/E: more details on the Method in the figure legend.

      We have revised them accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the Authors):

      The authors provide their data and code via Github, and that shiny apps allow easy access to their data. However, spending a few minutes with the snRNAseq app I could not figure out how to search for individual genes (e.g. DBH) on their web interface. Some changes could help to make this app more user-friendly.

      While it was not possible to easily modify the user interface of the snRNA-seq app itself, we have instead added two additional supplementary figures displaying screenshots and schematics with sequential instructions that provide a short tutorial showing how to search for individual genes and display either spatial gene expression (for the Visium SRT data) or gene expression by cluster or population (for the snRNA-seq data) in each interactive web app (Figure 3-figure supplement 20-21). We hope this makes the apps more accessible and assists users to more easily query specific genes that they are interested in.

      The first sentence of the abstract and line 70 on page 2 need to be revised for language / grammar / clarity.

      We have revised these two sentences. Line 70 on page 2 contained a typo / copy-paste error. Thank you for pointing this out.

      Reviewer #2 (Recommendations For The Authors):

      While the efforts of the authors to identify NE neurons in the LC is appreciated, the data fall a little short of conclusively calling these neurons solely noradrenergic as there is an apparent lack of overlap between TH and SLC6A2 in the spots. Undoubtedly, some spots contain both which is consistent with the RNA scope results, but there is clearly a pattern that shows spots that don't contain both. It would be worth testing the presence of other catecholamines in some of these certain spots particularly dopamine (Kempadoo et al. 2016, Takeuchi et al., 2016, Devoto et al. 2005).

      We agree this is an important point. To more rigorously investigate whether TH is co-expressed within cells that produce other catecholamines, particularly dopamine (DA) in addition to norepinephrine (NE), we have included additional analyses of the snRNA-seq and Visium data, as well as generated additional RNAscope data in the revised manuscript, as follows.

      (i) We investigated the spatial expression of DA neuron marker genes besides TH, including SLC6A3 (encoding the dopamine transporter), ALDH1A1, and SLC26A7 in the Visium samples (Figure 3-figure supplement 15), which shows that these genes are not strongly expressed within the manually annotated LC regions in the Visium samples (see Figure 2-figure supplement 1).

      (ii) We investigated expression of DA neuron marker genes SLC6A3, ALDH1A1, and SLC26A7 in the snRNA-seq clustering (updated heatmap in Figure 3-figure supplement 8), which shows minimal expression of these genes within the NE neuron cluster (cluster 6).

      (iii) Despite the data above suggesting little expression of markers for DA neurons within the human LC, we wanted to investigate this question more thoroughly with an orthogonal method given that relatively lower coverage in the sequencing approaches may miss expression, particularly for more lowly expressed transcripts. We generated new high-resolution RNAscope smFISH images at 40x magnification for samples from 3 additional donors (Br8689, Br5529, and Br5426) showing expression of NE neuron marker genes (DBH and TH), a 5-HT neuron marker gene (TPH2), and a DA neuron marker gene (SLC6A3) within individual cells within the LC regions in these samples. Expression of SLC6A3 within individual NE neurons (identified by co-expression of DBH and TH) was not apparent in these RNAscope images (Figure 3-figure supplement 16).

      Together with the previous high-magnification RNAscope images showing co-expression of NE neuron marker genes (DBH, TH, and SLC6A2) within individual NE neurons (Figure 3-figure supplement 4), these new results further strengthen the conclusion that the observed TH+ cells we profiled in the LC are NE-producing neurons. In our view, the lack of observed co-expression of TH and SLC6A2 within some individual Visium spots is likely due to sampling variability and relatively lower sequencing coverage in the Visium data, rather than a true lack of co-expression. We have included additional text in the Results and Discussion further discussing this issue.

      Likewise, given the low throughput of RNA scope, and the fact that it was not done in a systematic manner, it does not conclusively identify the cell types in the region. It might be worth a systematic survey of the cells in the region with both NE and DA markers. Otherwise, it is suggested that the authors be more conservative with their annotations.

      As discussed above, we have now generated additional high-magnification RNAscope images for 3 independent donors (Br8689, Br5529, and Br5426), visualizing expression of two NE neuron marker genes (DBH and TH), one 5-HT neuron marker gene (TPH2), and one DA neuron marker gene (SLC6A3, encoding the dopamine transporter) within individual cells within the LC region in each sample (Figure 3-figure supplement 16). Expression of the DA neuron marker gene (SLC6A3) within individual NE neuron cell bodies (identified by co-expression of DBH and TH) was not apparent in these RNAscope images. Together with our previous RNAscope images showing co-expression of DBH, TH, and SLC6A2 within individual cells (Figure 3-figure supplement 4), in our view, these results provide strong evidence that the observed TH+ cells in the LC are NE-producing neurons, and the data do not provide supporting evidence for the existence of DA-synthesizing neurons in the human LC.

      For the manual annotation, it would be useful to include HE tissue images to better understand how the annotations were derived especially because the annotations are not well corroborated by the clustering.

      We have now included the H&E stained histology images for the Visium samples in Figure 2-figure supplement 2A, which can be compared with the previous figures showing the manual annotations for the LC regions (Figure 2-figure supplement 1). The histology images can also be viewed at higher resolution through the Shiny web app (https://libd.shinyapps.io/locus-c_Visium/).

      The unsupervised clustering is certainly contingent on the number of genes detected, which is in turn dependent on the quality of the material and the success of the experiment. It is unclear from the methods whether the samples were pooled for clustering. If they were pooled, the author might consider using only the samples with UMIs > 500. The low UMI may represent free-floating RNA, suggesting issues with tissue permeabilization in turn influencing the ability to confidently associate genes with spots. Sticking with the higher quality sample may improve the ability to perform unsupervised clustering.

      For the spot-level unsupervised clustering using BayesSpace, our aim was to demonstrate whether it is feasible to segment the LC and non-LC regions in the Visium samples in a data-driven manner using a spatial clustering algorithm, instead of relying on manual annotations. We performed clustering across samples (i.e. pooled) -- we have included additional wording in the text and figure caption to clarify this. We agree with the reviewer there may be further optimizations possible, such as filtering out spots or samples with low UMI counts. However, filtering out low-UMI spots may also confound the clustering if low-UMI spots are associated with biological signal (e.g. preferentially located in white matter regions).

      Overall, we found that applying data-driven methods such as BayesSpace to segment the LC and non-LC regions did not perform sufficiently to rely on for our downstream analyses (Figure 2-figure supplement 6), and, in our view, further incremental optimizations were unlikely to reach sufficient performance and robustness, so we chose to rely on the manual annotations instead. In addition, as noted in the Results, this avoids potentially inflated false discoveries due to issues of circularity when performing differential gene expression testing between regions defined by unsupervised clustering on the same sets of genes (Gao et al. 2022). We included the BayesSpace results (Figure 2-figure supplement 6) to provide information and ideas to method developers interested in using this dataset as a test case for further development of spatial clustering algorithms. However, further adapting or optimizing these spatial clustering algorithms ourselves was not within the scope of our current work.

      It is not entirely clear why the authors used FANS, especially with the scored tissue. Do the authors think this could have negatively influenced the capture of the desired cell type since FANS can compromise the integrity of the nuclei? In other words, have the authors considered that this may have resulted in a loss rather than enrichment? The proportion of "NE" neurons in the snRNA-Seq data is less than 2% in all cases and at its lowest in sample 6522 which does not correspond well with the proportion of tissue that was manually annotated as containing NE cells, even when taken into consideration the potential size difference of cells. In the same vein, in some samples, there are more "5-HT" neurons in the region than "NE" according to the numbers.

      As noted in our initial response to reviewers (“Response to Public Review Comments”), we used FANS to enrich for neurons based on our previous success with this approach to identify relatively rare neuronal populations in other brain regions (e.g. nucleus accumbens and amygdala; Tran and Maynard et al. 2021). Based on this previous work, our rationale was that without neuronal enrichment, we could potentially miss the LC-NE population, given the relative scarcity and low absolute number of this neuronal population (e.g. estimates of ~50K total in the entire human LC).

      We do not have a definitive answer to the question of whether our use of FANS to enrich for neurons may have led to damage and contributed to the low recovery rate of LC-NE neurons (as well as the relatively increased levels of mitochondrial contamination compared to other brain regions / preparations in the human brain in our hands). Due to our limited tissue resources for this study, we did not have sufficient tissue to perform a direct comparison with non-sorted data. However, we agree with the reviewer that this is plausible, and warrants further investigation in future work. In particular, the relatively large size and fragility of LC-NE neurons, as well as our use of a standard cell straining approach (70 µm, which may not be ideal for this population), may also be contributing factors.

      Systematically optimizing the preparation to attempt to increase recovery rate (and decrease mitochondrial contamination) are important avenues for future work, and we have decided to share our data and experiences now to assist other groups performing related work. We have included additional wording in the Discussion to further highlight these issues.

      The majority of the snRNA-seq remained unannotated "ambiguous" neurons. It would be highly advantageous to include an annotation for these numerous cells.

      These nuclei were unidentifiable due to ambiguous marker gene expression profiles, i.e. expression of pan-neuronal marker genes without clear expression of either excitatory or inhibitory neuronal marker genes (see Figure 3A and Figure 3-figure supplement 8). Since we were not able to clearly identify these clusters, and due to our additional concerns regarding the data quality (e.g. low recovery rate of the NE neuron population of interest, potential cell damage, and mitochondrial contamination), we decided to label these neuronal clusters as “ambiguous” instead of assigning low-confidence cluster labels. We have included additional wording in the Results section to explain this issue.

      The most likely explanation for identifying serotonergic neurons in these samples is the inclusion of the Raphe Nucleus within the dissection, especially since these cells do not map to the LC per se. As such, is there a way to neuroanatomically define the potential inclusion of this region from these tissue blocks used? Or to the contrary, definitively demonstrate the exclusion of the Raphe?

      As noted in our initial response to reviewers (“Response to Public Review Comments”), our dissection strategy in this initial study precluded the ability to keep track of the exact orientation of the tissue sections on the Visium arrays with respect to their location within the brainstem. Therefore, it is not possible to definitively answer the question of whether the dissections included the raphe nucleus, and if so, which portion of it, based on neuroanatomy from the tissue blocks.

      However, during the course of this study and in parallel, ongoing work for other small, challenging brain regions, we developed a number of specialized technical and logistical strategies for keeping track of orientation and mounting serial sections from the same tissue block onto a single spatial array, which is extremely technically challenging. We are now well-prepared for addressing these issues in future studies, e.g. keeping track of the orientation of the dissections and potential inclusion of adjacent neuroanatomical structures. We have included additional details on this issue in the Discussion.

      Given that one sample (Visium capture area) was excluded as it did not seem to contain a representation of the LC for the profiling of "NE" cells, does it make sense to include this sample in the analysis of 5HT cells given the authors are trying to make claims about the cell composition in and around the LC? Since there appears to be little 5HT contribution from this sample and its inclusion results in inconsistency across experiments and not any notable advantages, the authors might want to reconsider its inclusion in the results.

      We identified a cluster of 5-HT neurons in the snRNA-seq data (Figure 3) and used the Visium samples to further investigate the spatial distribution of this population (Figure 3-figure supplement 9). For the enrichment analyses in the Visium data (Figure 3-figure supplement 9C), we used only the 8 Visium samples that passed quality control (QC). We included the 9th sample (which did not pass QC) in the spot plot visualizations (Figure 3-figure supplement 9A-B) for completeness, but did not base our main conclusions on this sample (in this sample, the tissue resource was likely depleted during earlier sections, so the section for the Visium sample was taken slightly past the extent of the LC within this tissue block). We have included additional wording in the Results section and figure captions to clarify this issue.

      For the RNAscope images, it would be useful to include (draw) the manual annotation of the LC to facilitate interpretation. This is especially useful for demonstrating the separate populations of 5HT and "NE" cells. In general, it would be useful to keep a hashed line perimeter for all sections processed by Visium.

      We have now added a dashed outline indicating the manually annotated LC region in the RNAscope image showing the full tissue section (Figure 3-figure supplement 11). The high-magnification RNAscope images (Figure 3-figure supplement 4, 16, and 17) show regions entirely within the LC regions -- we have included additional wording to note this in the figure captions. For the Visium spot

      plots, we either labeled spots within the annotated regions within the figures or included additional wording in the figure captions to refer to the figures showing the annotations (Figure 2-figure supplement 1).

      The authors state that they successfully mapped the NE neuron population from snRNA-seq to the manually annotated regions on the Visium slides. Based on the color-coded map, these results are not very convincing since the abundance of the given transcript profile is extremely low. Here again, it would help to draw a hashed line perimeter on the slide to denote the manually annotated region. Perhaps the authors could try a different strategy for mapping snRNA signal to the slide? However, it appears that the mapping worked better for the capture areas with higher UMI/genes counts. Perhaps the authors should consider using only the slides with high gene/UMI counts.

      We agree that the performance of these analyses (Figure 3-figure supplement 14) was not clearly described in the previous version of the manuscript. We have rewritten the corresponding paragraph in the Results section to make it more clear that the mapping (spot-level deconvolution) performance was relatively poor overall, and that we did not use these results for further downstream analyses. We did however want to include these results from the cell2location algorithm to provide information and data for method developers on the challenges of these types of analyses in our dataset (e.g. due to the presence of rare populations, relatively subtle differences in expression profiles between neuronal subpopulations, and potential issues due to large nuclei size and high transcriptional activity for NE neurons). While further approaches for these types of analyses exist, and additional optimizations such as subsetting samples or spots with high UMI counts could also be investigated, in our view, these further optimizations lie outside the scope of our current work. We have also added wording in the figure caption to refer to Figure 2-figure supplement 1, which displays the corresponding annotated LC regions per sample.

      It is hard to see if the RNA scope image Supplementary Figure 11 shows co-localization of SLC6A2, TH, and DBH. Having the individual image from each microscope filter along with the merged image is required to properly assess the colocalization of the signals.

      We updated the multi-channel RNAscope images to show both the merged channels and individual channels in separate panels (Figure 3-figure supplement 4, 16, and 17), which makes the visualization more clear. Thank you for this suggestion. (Note that the previous Supplementary Figure 11 has been re-numbered to Figure 3-figure supplement 4.)

      The heatmap showing the level of marker transcripts shows a much lower expression of specific markers, TH, DBH, SLC6A2 in NE vs other clusters looks surprisingly low (particularly TH), while the much broader marker SLC18A2 (monoamine transporter) is considerably more differential. What do the authors make of this finding?

      This is correct. In the snRNA-seq data, we observed that SLC18A2 is one of the most highly differentially expressed (DE) genes in the NE neuron cluster vs. other neuronal clusters, with a high level of expression in the NE neuron cluster (Figure 3C). Note that this heatmap shows the top 70 DE genes (excluding mitochondrial genes) out of the full list of 327 statistically significant DE genes with elevated expression in the NE neuron cluster (the full list of 327 genes is provided in Supplementary File 2C). While all four of these genes (DBH, TH, SLC6A2, and SLC18A2) are identified as statistically significant DE genes, SLC18A2 is the most highly DE out of these and has an especially high level of expression in the NE neuron cluster, as noted by the reviewer (Figure 3C). This could be due to the fact that SLC18A2 transcripts are expressed at higher absolute levels in these neurons than the transcripts that are more specific to LC-NE neurons. While it is true that SLC18A2 is a “broader” marker in the sense that it is found in more cell types -- e.g. cell types within brain nuclei that contain monoaminergic as well as brain nuclei that contain catecholaminergic cells -- expression of SLC18A2 within the LC is highly specific to the catecholaminergic LC-NE neurons given its specialized functional role within monoamine and catecholamine neurons in packaging amine neurotransmitters into synaptic vesicles. We note that SLC18A2 plays a specialized role that is critical to the core function of LC-NE neurons, and hence we are not particularly surprised with this finding and think that one possibility is that this differential expression appears more robustly due to higher absolute levels of the marker.

      While it is understandable that the authors decided to include cells/nuclei with high mitochondrial reads, further work is needed to ensure these cells are of sufficient quality to use in an unbiased way knowing that a high percentage of mitochondrial reads in nuclei sequencing is usually indicative of low-quality nuclei. This can be assessed by evaluating the quality of the nuclei with GWA, which stains an intact nuclear membrane acting as a measure of the integrity of the nuclei.

      To further investigate these results, we added additional analyses evaluating quality control (QC) metrics for the NE neuron cluster in the snRNA-seq data, which had an unusually high proportion of mitochondrial reads (Figure 3-figure supplement 2, shown also below in comments for Reviewer 3) (see also related Figure 3-figure supplement 1, 3, which were included in the manuscript previously). These additional QC analyses do not show any other problematic values for this cluster, other than the high mitochondrial proportion, so we do not believe this is purely a data quality issue. We are aware that this is an unexpected result -- in most cell populations, a high proportion of mitochondrial reads would be indicative of cell damage and poor data quality. However, we have recently also observed high mitochondrial proportions in other relatively rare neuronal populations characterized by large size and high metabolic demand. As discussed below for Reviewer 3, we believe that this is mitochondrial “contamination”, as there should be no mitochondrial reads per se within the nuclear compartment.

      However, it may be possible that in cell populations that have abundant levels of mitochondria and high transcript expression of mitochondrial transcripts in the cell body, that the likelihood of ambient RNA capture of mitochondrial transcripts during nuclear preparation may be higher than for other cell types that have lower expression of mitochondrial transcripts. Hence, we believe that our interpretation is likely correct, i.e. that a combination of technical and biological factors contributes to the inclusion of a relatively high amount of mitochondrial RNA within the droplets for these nuclei. We agree with the reviewer that this finding warrants further investigation in future work. However, in our current study, the tissue resource is depleted for any further experimental validation of this question, so we preferred to provide our data to the community in its current form, while transparently noting this unexpected finding in our results. We have included additional text in the Results section describing the new QC analyses shown in Figure 3-figure supplement 2.

      Minor comments:

      Line 319-321 could be written more clearly to indicate that due to the lack of resolution in a given spot, there are "contaminating reads" that reduce the precision of the cell profile. This reduced precision is likely what results in the "lack of conservation" across species.

      We have added additional wording to this sentence to clarify this point.

      In the discussion, the authors write that the analyses "unbiasedly identified a number of genes enriched in human LC", however, given the manual annotation of the region for each capture area, this resulted in a biased assessment of the spots.

      We have replaced this wording to refer to “untargeted, transcriptome-wide” analyses (i.e. analyses that are not based on a targeted panel of genes) instead of “unbiased”. We agree that the meaning of “unbiased” is ambiguous in this context.

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      Overall, the discovery of some cells in the LC region that express serotonergic markers is intriguing. However, no evidence is presented that these neurons actually produce 5-HT. Perhaps more conservative language would be appropriate (i.e. "cells that possess mRNA signatures of serotonergic neurons" or something like that). Did these cells co-express other markers one would expect in 5-HT neurons like 5-HT autoreceptors and SLC6A18? Also would be useful to compare expression profiles of these putative 5-HT neurons with any published material on bona fide dorsal raphe 5-HT neurons. For the RNAscope confirmation in the supplementary material, it would be helpful to show each marker separately as well as the overlay, and to include representative higher magnification images like were provided for the ACH markers.

      Thank you for this comment. In order to further investigate the identity of these cells, we have investigated the expression of several additional genes including SLC6A18, 5-HT autoreceptor genes (HTR1A, HTR1B), marker genes for 5-HT neurons (SLC18A2, FEV), and marker genes for 5-HT neuronal subpopulations within the dorsal and median raphe nuclei from the literature (Ren et al. 2019), in both the Visium and the snRNA-seq data.

      We observed some expression of SLC18A2 and FEV within the same areas as SLC6A4 and TPH2 in the Visium samples (Figure 3-figure supplement 10A-B, reproduced below; note that SLC18A2 is also a marker gene for NE neurons located within the LC regions), consistent with Ren et al. (2019). However, we did not observe a strong or consistent expression signal for the 5-HT autoreceptors (HTR1A, HTR1B) (Figure 3-figure supplement 10C-D, reproduced below), and we observed zero expression of SLC6A18 in the Visium samples. In the snRNA-seq data, within the cluster identified as 5-HT neurons, we observed some expression of SLC18A2, low expression of FEV, and almost zero expression of SLC6A18 (Figure 3-figure supplement 8, reproduced below; note that SLC6A18 is not shown since it was removed during filtering for low-expressed genes). Similarly, we observed very low expression of the 5-HT autoreceptors (HTR1A, HTR1B) and the additional marker genes for 5-HT neuronal subpopulations from Ren et al. (2019) -- with the possible exception of the neuropeptide receptor gene HCRTR2, which was identified by Ren et al. (2019) within several clusters in both the dorsal and median raphe in mice (Figure 3-figure supplement 8, reproduced below).

      Overall, these additional results give us some further confidence that these are likely 5-HT neurons (due to expression of SLC18A2 and FEV), while also raising further questions (due to the absence of 5-HT autoreceptor genes HTR1A, HTR1B and 5-HT neuronal subpopulation marker genes). While we believe that the most likely explanation is the inclusion of 5-HT neurons from the edges of the adjacent dorsal raphe nuclei in our samples, we acknowledge that the evidence presented is not fully conclusive and does not identify specific subpopulations of 5-HT neurons. In addition, the limited size of our dataset (number of samples and cells) and the lack of information on sample orientation precludes any definitive identification of subpopulations based on their association with specific anatomical regions within the dorsal raphe nuclei. We have updated the manuscript by (i) adjusting our language in the Results and Discussion, (ii) including the additional analyses, supplementary figures, and reference to the literature (Ren et al. 2019) discussed above, and (iii) including additional wording in the Discussion on improvements to the dissection strategy that would allow these questions to be addressed in future studies via a focused molecular profiling of the dorsal raphe nuclei across the rostral-caudal axis.

      Regarding the RNAscope images, we have included additional images showing channels side-by-side and higher magnification, as suggested (and also discussed above for Reviewers 1 and 2). In addition, we have added an outline highlighting the LC region in Figure 3-figure supplement 11 (as suggested above by Reviewer 2), and included an additional high-magnification RNAscope image demonstrating co-expression of 5-HT neuron marker genes (TPH2 and SLC6A4) within individual cells (Figure 3-figure supplement 12).

      Concerning the snRNA-seq experiments, why were only 3 of the 5 donors used, particularly given the low number of LC-NE nuclear transcriptomes obtained? How were the 3 donors chosen from the 5 total donors and how many 100 um sections were used from each donor? Are the 295 nuclei obtained truly representative of the LC population or are they just the most resilient LC nuclei? How many LC nuclei would be estimated to be captured from staining the 100 um tissue sections?

      As discussed in our previous response to reviewers (“Response to Public Review Comments”), the reason we included only 3 of the 5 donors for the snRNA-seq assays was due to tissue availability on the tissue blocks. In this study, we were working with a finite tissue resource. Due to the logistics and thickness of the required tissue sections for Visium (10 μm) and snRNA-seq (100 μm), running Visium first allowed us to ensure that we could collect data from both assays -- if we ran snRNA-seq first and captured no neurons, the tissue block would be depleted. Due to resource depletion, we did not have sufficient available tissue remaining on all tissue blocks to run the snRNA-seq assay for all donors. We have conducted extensive piloting in other brain regions on the amount (mg) of tissue that is needed from various sized cryosections, and the LC is particularly difficult since these are small tissue blocks and the extent of the structure is small. Hence, in some of the subjects, we did not have sufficient tissue available for the snRNA-seq assay.

      We have included details on the number of 100 μm sections used for each donor in Methods -- this varied between 10-15 sections per donor, approximating 50-80 mg of tissue per donor.

      Regarding the question about the representativeness / resilience of the LC nuclei -- as discussed in our previous response to reviewers (“Response to Public Review Comments”) and above for Reviewer 2, we agree that this is a concern. As discussed above for Reviewer 2, it is plausible that our use of FANS may have contributed to cell damage and the low recovery rate of LC-NE neurons. The relatively large size and fragility of LC-NE neurons, as well as our use of a standard cell straining approach (70 µm, which may not be ideal for this population), may also be contributing factors. Due to our limited tissue resource, we did not have sufficient tissue to perform a direct comparison with non-sorted data.

      Systematically optimizing the preparation to attempt to increase recovery rate is an important avenue for future work. We have included additional discussion of this issue in the Discussion.

      Regarding the question about the number of expected nuclei, we have now included estimates of the number of cells per spot within the LC regions in the Visium data (see also related point below, and Figure 2-figure supplement 2B reproduced below), based on the H&E stained histology images and use of cell segmentation software (VistoSeg; Tippani et al. 2022). While we do not have any confident estimates of the number of expected nuclei in the snRNA-seq data, these estimates of cell density from the Visium data could, together with information on additional factors such as the accuracy of the tissue scoring and the effectiveness of FANS, be used to help derive an an expected number of nuclei in future studies. We have included additional wording in the Discussion to note that these estimates could be used in this manner during future studies.

      The LC displays rostral/caudal and dorsal/ventral differences, including where they project, which functions they regulate, and which parts are vulnerable in neurodegenerative disease (e.g. Loughlin et al., Neuroscience 18:291-306, 1986; Dahl et al., Nat Hum Behav 3:1203-14, 2019; Beardmore et al., J Alzheimer's Dis 83:5-22, 2021; Gilvesy et al., Acta Neuropathol 144:651-76, 2022; Madelung et al., Mov Disord 37:479-89, 2022). Which part(s) of the LC was captured for the SRT and snRNAseq experiments?

      As discussed in our previous response to reviewers (“Response to Public Review Comments”), a limitation of this study was that we did not record the orientation of the anatomy of the tissue sections, precluding our ability to annotate the tissue sections with the rostral/caudal and dorsal/ventral axis labels. We agree with the reviewer that additional spatial studies, in future work, could offer needed and important information about expression profiles across the spatial axes (rostral/caudal, ventral/dorsal) of the LC. Our study provides us with insight about optimizing the dissections for spatial assays, as well as bringing to light a number of technical and logistical issues that we had not initially foreseen. For example, during the course of this study and parallel, ongoing work in other, small, challenging regions, we have now developed a number of specialized technical and logistical strategies for keeping track of orientation and mounting serial sections from the same tissue block onto a single spatial array, which is extremely technically challenging. We are now well-prepared for addressing these issues in future studies with larger numbers of donors and samples in order to make these types of insights. We have included additional details in the Discussion to further discuss this point.

      The authors mention that in other human SRT studies, there are typically between 1-10 cells per expression spot. I imagine that this depends heavily on the part of the brain being studied and neuronal density. In this specific case, can the authors estimate how many LC cells were contained in each expression spot?

      We have now performed additional analyses to provide an estimate of the number of cells per spot in the Visium data (Figure 2-figure supplement 2B), based on the application of cell segmentation software (VistoSeg; Tippani et al. 2022) to identify cell bodies in the H&E stained histology images. We applied this methodology and calculated summary statistics within the annotated LC regions for 6 samples (see Methods), and found that the median number of cells per spot within the LC regions ranged from 2 to 5 per sample. We note that these estimates include both NE neurons and other cell types within the LC regions, and that applying cell segmentation software in this brain region is particularly challenging due to the wide range in cell body sizes, with NE neurons being especially large. We have included these updated estimates in the Results and Discussion, and additional details in Methods.

      Regarding comparison of human LC-associated genes with rat or mouse LC-associated genes (Fig. 2D-F), the authors speculate that the modest degree of overlap may be due to species differences between rodent and human and/or methodological differences (SRT vs microarray vs TRAP). Was there greater overlap between mouse and rat than between mouse/rat and human? If so, that is evidence for the former. If not, that is evidence for the latter. Also would be useful for more in-depth comparison with snRNA-seq data from mouse LC. https://www.biorxiv.org/content/10.1101/2022.06.30.498327v1

      Our comparisons with the mouse (Mulvey et al. 2018) and rat (Grimm et al. 2004) data showed that we observed a relatively higher overlap between the human vs. mouse data than the human vs. rat data (Figures 2F-G and 3D-E). However, we note that the substantially different technologies used (TRAP-seq in mouse vs. laser capture microdissection and microarrays in rat) make it difficult to confidently interpret the degree of overlap between the two studies, and a direct comparison of these alternative platforms (TRAP-seq vs. LCM / microarray) or species (mouse vs. rat) lies outside the scope of our study. We have included updated wording in the Results and Discussion to explain this issue and help interpret these results.

      Regarding the newer mouse study using snRNA-seq (Luskin and Li et al. 2022), we have extended our analyses to perform a more in-depth comparison with this study. Specifically, we have evaluated the expression of an additional set of GABAergic neuron marker genes from this study within our secondary clustering of inhibitory neurons in the snRNA-seq data (Figure 3-figure supplement 13B). We observe some evidence of cluster-specific expression of several genes, including CCK, PCSK1, PCSK2, PCSK1N, PENK, PNOC, SST, and TAC1. We have also included additional text describing these results in the Results section.

      The finding of ACHE expression in LC neurons is intriguing. Susan Greenfield has published a series of papers suggesting that ACHE has functions independent of ACH metabolism that contributes to cellular vulnerability in neurodegenerative disease. This might be worth mentioning.

      We thank the reviewer for pointing this out. We were very surprised too by the observed expression of SLC5A7 and ACHE in the LC regions (Visium data) and within the LC-NE neuron cluster (snRNA-seq data), coupled with absence of other typical cholinergic marker genes (e.g. CHAT, SLC18A3), and we do not have a compelling explanation or theory for this. Hence, the work of Susan Greenfield and colleagues suggesting non-cholinergic actions of ACHE, particularly in other catecholaminergic neuron populations (e.g. dopaminergic neurons in the substantia nigra) is very interesting. We have included references to this work and how it could inform interpretation of this expression (Greenfield 1991; Halliday and Greenfield 2012) in the Discussion.

      High mitochondrial reads from snRNA-seq can indicate lower quality. Can the authors comment on this and explain why they are confident in the snRNA-seq data from presumptive LC-NE neurons?

      As mentioned above for Reviewer 2, we have included additional analyses to further compare quality control (QC) metrics for the NE neuron cluster (which had an unusually high proportion of mitochondrial reads) against other neuronal and non-neuronal clusters and nuclei in the snRNA-seq data (Figure 3-figure supplement 2). These additional QC analyses do not show any other problematic values for this cluster. Specifically, we show that the QC metric values for sum UMIs and detected genes per droplet for the NE neuron cluster fall within the range for (A) other neurons and (B) all other nuclei (excluding droplets with ambiguous / unidentifiable neuronal signatures). In addition, we observe that the droplets with the highest mitochondrial percentages (>75%) (C-D), which also have unusually low number of detected genes (D), tend to be from the ambiguous category (droplets with ambiguous / unidentifiable neuronal signatures), suggesting that true low-quality droplets are correctly identified and included within the ambiguous category (e.g. consisting of a mixture of debris from partial damaged nuclei) instead of as NE neurons. Since our QC analyses for the NE neuron cluster do not show any problems other than the high mitochondrial percentage, we do not believe these are simply mis-classified low-quality droplets. We also note that we have recently observed high mitochondrial proportions in other relatively rare neuronal populations characterized by large size and high metabolic demand in human data. We believe that our interpretation is correct -- i.e. that a combination of technical and biological factors has led to the inclusion of a relatively high amount of mitochondrial RNA within the droplets for these nuclei. We have included these additional QC analyses (Figure 3-figure supplement 2) and further discussion of this issue in the Results section.

      The Discussion could be expanded. Because there is a lot known and/or assumed about the LC, discussing all of it is certainly beyond the scope of this manuscript. However, perhaps the authors could pick a few more for confirmation and hypothesis generation. For example, one of the most well studied and important aspects of the LC is its regulation by neuromodulatory inputs. It would be interesting for the authors to discuss the expression of receptors for CRF, cannabinoids, orexin, galanin, 5-HT, etc, particularly when compared with the available rodent TRAP and snRNA-seq data (https://www.biorxiv.org/content/10.1101/2022.06.30.498327v1) contained some surprises, such as very low expression of CRF1 in LC-NE neurons, suggesting that the powerful activation of LC cells by CRF is indirect. Does this hold up in humans?

      We have expanded the Discussion to include additional discussion and references on several points, as discussed also above. Indeed these are interesting questions and these neuromodulatory systems are all of interest in the context of signaling within the LC in terms of function of the LC-NE system. We note that the manuscript serves primarily as a data resource and will be useful in many different ways depending on the different goals and interests of the readers. This is precisely why we wanted to take the time to make accessible and easy to use tools to interrogate and visualize the data. We have provided screenshots in Author response image 1-4 from the Shiny visualization app for the Visium data (https://libd.shinyapps.io/locus-c_Visium/) querying several main receptors of the neuromodulatory systems that this reviewer is particularly interested in to illustrate how the visualization apps can readily be used to query specific genes and systems of interest.

      Author response image 1.

      CRHR1:

      Author response image 2.

      CNR1:

      Author response image 3.

      OXR1:

      Author response image 4.

      GALR1:

      Minor points:

      Line 46 add stress responses to the key functions of LC neurons

      We have added this point and included additional references to support the findings.

      Line 47 add that the LC was so named "blue spot" because of its signature production of neuromelanin pigment

      We have added this point.

      Line 49 LC's capacity to synthesize NE is not "unique" - several other brainstem/medullary nuclei also synthesize NE (e.g. A1-A7; LC is A6)

      We have updated this wording.

      Line 54 Although prior evidence indicated age-related LC cell loss in people without frank neurodegenerative disease, recent studies that are better powered and used unbiased stereological methods have refuted the idea that LC neurons die during normal aging (reviewed in Matchett et al., Acta Neuropathologica 141:631-50, 2021)

      We have updated this part of the Introduction to focus on cell loss in the LC in neurodegenerative disease and removed the older references describing studies that suggested LC neurons die in normal aging.

      Line 62 Would also be worth mentioning the role of the LC in other mood disorders where adrenergic drugs are often prescribed, such as PTSD (e.g. prazosin), opioid withdrawal (e.g. lofexidine), anxiety and depression (e.g. NE reuptake inhibitors).

      We have added additional references to these disorders and their treatment with noradrenergic drugs in the Introduction.

      Additional updates from Public Review Comments:

      We have also included the following updates, in response to additional reviewer comments received during the initial round of “Public Review Comments” and which are not already described in the responses to the “Recommendations for the Authors” above.

      ● We included updated wording in the Results section and Figure 1C caption to more clearly describe the number of donors included in the final SRT and snRNA-seq data used for analyses after all quality control (QC) steps (4 donors for SRT data, 3 donors for snRNA-seq data).

      ● Figure 3-figure supplement 1D (number of nuclei per cluster in unsupervised clustering of snRNA-seq data) has been updated to show percentages of nuclei per cluster.

      ● We have added comparisons between the lists of differentially expressed (DE) genes identified in the Visium and snRNA-seq data. To make these sets comparable, we have added (i) snRNA-seq DE testing results between the NE neuron cluster and all other clusters (instead of other neuronal clusters only, as shown in the main results in Figure 3) (excluding ambiguous neuronal) (Figure 3-figure supplement 6 and Supplementary File 2D), and (ii) calculated overlaps and comparisons between the sets of DE genes between the Visium data (pseudobulked LC vs. non-LC regions) and the snRNA-seq data (NE neuron cluster vs. all other clusters excluding ambiguous neuronal). This comparison generated a list of 51 genes that were identified as statistically significant DE genes (FDR < 0.05 and FC > 2) in both the Visium and the snRNA-seq data (Figure 3-figure supplement 7 and Supplementary File 2E).

      Other additional updates:

      We have added an additional data repository (Globus). Raw data files (FASTQ sequencing data files and high-resolution TIF image files) are now available via Globus from the WeberDivecha2023_locus_coeruleus data collection from the jhpce#globus01 Globus endpoint, which is also listed at http://research.libd.org/globus/. The Globus repository is not publicly accessible due to individually identifiable donor genetic variants in the FASTQ files. Approved users may request access from the corresponding authors. This data repository is listed in the Data Availability section.

    1. Author Response

      The following is the authors’ response to the current reviews.

      I greatly appreciate your time and attention on our manuscript. I have carefully considered the reviewers’ comments and made modifications. Below are my responses to each comment and the revisions I have made.

      Reviewer #2 (Recommendations for The Authors):

      1) The authors address well with most of my concerns. I am fine with most of the responses except question 8. Actin is also reported to be located in nuclear (PMID: 31481797). It would be better to utlize other markers, like GAPDH. Moreover, the author did not address the issue of LXRa. I strongly suggest that the authors repeat this experiment to get a more solid result.

      Thank you for the comment! Actin is frequently used as a negative control for nucleus protein in many publications, such as DOI:10.1038/s41419-018-0428-x. Beta-actin is rich in cytoplasm protein that it only takes few seconds to reveal the strong band when performing western blot with cytoplasm. However, actin does not reveal when exposing western- blot with nucleus for minutes in many studies, including in this study. Even though as mentioned actin is also located in the nuclear, such a tiny amount in the nucleus may not be revealed in western blot with exposure in seconds. However, if nucleus protein is contaminated with total cell lysate, the action is quite easy to reveal. As a result, the use of actin as the nagtive control of nucleus protein is well-accepted.

      Author response image 1.

      2) In addition, the authors mentioned IL-1b but present IL-6 in the figure of Figure. 2F. Please correct.

      We appreciate your attention on the detail. “IL-1b” is corrected to “IL-6”.


      The following is the authors’ response to the original reviews.

      I greatly appreciate the time you and the reviewers have taken to review my paper and provide detailed feedback and suggestions. I have carefully considered the reviewers’ comments and made thorough modifications to the paper. Below are my responses to each comment and the revisions I have made.

      Reviewer #1 (Recommendations for The Authors):

      Although the paper has strengths in understanding better the pathway of activation leading to polarization, the mechanisms contributing to cytokine storm are weak. In the context of cellular in vitro changes, it would be very interesting to map these molecular changes to strengthen the pathways affected in this model. In vivo, stronger evidence is required to bridge the gap between the in vitro model and mechanisms regulating in vivo disease development. Reporting of experiments needs to be considerably strengthened. Individual data points are shown, however, it is unclear whether these represent biological or technical, or how many experiments have been undertaken. The addition of this information is essential for uznderstanding the robustness and repeatability of findings. Currently, these cannot be assessed from the information provided. Furthermore, it is unclear whether the error bars represent s.e.m or s.d. which greatly impacts data interpretation.

      Answer: thank you for the valuable comments! We have added some in vivo experiments to strengthen the bridge between the in vitro and in vivo model. 1) The depletion of macrophage by clodronate-liposomes (CLL) i.v. injection was performed in endotoxemic mice with leucine. The alleviation of LPS-induced cytokine production by leucine was muted with macrophage depletion (Figure 2E, F), suggesting the anti-inflammatory effect of leucine was exerted via the regulation of macrophage. 2) The LXRα inhibitor, GSK2033, was applied to mice via i.v. injection prior to LPS-challenge. In GSK2033 treated mice, the effects of leucine on the serum levels of inflammatory cytokines were neutralized (Supplementary Figure 4), partially indicating the importance of LXRα in the regulation of cytokine release. We acknowledge the limitation of LXRα inhibition by GSK2033 in this study. In our future study, we plan to use monocyte specific LXRα knockout mice by LysM-cre to elucidate the importance of LXRα in the progression of CSS, and specifically focuse on the molecular mechanism how mTORC1 interacts with LXRα to modulate M2 macrophage polarization. Additionally, we made modifications in the manuscript to clarify that the error bars represented as the standard error of the mean (SEM) (line 416).

      Reviewer #2 (Recommendations for The Authors):

      1. The whole manuscript is based on the 2% leucine from feed and 5% leucine from water. Is there any rationale for using these two types of different concentrations in this study? Often, a dose-dependent treatment is utilized in vivo in pharmacological study. Therefore, the authors should at least test two different concentrations in each type to confirm the conclusion.

      Answer: thank you for your comment and suggestion. The 2% leucine in feed and 5% leucine in water in this study were based on the literatures. In those studies, leucine was reported to activate mTORC1 and regulate metabolism at such types of different concentration as shown below, although there is lack of leucine in the regulation of macrophage activation. In this study, we found leucine supplementation in such types significantly increased the average body weight gain of mice, suggesting growth promoting and no toxicity of leucine on mice.

      (1) Jiang X, Zhang Y, Hu W, Liang Y, Zheng L, Zheng J, Wang B, Guo X. 2021. Different Effects of Leucine Supplementation and/or Exercise on Systemic Insulin Sensitivity in Mice. Front Endocrinol (Lausanne) 12:651303. doi:10.3389/fendo.2021.651303

      (2) Holler M, Grottke A, Mueck K, Manes J, Jücker M, Rodemann HP, Toulany M. 2016. Dual Targeting of Akt and mTORC1 Impairs Repair of DNA Double-Strand Breaks and Increases Radiation Sensitivity of Human Tumor Cells. PLoS One 11: e0154745. doi:10.1371/ journal. pone.0154745

      1. The authors focus on macrophage polarization as the major cellular event affected by leucine treatment; however, they also report that the proportion of multiple immune cell types has been suppressed by leucine treatment. As some of these immune cells can also produce inflammatory cytokines, the authors should confirm the anti-inflammatory effects of leucine were mainly mediated by modulating macrophage polarization as they suggested in the manuscript. For example, the authors could utilize Anti-CSF1 or clodronate to deplete macrophage and observed whether leucine-reduced inflammatory cytokines production was largely diminished.

      Answer: thank you for your valuable suggestion! We used clodronate-liposome (CLL) i.v. injection to deplete macrophages to further validate the specific contribution of macrophage polarization to the anti-inflammatory effects of leucine. The results revealed that clodronate treatment decreased blood monocyte counts and eliminated the effect of leucine in lowering serum inflammatory factors IL-6, IFN-γ and TNF-α (Figure 2E-F), suggesting the importance of leucine-mediacted macrophage activation on the anti-inflammation.

      1. It would be important to examine whether 10 mM leucine would exhibit cytotoxicity to bone marrow derived monocytes/macrophages. This would confirm that leucine treatment directly suppresses inflammatory cytokines production or reduces cell viability to indirectly modulates inflammatory responses.

      Answer: thank you for your valuable suggestion! We performed cell viability assays after treating BMDM with 2 mM and 10 mM leucine for 6h or 24h (consistent with the timing of leucine treatment in article). The results showed that at 6h, 2 mM leucine significantly increased cell viability, while 10 mM leucine had no significant effect on cell viability. At 24h, both 2 mM and 10 mM leucine significantly increased cell viability. In conclusion, 2 mM and 10 mM leucine were not cytotoxic to BMDM, and the anti-inflammatory effect of leucine was not derived from the reduction in cell viability (Supplementary Figure 2).

      1. The authors found that leucine promotes mTORC1-LXRα for arginase-1 transcription and M2 polarization. The pathway the authors elucidated is not surprising, which has already been reported in other studies. What about the other M2 markers? The authors could examine whether arginiase-1 deficiency would deplete leucine-increased other M2 marker genes expression. Moreover, what about the molecular mechanism for leucine-reduced M1 polarization?

      Answer: Thank you for the valuable comments! To clarify that Arginase-1 activity, mRNA expression of Fizz1, Mgl1, Mgl2, and Ym1 were well established markers for M2 macrophage. Specifically, Arginase-1 activity is important to define M2 functionality. These markers were used to define the level of M2 macrophage polarization. Only a few studies indicated the involvement of mTORC1 in the M2 polarization as shown below; however, there is no molecular mechanism about how mTORC1 modulates this process. In this study, we provide the evidence that LXRα mediated the mTORC1 associated M2 polarization, and leucine regulated mTORC1-LXRα to promote M2 polarization, which was in dependent of IL-4-induced STAT6 signaling. In our future study, we are focusing on the molecular mechanism how mTORC1 interacts with LXRα to modulate M2 macrophage polarization.

      (1) Byles V, Covarrubias AJ, Ben-Sahra I, Lamming DW, Sabatini DM, Manning BD, Horng T. 2013. The TSC-mTOR pathway regulates macrophage polarization. Nat Commun 4:2834. doi:10.1038/ncomms3834

      (2) Kimura T, Nada S, Takegahara N, Okuno T, Nojima S, Kang S, Ito D, Morimoto K, Hosokawa T, Hayama Y, Mitsui Y, Sakurai N, Sarashina-Kida H, Nishide M, Maeda Y, Takamatsu H, Okuzaki D, Yamada M, Okada M, Kumanogoh A. 2016. Polarization of M2 macrophages requires Lamtor1 that integrates cytokine and amino-acid signals. Nat Commun 7:13130. doi:10.1038/ncomms13130

      1. In Fig. 1A, what's the P-value among these two groups? Moreover, what about the result with combination treatment as the authors performed in other panels?

      Answer: thank you for the valuable comments from the reviewer! In Figure 1A, the P-value between the LPS and LPS+2% Leucine groups is 0.0031, and the P-value between the LPS and LPS+5% Leucine groups is 0.0009. I have marked the significance in Figure 1A accordingly. Due to the limited number of mice, we only treated mice in two different ways respectively. Initially, we performed survival experiment and observed that the addition of leucine prolonged survive of mice at lethal dose. Based on these findings, we further investigated whether a combination of the two methods would yield better results on the regulation of inflammation, but the combination exhibited the similar effect on cytokines production, and it is not necessary to repeat the survival experiment with the combination.

      1. It seems not much difference could be observed between 2% leucine from feed and 5% leucine from water in the expression of inflammatory genes and anti-inflammation-related markers. However, it seems that 5% leucine from water would exhibit a better survival rate than 2% leucine from feed. The authors should explain potential reasons and at least examine it in vitro.

      Answer: we appreciate the valuable comments from the reviewer! There are two possible reasons: 1) When lethal dose of LPS applied, mice were too weak to eat but still drank a small amount of water; 2) the absorption of leucine from the water were much easier than from the feed, thus leucine from the water exhibited much better efficiency in a short period of survival experiment. On the other hand, the cytokine levels and expressions were measure in non-lethal experiments, in which mice were in much better condition for lecine absorption.

      1. In Fig. 4A, the authors examined the expression of p-mTOR. The authors should further examine the expression of p-AKT (S473, T308) and p-S6 to clarify whether mTORC1 or mTORC2 has been modulated. As reported, leucine should act on GATOR2 for mTORC1 activation. However, the authors reported that Torin, a mTORC1/mTORC2 inhibitor, inhibited M2 polarization more significantly compared to rapamycin, a mTORC1 inhibitor. These observations seem to indicate that leucine has other targets except mTORC1, such as mTORC2, which might raise novel mechanisms that have never been reported before.

      Answer: thank you for the valuable comments! Akt-mTORC1 signaling integrates metabolic inputs to control macrophage activation. Wortamannin inhibition of AKT was followed by inhibition of M2 polarization, suggesting that AKT signaling is involved in M2 polarization. Studies reported that mTORC1 activation inhibits pAkt (T308), inhibition of mTORC1 in turn activate Akt (1), promoting M2 polarization as a feed back to compensate the inhibition of mTORC1 induced suppression of M2 polarization. mTORC2, directly phosphrlate Akt at S473, and inhibition of mTORC2 inhibits p-Akt (S473) (2), further inhibiting M2 porlarization. Torin1 is the inhibitor for both, while rapamycin is specially for mTORC1 (3). The explanation was included in Line 252-262

      (1) Leontieva OV, Demidenko ZN, Blagosklonny MV. 2014. Rapamycin reverses insulin resistance (IR) in high-glucose medium without causing IR in normoglycemic medium. Cell Death Dis 5: e1214. doi:10.1038/cddis.2014. 178Byles.

      (2) Holler M, Grottke A, Mueck K, Manes J, Jücker M, Rodemann HP, Toulany M. 2016. Dual Targeting of Akt and mTORC1 Impairs Repair of DNA Double-Strand Breaks and Increases Radiation Sensitivity of Human Tumor Cells. PLoS One 11: e0154745. doi:10.1371/journal. pone .0154745

      (3) V, Covarrubias AJ, Ben-Sahra I, Lamming DW, Sabatini DM, Manning BD, Horng T. 2013. The TSC-mTOR pathway regulates macrophage polarization. Nat Commun 4:2834. doi:10.1038/ncomms3834.

      1. In Fig.5B, frankly speaking, I do not observe much difference in LXRα expression. Also, the actin band is too poor to get any conclusion.

      Answer: thank you for the valuable comments from the reviewer! In Fig. 5B, the extracted protein is specifically mentioned as nuclear protein in the text. It is stated that actin is expressed in the cytoplasm, while histone is expressed in the nucleus. The figure shows that actin expression is almost absent, which is mentioned to demonstrate the purity of the extracted nuclear protein.

      1. In Fig. 5C and 5D, it is amazing that GSK2033 would reduce urea production even largely greater than the basal condition (lane 1). As GSK2033 normalized IL-4 or IL-4 combination with Leucine raised urea production in cells, how GSK2033 could reduce urea in medium. The authors should explain this discrepancy.

      Answer: thank you for the valuable comments from the reviewer! In Fig. 5C, urea production was measured directly in the culture medium using a commercial assay kit, and GSK2033 indeed led to a significant decrease in urea production. In Fig. 5D, on the other hand, we assessed the activity of arginase-1 by lysing the cells, activating arginase-1, providing the substrate arginine, and then measuring urea production. In response to your question, the explanation is that in the assay measuring arginase-1 activity, we supplied a sufficient amount of substrate arginine, which may better reflect the enzyme’s activity and the results were consistent with our expectations. Additionally, when GSK2033 was used in combination with IL-4 or IL-4 plus leucine, it might interact with the IL-4 signaling pathway or leucine metabolism pathway, leading to an increase in urea production. This is just our preliminary explanation for the contradictory results, and we acknowledge that further research is needed to explore the mechanism of action of GSK2033 and its interactions with IL-4 or leucine.

      1. Line 98, "INF-gamma" should be IFN-gamma.

      Answer: We appreciate your attention to detail. We apologize for the error in line 98, where “INF-gamma” should indeed be corrected to “IFN-gamma (IFN-γ).” We will make the necessary correction in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tamoxifen resistance is a common problem in partially ER-positive patients undergoing endocrine therapy, and this manuscript has important research significance as it is based on clinical practical issues. The manuscript discovered that the absence of FRMD8 in breast epithelial cells can promote the progression of breast cancer, thus proposing the hypothesis that FRMD8 affects tamoxifen resistance and validating this hypothesis through a series of experiments. The manuscript has a certain theoretical reference value.

      Strengths:

      At present, research on the role of FRMD8 in breast cancer is very limited. This manuscript leverages the MMTV-Cre+;Frmd8fl/fl;PyMT mouse model to study the role of FRMD8 in tamoxifen resistance, and single-cell sequencing technology discovered the interaction between FRMD8 and ESR1. At the mechanistic level, this manuscript has demonstrated two ways in which FRMD8 affects ERα, providing some new insights into the development of ER-positive breast cancer in patients who are resistant to tamoxifen.

      Weaknesses:

      This manuscript repeatedly emphasizes the role of FRMD8/FOXO3A in tamoxifen resistance in ER-positive breast cancer, but the specific mechanisms have not yet been fully elucidated. Whether FRMD8 can become a biomarker should be verified in large clinical samples or clinical data.

      We appreciate your recognition and valuable suggestions. The proliferation of ERα-positive breast cancer cells is contingent upon the expression of ERα. Tamoxifen, a selective estrogen receptor modulator, competitively binds to ERα, thereby inhibiting the activation of the proliferation signaling pathway. Previous studies have demonstrated that the downregulation of ERα expression results in a reduction in the sensitivity of breast cancer cells to tamoxifen (PMID: 15894097; PMID: 922747). Our study revealed the molecular mechanism by which FRMD8 regulates ERα expression through FOXO3A and UBE3A, and thus FRMD8 deficiency is a cause of tamoxifen treatment resistance. 

      In this study, our results showed that low expression of FRMD8 predicts poor prognosis in breast cancer patients. We agree with this reviewer and will validate the role of FRMD8 in more patient samples and expand its application in different cancer types.

      Reviewer #2 (Public review):

      Summary:

      The manuscript presents a valuable finding on the impact of FRMD8 loss on tumor progression and the resistance to tamoxifen therapy. The author conducted systematic experiments to explore the role of FRMD8 in breast cancer and its potential regulatory mechanisms, confirming that FRMD8 could serve as a potential target to revere tamoxifen resistance.

      Strengths:

      The majority of the research is logically clear, smooth, and persuasive.

      Weaknesses:

      Some research in the article lacks depth and some sentences are poorly organized.

      Thank you for your helpful suggestion. We have carefully revised the manuscript again. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      This manuscript suggests that the resistance of tamoxifen in breast cancer is linked to the loss of function of FRMD8. This is a relatively good and valuable contribution. However, there are several points that confused me.

      (1) The subfigures with important conclusions should include quantitative analysis, for example, Figure 4D, 4E, and 6A. In Figure 6F, which subtypes of normal and tumor tissues were investigated.

      Thank you for your helpful suggestions. We have quantified the bands in Figure 4D, 4E, and 6A and labelled them in the figures. 

      We have also provided details of the tumor samples in Table S3 and the “Materials and Methods” section. The majority of tumor tissues are invasive ductal carcinomas.

      (2) In the luminal epithelium-specific Frmd8 knockout mice (MMTV-Cre+; Frmd8fl/fl), the authors demonstrated that the loss of FRMD8 promotes the growth of breast tumors. In Figure 3A, the expression of ERα and PR in tumors is nearly negative. However, why was the validation of the mechanism performed in breast tumor cell lines and not in epithelial cells?

      Thanks for the question. Early-stage mammary tumors in MMTV-PyMT mice express ERα, while ERα is negative in advanced tumors of MMTV-PyMT mice. Figure 3A shows the results of tumors from four-month-old mice. Meanwhile, our supplementary results showed that loss of Frmd8 decreased ERα expression also in normal and atypical hyperplasia mammary tissues from 7-week-old MMTV-PyMT mice, when the mice had no palpable tumors and ERα is positive (Fig. S3E). We believe that the absence of FRMD8 contributes to the acceleration of the malignant progression during the dynamic evolution of breast cancer. Limited by the difficulty of transfection in breast normal epithelial cell line (MCF10A), we explored the subsequent mechanisms mainly in breast cancer cells and HEK293, a human embryonic kidney cell line. Besides, Figure S3E also showed the regulation of ERα expression by Frmd8 in mouse mammary

      epithelial cells.

      (3) To explore the mechanism by which FRMD8 inhibits ERα degradation, what is the reason for choosing HEK293A?

      Thank you for the good question. HEK293 cell line is commonly used in mechanistic studies. We also employed the breast cancer cell line T47D to verify the observations in HEK293 cells. Furthermore, the mass spectrometry result of HEK293A cells presented in Figure 5E was an additional experiment performed when we were exploring the regulation of the cell cycle by FRMD8, which is published in Cell Reports (PMID: 37527040). Based on the mass spectrometry result, we assumed that FRMD8 may influence ERα degradation mediated by UBE3A.

      Reviewer #2 (Recommendations for the authors):

      Introduction

      (1) In order for the reader to better understand the content of the article, it is better to briefly describe the role of ERα in the progression of breast cancer.

      Thank you for your suggestion. We have provided a brief description of the role of ERα in the introduction of revised manuscript:

      “ERα is a ligand-activated transcription factor that is activated by oestrogen, and promotes cell proliferation during breast cancer development (Harbeck et al., 2019).”

      (2) As ESR1 is mentioned in the second paragraph, a brief description of the relationship between ESR1 and ERα can make the article more logical.

      Thank you for the suggestion. We have added the description in the introduction:

      “Multiple transcription factors, such as AP-2γ, FOXO3, FOXM1, and GATA3, have been reported to bind to the promoter region of ESR1, the gene encoding ERα, and participate in transcriptional regulation of ESR1(Jia et al., 2019; Koš et al., 2001).”

      (3) In the text, there are two variations of the term FRMD8: 'FRMD8' and 'Frmd8'. It is best to standardize on one form throughout the document.

      We apologize for any confusion. The terms "FRMD8" and "Frmd8" are used to indicate proteins derived from human and mouse, respectively.

      Results

      (4) In Figure 2L, there is no noticeable difference in the expression levels of Pgr and Esr1 between the Cre+ tumor and Cre- tumor groups. Figure S2E is more suitable for inclusion in the main text compared to Figure 2L.

      Thank you for this suggestion. ERα and PR are positive in early-stage mammary tumors of MMTV-PyMT mice, while ERα and PR are gradually lost as the tumor progresses. In figure 2, mammary tumors from 4-month-old MMTV-PyMT mice were subjected to scRNA-seq analysis. Since the expression of ERα was very low in tumor cells at this time, there appears to be no difference between the two groups. We have exchanged Figure 2L and Figure S2E in the manuscript.

      (5) The CNV score can be used to assess the malignancy of cells, it would be better to compare the malignancy levels between the two groups.

      This is a very good suggestion. However, copy number variations usually occur randomly and have a high degree of heterogeneity. Due to the limited sample size in our study, we did not compare the difference between the two groups.

      (6) Enrichment analysis is crucial for single-cell sequencing studies. It is recommended to perform differential gene analysis and enrichment analysis between the Cre+ and Cre- groups to further explore the impact of FRMD8 deficiency on the functions of malignant cells.

      Thank you for your suggestion. We have performed differential gene analysis and biological process enrichment analysis on the results of scRNA sequence using the gene ontology (GO) database. Our results showed that upregulated genes in luminal progenitor (Lp) epithelial cells were enriched in epithelial cell proliferation and transmembrane receptor protein serine/threonine kinase signaling pathways, suggesting that Frmd8 deficiency significantly promotes epithelial cells proliferation in MMTV-PyMT mice.

      Author response image 1.

      (7) The coherent logic in lines 300 to 308 should be that FRMD8 is expressed at higher levels in normal Hsd epithelial cells in mice, hence further verification was conducted to examine the expression levels of FRMD8 in various human breast cancer cell lines.

      We have revised the figures and text as suggested.  

      Discussion

      (8) In lines 352 to 360, the background narrative in the first half seems to have little connection with the research findings in the second half; it is suggested to reorganize the language of this section.

      Thank you for the advice. We have rewritten this paragraph in the manuscript:

      “In MMTV-PyMT mice, early-stage mammary tumors express ERα and PR, but these receptors are gradually lost as the tumor progresses (Lapidus et al., 1998). Our scRNA-seq results revealed that mammary tumor epithelial cells in MMTV-PyMT mice fall into four clusters, with only Hsd epithelial cells showing ERα and PR expression. Additionally, Hsd epithelial cells exhibited the lowest CNV score, indicating a closer resemblance to normal epithelial cells. The loss of Frmd8 reduced the proportion of Hsd epithelial cells and led to a downregulation of ERα and PR expression, implying that Frmd8 deficiency promotes the loss of luminal features in the mammary gland and accelerates mammary tumor progression.”

      (9) As stated in the result section, the depletion of FRMD8 may lead to the decrease of the Hsd epithelial cells proportion, it might be beneficial to discuss the significance of this finding.

      We have added a discussion of the Hsd epithelial cell proportion in the third paragraph of this section (please refer to the above question (8) ).

      Figures

      (10) The structural layout of Figure 4 should be reorganized to make it more aesthetically pleasing.

      Thank you for this suggestion. We have rearranged Figure 4 as suggested.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This paper presents a model of the whole somatosensory non-barrel cortex of the rat, with 4.2 million morphologically and electrically detailed neurons, with many aspects of the model constrained by a variety of data. The paper focuses on simulation experiments, testing a range of observations. These experiments are aimed at understanding how the multiscale organization of the cortical network shapes neural activity.

      Strengths:

      (1) The model is very large and detailed. With 4.2 million neurons and 13.2 billion synapses, as well as the level of biophysical realism employed, it is a highly comprehensive computational representation of the cortical network.

      (2) Large scope of work - the authors cover a variety of properties of the network structure and activity in this paper, from dendritic and synaptic physiology to multi-area neural activity.

      (3) Direct comparisons with experiments, shown throughout the paper, are laudable.

      (4) The authors make a number of observations, like describing how high-dimensional connectivity motifs shape patterns of neural activity, which can be useful for thinking about the relations between the structure and the function of the cortical network.

      (5) Sharing the simulation tools and a "large subvolume of the model" is appreciated.

      We thank the reviewer for these comments and are pleased they appreciated these aspects of the work.

      Weaknesses:

      (1) A substantial part of this paper - the first few figures - focuses on single-cell and single-synapse properties, with high similarity to what was shown in Markram et al., 2015. Details may differ, but overall it is quite similar.

      We thank the reviewer for this useful comment and agree that it is important to better highlight the incremental improvements to the model’s low-level physiology. The validity of any model can continuously be improved at all spatial scales and the validity of emergent network activity increases with improved validity at lower levels. For this reason, we felt it was valuable to improve the low-level physiology of the model.

      Regarding neuron physiology, we have added the following in Section 2.1 on page 5:

      “2.1 Improved modeling and validation of neuron physiology

      Similarly to Markram et al. (2015), electrical properties of single neurons were modelled by optimizing ion channel densities in specific compartment-types (soma, axon initial segment (AIS), basal dendrite, and apical dendrite) (Figure 2B) using an evolutionary algorithm (IBEA; Van Geit et al., 2016) so that each neuron recreates electrical features of its corresponding electrical type (e-type) under multiple standardized protocols. Compared to Markram et al. (2015), electrical models were optimized and validated using 1) additional in vitro data, features and protocols, 2) ion channel and electrophysiological data corrected for the liquid junction potential, and 3) stochastic channels (StochKv3) now including inactivation profiles. The methodology and resulting electrical models are described in Reva et al. (2023) (see Methods), and generated quantitatively more accurate electrical activity, including improved attenuation of excitatory postsynaptic potentials (EPSPs) and back-propagating action potentials.”

      And page 8:

      “The new neuron models saw a 5-fold improvement in generalizability compared to Markram et al. (2015) (Reva et al., 2023).”

      We have also made the descriptions of the improvements to synaptic physiology more explicit in Section 2.2 on page 9:

      “2.2 Improved modeling and validation of synaptic physiology

      The biological realism of synaptic physiology was improved relative to Markram et al. (2015) using additional data sources and by extending the stochastic version of the Tsodyks-Markram model (Tsodyks and Markram, 1997; Markram et al., 1998; Fuhrmann et al., 2002; Loebel et al., 2009) to feature multi-vesicular release, which in turn improved the accuracy of the coefficient of variations (CV; std/mean) of postsynaptic potentials (PSPs) as described in Barros-Zulaica et al. (2019) and Ecker et al. (2020). The model assumes a pool of available vesicles that is utilized by a presynaptic action potential, with a release probability dependent on the extracellular calcium concentration ([Ca2+]o; Ohana and Sakmann, 1998; Rozov et al., 2001; Borst, 2010). Additionally, single vesicles spontaneously release as an additional source of variability with a low frequency (with improved calibration relative to Markram et al. (2015)). The utilization of vesicles leads to a postsynaptic conductance with bi-exponential kinetics. Short-term plasticity (STP) dynamics in response to sustained presynaptic activation are either facilitating (E1/I1), depressing (E2/I2), or pseudo-linear (I3). E synaptic currents consist of both AMPA and NMDA components, whilst I currents consist of a single GABAA component, except for neurogliaform cells, whose synapses also feature a slow GABAB component. The NMDA component of E synaptic currents depends on the state of the Mg2+ block (Jahr and Stevens, 1990), with the improved fitting of parameters to cortical recordings from Vargas-Caballero and Robinson (2003) by Chindemi et al. (2022).”

      (2) Although the paper is about the model of the whole non-barrel somatosensory cortex, out of all figures, only one deals with simulations of the whole non-barrel somatosensory cortex. Most figures focus on simulations that involve one or a few "microcolumns". Again, it is rather similar to what was done by Markram et al., 2015 and constitutes relatively incremental progress.

      We thank the reviewer for this comment and have added the following text to the Discussion on page 33 to explain our rationale:

      “In keeping with the philosophy of compartmentalization of parameters and continuous model refinement (see Introduction), it was essential to improve validity at the columnar scale (relative to Markram et al. (2015)) as part of demonstrating validity of the full nbS1. Indeed, improved parametrization and validation at smaller scales was essential to parameterizing background input which generated robust nbS1 activity within realistic [Ca<sup>2+</sup>]<sub>o</sub> and firing rate ranges. We view this as a major achievement, as it was unknown whether the model would achieve a stable and meaningful regime at the start of our investigation. Whilst we would have liked to go further, our primary goal was to publish a well characterized model as an open resource that others could use to undertake further in-depth studies. In this regard, we are pleased that the parametrization of the nbS1 model has already been used to study EEG signals (Tharayil et al., 2024), as well as propagation of activity between two subregions (Bolaños-Puchet and Reimann, 2024).”

      We also make it clearer in the Introduction on page 4 that the improved validation of the emergent columnar regime was essential to stable activity at the larger scale:

      “These initial validations demonstrated that the model was in a more accurate regime compared to Markram et al. (2015) – an essential step before testing more complex or larger-scale validations. For example, under the same parameterization we then observed selective propagation of stimulus-evoked activity to downstream areas, and…”

      (3) With a model like this, one has an opportunity to investigate computations and interactions across an extensive cortical network in an in vivo-like context. However, the simulations presented are not addressing realistic specific situations corresponding to animals performing a task or perceiving a relevant somatosensory stimulus. This makes the insights into the roles of cell types or connectivity architecture less interesting, as they are presented for relatively abstract situations. It is hard to see their relationship to important questions that the community would be excited about - theoretical concepts like predictive coding, biophysical mechanisms like dendritic nonlinearities, or circuit properties like feedforward, lateral, and feedback processing across interacting cortical areas. In other words, what do we learn from this work conceptually, especially, about the whole non-barrel somatosensory cortex?

      We thank the reviewer for this comment and agree that it would be very interesting to explore such topics. In the Introduction on page 4, we have updated the list of papers which have so far used the model for more in depth studies:

      “…propagation of activity between cortical areas (Bolaños-Puchet and Reimann, 2024) the role of non-random connectivity motifs on network activity (Pokorny et al., 2024) and reliability (Egas Santander et al., 2024), the composition of high-level electrical signals such as the EEG (Tharayil et al., 2024), and how spike sorting biases population codes (Laquitaine et al., 2024).”

      In the Discussion on page 33 we also add our additional thoughts on this topic:

      “Whilst we would have liked to go further, our primary goal was to publish a well characterized model as an open resource that others could use to undertake further in-depth studies. In this regard, we are pleased that the parametrization of the nbS1 model has already been used to study EEG signals (Tharayil et al., 2024), as well as propagation of activity between two subregions (Bolaños-Puchet and Reimann, 2024). Investigation, improvement and validation must be continued at all spatial scales in follow up papers with detailed description, figures and analysis, which cannot be covered in this manuscript. Each new study increases the scope and validity of future investigations. In this way, this model and paper act as a stepping stone towards more complex questions of interest to the community such as perception, task performance, predictive coding and dendritic processing. This was similar for Markram et al. (2015) where the initial paper was followed by more detailed studies. Unlike the Markram et al. (2015) model, the new model can also be exploited by the community and has already been used in a number of follow up papers studying (Ecker et al., 2024a,b; Bolaños-Puchet and Reimann, 2024; Pokorny et al., 2024; Egas Santander et al., 2024; Tharayil et al., 2024; Laquitaine et al., 2024). We believe that the number of use cases for such a general model is vast, and is made larger by the increased size of the model.”

      (4) Most comparisons with in vivo-like activity are done using experimental data for whisker deflection (plus some from the visual stimulation in V1). But this model is for the non-barrel somatosensory cortex, so exactly the part of the cortex that has less to do with whiskers (or vision). Is it not possible to find any in vivo neural activity data from the non-barrel cortex?

      We agree with the reviewer that this is a weakness. We have expanded our discussion of the need to mix data sources to also consider our view for network level activity:

      “This paper and its companion paper serve to present a methodology for modeling micro- and mesoscale anatomy and physiology, which can be applied for other cortical regions and species. With the rapid increase in openly available data, efforts are already in progress to build models of mouse brain regions with reduced reliance on data mixing thanks to much larger quantities of available atlas-based data. This also includes data for the validation of emergent network level activity. Here we chose to compare network-level activity to data mostly from the barrel cortex, as well as a single study from primary visual cortex. Whilst a lot of the data used to build the model was from the barrel cortex, the barrel cortex also represents a very well characterized model of cortical processing for simple and controlled sensory stimuli. The initial comparison of population-wise responses in response to accurate thalamic input for single whisker deflections was essential to demonstrating that the model was closer to in vivo, and we were unaware of similar data for nonbarrel somatosensory regions. Moreover, our optogenetic & lesion study demonstrated the capacity to compare and extend studies of canonical cortical processing in the whisker system.”

      (5) The authors almost do not show raw spike rasters or firing rates. I am sure most readers would want to decide for themselves whether the model makes sense, and for that, the first thing to do is to look at raster plots and distributions of firing rates. Instead, the authors show comparisons with in vivo data using highly processed, normalized metrics.

      We thank the reviewer for this comment and agree that better visualizations of the network activity under different conditions is essential for helping the reader assess the work. In addition to raster plots in Video 1, Video 3, Fig 6, Fig 5C, Fig S9a, S16a, we have additionally:

      a) Changed the histograms of spontaneous activity in Fig 4G on page 13 to raster plots for the seven column subvolume for two contrasting meta-parameter regimes.

      b) Added 4 new videos (Video 6a,b and 8a,b) showing all spontaneous and evoked meta-parameter combinations in hex0 and hex39 of the nbS1:

      We have added improved plots showing the distributions of firing rates in the seven column subvolume on page 74:

      With more detailed consideration in the Results on page 15:

      “Long-tailed population firing rate distributions with means ∼ 1Hz

      To study the firing rate distributions of different subpopulations and m-types, we ran 50s simulations for the meta-parameter combinations: [Ca<sup>2+</sup>]<sub>o</sub>: 1.05mM, R<sub>OU</sub>: 0.4,P<sub>FR</sub>: 0.3, 0.7 (Figure S4). Different subpopulations showed different sparsity levels (proportion of neurons spiking at least once) ranging from 6.6 to 42.5%. Wohrer et al. (2013) considered in detail the biases and challenges in obtaining ground truth firing rate distributions in vivo, and discuss the wide heterogeneity of reports in different modalities using different recording techniques. They conclude that most evidence points towards longtailed distributions with peaks just below 1Hz. We confirmed that spontaneous firing rate distributions were long-tailed (approximately lognormally distributed) with means on the order of 1Hz for most subpopulations. Importantly the layer-wise means were just below 1Hz in all layers for the P<sub>FR</sub> = 0.3 meta-parameter combination. Moreover, our recent work applying spike sorting to extracellular activity using this meta-parameter combination found spike sorted firing rate distributions to be lognormally distributed and very similar to in vivo distributions obtained using the same probe geometry and spike sorter (Laquitaine et al., 2024).

      (6) While the authors claim that their model with one set of parameters reproduces many experimentally established metrics, that is not entirely what one finds. Instead, they provide different levels of overall stimulation to their model (adjusting the target "P_FR" parameter, with values from 0 to 1, and other parameters), and that influences results. If I get this right (the figures could really be improved with better organization and labeling), simulations withP<sub>FR</sub> closer to 1 provide more realistic firing rate levels for a few different cases, however, P<sub>FR</sub> of 0.3 and possibly above tends to cause highly synchronized activity - what the authors call bursting, but which also could be called epileptic-like activity in the network.

      We thank the reviewer for this comment. We can now see that the motivation for P<sub>FR</sub> parameter was introduced very briefly in the results and that the results of the calibration and analysis of the spontaneous activity regime are not interpreted in relation to this parameter.

      To address this, we have given more detail where it is first introduced in the Results on page 12:

      “to account for uncertainty in the firing rate bias during spontaneous activity from extracellular spike sorted recordings…”

      We then reconsider that it represents an unknown bias when interpreting the calibration and spontaneous activity results on page 15:

      “We reemphasize that the [Ca<sup>2+</sup>]<sub>o</sub>, R<sub>OU</sub> and P<sub>FR</sub> meta-parameters account for uncertainty of in vivo extracellular calcium concentration, the nature of inputs from other brain regions and the bias of extracellularly recorded firing rates. Whilst estimates for [Ca<sup>2+</sup>]<sub>o</sub> are between 1.0 - 1.1mM (Jones and Keep, 1988; Massimini and Amzica, 2001; Amzica et al., 2002; Gonzalez et al., 2022) and estimates for PFR are in the range of 0.1 - 0.3 (Olshausen and Field, 2006), combinations of these parameters supporting in vivo-like stimulus responses in later sections will offer a prediction for the true values of these parameters. Both these later results and our recent analysis of spike sorting bias using this model (Laquitaine et al., 2024) predict a spike sorting bias corresponding to P<sub>FR</sub> ∼ 0.3, confirming the prediction of Olshausen and Field (2006).”

      And in relation to the stimulus evoked responses on page 17:

      “Specifically, simulations with PFR from 0.1 to 0.5 robustly support realistic stimulus responses, with the middle of this range (0.3) corresponding with estimates of in vivo recording bias; both the previous estimates of Olshausen and Field (2006) and from a spike sorting study using this model (Laquitaine et al., 2024).”

      Following these considerations, the remainder of the experiments using the seven column subvolume only use a single meta-parameter on page 19.

      For the full nbS1 we further discuss the importance of a P_FR value between 0.1 and 0.3 in the Results on page 26:

      “Stable spontaneous activity only emerges in nbS1 at predicted in vivo firing rates

      After calibrating the model of extrinsic synaptic input for the seven column subvolume, we tested to what degree the calibration generalizes to the entire nbS1. Notably, this included the addition of mid-range connectivity (Reimann et al., 2024). The total number of local and mid-range synapses in the model was 9138 billion and 4075 billion, i.e., on average full model simulations increased the number of intrinsic synapses onto a neuron by 45%. Particularly, we ran simulations for P<sub>FR</sub></i ∈ [0.1, 0.15, ..., 0.3] using the OU parameters calibrated for the seven column subvolume for [Ca<sup>2+</sup>]<sub>o</sub> = 1.05mM and R<sub>OU</sub> = 0.4. Each of these full nbS1 simulations produced stable non-bursting activity (Figure 8A), except for the simulation for P<sub>FR</sub></i = 0.3, which produced network-wide bursting activity (Video 6). Activity levels in the simulations of spontaneous activity were heterogeneous (Figure 8B, Video 7). In some areas, firing rates were equal to the target P<sub>FR</sub>, whilst in others they increased above the target (Figure 8C). In the more active regions, mean firing rates (averaged over layers) were on the order of 30-35% of the in vivo references for the maximum non-bursting P<sub>FR</sub> simulation (target P<sub>FR</sub> : 0.25). This range of firing rates again fits with the estimate of firing rate bias from our paper studying spike sorting bias (Laquitaine et al., 2024) and the meta-parameter range supporting realistic stimulus responses in the seven column subvolume. This also predicts that the nbS1 cannot sustain higher firing rates without entering a bursting regime.

      Finally, we also added to our discussion of biases in extracellular firing rates in the Discussion on page 32:

      “This is also inline with our recent work using the model, which estimated a spike sorting bias corresponding to PFR = 0.3 using virtual extracellular electrodes (Laquitaine et al., 2024).”

      We also thank the reviewer for pointing out that we did not define the term “bursting” in the main text. We have added the following definition and discussion in the Results on page 15:

      “Note that the most correlated meta-parameter combination [Ca<sup>2+</sup>]<sub>o</sub>: 1.1mM, R<sub>OU</sub>: 0.2, P<sub>FR</sub>: 1.0 produced network-wide “bursting” activity, which we define as highly synchronous all or nothing events (Video 1). Such activity, which may be characteristic of epileptic activity, can be studied with the model but is not the focus of this study.”

      (7) The authors mention that the model is available online, but the "Resource availability" section does not describe that in substantial detail. As they mention in the Abstract, it is only a subvolume that is available. That might be fine, but more detail in appropriate parts of the paper would be useful.

      Firstly, we are pleased to say that the full nbS1 model is now available to download, in addition to the seven hexagon subvolume. In the manuscript, we have:

      a) Added to the Introduction at the bottom of page 4:

      “To provide a framework for further studies and integration of experimental data, the full model is made available with simulation tools, as well as a smaller subvolume with the optional new connectome capturing inhibitory targeting rules from electron microscopy”.

      b) Updated the open source panel of Figure 1:

      Secondly, we thank the reviewer for noticing that the description of the available model is not well described in the “Resource availability” statement and have addressed this by:

      a) Adding the following to the “Resource availability” statement on page 36:

      “Both the full nbS1 model and smaller seven hexagon subvolume are available on Harvard Dataverse and Zenodo respectively in SONATA format (Dai et al., 2020) with simulation code. DOIs are listed under the heading ``Final simulatable models'' in the Key resources table. An additional link is provided to the SM-Connectome with instructions on how to use it with the seven hexagon subvolume model.”

      b) Creating a new subheading in the “Key resources table” titled: “Final simulatable models” to make it clearer which links refer to the final models.

      Reviewer #2 (Public review):

      Summary:

      This paper is a companion to Reimann et al. (2022), presenting a large-scale, data-driven, biophysically detailed model of the non-barrel primary somatosensory cortex (nbS1). To achieve this unprecedented scale of a bottom-up model, approximately 140 times larger than the previous model (Markram et al., 2015), they developed new methods to account for inputs from missing brain areas, among other improvements. Isbister et al. focus on detailing these methodological advancements and describing the model's ability to reproduce in vivo-like spontaneous, stimulus-evoked, and optogenetically modified activity.

      Strengths:

      The model generated a series of predictions that are currently impossible in vivo, as summarized in Table S1. Additionally, the tools used in this study are made available online, fostering community-based exploration. Together with the companion paper, this study makes significant contributions by detailing the model's constraints, validations, and potential caveats, which are likely to serve as a basis for advancing further research in this area.

      We thank the reviewer for these comments, and are pleased they appreciate these aspects of the work.

      Weaknesses:

      That said, I have several suggestions to improve clarity and strengthen the validation of the model's in vivo relevance.

      Major:

      (1) For the stimulus-response simulations, the authors should also reference, analyze, and compare data from O'Connor et al. (2010; https://pubmed.ncbi.nlm.nih.gov/20869600/) and Yu et al .(2016; https://pubmed.ncbi.nlm.nih.gov/27749825/) in addition to Yu et al. 2019, which is the only data source the authors consider for an awake response. The authors mentioned bias in spike rate measurements, but O'Connor et al. used cell-attached recordings, which do not suffer from activity-based selection bias (in addition, they also performed Ca2+ imaging of L2/3). This was done in the exact same task as Yu et al., 2019, and they recorded from over 100 neurons across layers. Combining this data with Yu et al., 2019 would provide a comprehensive view of activity across layers and inhibitory cell types. Additionally, Yu et al. (2016) recorded VPM neurons in the same task, alongside whole-cell recordings in L4, showing that L4 PV neurons filter movement-related signals encoded in thalamocortical inputs during active touch. This dataset is more suitable for extracting VPM activity, as it was collected under the same behavior and from the same species (Unlike Diamond et al., 1992, which used anesthetized rats). Furthermore, this filtering is an interesting computation performed by the network the authors modeled. The validation would be significantly strengthened and more biologically interesting if the authors could also reproduce the filtering properties, membrane potential dynamics, and variability in the encoding of touch across neurons, not just the latency (which is likely largely determined by the distance and number of synapses).

      We thank the reviewer for pointing out these very useful studies. We have taken on board this suggestion for a future model of the mouse barrel cortex.

      (2) The authors mention that in the model, the response of the main activated downstream area was confined to L6. Is this consistent with in vivo observations? Additionally, is there any in vivo characterization of the distance dependence of spiking correlation to validate Figure 8I?

      We are not aware of data confirming the propagation of activity to downstream areas being confined to layer 6 but have considered the connectivity further between these two regions on page 27, as well as studying this further in follow up work:

      “Stable propagation of evoked activity through mid-range connectivity only emerges in nbS1 at predicted in vivo firing rates

      We repeated the previous single whisker deflection evoked activity experiment in the full model, providing a synchronous thalamic input into the forelimb sub-region (S1FL; Figure 8E; Video 8 & 9). Responses in S1FL were remarkably similar to the ones in the seven column subvolume, including the delays and decays of activity (Figure 8F). However, in addition to a localized primary response in S1FL within 350μm of the stimulus, we found several secondary responses at distal locations (Figure 8E; Video 9), which was suggestive of selective propagation of the stimulus-evoked signal to downstream areas efferently connected by mid-range connectivity. The response of the main activated downstream area (visible in Figure 8E) was confined to L6 (Figure 8G). In a follow up study using the model to explore the propagation of activity between cortical regions (Bolaños-Puchet and Reimann, 2024), it is described how the model contains both a feedforward projection pattern, which projects to principally to synapses in L1 & L23, and a feedback type pattern, which principally projects to synapses in L1 & L6. On visualizing the innervation profile from the stimulated hexagon to the downstream hexagon we can see that we have stimulated a feedback pathway (Figure S16)”

      With referenced Figure S16 on page 85:

      We did find in vivo evidence of similar layer-wise and distance dependence of correlations in the somatosensory cortex discussed on page 27 of the Results:

      “The distance dependence of correlations followed a similar profile to that observed in a dataset characterizing spontaneous activity in the somatosensory cortex (Reyes-Puerta et al., 2015a) (compare red line in Figure 8I with Figure S16). In the in vivo dataset spiking correlation was also low but highest in lower layers, with short “up-states” in spiking activity constrained to L5 & 6 (see Figure 1E,F in (Reyes-Puerta et al., 2015a)). In the model, they are constrained to L6.”

      With Figure S16a on page 85 showing the distance dependence of correlations in the anaesthetized barrel cortex during spontaneous activity (digitization from the reference paper):

      (3) Across the figures, activity is averaged across neurons within layers and E or I cell types, with a limited description of single-cell type and single-cell responses. Were there any predictions regarding the responses of particular cell types that significantly differ from others in the same layer? Such predictions could be valuable for future investigations and could showcase the advantages of a data-driven, biophysically detailed model.

      We thank the review for this comment. In addition to new analyses at higher granularity addressed in other comments, we have added the following comparison of stimulus-evoked membrane potential dynamics in different subpopulations for the original connectome and SM-connectome in Figure 7 on page 24.

      This gave interesting results discussed in a new subsection on page 26:

      “EM targeting trends hyperpolarize Sst+ and HT3aR+ late response, and disinhibit L5/6 E

      Studying somatic membrane potentials for different subpopulations in response to whisker deflections shows that PV+, L23E and L4E subpopulations are largely unaffected in the SM-connectome (Figure 7E). Interestingly, Sst+ and 5HT3aR+ subpopulations show a strong hyperpolarization in the late response that isn’t present in the original connectome. Interestingly, this corresponds with a stronger late response in L5/6 E populations, which could be caused by disinhibition due to the Sst+ and 5HT3aR+ hyperpolarization. This could be explored further in follow up studies using our connectome manipulator tool (Pokorny et al., 2024).”

      (4) 2.4: Are there caveats to assuming the OU process as a model for missing inputs? Inputs to the cortex are usually correlated and low-dimensional (i.e., communication subspace between cortical regions), but the OU process assumes independent conductance injection. Can (weakly) correlated inputs give rise to different activity regimes in the model? Can you add a discussion on this?

      We agree with the reviewer that there are caveats to assuming an OU process for the model of missing inputs and have added the following to the Discussion on page 31:

      “The calibration framework could optimize per population parameters for other compensation methods, whilst still offering an interpretable spectrum of firing rate regimes at different levels of P<sub>FR</sub>. For example, more realistic compensation schemes could be explored which introduce a) correlations between the inputs received by different neurons and b) compensation distributed across dendrites, as well as at the soma. We predict that such changes would make spontaneous activity more correlated at the lower spontaneous firing rates which supported in vivo like responses (P<sub>FR</sub> : 0.1 − 0.5), which would in turn make stimulus-responses more noise correlated.”

      (5) 2.6: The network structure is well characterized in the companion paper, where the authors report that correlations in higher dimensions were driven by a small number of neurons with high participation ratios. It would be interesting to identify which cell types exhibit high node participation in high-dimensional simplices and examine the spiking activity of cells within these motifs. This could generate testable predictions and inform theoretical cell-type-specific point neuron models for excitatory/inhibitory balanced networks and cortical processing.

      We thank the reviewer for this suggestion. We have added two supplementary figures to address this suggestion, which are discussed in the Results on Page 16:

      “Additionally, we studied the structural effect on the firing rate (here measured as the inverse of the inter-spike interval, ISI, which can be thought of as a proxy of non-zero firing rate). We found that for the connected circuit, the firing rate increases with simplex dimension; in contrast with the disconnected circuit, where this relationship remains flat (see Figure S6 red vs. blue curves and Methods).

      This also demonstrates high variability between neurons, in line with biology, both structurally (Towlson et al., 2013; Nigam et al., 2016) and functionally (Wohrer et al., 2013; Buzs´aki and Mizuseki, 2014). We next identified the cell types that are overexpressed in the group of neurons that have the 5% highest values of node participation across dimensions (Figure S7). This could inform theoretical point neuron models with cell-type specificity, for example. We found that while in dimension one (i.e., node degree) this consists mostly of inhibitory cells, in higher dimensions the cell types concentrate in layers 4, 5 and 6, especially for TPC neurons. This is in line with our structural layer-wise findings in Figure 8B in Reimann et al. (2024).”

      Which reference new Figures S6 and S7:

      With the methodology for S6 described on page 49 of the Methods:

      “For any numeric property of neurons, e.g., firing rate, we evaluate the effect of dimension on it by taking weighted averages across dimensions. That is for each dimension k, we take the weighted average of the property across neurons where the weights are given by node participation on dimension k. More precisely, let N be the number of neurons and −→V ∈ RN, be a vector of a property on all the neurons e.g., the vector of firing rates. Then in each dimension k we compute

      Where is the vector of node participation on dimension k for all neurons and ・ is the dot product.

      To measure the over and underexpression of the different m-types among those with the highest 5% of values of node participation, we used the hypergeometric distribution to determine the expected distribution of m-types in a random sample of the same size. More precisely, for each dimension k and m-type m, let N<sub>total</sub> be the total number of neurons in the circuit, Nm be the number of neurons of m-type m in the circuit, Ctop be the number of neurons with the highest 5% values of node participation in dimension k, Cm the number of neurons of mtype m among these, and let P = hypergeom(N<sub>total</sub<,N<sub>m</sub>,C<sub>top</sub>) be the hypergeometric distribution.

      By definition, P(x) describes the probability of sampling x neurons of m-type m in a random sample of size C<sub>top</sub>. Therefore, using the cumulative distribution F(x) = P(Counts ≤ x), we can compute the p-values as follows:

      Small values indicate under and over representation respectively….”

      Minor:

      (1) Since the previous model was published in 2015, the neuroscience field has seen significant advancements in single-cell and single-nucleus sequencing, leading to the clustering of transcriptomic cell types in the entire mouse brain. For instance, the Allen Institute has identified ~10 distinct glutamatergic cell types in layer 5, which exceeds the number incorporated into the current model. Could you discuss 1) the relationship between the modeled me-types and these transcriptomic cell types, and 2) how future models will evolve to integrate this new information? If there are gaps in knowledge in order to incorporate some transcriptome cell types into your model, it would be helpful to highlight them so that efforts can be directed toward addressing these areas.

      We thank the reviewer for this suggestion, particularly the idea to describe what types of data would be valuable towards improving the model in future. We have added the following to the Discussion on page 33:

      “In our previous work (Roussel et al., 2023) we linked mouse inhibitory me-models to transcriptomic types (t-types) in a whole mouse cortex transcriptomic dataset (Gouwens et al., 2019). This can provide a direct correspondence in future large-scale mouse models. As we model only a single electrical type for pyramidal cells there is no one-to-one correspondence between our me-models and the 10 different pyramidal cell types identified there. We are not currently aware of any method which can recreate the electrical features of different types of pyramidal cells using only generic ion channel models. To achieve the firing pattern behavior of more specific electrical types, usually ion channel kinetics are tweaked, and this would violate the compartmentalization of parameters. In future we hope to build morpho-electric-transcriptomic type (met-type) models by selecting gene-specific ion channel models (Ranjan et al., 2019, 2024) based on the met-type’s gene expression. Data specific to different neuron sections (i.e. soma, AIS, apical/basel dendrites) of different met-types, such as gene expression, distribution of ion channels, and voltage recordings under standard single cell protocols would be particularly useful.”

      (2) For the optogenetic manipulation, it would be interesting if the model could reproduce the paradoxical effects (for example, Mahrach et al. reported paradoxical effects caused by PV manipulation in S1; https://pubmed.ncbi.nlm.nih.gov/31951197/). This seems a more relevant and non-trivial network phenomenon than the V1 manipulation the authors attempted to replicate.

      We thank the reviewer for this valuable idea. Indeed, our model is able to reproduce paradoxical effects under certain conditions. We added the following new supplementary Figure S12 demonstrating this finding (black arrows).

      Which we discuss in the Results on page 22:

      “However, at high contrasts, we observed a paradoxical effect of the optogenetic stimulation on L6 PV+ neurons, reducing their activity with increasing stimulation strength (Figure S12B; cf. Mahrach et al. (2020)). This effect did not occur under grey screen conditions (i.e., at contrast 0.0) with a constant background firing rate of 0.2 Hz or 5 Hz respectively (not shown). The individual…”

      and added to the Discussion on page 32:

      “Also, we predicted a paradoxical effect of optogenetic stimulation on L6 PV+ interneurons, namely a decrease in firing with increased stimulus strength. This is reminiscent of the paradoxical responses found by Mahrach et al. (2020) in the mouse anterior lateral motor cortex (in L5, but not in L2/3) and barrel cortex (no layer distinction) respectively. While Mahrach et al. (2020) conducted their recordings in awake mice not engaged in any behavior, we observed this effect only when drifting grating patterns with high contrast were presented. Nevertheless, consistent with their findings, we found the effect only in deep but not in superficial layers, and only for PV+ interneurons but not for PCs. Our model could therefore be used to improve the understanding of this paradoxical effect in follow up studies. These examples demonstrate that the approach of modeling entire brain regions can be used to further probe the topics of the original articles and cortical processing.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      My specific comments are in the Public Review. The summarizing point is that this is a sprawling paper, and it is easy for readers to get confused. Focusing on specific connections between known functional properties and findings in this model, especially for the full-scale model, will be helpful.

      We thank the reviewer for this comment and for their related recommendation (4) below, and have added subheadings through-out the results.

      Reviewer #2 (Recommendations for the authors):

      (1) P4. What are the 10 free parameters?

      We thank the reviewer for pointing out that it would be useful to summarize the 10 parameters at this stage of the text, and have adjusted the sentence to:

      “As a result, the emerging in-vivo like activity is the consequence of only 10 free parameters representing the strength of extrinsic input from other brain regions into 9 layer-specific excitatory and inhibitory populations, and a parameter controlling the noise structure of this extrinsic input.”

      (2) Table 1 and S1 are extremely useful. Could you provide a table summarizing the major assumptions or gaps in the model, their potential influence on the results, and possible ways to collect data that could support or challenge these assumptions? Currently, this information is scattered throughout the manuscript.

      We thank the reviewer for this very useful suggestion and have added a Table S8 on page 68:

      (3) Figure 4F is important, but the legend is unclear. What is the unit on the x-axis? The values seem too large to represent per-neuron measurements.

      Thank you to the reviewer for raising this. Indeed the values are estimated mean numbers of missing number synapses per neuron by population. Such numbers are difficult to estimate but we have further discussed our rationale, justification and consideration of whether these numbers are accurate in the Results, as follows:

      “Heterogeneity in synaptic density within and across neuron classes and sections makes estimating the number of missing synapses challenging (DeFelipe and Fariñas, 1992). Changing the assumed synaptic density value of 1.1 synapses/μm would only change the slope of the relationship, however. Estimates of mean number of existing and missing synapses per population were within reasonable ranges; even the larger estimate for L5 E (due to higher dendritic length; Figure S3) was within biological estimates of 13,000 ± 3,500 total afferent synapses (DeFelipe and Fariñas, 1992).”

      This text references the new supplementary Figure S3:

      Moreover, these numbers represent the number of synapses, rather than the number of connections. The number of connections is usually used for quantifications such as indegree, and are usually much lower.

      We have also updated the caption and axis labels of the original figure:

      (4) Including additional subsections or improving the indexing in the Results section could be beneficial. In its current format, it's difficult to distinguish where the model description ends and where the validation begins. Some readers may want to focus more on the validation than other parts, so clearer segmentation would improve readability.

      We have addressed this comment with the opening comment in the authors “Recommendations for authors”.

      (5) P4. 2nd paragraph. Original vs rewired connectome. The term "rewired connectome" may give the impression that it refers to an artificial manipulation rather than a modification based on the latest data. It might be helpful to use a different term (e.g., SM-connectome as described later in the paper?).

      We have adjusted the text in the introduction:

      “Additionally, we generated a new connectome which captured recently characterized spatially-specific targeting rules for different inhibitory neuron types (Schneider-Mizell et al., 2023) in the MICrONS electron microscopy dataset (MICrONS-Consortium et al., 2021), such as increased perisomatic targeting by PV+ neurons, and increased targeting of inhibitory populations by VIP+ neurons. Comparing activity to the original connectome gave predictions about the role of these additional targeting rules.”

      (6) Figures 7 B, C, D: what is v1/v2? Original vs SM-Connectome?

      We thank the reviewer for noticing this and have corrected the figure to use “Orig” and “SM” consistent with the rest of the figure.

      (7) Page 23, 2.10: what is phi?

      We thank the reviewer for noticing this inconsistency with the earlier text, and have updated the text to read: “Particularly, we ran simulations for PF R ∈ [0.1, 0.15, ..., 0.3] using the OU para-maters calibrated for the seven column subvolume for [Ca<sup>2+</sup>] = 1.05 mM and R<sub>OU</sub> = 0.4.”

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank you for sending our manuscript for the second round of review.  We are encouraged by the comments from reviewer #2 that our supplementary work on naïve T cells and antibody blockade work satisfied their previous concerns and is important for our work.

      The Editors raised concerns that we have shared preliminary data on Nrn1 and AMPAR double knockout mice.  We apologize for our enthusiasm for these studies.  Because of the publication model by eLife, we shared that data not because we needed to persuade the reviewer for publication purposes but rather to agree with the reviewer that the molecular target of Nrn1 is important, and we are progressing in understanding this subject.


      The following is the authors’ response to the original reviews.

      To Reviewer #1:

      Thank you for your thorough review and comments on our work, which you described as “the role of neuritin in T cell biology studied here is new and interesting.”.  We have summarized your comments into two categories: biology and investigation approach, experimental rigor, and data presentation.

      Biology and Investigation approach comments:

      (1) Questions regarding the T cell anergy model:

      Major point “(4) Figure 1E-H. The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this. It would be useful to show that T cells are indeed anergic in this model, especially those that are OVA-specific. The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVA-specific cells, rather than by an anergic status.”

      T cell anergy is a well-established concept first described by Schwartz’s group. It refers to the hyporesponsive T cell functional state in antigen-experienced CD4 T cells (Chappert and Schwartz, 2010; Fathman and Lineberry, 2007; Jenkins and Schwartz, 1987; Quill and Schwartz, 1987).  Anergic T cells are characterized by their inability to expand and to produce IL2 upon subsequent antigen re-challenge. In this paper, we have borrowed the existing in vivo T cell anergy induction model used by Mueller’s group for T cell anergy induction (Vanasek et al., 2006).  Specifically, Thy1.1+ Ctrl or Nrn1-/- TCR transgenic OTII cells were co-transferred with the congenically marked Thy1.2+ WT polyclonal Treg cells into TCR-/- mice.  After anergy induction, the congenically marked TCR transgenic T cells were recovered by sorting based on Thy1.1+ congenic marker, and subsequently re-stimulation ex vivo with OVA323-339 peptide. We evaluated the T cell anergic state based on OTII cell expansion in vivo and IL2 production upon OVA323-339 restimulation ex vivo.  

      “The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this.”

      Because the anergy model by Mueller's group is well established (Vanasek et al., 2006), we did not feel that additional effort was required to validate this model as the reviewer suggested. Moreover, the limited IL2 production among the control cells upon restimulation confirms the validity of this model.

      “The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVAspecific cells, rather than by an anergic status”.

      Cells from Ctrl and Nrn1-/- mice on a homogeneous TCR transgenic (OTII) background were used in these experiments. The possibility that substantial variability of TCR expression or different expression levels of the transgenic TCR could have impacted IL2 production rather than anergy induction is unlikely.

      Overall, we used this in vivo anergy model to evaluate the Nrn1-/- T cell functional state in comparison to Ctrl cells under the anergy induction condition following the evaluation of Nrn1 expression, particularly in anergic T cells.  Through studies using this anergy model, we observed a significant change in Treg induction among OTII cells. We decided to pursue the role of Nrn1 in Treg cell development and function rather than the biology of T cell anergy as evidenced by subsequent experiments.

      Minor points “(6) On which markers are anergic cells sorted for RNAseq analysis?”

      Cells were sorted out based on their congenic marker marking Ctrl or Nrn1-/- OTII cells transferred into the host mice.  We did not specifically isolate anergic cells for sequencing.

      (2) Question regarding the validity of iTreg differentiation model.

      Major point: “(5) Figure 2A-C and Figure 3. The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance. In any case, they are different from pTreg cells generated in vivo. Working with pTreg may be challenging, that is why I would suggest generating data with purified nTreg. Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript. Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”.

      We thank Reviewer #1 for their feedback. While it is true that iTregs made in vitro and in vivo generated pTregs display several distinctions (e. g., differences in Foxp3 expression stability, for example), we strongly disagree with this statement by Revieweer#1 “The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance.”  The induced Treg cell (iTreg) model was established over 20 years ago (Chen et al., 2003; Zheng et al., 2002), and the model is widely adopted with over 2000 citations. Further, it has been instrumental in understanding different aspects of regulatory T cell biology (Hurrell et al., 2022; John et al., 2022; Schmitt and Williams, 2013; Sugiura et al., 2022).   

      Because we have observed reduced pTreg generation in vivo, we choose to use the in vitro iTreg model system to understand the mechanistic changes involved in Treg cell differentiation and function, specifically, neuritin’s role in this process. We have made no claim that iTreg cell biology is identical to pTreg generated in vivo or nTreg cells. However, the iTreg culture system has proved to be a good in vitro system for deciphering molecular events involved in complex processes. As such, it remains a commonly used approach by many research groups in the Treg cell field (Hurrell et al., 2022; John et al., 2022; Sugiura et al., 2022). Moreover, applying the iTreg in vitro culture system has been instrumental in helping us identify the cell electrical state change in Nrn1-/- CD4 cells and revealed the biological link between Nrn1 and the ionotropic AMPA receptor (AMPAR), which we will discuss in the subsequent discussion. It is technically challenging to use nTreg cells for T cell electrical state studies due to their heterogeneous nature from development in an in vivo environment and the effect of manipulation during the nTreg cell isolation process, which can both affect the T cell electrical state.   

      “Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript.” 

      We have also carried out nTreg studies in vitro in addition to iTreg cells. Similar to Gonzalez-Figueroa et al.'s findings, we did not observe differences in suppression function between Nrn1-/- and WT nTreg using the in vitro suppression assay. However, Nrn1-/- nTreg cells revealed reduced suppression function in vivo (Fig. 2D-L). In fact, Gonzalez-Figueroa et al. observed reduced plasma cell formation after OVA immunization in Treg-specific Nrn1-/- mice, implicating reduced suppression from Nrn1-/- follicular regulatory T (Tfr) cells. Thus, our observation of the reduced suppression function of Nrn1-/- nTreg toward effector T cell expansion, as presented in Fig. 2D-L, does not contradict the results from Gonzalez-Figueroa et al. Rather, the conclusions of these two studies agree that Nrn1 can play important roles in immune suppression observable in vivo that are not captured readily by the in vitro suppression assay.

      “Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”

      We have stated in the manuscript on page 7 line 208 that “Similar proportions of Foxp3+ cells were observed in Nrn1-/- and Ctrl cells under the iTreg culture condition, suggesting that Nrn1 deficiency does not significantly impact Foxp3+ cell differentiation”. In the revised manuscript, we will include the data on the proportion of Foxp3+ cells before iTreg restimulation.

      (3) Confirmation of transcriptomic data regarding amino acids or electrolytes transport change

      Minor point“(3) Would not it be possible to perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane? This would be a more interesting demonstration than transcriptomic data.”

      We appreciate Review# 1’s suggestion regarding “perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane”.  We have indeed already performed such experiments corroborating the transcriptomics data on differential amino acid and nutrient transporter expression. Specifically, we loaded either iTreg or Th0 cells with membrane potential (MP) dye and measured MP level change after adding the complete set of amino acids (complete AA).  Upon entry, the charge carried by AAs may transiently affect cell membrane potential. Different AA transporter expression patterns may show different MP change patterns upon AA entry, as we showed in Author response image 1. We observed reduced MP change in Nrn1-/- iTreg compared to the Ctrl, whereas in the context of Th0 cells, Nrn1-/- showed enhanced MP change than the Ctrl. We can certainly include these data in the revised manuscript.

      Author response image 1.

      Membrane potential change induced by amino acids entry. a. Nrn1-/- or WT iTreg cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs. b. Nrn1-/- or WT Th0 cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs.

      (4) EAE experiment data assessment

      Minor point ”(5) Figure 5F. How are cells re-stimulated? If polyclonal stimulation is used, the experiment is not interesting because the analysis is done with lymph node cells. This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”

      In the EAE study, the Nrn1-/- mice exhibit similar disease onset but a protracted non-resolving disease phenotype compared to the WT control mice.  Several reasons may contribute to this phenotype: 1. Enhanced T effector cell infiltration/persistence in the central nervous system (CNS); 2. Reduced Treg cell-mediated suppression to the T effector cells in the CNS; 3. Protracted non-resolving inflammation at the immunization site has the potential to continue sending T effector cells into CNS, contributing to persistent inflammation. Based on this reasoning, we examined the infiltrating T effector cell number and Treg cell proportion in the CNS.  We also restimulated cells from draining lymph nodes close to the inflammation site, looking for evidence of persistent inflammation.  When mice were harvested around day 16 after immunization, the inflammation at the local draining lymph node should be at the contraction stage.  We stimulated cells with PMA and ionomycin intended to observe all potential T effector cells involved in the draining lymph node rather than only MOG antigen-specific cells.  We disagree with Reviewer #1’s assumption that “This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”. We think the experimental approach we have taken has been appropriately tailored to the biological questions we intended to answer.

      Experimental rigor and data presentation.

      (1) data labeling and additional supporting data

      Major points

      (2) The authors use Nrn1+/+ and Nrn1+/- cells indiscriminately as control cells on the basis of similar biology between Nrn1+/+ and Nrn1+/- cells at homeostasis. However, it is quite possible that the Nrn1+/- cells have a phenotype in situations of in vitro activation or in vivo inflammation (cancer, EAE). It would be important to discriminate Nrn1+/- and Nrn1+/+ cells in the data or to show that both cell types have the same phenotype in these conditions too.

      (3) Figure 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. Once verified, it would be important to add FACS results with this mAb in Figures 1A-C to have single-cell and quantitative data as well.

      Minor points  

      (1) Line 119, 120 of the text. It is said that one of the most up-regulated genes in anergic cells is Nrn1 but the data is not shown.

      (2) For all figures showing %, the titles of the Y axes are written in an odd way. For example, it is written "Foxp3% CD4". It would be more conventional and clearer to write "% Foxp3+ / CD4+" or "% Foxp3+ among CD4+".

      (4) For certain staining (Figure 3E, H) it would be important to show the raw data, in addition to MFI or % values.

      We can adapt the labeling and provide additional data, including Nrn1 staining on Treg cells and flow graphs for pmTOR and pS6 staining (Fig. 3H), as requested by Reviewer #1.

      (2) Experimental rigor:

      General comments:

      “However, it is disappointing that reading this manuscript leaves an impression of incomplete work done too quickly.”

      We were discouraged to receive the comment, “this manuscript leaves an impression of incomplete work done too quickly.” Our study of this novel molecule began without any existing biological tools such as antibodies, knockout mice, etc.  Over the past several years, we have established our own antibodies for Nrn1 detection, obtained and characterized Nrn1 knockout mice, and utilized multiple approaches to identify the molecular mechanism of Nrn1 function. Through the use of the in vitro iTreg system described in this manuscript, we identified the association of Nrn1 deficiency with cell electrical state change, potentially connected to AMPAR function. We have further corroborated our findings by generating Nrn1 and AMPAR T cell specific double knockout mice and confirmed that T cell specific AMPAR deletion could abrogate the phenotype caused by the Nrn1 deficiency (see Support Figure 2).  We did not include the double knockout data in the current manuscript because AMPAR function has not yet been studied thoroughly in T cell biology, and we feel this topic warrants examination in its own right.  However, the unpublished data support the finding that Nrn1 modulates the T cell electrical state and, consequently, metabolism, ultimately influencing tolerance and immunity.  In its current form, the manuscript represents the first characterization of the novel molecule Nrn1 in anergic cells, Tregs, and effector T cells. While this work has led to several exciting additional questions, we disagree that the novel characterization we have presented Is incomplete. We feel that our present data set, which squarely highlights Nrn1’s role as an important immune regulator while shedding unprecedented light on the molecular events involved, will be of considerable interest to a broad field of researchers.

      “Multiple models have been used, but none has been studied thoroughly enough to provide really conclusive and unambiguous data. For example, 5 different models were used to study T cells in vivo. It would have been preferable to use fewer, but to go further in the study of mechanisms.”

      We have indeed used multiple in vivo models to reveal Nrn1's function in Treg differentiation, Treg suppression function, T effector cell differentiation and function, and the overall impact on autoimmune disease. Because the impact of ion channel function is often context-dependent, we examined the biological outcome of Nrn1 deficiency in several in vivo contexts.  We would appreciate it if Reviewer#1 would provide a specific example, given the Nrn1 phenotype, of how to proceed deeper to investigate the electrical change in the in vivo models.

      “Major points

      (1) A real weakness of this work is the fact that in most of the results shown, there are few biological replicates with differences that are often small between Ctrl and Nrn1 -/-. The systematic use of student's t-test may lead to thinking that the differences are significant, which is often misleading given the small number of samples, which makes it impossible to know whether the distributions are Gaussian and whether a parametric test can be used. RNAseq bulk data are based on biological duplicates, which is open to criticism.”

      We respectfully disagree with Reviewer #1 on the question of statistical power and significance to our work. We have used 5-8 mice/group for each in vivo model and 3-4 technical replicates for the in vitro studies, with a minimum of 2-3 replicate experiments. These group sizes and replication numbers are in line with those seen in high-impact publications. While some differences between Ctrl and Nrn1-/- appear small, they have significant biological consequences, as evidenced by the various Nrn1-/- in vivo phenotypes. Furthermore, we believe we have subjected our data to the appropriate statistical tests to ensure rigorous analysis and representation of our findings.

      To Reviewer #2.

      We thank Reviewer #2 for the careful review of the manuscript. We especially appreciate the comments that “The characterizations of T cell Nrn1 expression both in vitro and in vivo are comprehensive and convincing. The in vivo functional studies of anergy development, Treg suppression, and EAE development are also well done to strengthen the notion that Nrn1 is an important regulator of CD4 responsiveness.”

      “The major weakness of this study stems from a lack of a clear molecular mechanism involving Nrn1. “  

      We fully understand this comment from Reviewer #2. The main mechanism we identified contributing to the functional defect of Nrn1-/- T cells involves novel effects on the electric and metabolic state of the cells. Although we referenced neuronal studies that indicate Nrn1 is the auxiliary protein for the ionotropic AMPA-type glutamate receptor (AMPAR) and may affect AMPAR function, we did not provide any evidence in this manuscript as the topic requires further in-depth study.   

      For the benefit of this discussion, we include our preliminary Nrn1 and AMPAR double knockout data (Author response image 2), which indicates that abrogating AMPAR expression can compensate for the defect caused by Nrn1 deficiency in vitro and in vivo. This preliminary data supports the notion that Nrn1 modulates AMPAR function, which causes changes in T cell electric and metabolic state, influencing T cell differentiation and function.  

      Author response image 2.

      Deletion of AMPAR expression in T cells compensates for the defect caused by Nrn1 deficiency. Nrn1-/- mice were crossed with T cell-specific AMPAR knockout mice (AMPARfl/flCD4Cre+) mice. The following mice were generated and used in the experiment: T cell specific AMPAR-knockout and Nrn1 knockout mice (AKONKO), Nrn1 knockout mice (AWTNKO), Ctrl mice (AWTNWT). a. Deletion of AMPAR compensates for the iTreg cell defect observed in Nrn1-/- CD4 cells. iTreg live cell proportion, cell number, and Ki67 expression among Foxp3+ cells 3 days after aCD3 restimulation. b. Deletion of AMPAR in T cells abrogates the enhanced autoimmune response in Nrn1-/- Mouse in the EAE disease model. Mouse relative weight change and disease score progression after EAE disease induction.  

      Ion channels can influence cell metabolism through multiple means (Vaeth and Feske, 2018; Wang et al., 2020). First, ion channels are involved in maintaining cell resting membrane potential. This electrical potential difference across the cell membrane is essential for various cellular processes, including metabolism (Abdul Kadir et al., 2018; Blackiston et al., 2009; Nagy et al., 2018; Yu et al., 2022). Second, ion channels facilitate the movement of ions across cell membranes. These ions are essential for various metabolic processes. For example, ions like calcium (Ca2+), potassium (K+), and sodium (Na+) play crucial roles in signaling pathways that regulate metabolism (Kahlfuss et al., 2020). Third, ion channel activity can influence cellular energy balance due to ATP consumption associated with ion transport to maintain ion balances (Erecińska and Dagani, 1990; Gerkau et al., 2019). This, in turn, can impact processes like ATP production, which is central to cellular metabolism. Thus, ion channel expression and function determine the cell’s bioelectric state and contribute to cell metabolism (Levin, 2021).

      Because the AMPAR function has not been thoroughly studied using a genetic approach in T cells, we do not intend to include the double knockout data in this manuscript before fully characterizing the T cell-specific AMPAR knockout mice.  

      “Although the biochemical and informatics studies are well-performed, it is my opinion that these results are inconclusive in part due to the absence of key "naive" control groups. This limits my ability to understand the significance of these data.

      Specifically, studies of the electrical and metabolic state of Nrn1-/- inducible Treg cells (iTregs) would benefit from similar data collected from wild-type and Nrn1-/- naive CD4 T cells.”

      We appreciate the reviewer’s comments. This comment reflects two concerns in data interpretation:

      (1) Are Nrn1-/- naïve T cells fundamentally different from WT cells? Does this fundamental difference contribute to the observed electrical and metabolic phenotype in iTreg or Th0 cells? This is a very good question we will perform the experiments as the reviewer suggested. While Nrn1 is expressed at a basal (low) level in naïve T cells, deletion of Nrn1 may cause changes in naïve T cell phenotype.   

      (2) Is the Nrn1-/- phenotype caused by Nrn1 functional deficiency or due to the secondary effect of Nrn1 deletion, such as non-physiological cell membrane structure changes?

      We have done the following experiment to address this concern.  We have cultured WT T cells in the presence of Nrn1 antibody and compared the outcome with Nrn1-/- iTreg cells (Figure 3-figure supplement 2D,E,F). WT iTreg cells under antibody blockade exhibited similar changes as Nrn1-/- iTreg cells, confirming the physiological relevance of the Nrn1-/- phenotype.

      Manuscript Revision based on the Reviewer’s suggestions:

      Reviewer #1:

      Major points (3) Figure 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. 

      Following the suggestion by Reviewer#1, We have included the Nrn1 Ab staining on activated Nrn1-/- CD4 cells in Figure 1D. We have also added the staining of cell surface Nrn1 on Treg cells in Figure 1-figure supplement 1D.

      Major point: (5) “Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”

      In the revised manuscript, we have included the proportion of Foxp3+ cells among Nrn1-/- and ctrl iTreg cells developed under the iTreg culture condition in Figure 2A.

      Minor points  

      (2) For all figures showing %, the titles of the Y axes are written in an odd way. For example, it is written "Foxp3% CD4". It would be more conventional and clearer to write "% Foxp3+ / CD4+" or "% Foxp3+ among CD4+".

      Following reviewer#1’s suggestion, we have changed the Y-axis label in all the relevant figures.

      (3) Would not it be possible to perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane? This would be a more interesting demonstration than transcriptomic data.”

      We appreciate Review# 1’s suggestion regarding “perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane”.  We have used AAinduced cellular MP changes to confirm differential AA transporter expression patterns and their impact on cellular MP levels.  The data are included in the revised manuscript in Figure 3H and Figure 4K.

      (4) For certain staining (Figure 3E, H) it would be important to show the raw data, in addition to MFI or % values.

      We appreciated Reviewer #1’s suggestion and have included the histogram staining data for Figure 3E. We have moved the original Figure 3H to the supplemental figure and included the histogram staining data in Figure 3-figure supplement 1C.  Similarly, we have included the histogram staining data in Figure 4-figure supplement 1C.

      Reviewer#2:

      “Although the biochemical and informatics studies are well-performed, it is my opinion that these results are inconclusive in part due to the absence of key "naive" control groups. This limits my ability to understand the significance of these data.

      Specifically, studies of the electrical and metabolic state of Nrn1-/- inducible Treg cells (iTregs) would benefit from similar data collected from wild-type and Nrn1-/- naive CD4 T cells.”

      We greatly appreciate Reviewer#2’s suggestion and have carried out experiments on naïve CD4 cells derived from Nrn1-/- and WT mice. We have compared membrane potential, AA-induced MP change between Nrn1-/- and WT naïve T cells, and the metabolic state of Nrn1-/- and WT naïve T cells by carrying out glucose stress tests and mitochondria stress tests using a seahorse assay.  Moreover, to investigate whether the phenotype revealed in Nrn1-/- CD4 cells was caused by a secondary effect of cell membrane structure change due to Nrn1 deletion, we carried out Nrn1 antibody blockade in WT CD4 cells and investigated the phenotypic change. These new results are included in Figure 3-figure supplement 2.

      Reference:

      Abdul Kadir, L., M. Stacey, and R. Barrett-Jolley. 2018. Emerging Roles of the Membrane Potential: Action Beyond the Action Potential. Front Physiol 9:1661.

      Blackiston, D.J., K.A. McLaughlin, and M. Levin. 2009. Bioelectric controls of cell proliferation: ion channels, membrane voltage and the cell cycle. Cell Cycle 8:3527-3536.

      Chappert, P., and R.H. Schwartz. 2010. Induction of T cell anergy: integration of environmental cues and infectious tolerance. Current opinion in immunology 22:552-559.

      Chen, W., W. Jin, N. Hardegen, K.J. Lei, L. Li, N. Marinos, G. McGrady, and S.M. Wahl. 2003. Conversion of peripheral CD4+CD25- naive T cells to CD4+CD25+ regulatory T cells by TGF-beta induction of transcription factor Foxp3. The Journal of experimental medicine 198:1875-1886.

      Erecińska, M., and F. Dagani. 1990. Relationships between the neuronal sodium/potassium pump and energy metabolism. Effects of K+, Na+, and adenosine triphosphate in isolated brain synaptosomes. J Gen Physiol 95:591-616.

      Fathman, C.G., and N.B. Lineberry. 2007. Molecular mechanisms of CD4+ T-cell anergy. Nat Rev Immunol 7:599-609.

      Gerkau, N.J., R. Lerchundi, J.S.E. Nelson, M. Lantermann, J. Meyer, J. Hirrlinger, and C.R. Rose. 2019. Relation between activity-induced intracellular sodium transients and ATP dynamics in mouse hippocampal neurons. The Journal of physiology 597:5687-5705.

      Hurrell, B.P., D.G. Helou, E. Howard, J.D. Painter, P. Shafiei-Jahani, A.H. Sharpe, and O. Akbari. 2022. PD-L2 controls peripherally induced regulatory T cells by maintaining metabolic activity and Foxp3 stability. Nature communications 13:5118.

      Jenkins, M.K., and R.H. Schwartz. 1987. Antigen presentation by chemically modified splenocytes induces antigen-specific T cell unresponsiveness in vitro and in vivo. The Journal of experimental medicine 165:302-319.

      John, P., M.C. Pulanco, P.M. Galbo, Jr., Y. Wei, K.C. Ohaegbulam, D. Zheng, and X. Zang. 2022. The immune checkpoint B7x expands tumor-infiltrating Tregs and promotes resistance to anti-CTLA-4 therapy. Nature communications 13:2506.

      Kahlfuss, S., U. Kaufmann, A.R. Concepcion, L. Noyer, D. Raphael, M. Vaeth, J. Yang, P. Pancholi, M. Maus, J. Muller, L. Kozhaya, A. Khodadadi-Jamayran, Z. Sun, P. Shaw, D. Unutmaz, P.B. Stathopulos, C. Feist, S.B. Cameron, S.E. Turvey, and S. Feske. 2020. STIM1-mediated calcium influx controls antifungal immunity and the metabolic function of nonpathogenic Th17 cells. EMBO molecular medicine 12:e11592.

      Levin, M. 2021. Bioelectric signaling: Reprogrammable circuits underlying embryogenesis, regeneration, and cancer. Cell 184:1971-1989.

      Nagy, E., G. Mocsar, V. Sebestyen, J. Volko, F. Papp, K. Toth, S. Damjanovich, G. Panyi, T.A. Waldmann, A. Bodnar, and G. Vamosi. 2018. Membrane Potential Distinctly Modulates Mobility and Signaling of IL-2 and IL-15 Receptors in T Cells. Biophys J 114:2473-2482.

      Quill, H., and R.H. Schwartz. 1987. Stimulation of normal inducer T cell clones with antigen presented by purified Ia molecules in planar lipid membranes: specific induction of a long-lived state of proliferative nonresponsiveness. Journal of immunology (Baltimore, Md. : 1950) 138:3704-3712.

      Schmitt, E.G., and C.B. Williams. 2013. Generation and function of induced regulatory T cells. Frontiers in immunology 4:152.

      Sugiura, A., G. Andrejeva, K. Voss, D.R. Heintzman, X. Xu, M.Z. Madden, X. Ye, K.L. Beier, N.U. Chowdhury, M.M. Wolf, A.C. Young, D.L. Greenwood, A.E. Sewell, S.K. Shahi, S.N. Freedman, A.M. Cameron, P. Foerch, T. Bourne, J.C. Garcia-Canaveras, J. Karijolich, D.C. Newcomb, A.K. Mangalam, J.D. Rabinowitz, and J.C. Rathmell. 2022. MTHFD2 is a metabolic checkpoint controlling effector and regulatory T cell fate and function. Immunity 55:65-81.e69.

      Vaeth, M., and S. Feske. 2018. Ion channelopathies of the immune system. Current opinion in immunology 52:39-50.

      Vanasek, T.L., S.L. Nandiwada, M.K. Jenkins, and D.L. Mueller. 2006. CD25+Foxp3+ regulatory T cells facilitate CD4+ T cell clonal anergy induction during the recovery from lymphopenia. Journal of immunology (Baltimore, Md. : 1950) 176:5880-5889.

      Wang, Y., A. Tao, M. Vaeth, and S. Feske. 2020. Calcium regulation of T cell metabolism. Current opinion in physiology 17:207-223.

      Yu, W., Z. Wang, X. Yu, Y. Zhao, Z. Xie, K. Zhang, Z. Chi, S. Chen, T. Xu, D. Jiang, X. Guo, M. Li, J. Zhang, H. Fang, D. Yang, Y. Guo, X. Yang, X. Zhang, Y. Wu, W. Yang, and D. Wang. 2022. Kir2.1-mediated membrane potential promotes nutrient acquisition and inflammation through regulation of nutrient transporters. Nature communications 13:3544.

      Zheng, S.G., J.D. Gray, K. Ohtsuka, S. Yamagiwa, and D.A. Horwitz. 2002. Generation ex vivo of TGF-beta-producing regulatory T cells from CD4+CD25- precursors. Journal of immunology (Baltimore, Md. : 1950) 169:4183-4189.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Koumoundourou et al., identify a pathway downstream of Bcl11b that controls synapse morphology and plasticity of hippocampal mossy fiber synapses. Using an elegant combination of in vivo, ex vivo, and in vitro approaches, the authors build on their previous work that indicated C1ql2 as a functional target of Bcl11b (De Bruyckere et al., 2018). Here, they examine the functional implications of C1ql2 at MF synapses in Bcl11b cKO mice and following C1ql2 shRNA. The authors find that Bcl11b KO and shRNA against C1ql2 significantly reduces the recruitment of synaptic vesicles and impairs LTP at MF synapses. Importantly, the authors test a role for the previously identified C1ql2 binding partner, exon 25b-containing Nrxn3 (Matsuda et al., 2016), as relevant at MF synapses to maintain synaptic vesicle recruitment. To test this, the authors developed a K262E C1ql2 mutant that disrupts binding to Nrxn3. Curiously, while Bcl11b KO and C1ql2 KD largely phenocopy (reduced vesicle recruitment and impaired LTP), only vesicle recruitment is dependent on C1ql2-Nrxn3 interactions. These findings provide new insight into the functional role of C1ql2 at MF synapses. While the authors convincingly demonstrate a role for C1ql2-Nrxn3(25b+) interaction for vesicle recruitment and a Nrxn3(25b+)independent role for C1ql2 in LTP, the underlying mechanisms remain inconclusive. Additionally, a discussion of how these findings relate to previous work on C1ql2 at mossy fiber synapses and how the findings contribute to the biology of Nrxn3 would increase the interpretability of this work.

      As suggested by reviewer #1, we extended our discussion of previous work on C1ql2 and additionally discussed the biology of Nrxn3 and how our work relates to it. Moreover, we extended our mechanistic analysis of how Bcl11b/C1ql2/Nrxn3 pathway controls synaptic vesicle recruitment as well as LTP (please see also response to reviewer #2 points 5 and 8 and reviewer #3 point 4 of public reviews below for detailed discussion).

      Reviewer #2 (Public Review):

      This manuscript describes experiments that further investigate the actions of the transcription factor Bcl11b in regulating mossy fiber (MF) synapses in the hippocampus. Prior work from the same group had demonstrated that loss of Bcl11b results in loss of MF synapses as well as a decrease in LTP. Here the authors focus on a target of Bcl11b a secreted synaptic organizer C1ql2 which is almost completely lost in Bcl11b KO. Viral reintroduction of C1ql2 rescues the synaptic phenotypes, whereas direct KD of C1ql2 recapitulates the Bcl1 phenotype. C1ql2 itself interacts directly with Nrxn3 and replacement with a binding deficient mutant C1q was not able to rescue the Bcl11b KO phenotype. Overall there are some interesting observations in the study, however there are also some concerns about the measures and interpretation of data.

      The authors state that they used a differential transcriptomic analysis to screen for candidate targets of Bcl11b, yet they do not present any details of this screen. This should be included and at the very least a table of all DE genes included. It is likely that many other genes are also regulated by Bcl11b so it would be important to the reader to see the rationale for focusing attention on C1ql2 in this study.

      The transcriptome analysis mentioned in our manuscript was published in detail in our previous study (De Bruyckere et al., 2018), including chromatin-immunoprecipitation that revealed C1ql2 as a direct transcriptional target of Bcl11b. Upon revision of the manuscript, we made sure that this was clearly stated within the main text module to avoid future confusion. In the same publication (De Bruyckere et al., 2018), we discuss in detail several identified candidate genes such as Sema5b, Ptgs2, Pdyn and Penk as putative effectors of Bcl11b in the structural and functional integrity of MFS. C1ql2 has been previously demonstrated to be almost exclusively expressed in DG neurons and localized to the MFS.

      There it bridges the pre- and post-synaptic sides through interaction with Nrxn3 and KAR subunits, respectively, and regulates synaptic function (Matsuda et al., 2016). Taken together, C1ql2 was a very good candidate to study as a potential effector downstream of Bcl11b in the maintenance of MFS structure and function. However, as our data reveal, not all Bcl11b mutant phenotypes were rescued by C1ql2 (see supplementary figures 2d-f of revised manuscript). We expect additional candidate genes, identified in our transcriptomic screen, to act downstream of Bcl11b in the control of MFS.

      All viral-mediated expression uses AAVs which are known to ablate neurogenesis in the DG (Johnston DOI: 10.7554/eLife.59291) through the ITR regions and leads to hyperexcitability of the dentate. While it is not clear how this would impact the measurements the authors make in MF-CA3 synapses, this should be acknowledged as a potential caveat in this study.

      We agree with reviewer #2 and are aware that it has been demonstrated that AAV-mediated gene expression ablates neurogenesis in the DG. To avoid potential interference of the AAVs with the interpretability of our phenotypes, we made sure during the design of the study that all of our control groups were treated in the same way as our groups of interest, and were, thus, injected with control AAVs. Moreover, the observed phenotypes were first described in Bcl11b mutants that were not injected with AVVs (De Bruyckere et al., 2018). Finally, we thoroughly examined the individual components of the proposed mechanism (rescue of C1ql2 expression, over-expression of C1ql3 and introduction of mutant C1ql2 in Bcl11b cKOs, KD of C1ql2 in WT mice, and Nrxn123 cKO) and reached similar conclusions. Together, this strongly supports that the observed phenotypes occur as a result of the physiological function of the proteins involved in the described mechanism and not due to interference of the AAVs with these biological processes. We have now addressed this point in the main text module of the revised ms.

      The authors claim that the viral re-introduction "restored C1ql2 protein expression to control levels. This is misleading given that the mean of the data is 2.5x the control (Figure 1d and also see Figure 6c). The low n and large variance are a problem for these data. Moreover, they are marked ns but the authors should report p values for these. At the least, this likely large overexpression and variability should be acknowledged. In addition, the use of clipped bands on Western blots should be avoided. Please show the complete protein gel in primary figures of supplemental information.

      We agree with reviewer #2 that C1ql2 expression after its re-introduction in Bcl11b cKO mice was higher compared to controls and that this should be taken into consideration for proper interpretation of the data. To address this, based also on the suggestion of reviewer #3 point 1 below, we overexpressed C1ql2 in DG neurons of control animals. We found no changes in synaptic vesicle organization upon C1ql2 over-expression compared to controls. This further supports that the observed effect upon rescue of C1ql2 expression in Bcl11b cKOs is due to the physiological function of C1ql2 and not as result of the overexpression. These data are included in supplementary figure 2g-j and are described in detail in the results part of the revised manuscript.

      Additionally, we looked at the effects of C1ql2 overexpression in Bcl11b cKO DGN on basal synaptic transmission. We plotted fEPSP slopes versus fiber volley amplitudes, measured in slices from rescue animals, as we had previously done for the control and Bcl11b cKO (Author response image 1a). Although regression analysis revealed a trend towards steeper slopes in the rescue mice (Author response image 1a and b), the observation did not prove to be statistically significant, indicating that C1ql2 overexpression in Bcl11b cKO animals does not strongly alter basal synaptic transmission at MFS. Overall, our previous and new findings support that the observed effects of the C1ql2 rescue are not caused by the artificially elevated levels of C1ql2, as compared to controls, but are rather a result of the physiological function of C1ql2.

      Following the suggestion of reviewer #2 all western blot clipped bands were exchanged for images of the full blot. This includes figures 1c, 4c, 6b and supplementary figure 2g of the revised manuscript. P-value for Figure 1d has now been included.

      Author response image 1.

      C1ql2 reintroduction in Bcl11b cKO DGN does not significantly alter basal synaptic transmission at mossy fiber-CA3 synapses. a Input-output curves generated by plotting fEPSP slope against fiber volley amplitude at increasing stimulation intensities. b Quantification of regression line slopes for input-output curves for all three conditions. Control+EGFP, 35 slices from 16 mice; Bcl11b cKO+EGFP, 32 slices from 14 mice; Bcl11b cKO+EGFP-2A-C1ql2, 22 slices from 11 mice. The data are presented as means, error bars represent SEM. Kruskal-Wallis test (non-parametric ANOVA) followed by Dunn’s post hoc pairwise comparisons. p=0.106; ns, not significant.

      Measurement of EM micrographs: As prior work suggested that MF synapse structure is disrupted the authors should report active zone length as this may itself affect "synapse score" defined by the number of vesicles docked. More concerning is that the example KO micrographs seem to have lost all the densely clustered synaptic vesicles that are away from the AZ in normal MF synapses e.g. compare control and KO terminals in Fig 2a or 6f or 7f. These terminals look aberrant and suggest that the important measure is not what is docked but what is present in the terminal cytoplasm that normally makes up the reserve pool. This needs to be addressed with further analysis and modifications to the manuscript.

      As requested by reviewer #2 we analyzed and reported in the revised manuscript the active zone length. We found that the active zone length remained unchanged in all conditions (control/Bcl11b cKO/C1ql2 rescue, WT/C1ql2 KD, control/K262E and control/Nrxn123 cKO), strengthening our results that the described Bcl11b/C1ql2/Nrxn3 mechanism is involved in the recruitment of synaptic vesicles. These data have been included in supplementary figures 2c, 4h, 5f and 6g and are described in the results part of the revised manuscript.

      We want to clarify that the synapse score is not defined by the number of docked vesicles to the plasma membrane. The synapse score, which is described in great detail in our materials and methods part and has been previously published (De Bruyckere et al., 2018), rates MFS based on the number of synaptic vesicles and their distance from the active zone and was designed according to previously described properties of the vesicle pools at the MFS. The EM micrographs refer to the general misdistribution of SV in the proximity of MFS. Upon revision of the manuscript, we made sure that this was clearly stated in the main text module to avoid further confusion.

      The study also presents correlated changes in MF LTP in Bcl11b KO which are rescued by C1ql2 expression. It is not clear whether the structural and functional deficits are causally linked and this should be made clearer in the manuscript. It is also not apparent why this functional measure was chosen as it is unlikely that C1ql2 plays a direct role in presynaptic plasticity mechanisms that are through a cAMP/ PKA pathway and likely disrupted LTP is due to dysfunctional synapses rather than a specific LTP effect.

      The inclusion of functional experiments in this and our previous study (de Bruyckere et al., 2018) was first and foremost intended to determine whether the structural alterations observed at MFB disrupt MFS signaling. From the signaling properties we tested, basal synaptic transmission (this study) and short-term potentiation (de Bruyckere et al., 2018) were unaltered by Bcl11b KO, whereas MF LTP was found to be abolished (de Bruyckere et al., 2018). Indeed, because MF LTP largely depends on presynaptic mechanisms, including the redistribution of the readily releasable pool and recruitment of new active zones (Orlando et al., 2021; Vandael et al., 2020), it appears to be particularly sensitive to the specific structural changes we observed. We therefore believe that it is valuable information that MF LTP is affected in Bcl11b cKO animals - it conveys a direct proof for the functional importance of the observed morphological alterations, while basic transmission remains largely normal. Furthermore, it subsequently provided a functional marker for testing whether the reintroduction of C1ql2 in Bcl11b cKO animals or the KD of C1ql2 in WT animals can functionally recapitulate the control or the Bcl11b KO phenotype, respectively.

      We fully agree with the reviewer that C1ql2 is unlikely to directly participate in the cAMP/PKA pathway and that the ablation of C1ql2 likely disrupts MF LTP through an alternative mode of action. Our original wording in the paragraph describing the results of the forskolin-induced LTP experiment might have overstressed the importance of the cAMP pathway. We have now rephrased that paragraph to better describe the main idea behind the forskolin experiment, namely to circumvent the initial Ca2+ influx in order to test whether deficient presynaptic Ca2+ channel/KAR signaling might be responsible for the loss of LTP in Bcl11b cKO. The results are strongly indicative of a downstream mechanism and further investigation is needed to determine the specific mechanisms by which C1ql2 regulates MFLTP, especially in light of the result that C1ql2.K262E rescued LTP, while it was unable to rescue the SV recruitment at the MF presynapse. This raises the possibility that C1ql2 can influence MF-LTP through additional, yet uncharacterized mechanisms, independent of SV recruitment. As such, a causal link between the structural and functional deficits remains tentative and we have now emphasized that point by adding a respective sentence to the discussion of our revised manuscript. Nevertheless, we again want to stress that the main rationale behind the LTP experiments was to assess the functional significance of structural changes at MFS and not to elucidate the mechanisms by which MF LTP is established.

      The authors should consider measures that might support the role of Bcl11b targets in SV recruitment during the depletion of synapses or measurements of the readily releasable pool size that would complement their findings in structural studies.

      We fully agree that functional measurements of the readily releasable pool (RRP) size would be a valuable addition to the reported redistribution of SV in structural studies. We have, in fact, attempted to use high-frequency stimulus trains in both field and single-cell recordings (details on single-cell experiments are described in the response to point 8) to evaluate potential differences in RRP size between the control and Bcl11b KO (Figure for reviewers 2a and b). Under both recording conditions we see a trend towards lower values of the intersection between a regression line of late responses and the y-axis. This could be taken as an indication of slightly smaller RRP size in Bcl11b mutant animals compared to controls. However, due to several technical reasons we are extremely cautious about drawing such far-reaching conclusions based on these data. At most, they suffice to conclude that the availability of release-ready vesicles in the KO is likely not dramatically smaller than in the control.

      The primary issue with using high-frequency stimulus trains for RRP measurements at MFS is the particularly low initial release probability (Pr) at these synapses. This means that a large number of stimulations is required to deplete the RRP. As the RRP is constantly replenished, it remains unclear when steady state responses are reached (reviewed by Kaeser and Regehr, 2017). This is clearly visible in our single-cell recordings (Author response image 2b), which were additionally complicated by prominent asynchronous release at later stages of the stimulus train and by a large variability in the shapes of cumulative amplitude curves between cells. In contrast, while the cumulative amplitude curves for field potential recordings do reach a steady state (Author response image 2a), field potential recordings in this context are not a reliable substitute for single cell or, in the case of MFB, singlebouton recordings. Postsynaptic cells in field potential recordings are not clamped, meaning that the massive release of glutamate due to continuous stimulation depolarizes the postsynaptic cells and reduces the driving force for Na+, irrespective of depletion of the RRP. This is supported by the fact that we consistently observed a recovery of fEPSP amplitudes later in the trains where RRP had presumably been maximally depleted. In summary, high-frequency stimulus trains at the field potential level are not a valid and established technique for estimating RRP size at MFS.

      Specialized laboratories have used highly advanced techniques, such as paired recordings between individual MFB and postsynaptic CA3 pyramidal cells, to estimate the RRP size of MFB (Vandael et al., 2020). These approaches are outside the scope of our present study which, while elucidating functional changes following Bcl11b depletion and C1ql2 rescue, does not aim to provide a high-end biophysical analysis of the presynaptic mechanisms involved.

      Author response image 2.

      Estimation of RRP size using high-frequency stimulus trains at mossy fiber-CA3 synapses. a Results from field potential recordings. Cumulative fEPSP amplitude in response to a train of 40 stimuli at 100 Hz. All subsequent peak amplitudes were normalized to the amplitude of the first peak. Data points corresponding to putative steady state responses were fit with linear regression (RRP size is indirectly reflected by the intersection of the regression line with the yaxis). Control+EGFP, 6 slices from 5 mice; Bcl11b cKO+EGFP, 6 slices from 3 mice. b Results from single-cell recordings. Cumulative EPSC amplitude in response to a train of 15 stimuli at 50 Hz. The last four stimuli were fit with linear regression. Control, 5 cells from 4 mice; Bcl11b cKO, 3 cells from 3 mice. Note the shallow onset of response amplitudes and the subsequent frequency potentiation. Due to the resulting increase in slope at higher stimulus numbers, intersection with the y-axis occurs at negative values. The differences shown were not found to be statistically significant; unpaired t-test or Mann-Whitney U-test.

      Bcl11b KO reduces the number of synapses, yet the I-O curve reported in Supp Fig 2 is not changed. How is that possible? This should be explained.

      We agree with reviewer #2– this apparent discrepancy has indeed struck us as a counterintuitive result. It might be that synapses that are preferentially eliminated in Bcl11b cKO are predominantly silent or have weak coupling strength, such that their loss has only a minimal effect on basal synaptic transmission. Although perplexing, the result is fully supported by our single-cell data which shows no significant differences in MF EPSC amplitudes recorded from CA3 pyramidal cells between controls and Bcl11b mutants (Author response image 3; please see the response below for details and also our response to Reviewer #1 question 2).

      Matsuda et al DOI: 10.1016/j.neuron.2016.04.001 previously reported that C1ql2 organizes MF synapses by aligning postsynaptic kainate receptors with presynaptic elements. As this may have consequences for the functional properties of MF synapses including their plasticity, the authors should report whether they see deficient postsynaptic glutamate receptor signaling in the Bcl11b KO and rescue in the C1ql2 re-expression.

      We agree that the study by Matsuda et al. is of key importance for our present work. Although MF LTP is governed by presynaptic mechanisms and we previously did not see differences in short-term plasticity between the control and Bcl11b cKO (De Bruyckere et al., 2018), the clustering of postsynaptic kainate receptors by C1ql2 is indeed an important detail that could potentially alter synaptic signaling at MFS in Bcl11b KO. We, therefore, re-analyzed previously recorded single-cell data by performing a kinetic analysis on MF EPSCs recorded from CA3 pyramidal cells in control and Bcl11b cKO mice (Figure for reviewers 3a) to evaluate postsynaptic AMPA and kainate receptor responses in both conditions. We took advantage of the fact that AMPA receptors deactivate roughly 10 times faster than kainate receptors, allowing the contributions of the two receptors to mossy fiber EPSCs to be separated (Castillo et al., 1997 and reviewed by Lerma, 2003). We fit the decay phase of the second (larger) EPSC evoked by paired-pulse stimulation with a double exponential function, yielding a fast and a slow component, which roughly correspond to the fractional currents evoked by AMPA and kainate receptors, respectively. Analysis of both fast and slow time constants and the corresponding fractional amplitudes revealed no significant differences between controls and Bcl11b mutants (Figure for reviewers 3e-h), indicating that both AMPA and kainate receptor signaling is unaffected by the ablation of C1ql2 following Bcl11b KO.

      Importantly, MF EPSC amplitudes evoked by the first and the second pulse (Author response image 3b), paired-pulse facilitation (Author response image 3c) and failure rates (Author response image 3d) were all comparable between controls and Bcl11b mutants. These results further corroborate our observations from field recordings that basal synaptic transmission at MFS is unaltered by Bcl11b KO.

      We note that the results from single cell recordings regarding basal synaptic transmission merely confirm the observations from field potential recordings, and that the attempted measurement of RRP size at the single cell level was not successful. Thus, our single-cell data do not add new information about the mechanisms underlying the effects of Bcl11b-deficiency and we therefore decided not to report these data in the manuscript.

      Author response image 3.

      Basal synaptic transmission at mossy fiber-CA3 synapses is unaltered in Bcl11b cKO mice. a Representative average trace (20 sweeps) recorded from CA3 pyramidal cells in control and Bcl11b cKO mice at minimal stimulation conditions, showing EPSCs in response to paired-pulse stimulation (PPS) at an interstimulus interval of 40 ms. The signal is almost entirely blocked by the application of 2 μM DCG-IV (red). b Quantification of MF EPSC amplitudes in response to PPS for both the first and the second pulse. c Ratio between the amplitude of the second over the first EPSC. d Percentage of stimulation events resulting in no detectable EPSCs for the first pulse. Events <5 pA were considered as noise. e Fast decay time constant obtained by fitting the average second EPSC with the following double exponential function: I(t)=Afaste−t/τfast+Aslowe−t/τslow+C, where I is the recorded current amplitude after time t, Afast and Aslow represent fractional current amplitudes decaying with the fast (τfast) and slow (τslow) time constant, respectively, and C is the offset. Starting from the peak of the EPSC, the first 200 ms of the decaying trace were used for fitting. f Fractional current amplitude decaying with the fast time constant. g-h Slow decay time constant and fractional current amplitude decaying with the slow time constant. For all figures: Control, 8 cells from 4 mice; Bcl11b cKO, 8 cells from 6 mice. All data are presented as means, error bars indicate SEM. None of the differences shown were found to be statistically significant; Mann-Whitney U-test for nonnormally and unpaired t-test for normally distributed data.

      Reviewer #3 (Public Review):

      Overall, this is a strong manuscript that uses multiple current techniques to provide specific mechanistic insight into prior discoveries of the contributions of the Bcl11b transcription factor to mossy fiber synapses of dentate gyrus granule cells. The authors employ an adult deletion of Bcl11b via Tamoxifen-inducible Cre and use immunohistochemical, electron microscopy, and electrophysiological studies of synaptic plasticity, together with viral rescue of C1ql2, a direct transcriptional target of Bcl11b or Nrxn3, to construct a molecular cascade downstream of Bcl11b for DG mossy fiber synapse development. They find that C1ql2 re-expression in Bcl11b cKOs can rescue the synaptic vesicle docking phenotype and the impairments in MF-LTP of these mutants. They also show that C1ql2 knockdown in DG neurons can phenocopy the vesicle docking and plasticity phenotypes of the Bcl11b cKO. They also use artificial synapse formation assays to suggest that C1ql2 functions together with a specific Nrxn3 splice isoform in mediating MF axon development, extending these data with a C1ql2-K262E mutant that purports to specifically disrupt interactions with Nrxn3. All of the molecules involved in this cascade are disease-associated and this study provides an excellent blueprint for uncovering downstream mediators of transcription factor disruption. Together this makes this work of great interest to the field. Strengths are the sophisticated use of viral replacement and multi-level phenotypic analysis while weaknesses include the linkage of C1ql2 with a specific Nrxn3 splice variant in mediating these effects.

      Here is an appraisal of the main claims and conclusions:

      1) C1ql2 is a downstream target of Bcl11b which mediates the synaptic vesicle recruitment and synaptic plasticity phenotypes seen in these cKOs. This is supported by the clear rescue phenotypes of synapse anatomy (Fig.2) and MF synaptic plasticity (Fig.3). One weakness here is the absence of a control assessing over-expression phenotypes of C1ql2. It's clear from Fig.1D that viral rescue is often greater than WT expression (totally expected). In the case where you are trying to suppress a LoF phenotype, it is important to make sure that enhanced expression of C1ql2 in a WT background does not cause your rescue phenotype. A strong overexpression phenotype in WT would weaken the claim that C1ql2 is the main mediator of the Bcl11b phenotype for MF synapse phenotypes.

      As suggested by reviewer #3, we carried out C1ql2 over-expression experiments in control animals. We show that the over-expression of C1ql2 in the DG of control animals had no effect on the synaptic vesicle organization in the proximity of MFS. This further supports that the observed effect upon rescue of C1ql2 expression in Bcl11b cKOs is due to the physiological function of C1ql2 and not a result of the artificial overexpression. These data are now included in supplementary figure 2g-j and are described in detail in the results part of the revised manuscript. Please also see response to point 3 of reviewer #2.

      2) Knockdown of C1ql2 via 4 shRNAs is sufficient to produce the synaptic vesicle recruitment and MFLTP phenotypes. This is supported by clear effects in the shRNA-C1ql2 groups as compared to nonsense-EGFP controls. One concern (particularly given the use of 4 distinct shRNAs) is the potential for off-target effects, which is best controlled for by a rescue experiment with RNA insensitive C1ql2 cDNA as opposed to nonsense sequences, which may not elicit the same off-target effects.

      We agree with reviewer #3 that the usage of shRNAs could potentially create unexpected off-target effects and that the introduction of a shRNA-insensitive C1ql2 in parallel to the expression on the shRNA cassette would be a very effective control experiment. However, the suggested experiment would require an additional 6 months (2 months for AAV production, 2-3 months from animal injection to sacrifice and 1-2 months for EM imaging/analysis and LTP measurements) and a high number of additional animals (minimum 8 for EM and 8 for LTP measurements). We note here, that before the production of the shRNA-C1ql2 and the shRNA-NS, the individual sequences were systematically checked for off-target bindings on the murine exome with up to two mismatches and presented with no other target except the proposed (C1ql2 for shRNA-C1ql2 and no target for shRNA-NS). Taking into consideration our in-silico analysis, we feel that the interpretation of our findings is valid without this (very reasonable) additional control experiment.

      3) C1ql2 interacts with Nrxn3(25b+) to facilitate MF terminal SV clustering. This claim is theoretically supported by the HEK cell artificial synapse formation assay (Fig.5), the inability of the K262-C1ql2 mutation to rescue the Bcl11b phenotype (Fig.6), and the altered localization of C1ql2 in the Nrxn1-3 deletion mice (Fig.7). Each of these lines of experimental evidence has caveats that should be acknowledged and addressed. Given the hypothesis that C1ql2 and Nrxn3b(25b) are expressed in DG neurons and work together, the heterologous co-culture experiment seems strange. Up till now, the authors are looking at pre-synaptic function of C1ql2 since they are re-expressing it in DGNs. The phenotypes they are seeing are also pre-synaptic and/or consistent with pre-synaptic dysfunction. In Fig.5, they are testing whether C1ql2 can induce pre-synaptic differentiation in trans, i.e. theoretically being released from the 293 cells "post-synaptically". But the post-synaptic ligands (Nlgn1 and and GluKs) are not present in the 293 cells, so a heterologous synapse assay doesn't really make sense here. The effect that the authors are seeing likely reflects the fact that C1ql2 and Nrxn3 do bind to each other, so C1ql2 is acting as an artificial post-synaptic ligand, in that it can cluster Nrxn3 which in turn clusters synaptic vesicles. But this does not test the model that the authors propose (i.e. C1ql2 and Nrxn3 are both expressed in MF terminals). Perhaps a heterologous assay where GluK2 is put into HEK cells and the C1ql2 and Nrxn3 are simultaneously or individually manipulated in DG neurons?

      C1ql2 is expressed by DG neurons and is then secreted in the MFS synaptic cleft, while Nrxn3, that is also expressed by DG neurons, is anchored at the presynaptic side. In our work we used the well established co-culture system assay and cultured HEK293 cells secreting C1ql2 (an IgK secretion sequence was inserted at the N-terminus of C1ql2) together with hippocampal neurons expressing Nrxn3(25b+). We used the HEK293 cells as a delivery system of secreted C1ql2 to the neurons to create regions of high concentration of C1ql2. By interfering with the C1ql2-Nrxn3 interaction in this system either by expression of the non-binding mutant C1ql2 variant in the HEK cells or by manipulating Nrxn expression in the neurons, we could show that C1ql2 binding to Nrxn3(25b+) is necessary for the accumulation of vGlut1. However, we did not examine and do not claim within our manuscript that the interaction between C1ql2 and Nrxn3(25b+) induces presynaptic differentiation. Our experiment only aimed to analyze the ability of C1ql2 to cluster SV through interaction with Nrxn3. Moreover, by not expressing potential postsynaptic interaction partners of C1ql2 in our system, we could show that C1ql2 controls SV recruitment through a purely presynaptic mechanism. Co-culturing GluK2-expressing HEK cells with simultaneous manipulation of C1ql2 and/or Nrxn3 in neurons would not allow us to appropriately answer our scientific question, but rather focus on the potential synaptogenic function of the Nrxn3/C1ql2/GluK2 complex and the role of the postsynaptic ligand in it. Thus, we feel that the proposed experiment, while very interesting in characterization of additional putative functions of C1ql2, may not provide additional information for the point we were addressing. In the revised manuscript we tried to make the aim and methodological approach of this set of experiments more clear.

      4) K262-C1ql2 mutation blocks the normal rescue through a Nrxn3(25b) mechanism (Fig.6). The strength of this experiment rests upon the specificity of this mutation for disrupting Nrxn3b binding (presynaptic) as opposed to any of the known postsynaptic C1ql2 ligands such as GluK2. While this is not relevant for interpreting the heterologous assay (Fig.5), it is relevant for the in vivo phenotypes in Fig.6. Similar approaches as employed in this paper can test whether binding to other known postsynaptic targets is altered by this point mutation.

      It has been previously shown that C1ql2 together with C1ql3 recruit postsynaptic GluK2 at the MFS. However, loss of just C1ql2 did not affect the recruitment of GluK2, which was disrupted only upon loss of both C1ql2 and C1ql3 (Matsuda et al., 2018). In our study we demonstrate a purely presynaptic function of C1ql2 through Nrxn3 in the synaptic vesicle recruitment. This function is independent of C1ql3, as C1ql3 expression is unchanged in all of our models and its over-expression did not compensate for C1ql2 functions (Fig. 2, 3a-c). Our in vitro experiments also reveal that C1ql2 can recruit both Nrxn3 and vGlut1 in the absence of any known postsynaptic C1ql2 partner (KARs and BAI3; Fig.5; please also see response above). Furthermore, we have now performed a kinetic analysis on single-cell data which we had previously collected to evaluate postsynaptic AMPA and kainate receptor responses in both the control and Bcl11b KO. Our analysis reveals no significant differences in postsynaptic current kinetics, making it unlikely that AMPA and kainate receptor signaling is altered upon the loss of C1ql2 following Bcl11b cKO (Author response image 3e-h; please also see our response to reviewer #2 point 8). Thus, we have no experimental evidence supporting the idea that a loss of interaction between C1ql2.K262E and GluK2 would interfere with the examined phenotype. However, to exclude that the K262E mutation disrupts interaction between C1ql2 and GluK2, we performed co-immunoprecipitation from protein lysate of HEK293 cells expressing GluK2myc-flag and GFP-C1ql2 or GluK2-myc-flag and GFP-K262E and could show that both C1ql2 and K262E had GluK2 bound when precipitated. These data are included in supplementary figure 5k of the revised manuscript.

      5) Altered localization of C1ql2 in Nrxn1-3 cKOs. These data are presented to suggest that Nrx3(25b) is important for localizing C1ql2 to the SL of CA3. Weaknesses of this data include both the lack of Nrxn specificity in the triple a/b KOs as well as the profound effects of Nrxn LoF on the total levels of C1ql2 protein. Some measure that isn't biased by this large difference in C1ql2 levels should be attempted (something like in Fig.1F).

      We acknowledge that the lack of specificity in the Nrxn123 model makes it difficult to interpret our data. We have now examined the mRNA levels of Nrxn1 and Nrxn2 upon stereotaxic injection of Cre in the DG of Nrxn123flox/flox animals and found that Nrxn1 was only mildly reduced. At the same time Nrxn2 showed a tendency for reduction that was not significant (data included in supplementary figure 6a of revised manuscript). Only Nrxn3 expression was strongly suppressed. Of course, this does not exclude that the mild reduction of Nrxn1 and Nrxn2 interferes with the C1ql2 localization at the MFS. We further examined the mRNA levels of C1ql2 in control and Nrxn123 mutants to ensure that the observed changes in C1ql2 protein levels at the MFS are not due to reduced mRNA expression and found no changes (data are included in supplementary figure 6b of the revised manuscript), suggesting that overall protein C1ql2 expression is normal.

      The reduced C1ql2 fluorescence intensity at the MFS was first observed when non-binding C1ql2 variant K262E was introduced to Bcl11b cKO mice that lack endogenous C1ql2 (Fig.6). In these experiments, we found that despite the overall high protein levels of C1ql2.K262E in the hippocampus (Fig. 6c), its fluorescence intensity at the SL was significantly reduced compared to WT C1ql2 (Fig. 6d-e). The remaining signal of the C1ql2.K262E at the SL was equally distributed and in a punctate form, similar to WT C1ql2. Together, this suggests that loss of C1ql2-Nrxn3 interaction interferes with the localization of C1ql2 at the MFS, but not with the expression of C1ql2. Of course, this does not exclude that other mechanisms are involved in the synaptic localization of C1ql2, beyond the interaction with Nrxn3, as both the mutant C1ql2 in Bcl11b cKO and the endogenous C1ql2 in Nrxn123 cKOs show residual immunofluorescence at the SL. Further studies are required to determine how C1ql2-Nrxn3 interaction regulates C1ql2 localization at the MFS.

      Reviewer #1 (Recommendations For The Authors):

      In addition to addressing the comments below, this study would benefit significantly from providing insight and discussion into the relevant potential postsynaptic signaling components controlled exclusively by C1ql2 (postsynaptic kainate receptors and the BAI family of proteins).

      We have now performed a kinetic analysis on single-cell data that we had previously collected to evaluate postsynaptic AMPA and kainate receptor responses in both the control and Bcl11b cKO. Our analysis reveals no significant differences in postsynaptic current kinetics, making it unlikely that AMPA and kainate receptor signaling differ between controls and upon the loss of C1ql2 following Bcl11b cKO (Author response image 3e-h; please also see our response to Reviewer #2 point 8). This agrees with previous findings that C1ql2 regulates postsynaptic GluK2 recruitment together with C1ql3 and only loss of both C1ql2 and C1ql3 results in a disruption of KAR signaling (Matsuda et al., 2018). In our study we demonstrate a purely presynaptic function of C1ql2 through Nrxn3 in the synaptic vesicle recruitment. This function is independent of C1ql3, as C1ql3 expression is unchanged in all of our models and its over-expression did not compensate for C1ql2 functions (Fig. 2, 3a-c). Our in vitro experiments also reveal that C1ql2 can recruit both Nrxn3 and vGlut1 in the absence of any known postsynaptic C1ql2 partner (KARs and BAI3; Fig.5; please also see our response to reviewer #3 point 4 above). We believe that further studies are needed to fully understand both the pre- and the postsynaptic functions of C1ql2. Because the focus of this manuscript was on the role of the C1ql2-Nrxn3 interaction and our investigation on postsynaptic functions of C1ql2 was incomplete, we did not include our findings on postsynaptic current kinetics in our revised manuscript. However, we increased the discussion on the known postsynaptic partners of C1ql2 in the revised manuscript to increase the interpretability of our results.

      Major Comments:

      The authors demonstrate that the ultrastructural properties of presynaptic boutons are altered after Bcl11b KO and C1ql2 KD. However, whether C1ql2 functions as part of a tripartite complex and the identity of the postsynaptic receptor (BAI, KAR) should be examined.

      Matsuda and colleagues have nicely demonstrated in their 2016 (Neuron) study that C1ql2 is part of a tripartite complex with presynaptic Nrxn3 and postsynaptic KARs. Moreover, they demonstrated that C1ql2, together with C1ql3, recruit postsynaptic KARs at the MFS, while the KO of just C1ql2 did not affect the KAR localization. In our study we demonstrate a purely presynaptic function of C1ql2 through Nrxn3 in the synaptic vesicle recruitment. This function is independent of C1ql3, as C1ql3 expression is unchanged in all of our models and its over-expression did not compensate for C1ql2 functions (Fig. 2, 3a-c). Our in vitro experiments also reveal that C1ql2 is able to recruit both Nrxn3 and vGlut1 in the absence of any known postsynaptic C1ql2 partner (Fig. 5; please also see our response to reviewer #3 point 4 above). Moreover, we were able to show that the SV recruitment depends on C1ql2 interaction with Nrxn3 through the expression of a non-binding C1ql2 (Fig. 6) that retains the ability to interact with GluK2 (supplementary figure 5k of revised manuscript) or by KO of Nrxns (Fig. 7). Furthermore, we have now performed a kinetic analysis on single-cell data which we had previously collected to evaluate postsynaptic AMPA and kainate receptor responses in both the control and Bcl11b cKO. Our analysis reveals no significant differences in postsynaptic current kinetics, making it unlikely that AMPA and kainate receptor signaling differ between controls and Bcl11b mutants (Author response image 3e-h; please also see our response to Reviewer #2 question 8). Together, we have no experimental evidence so far that would support that the postsynaptic partners of C1ql2 are involved in the observed phenotype. While it would be very interesting to characterize the postsynaptic partners of C1ql2 in depth, we feel this would be beyond the scope of the present study.

      Figure 1f: For a more comprehensive understanding of the Bcl11b KO phenotype and the potential role for C1ql2 on MF synapse number, a complete quantification of vGlut1 and Homer1 for all conditions (Supplement Figure 2e) should be included in the main text.

      In our study we focused on the role of C1ql2 in the structural and functional integrity of the MFS downstream of Bcl11b. Bcl11b ablation leads to several phenotypes in the MFS that have been thoroughly described in our previous study (De Bruyckere et al., 2018). As expected, re-expression of C1ql2 only partially rescued these phenotypes, with full recovery of the SV recruitment (Fig. 2) and of the LTP (Fig. 3), but had no effect on the reduced numbers of MFS nor the structural complexity of the MFB created by the Bcl11b KO (supplementary figure 2d-f of revised manuscript). We understand that including the quantification of vGlut1 and Homer1 co-localization in the main figures would help with a better understanding of the Bcl11b mutant phenotype. However, in our manuscript we investigate C1ql2 as an effector of Bcl11b and thus we focus on its functions in SV recruitment and LTP. As we did not find a link between C1ql2 and the number of MFS/MFB upon re-expression of C1ql2 in Bcl11b cKO or now also in C1ql2 KD (see response to comment #4 below), we believe it is more suitable to present these data in the supplement.

      Figure 3/4: Given the striking reduction in the numbers of synapses (Supplement Figure 2e) and docked vesicles (Figure 2d) in the Bcl11b KO and C1ql2 KD (Figure 4e-f), it is extremely surprising that basal synaptic transmission is unaffected (Supplement Figure 2g). The authors should determine the EPSP input-output relationship following C1ql2 KD and measure EPSPs following trains of stimuli at various high frequencies.

      We fully acknowledge that this is an unexpected result. It is, however, well feasible that the modest displacement of SV fails to noticeably influence basal synaptic transmission. This would be the case, for example, if only a low number of vesicles are released by single stimuli, in line with the very low initial Pr at MFS. In contrast, the reduction in synapse numbers in the Bcl11b mutant might indeed be expected to reflect in the input-output relationship. It is possible, however, that synapses that are preferentially eliminated in Bcl11b cKO are predominantly silent or have weak coupling strength, such that their loss has only a minimal effect on basal synaptic transmission. Finally, we cannot exclude compensatory mechanisms (homeostatic plasticity) at the remaining synapses. A detailed analysis of these potential mechanisms would be a whole project in its own right.

      As additional information, we can say that the largely unchanged input-output-relation in Bcl11b cKO is also present in the single-cell level data (Author response image 3; details on single-cell experiments are described in the response to Reviewer #2 point 8).

      As suggested by the reviewer, we have now additionally analyzed the input-output relationship following C1ql2 KD and again did not observe any significant difference between control and KD animals. We have incorporated the respective input-output curves into the revised manuscript under Supplementary figure 3c-d.

      Figure 4: Does C1ql2 shRNA also reduce the number of MFBs? This should be tested to further identify C1ql2-dependent and independent functions.

      As requested by reviewer #1 we quantified the number of MFBs upon C1ql2 KD. We show that C1ql2 KD in WT animals does not alter the number of MFBs. The data are presented in supplementary figure 4d of the revised manuscript. Re-expression of C1ql2 in Bcl11b cKO did not rescue the loss of MFS created by the Bcl11b mutation. Moreover, C1ql2 re-expression did not rescue the complexity of the MFB ultrastructure perturbed by the Bcl11b ablation. Together, this suggests that Bcl11b regulates MFs maintenance through additional C1ql2-independent pathways. In our previously published work (De Bruyckere et al., 2018) we identified and discussed in detail several candidate genes such as Sema5b, Ptgs2, Pdyn and Penk as putative effectors of Bcl11b in the structural and functional integrity of MFS (please also see response to reviewer #2- point 1 of public reviews).

      Figure 5: Clarification is required regarding the experimental design of the HEK/Neuron co-culture: 1. C1ql2 is a secreted soluble protein - how is the protein anchored to the HEK cell membrane to recruit Nrxn3(25b+) binding and, subsequently, vGlut1?

      C1ql2 was secreted by the HEK293 cells through an IgK signaling peptide at the N-terminus of C1ql2. The high concentration of C1ql2 close to the secretion site together with the sparse coculturing of the HEK293 cells on the neurons allows for the quantification of accumulation of neuronal proteins. We have now described the experimental conditions in greater detail in the main text module of the revised manuscript

      2) Why are the neurons transfected and not infected? Transfection efficiency of neurons with lipofectamine is usually poor (1-5%; Karra et al., 2010), while infection of neurons with lentiviruses or AAVs encoding cDNAs routinely are >90% efficient. Thus, interpretation of the recruitment assays may be influenced by the density of neurons transfected near a HEK cell.

      We agree with reviewer #1 that viral infection of the neurons would have been a more effective way of expressing our constructs. However, due to safety allowances in the used facility and time limitation at the time of conception of this set of experiments, a lipofectamine transfection was chosen.

      However, as all of our examined groups were handled in the same way and multiple cells from three independent experiments were examined for each experimental set, we believe that possible biases introduced by the transfection efficiency have been eliminated and thus have trust in our interpretation of these results.

      3) Surface labeling of HEK cells for wild-type C1ql2 and K262 C1ql2 would be helpful to assess the trafficking of the mutant.

      We recognize that potential changes to the trafficking of C1ql2 caused by the K262E mutation would be important to characterize, in light of the reduced localization of the mutant protein at the SL in the in vivo experiments (Fig. 6e). In our culture system, C1ql2 and K262E were secreted by the HEK cells through insertion of an IgK signaling peptide at the N-terminus of the myc-tagged C1ql2/K262E. Thus, trafficking analysis on this system would not be informative, as the system is highly artificial compared to the in vivo model. Further studies are needed to characterize C1ql2 trafficking in neurons to understand how C1ql2-Nrxn3 interaction regulates the localization of C1ql2. However, labeling of the myc-tag in C1ql2 or K262E expressing HEK cells of the co-culture model reveals a similar signal for the two proteins (Fig. 5a,c). Nrxn-null mutation in neurons co-cultured with C1ql2-expressing HEK cells disrupted C1ql2 mediated vGlut1 accumulation in the neurons. Selective expression of Nrxn3(25b) in the Nrxn-null neurons restored vGlut1 clustering was (Fig. 5e-f). Together, these data suggest that it is the interaction between C1ql2 and Nrxn3 that drives the accumulation of vGlut1.

      Figure 6: Bcl11b KO should also be included in 6f-h.

      As suggested by reviewer #1, we included the Bcl11b cKO in figures 6f-h and in corresponding supplementary figures 5c-j.

      Figure 7b: What is the abundance of mRNA for Nrxn1 and Nrxn2 as well as the abundance of Nrxns after EGFP-Cre injection into DG?

      We addressed this point raised by reviewer #1 by quantifying the relative mRNA levels of Nrxn1 and Nrxn2 via qPCR upon Nrxn123 mutation induction with EGFP-Cre injection. We have now examined the mRNA levels of Nrxn1 and Nrxn2 upon stereotaxic injection of Cre in the DG of Nrxn123flox/flox animals and found that Nrxn1 was only mildly reduced. At the same time Nrxn2 showed a tendency for reduction that was not significant. The data are presented in supplementary figure 6a of the revised maunscript.

      Minor Comments for readability:

      Synapse score is referred to frequently in the text and should be defined within the text for clarification.

      'n' numbers should be better defined in the figure legends. For example, for protein expression analysis in 1c, n=3. Is this a biological or technical triplicate? For electrophysiology (e.g. 3c), does "n=7" reflect the number of animals or the number of slices? n/N (slices/animals) should be presented.

      Figure 7a: Should the diagrams of the cre viruses be EGFP-Inactive or active Cre and not CRE-EGFP as shown in the diagram?

      Figure 7b: the region used for the inset should be identified in the larger image.

      All minor points have been fixed in the revised manuscript according to the suggestions.

      Reviewer #3 (Recommendations For The Authors):

      -Please describe the 'synapse score' somewhere in the text - it is too prominently featured to not have a clear description of what it is.

      The description of the synapse score has been included in the main text module of the revised manuscript.

      -The claim that Bcl11b controls SV recruitment "specifically" through C1ql2 is a bit stronger than is warranted by the data. Particularly given that C1ql2 is expressed at 2.5X control levels in their rescue experiments. See pt.2

      Please see response to reviewer #3 point 1 of public reviews. To address this, we over-expressed C1ql2 in control animals and found no changes in the synaptic vesicle distribution (supplementary figure 2g-j of revised manuscript). This supports that the observed rescue of synaptic vesicle recruitment by re-expression of C1ql2 is due to its physiological function and not due to the artificially elevated protein levels. Of course, we cannot exclude the possibility that other, C1ql2-independent, mechanisms also contribute to the SV recruitment downstream of Bcl11b. Our data from the C1ql2 rescue, C1ql2 KD, the in vitro experiments and the interruption of C1ql2-Nrxn3 in vivo, strongly suggest C1ql2 to be an important regulator of SV recruitment.

      -Does Bcl11b regulate Nrxn3 expression? Considering the apparent loss of C1ql2 expression in the Nrxn KO mice, this is an important detail.

      We agree with reviewer #3 that this is an important point. We have previously done differential transcriptomics from DG neurons of Bcl11b cKOs compared to controls and did not find Nrxn3 among the differentially expressed genes. To further validate this, we now quantified the Nrxn3 mRNA levels via qPCR in Bcl11b cKOs compared to controls and found no differences. These data are included in supplementary figure 5a of the revised manuscript.

      -It appears that C1ql2 expression is much lower in the Nrxn123 KO mice. Since the authors are trying to test whether Nrxn3 is required for the correct targeting of C1ql2, this is a confounding factor. We can't really tell if what we are seeing is a "mistargeting" of C1ql2, loss of expression, or both. If the authors did a similar analysis to what they did in Figure 1 where they looked at the synaptic localization of C1ql2 (and quantified it) that could provide more evidence to support or refute the "mistargeting" claim.

      Please also see response to reviewer #3 point 5 of public reviews. To exclude that reduction of fluorescence intensity of C1ql2 at the SL in Nrxn123 KO mice is due to loss of C1ql2 expression, we examined the mRNA levels of C1ql2 in control and Nrxn123 mutants and found no changes (data are included in supplementary figure 6b of the revised manuscript), suggesting that C1ql2 gene expression is normal. The reduced C1ql2 fluorescence intensity at the MFS was first observed when non-binding C1ql2 variant K262E was introduced to Bcl11b cKO mice that lack endogenous C1ql2 (Fig.6). In these experiments, we found that despite the overall high protein levels of C1ql2.K262E in the hippocampus (Fig. 6c), its fluorescence intensity at the SL was significantly reduced compared to WT C1ql2 (Fig. 6d-e). The remaining C1ql2.K262E signal in the SL was equally distributed and in a punctate form, similar to WT C1ql2. Together, this indicates that the loss of C1ql2-Nrxn3 interaction interferes with the localization of C1ql2 along the MFS, but not with expression of C1ql2. Of course, this does not exclude that additional mechanisms regulate C1ql2 localization at the synapse, as both the mutant C1ql2 in Bcl11b cKO and the endogenous C1ql2 in Nrxn123 cKO show residual immunofluorescence at the SL.

      We note here that we have not previously quantified the co-localization of C1ql2 with individual synapses. C1ql2 is a secreted molecule that localizes at the MFS synaptic cleft. However, not much is known about the number of MFS that are positive for C1ql2 nor about the mechanisms regulating C1ql2 targeting, transport, and secretion to the MFS. Whether C1ql2 interaction with Nrxn3 is necessary for the protection of C1ql2 from degradation, its surface presentation and transport or stabilization to the synapse is currently unclear. Upon revision of our manuscript, we realized that we might have overstated this particular finding and have now rephrased the specific parts within the results to appropriately describe the observation and have also included a sentence in the discussion referring to the lack of understanding of the mechanism behind this observation.

      -Title of Figure S5 is "Nrxn KO perturbs C1ql2 localization and SV recruitment at the MFS", but there is no data on C1ql2 localization.

      This issue has been fixed in the revised manusript.

      -S5 should be labeled more clearly than just Cre+/-

      This issue has been fixed in the revised manuscript.

      References

      Castillo, P.E., Malenka, R.C., Nicoll, R.A., 1997. Kainate receptors mediate a slow postsynaptic current in hippocampal CA3 neurons. Nature 388, 182–186. https://doi.org/10.1038/40645

      De Bruyckere, E., Simon, R., Nestel, S., Heimrich, B., Kätzel, D., Egorov, A.V., Liu, P., Jenkins, N.A., Copeland, N.G., Schwegler, H., Draguhn, A., Britsch, S., 2018. Stability and Function of Hippocampal Mossy Fiber Synapses Depend on Bcl11b/Ctip2. Front. Mol. Neurosci. 11. https://doi.org/10.3389/fnmol.2018.00103

      Kaeser, P.S., Regehr, W.G., 2017. The readily releasable pool of synaptic vesicles. Curr. Opin. Neurobiol. 43, 63–70. https://doi.org/10.1016/j.conb.2016.12.012

      Lerma, J., 2003. Roles and rules of kainate receptors in synaptic transmission. Nat. Rev. Neurosci. 4, 481–495. https://doi.org/10.1038/nrn1118

      Orlando, M., Dvorzhak, A., Bruentgens, F., Maglione, M., Rost, B.R., Sigrist, S.J., Breustedt, J., Schmitz, D., 2021. Recruitment of release sites underlies chemical presynaptic potentiation at hippocampal mossy fiber boutons. PLoS Biol. 19, e3001149. https://doi.org/10.1371/journal.pbio.3001149

      Vandael, D., Borges-Merjane, C., Zhang, X., Jonas, P., 2020. Short-Term Plasticity at Hippocampal Mossy Fiber Synapses Is Induced by Natural Activity Patterns and Associated with Vesicle Pool Engram Formation. Neuron 107, 509-521.e7. https://doi.org/10.1016/j.neuron.2020.05.013

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their thoughtful comments on the manuscript and to the editors for their assessment.

      We thank the reviewers for their positive feedback and appreciate that they consider our method a valid addition to previously established systems for generating recombinant RNA viruses.

      To strengthen this point, we have now included additional validation by the rescue of recombinant Chikungunya and Dengue virus from viral RNA directly, using the CLEVER protocol. This strengthens the potential of this method as a reverse genetics platform for positive-stranded viruses in general.

      The supportive data has been amended in the Results section, taken into account in Materials and Methods, and the corresponding supplementary figure (Figure S4) has been added.

      One key point raised by one of the reviewers, a comparison with different systems, could not be addressed in this manuscript as our lab does not at all perform BAC cloning. We currently do not have the necessary expertise to conduct an unbiased side-by-side comparison.

      All other comments were addressed in detail, either by including additional data or through specific clarification in the revised text. We are grateful for the careful review and constructive criticisms raised by the reviewers and feel that the corrections and additions have significantly improved the manuscript.

      We have revised the latest version posted May 30, 2023 on bioRxiv (https://doi.org/10.1101/2023.05.11.540343).

      Reviewer #1:

      Public Review:

      In this manuscript, Kipfer et al describe a method for a fast and accurate SARS-CoV2 rescue and mutagenesis. This work is based on a published method termed ISA (infectious subgenomic amplicons), in which partially overlapping DNA fragments covering the entire viral genome and additional 5' and 3' sequences are transfected into mammalian cell lines. These DNA fragments recombine in the cells, express the full length viral genomic RNA and launch replication and rescue of infectious virus.

      CLEVER, the method described here significantly improves on the ISA method to generate infectious SARS-CoV2, making it widely useful to the virology community.

      Specifically, the strengths of this method are:

      1) The successful use of various cell lines and transfection methods.

      2) Generation of a four-fragment system, which significantly improves the method efficiency due to lower number of required recombination events.

      3) Flexibility in choice of overlapping sequences, making this system more versatile.

      4) The authors demonstrated how this system can be used to introduce point mutations as well as insertion of a tag and deletion of a viral gene.

      5) Fast-tracking generation of infectious virus directly from RNA of clinical isolates by RT-PCR, without the need for cloning the fragments or using synthetic sequences.

      One weakness of the latter point, which is also pointed out by the authors, is that the direct rescue of clinical isolates was not tested for sequence fidelity.

      The manuscript clearly presents the findings, and the proof-of-concept experiments are well designed.

      Overall, this is a very useful method for SARS-CoV2 research. Importantly, it can be applicable to many other viruses, speeding up the response to newly emerging viruses than threaten the public health.

      We thank the reviewer for this positive feedback and the summary of the main points. Nevertheless, we would like to comment on point 5): “the direct rescue of clinical isolates was not tested for sequence fidelity”

      This impression by the reviewer suggests that the data was not sufficient on this point. However, the sequence fidelity after direct rescue from RNA was indeed tested in this study, even on a clonal level (please see: Table S2, or raw NGS data SRX20303605 - SRX20303607). For higher clarity, we added the following sentence to the manuscript:<br /> “Indeed, a slight increase of unintentional mutations was observed when sequencing clonal virus populations rescued from RNA directly”.

      Recommendations for the authors:

      Minor Points:

      1) On page 8, the authors write: "levels correlated very well with the viral phenotype". This sentence is not clear. Please clarify what you mean by "viral phenotype". Do you mean CPE on Vero cells?

      We corrected the sentence to: “(…) staining intensity and patterns correlated very well with the wild-type phenotype.”

      2) Page 9 "sequences were analyzed with a cut-off of 10%. Cutoff of what? please clarify.

      The sentence was rephrased to: “(…)mutations with a relative abundance of >10% in the entire virus population were analyzed”

      3) Page 15: The authors refer to the time required for completion of each step of the process. It would be helpful and informative for the readers to include a panel in figure 4, visualizing the timelines.

      We included a timeline in Figure 4, Panel A.

      4) Materials and methods, first paragraph: Please specify which human samples were collected. Do the authors refer to clinical virus isolates?

      We added the following information to the Materials and Methods section:<br /> “Human serum samples for neutralization assays were collected from SARS-CoV-2 vaccinated anonymous donors (…)”

      Clinical virus isolates (Material and Methods; Virus) were used for control experiments, neutralization assays, or as templates for RT-PCR.

      5) Supplementary figure 4A: The color scheme makes it hard to differentiate between the BA.1 and BA.5 fragments. Please choose colors that are not as similar to each other.

      Colors were adapted for better distinction.

      Reviewer #2:

      Public Review:

      The authors of the manuscript have developed and used cloning-free method. It is not entirely novel (rather it is based on previously described ISA method) but it is clearly efficient and useful complementation to the already existing methods. One of strong points of the approach use by authors is that it is very versatile, i.e. can be used in combination with already existing methods and tools. I find it important as many laboratories have already established their favorite methods to manipulate SARS-CoV-2 genome and are probably unwilling to change their approach entirely. Though authors highlight the benefits of their method these are probably not absolute - other methods may be as efficient or as fast. Still, I find myself thinking that for certain purposes I would like to complement my current approach with elements from authors CLEVER method.

      The work does not contain much novel biological data - which is expected for a paper dedicated to development of new method (or for improving the existing one). It may be kind of shortcoming as it is commonly expected that authors who have developed new methods apply it for discovery of something novel. The work stops on step of rescue the viruses and confirming their biological properties. This part is done very well and represents a strength of the study. The properties of rescued viruses were also studied using NSG methods that revealed high accuracy of the used method, which is very important as the method relies on use of PCR that is known to generate random mistakes and therefore not always method of choice.

      What I found missing is a real head-to-head comparison of the developed system with an existing alternatives, preferably some PCR-free standard methods such as use of BAC clones. There are a lot of comparisons but they are not direct, just data from different studies has been compared. Authors could also be more opened to discuss limitations of the method. One of these seems to be rather low rescue efficiency - 1 rescue event per 11,000 transfected cells. This is much lower compared to infectious plasmid (about 1 event per 100 cells or so) and infectious RNAs (often 1 event per 10 cells, for smaller genomes most of transfected cells become infected). This makes the CLEVER method poorly suitable for generation of large infectious virus libraries and excludes its usage for studies of mutant viruses that harbor strongly attenuating mutations. Many of such mutations may reduce virus genome infectivity by 3-4 orders of magnitude; with current efficiencies the use of CLEVER approach may result in false conclusions (mutant viruses will be classified as non-viable while in reality they are just strongly attenuated).

      We thank reviewer 2 for the careful review of our work and the valuable feedback. We agree that a direct comparison with other (PCR-free) methods such as BAC cloning, could be useful for demonstrating the unique benefits of the CLEVER method. However, as our laboratory does not use any BAC or YAC cloning methods, we could not ensure an unbiased side-byside comparison using different techniques.

      We would like to highlight the avoidance of any yeast/bacterial cloning steps that render the CLEVER protocol significantly faster and easier to handle. A visualization of the key steps that could be skipped using CLEVER in comparison to common reverse genetics methods is given in Figure 6.

      Further, we firmly believe that the benefits of the CLEVER method become especially apparent for large viral genomes such as the one of SARS-CoV-2, where assembly, genome amplification and sequence verification of plasmid DNA are highly inefficient and more timeconsuming than for small viruses like DENV, CHIKV or HIV.

      We agree with the reviewer that the overall transfection and recombination efficiencies observed with CLEVER seemed rather low. Although data on transfection/rescue efficiency is known for many techniques and viruses, we did not find any published data on the reconstitution of SARS-CoV-2 or viruses with similar genome sizes. Therefore, a useful comparator for our observations in relation to other techniques is currently simply missing. We therefore emphasize that the efficiencies of CLEVER were achieved with one of the largest plus-stranded RNA virus genomes, and our data can’t be directly compared to transfection efficiencies of short infectious RNAs.

      On the contrary, it was rather interesting to observe the very high rescue efficiency of infectious virus progeny. During the two years of establishing and validating the CLEVER protocol, we reached success rates for the genome reconstitution after transfection of >95 %. This was even obtained with highly attenuated mutants including rCoV2∆ORF3678 (joint deletion of ORF3a, ORF6, ORF7a, and ORF8) (Liu et al., 2022)(see Author response image 1). We amended this data in response to the reviewers’ comment and as an example of the successful rescue of an attenuated virus from five overlapping genome fragments (fragments A, B, C, D1, and D2∆ORF3678).

      The latter data were not added to the main manuscript since in this case the deletions were introduced using a different method: from the plasmid-based DNA fragment D2∆ORF3678 and not directly from PCR-based mutagenesis.

      Further, CLEVER was used for related substantial manipulations, including the complete deletion of the Envelope gene (E) which led to the creation of a single-cycle virus that may serve as a live, replication-incompetent vaccine candidate (Lett et al., 2023).

      Author response image 1.

      rCoV2∆ORF3678. Detection of intracellular SARS-CoV-2 nucleocapsid protein (N, green) and nuclei (Hoechst, blue) in Vero E6TMPRSS2 cells infected with rCoV2∆ORF3678 by immunocytochemistry. Scalebar is 200 µm in overview and 50 µm in ROI images.

      Recommendations for the authors:

      The work is nicely presented and the method authors has developed is clearly valuable. As indicated in Public review section the work would benefit from direct comparison of CLEVER with that of infectious plasmid (or RNA) based methods; direct comparison of data would be more convincing that indirect one. Authors should also discuss possible limitations of the method - this is helpful for a reader.

      We were not able to perform a direct comparison of CLEVER with other methods (see our statement above).

      We added the following section to the discussion: “Along with the advantages of the CLEVER protocol, limitations must be considered: Interestingly, virus was never rescued after transfecting Vero E6 cells, as has been observed previously (Mélade et al., 2022). Whether this is due to low transfection efficiency or the cell’s inability to recombine remains to be elucidated. Other cell lines not tested within this study will have to be tested for efficient recombination and virus production first. Further, the high sequence integrity of rescued virus is highly dependent on the fidelity of the DNA polymerase used for amplification. The use of other enzymes might negatively influence the sequence integrity of recombinant virus, as it has been observed for the direct rescue from viral RNA using a commercially available onestep RT-PCR kit. Another limitation when performing direct mutagenesis is the synthesis of long oligos to create an overlapping region. Repetitive sequences, for example, can impair synthesis, and self-annealing and hairpin formation increase with prolonged oligos.”

      Some technical corrections of the text would be beneficial. In all past of the text the use of terms applicable only for DNA or RNA is mixed and creates some confusion. For example, authors state that "the human cytomegalovirus promoter (CMV) was cloned upstream of 5' UTR and poly(A) tail, the hepatitis delta ribozyme (HDVr) and the simian virus 40 polyadenylation signal downstream of the 3' UTR". Strictly speaking it is impossible as such a construct would contain dsDNA sequence (CMV promoter) followed by ssRNA (5'UTR, polyA tail and HDV ribozyme) and then again dsDNA (SV40 terminator). So, better to be correct and add "sequences corresponding to", "dsDNA copies of" to the description of RNA elements

      We thank the reviewer for the advice but would like to state that in scientific language it is common to assume that nucleic acid cloning is based on DNA.

      We have corrected the description in the Methods section: “The human cytomegalovirus promoter (CMV) was cloned upstream of the DNA sequence of the viral 5’UTR; herein, the first five nucleotides (ATATT) correspond to the 5’UTR of SARS-CoV. Sequences corresponding to the poly(A) tail (n=35), the hepatitis delta virus ribozyme (HDVr), and the simian virus 40 polyadenylation signal (SV40pA) were cloned immediately downstream of the DNA sequence of the viral 3’UTR.”

      For ease of reading and for consistent terminology, we kept the original spelling in the rest of the manuscript.

      In description of neutralization assay authors have used temperature 34 C for incubation of virus with antibodies as well as for subsequent incubation of infected cells. Why this temperature was used?

      The following sentence was added (Materials and Methods; Cells): “A lower incubation temperature was chosen based on previous studies (V’kovski et al., 2021).”

      References

      Lett MJ, Otte F, Hauser D, Schön J, Kipfer ET, Hoffmann D, Halwe NJ, Ulrich L, Zhang Y, Cmiljanovic V, Wylezich C, Urda L, Lang C, Beer M, Mittelholzer C, Klimkait T. 2023. Single-cycle SARS-CoV-2 vaccine elicits high protection and sterilizing immunity in hamsters. doi:10.1101/2023.05.17.541127

      Liu Y, Zhang X, Liu J, Xia H, Zou J, Muruato AE, Periasamy S, Kurhade C, Plante JA, Bopp NE, Kalveram B, Bukreyev A, Ren P, Wang T, Menachery VD, Plante KS, Xie X, Weaver SC, Shi P-Y. 2022. A live-attenuated SARS-CoV-2 vaccine candidate with accessory protein deletions. Nat Commun 13:4337. doi:10.1038/s41467-022-31930-z

      V’kovski P, Gultom M, Kelly JN, Steiner S, Russeil J, Mangeat B, Cora E, Pezoldt J, Holwerda M, Kratzel A, Laloli L, Wider M, Portmann J, Tran T, Ebert N, Stalder H, Hartmann R, Gardeux V, Alpern D, Deplancke B, Thiel V, Dijkman R. 2021. Disparate temperaturedependent virus–host dynamics for SARS-CoV-2 and SARS-CoV in the human respiratory epithelium. PLoS Biol 19:e3001158. doi:10.1371/journal.pbio.3001158

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study addresses how faces and bodies are integrated in two STS face areas revealed by fMRI in the primate brain. It builds upon recordings and analysis of the responses of large populations of neurons to three sets of images, that vary face and body positions. These sets allowed the authors to thoroughly investigate invariance to position on the screen (MC HC), to pose (P1 P2), to rotation (0 45 90 135 180 225 270 315), to inversion, to possible and impossible postures (all vs straight), to the presentation of head and body together or in isolation. By analyzing neuronal responses, they found that different neurons showed preferences for body orientation, head orientation, or the interaction between the two. By using a linear support vector machine classifier, they show that the neuronal population can decode head-body angle presented across orientations, in the anterior aSTS patch (but not middle mSTS patch), except for mirror orientation.

      Strengths:

      These results extend prior work on the role of Anterior STS fundus face area in face-body integration and its invariance to mirror symmetry, with a rigorous set of stimuli revealing the workings of these neuronal populations in processing individuals as a whole, in an important series of carefully designed conditions.

      Minor issues and questions that could be addressed by the authors:

      (1) Methods. While monkeys certainly infer/recognize that individual pictures refer to the same pose with varying orientations based on prior studies (Wang et al.), I am wondering whether in this study monkeys saw a full rotation of each of the monkey poses as a video before seeing the individual pictures of the different orientations, during recordings.

      The monkeys had not been exposed to videos of a rotating monkey pose before the recordings. However, they were reared and housed with other monkeys, providing them with ample experience of monkey poses from different viewpoints.

      (2) Experiment 1. The authors mention that neurons are preselected as face-selective, body-selective, or both-selective. Do the Monkey Sum Index and ANOVA main effects change per Neuron type?

      We have performed a new analysis to assess whether the Monkey Sum Index is related to the response strength for the face versus the body as measured in the Selectivity Test of Experiment 1. To do this we selected face- and body-category selective neurons, as well as neurons responding selectively to both faces and bodies. First, we selected those neurons that responded significantly to either faces, bodies, or the two control object categories, using a split-plot ANOVA for these 40 stimuli. From those neurons, we selected face-selective ones having at least a twofold larger mean net response to faces compared to bodies (faces > 2 * bodies) and the control objects for faces (faces  > 2* objects). Similarly, a body-selective neuron was defined by a twofold larger mean net response to bodies compared to faces and the control objects for bodies. A body-and-face selective neuron was defined as having a twofold larger net response to the faces compared to their control objects, and to bodies compared to their control objects, with the ratio between mean response to bodies and faces being less than twofold. Then, we compared the distribution of the Monkey Sum Index (MSI) for each region (aSTS; mSTS), pose (P1, P2), and centering (head- (HC) or monkey-centered (MC)) condition. Too few body-and-face selective neurons were present in each combination of region, pose, and centering (a maximum of 7) to allow a comparison of their MSI distribution with the other neuron types. The Figure below shows the distribution of the MSI for the different orientation-neuron combinations for the body- and face-selective neurons (same format as in Figure 3a, main text). The number of body-selective neurons, according to the employed criteria, varied from 21 to 29, whereas the number of face-selective neurons ranged from 14 to 24 (pooled across monkeys). The data of the two subjects are shown in a different color and the number of cases for each subject is indicated (n1: number of cases for M1; n2: number of cases for M2). The arrows indicate the medians for the data pooled across the monkey subjects. For the MC condition, the MSI tended to be more negative (i.e. relatively less response to the monkey compared to the sum of the body and face responses) for the face compared to the body cells, but this was significant only for mSTS and P1 (p = 0.043; Wilcoxon rank sum test; tested after averaging the indices per neuron to avoid dependence of indices within a neuron). No consistent, nor significant tendencies were observed for the HC stimuli. This absence of a consistent relationship between MSI and face- versus body-selectivity is in line with the absence of a correlation between the MSI and face- versus body-selectivity using natural images of monkeys in a previous study (Zafirova Y, Bognár A, Vogels R. Configuration-sensitive face-body interactions in primate visual cortex. Prog Neurobiol. 2024 Jan;232:102545).

      We did not perform a similar analysis for the main effects of the two-way ANOVA because the very large majority of neurons showed a significant effect of body orientation and thus no meaningful difference between the two neuron types can be expected.

      Author response image 1.

      (3) I might have missed this information, but the correlation between P1 and P2 seems to not be tested although they carry similar behavioral relevance in terms of where attention is allocated and where the body is facing for each given head-body orientation.

      Indeed, we did not compute this correlation between the responses to the sitting (P1) and standing (P2) pose avatar images. However, as pointed out by the reviewer, one might expect such correlations because of the same head orientations and body-facing directions. Thus, we computed the correlation between the 64 head-body orientation conditions of P1 and P2 for those neurons that were tested with both poses and showed a response for both poses (Split-plot ANOVA). This was performed for the Head-Centered and Monkey-Centered tests of Experiment 1 for each monkey and region. Note that not all neurons were tested with both poses (because of failure to maintain isolation of the single unit in both tests or the monkey stopped working) and not all neurons that were recorded in both tests showed a significant response for both poses, which is not unexpected since these neurons can be pose selective. The distribution of the Pearson correlation coefficients of the neurons with a significant response in both tests is shown in Figure S1. The median correlation coefficient was significantly larger than zero for each region, monkey, and centering condition (outcome of Wilcoxon tests, testing whether the median was different from zero (p1 = p-value for M1; p2: p-value for M2) in Figure), indicating that the effect of head and/or body orientation generalizes across pose. We have noted this now in the Results (page 12) and added the Figure (New Figure S1) in the Suppl. Material.

      (4) Is the invariance for position HC-MC larger in aSTS neurons compared to mSTS neurons, as could be expected from their larger receptive fields?

      Yes, the position tolerance of the interaction of body and head orientation was significantly larger for aSTS compared to mSTS neurons, as we described on pages 11 and 12 of the Results. This is in line with larger receptive fields in aSTS than in mSTS. However, we did not plot receptive fields in the present study.

      (5) L492 "The body-inversion effect likely results from greater exposure to upright than inverted bodies during development". Monkeys display more hanging upside-down behavior than humans, however, does the head appear more tilted in these natural configurations?

      Indeed, infant monkeys do spend some time hanging upside down from their mother's belly. While we lack quantitative data on this behavior, casual observations suggest that even young monkeys spend more time upright. The tilt of the head while hanging upside down can vary, just as it does in standing or sitting monkeys (as when they search for food or orient to other individuals). To our knowledge, no quantitative data exist on the frequency of head tilts in upright versus upside-down monkeys. Therefore, we refrain from further speculation on this interesting point, which warrants more attention.

      (6) Methods in Experiment 1. SVM. How many neurons are sufficient to decode the orientation?

      The number of neurons that are needed to decode the head-body orientation angle depends on which neurons are included, as we show in a novel analysis of the data of Experiment 1. We employed a neuron-dropping analysis, similar to Chiang et al. (Chiang FK, Wallis JD, Rich EL. Cognitive strategies shift information from single neurons to populations in prefrontal cortex. Neuron. 2022 Feb 16;110(4):709-721) to assess the positive (or negative) contribution of each neuron to the decoding performance. We performed cross-validated linear SVM decoding N times, each time leaving out a different neuron (using N-1 neurons; 2000 resamplings of pseudo-population vectors). We then ranked decoding accuracies from highest to lowest, identifying the ‘worst’ (rank 1) to ‘best’ (rank N) neurons. Next, we conducted N decodings, incrementally increasing the number of included neurons from 1 to N, starting with the worst-ranked neuron (rank 1) and sequentially adding the next (rank 2, rank 3, etc.). This analysis focused on zero versus straight angle decoding in the aSTS, as it yielded the highest accuracy. We applied it when training on MC and testing on HC for each pose. Plotting accuracy as a function of the number of included neurons suggested that less than half contributed positively to decoding. We show also the ten “best” neurons for each centering condition and pose. These have a variety of tuning patterns for head and body orientation suggesting that the decoding of head-body orientation angle depends on a population code. Notably, the best-ranked (rank N) neuron alone achieved above-chance accuracy. We have added this interesting and novel result to the Results (page 16) and Suppl. Material (new Figure S3).

      (7) Figure 3D 3E. Could the authors please indicate for each of these neurons whether they show a main effect of face, body, or interaction, as well as their median corrected correlation to get a flavor of these numbers for these examples?

      We have indicated these now in Figure 3.

      (8) Methods and Figure 1A. It could be informative to precise whether the recordings are carried in the lateral part of the STS or in the fundus of the STS both for aSTS and mSTS for comparison to other studies that are using these distinctions (AF, AL, MF, ML).

      In experiment 1, the recording locations were not as medial as the fundus. For experiments 2 and 3, the ventral part of the fundus was included, as described in the Methods. We have added this to the Methods now (page 31).

      Wang, G., Obama, S., Yamashita, W. et al. Prior experience of rotation is not required for recognizing objects seen from different angles. Nat Neurosci 8, 1768-1775 (2005). https://doi-org.insb.bib.cnrs.fr/10.1038/nn1600

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the neuronal encoding of the relationship between head and body orientations in the brain. Specifically, the authors focus on the angular relationship between the head and body by employing virtual avatars. Neuronal responses were recorded electrophysiologically from two fMRI-defined areas in the superior temporal sulcus and analyzed using decoding methods. They found that: (1) anterior STS neurons encode head-body angle configurations; (2) these neurons distinguish aligned and opposite head-body configurations effectively, whereas mirror-symmetric configurations are more difficult to differentiate; and (3) an upside-down inversion diminishes the encoding of head-body angles. These findings advance our understanding of how visual perception of individuals is mediated, providing a fundamental clue as to how the primate brain processes the relationship between head and body - a process that is crucial for social communication.

      Strengths:

      The paper is clearly written, and the experimental design is thoughtfully constructed and detailed. The use of electrophysiological recordings from fMRI-defined areas elucidated the mechanism of head-body angle encoding at the level of local neuronal populations. Multiple experiments, control conditions, and detailed analyses thoroughly examined various factors that could affect the decoding results. The decoding methods effectively and consistently revealed the encoding of head-body angles in the anterior STS neurons. Consequently, this study offers valuable insights into the neuronal mechanisms underlying our capacity to integrate head and body cues for social cognition-a topic that is likely to captivate readers in this field.

      Weaknesses:

      I did not identify any major weaknesses in this paper; I only have a few minor comments and suggestions to enhance clarity and further strengthen the manuscript, as detailed in the Private Recommendations section.

      Reviewer #3 (Public review):

      Summary:

      Zafirova et al. investigated the interaction of head and body orientation in the macaque superior temporal sulcus (STS). Combining fMRI and electrophysiology, they recorded responses of visual neurons to a monkey avatar with varying head and body orientations. They found that STS neurons integrate head and body information in a nonlinear way, showing selectivity for specific combinations of head-body orientations. Head-body configuration angles can be reliably decoded, particularly for neurons in the anterior STS. Furthermore, body inversion resulted in reduced decoding of head-body configuration angles. Compared to previous work that examined face or body alone, this study demonstrates how head and body information are integrated to compute a socially meaningful signal.

      Strengths:

      This work presents an elegant design of visual stimuli, with a monkey avatar of varying head and body orientations, making the analysis and interpretation straightforward. Together with several control experiments, the authors systematically investigated different aspects of head-body integration in the macaque STS. The results and analyses of the paper are mostly convincing.

      Weaknesses:

      (1) Using ANOVA, the authors demonstrate the existence of nonlinear interactions between head and body orientations. While this is a conventional way of identifying nonlinear interactions, it does not specify the exact type of the interaction. Although the computation of the head-body configuration angle requires some nonlinearity, it's unclear whether these interactions actually contribute. Figure 3 shows some example neurons, but a more detailed analysis is needed to reveal the diversity of the interactions. One suggestion would be to examine the relationship between the presence of an interaction and the neural encoding of the configuration angle.

      This is an excellent suggestion. To do this, one needs to identify the neurons that contribute to the decoding of head-body orientation angles. For that, we employed a neuron-dropping analysis, similar to Chiang et al. (Chiang FK, Wallis JD, Rich EL. Cognitive strategies shift information from single neurons to populations in prefrontal cortex. Neuron. 2022 Feb 16;110(4):709-721.) to assess the positive (or negative) contribution of each neuron to the decoding performance. We performed cross-validated linear SVM decoding N times, each time leaving out a different neuron (using N-1 neurons; 2000 resamplings of pseudo-population vectors). We then ranked decoding accuracies from highest to lowest, identifying the ‘worst’ (rank 1) to ‘best’ (rank N) neurons. Next, we conducted N decodings, incrementally increasing the number of included neurons from 1 to N, starting with the worst-ranked neuron (rank 1) and sequentially adding the next (rank 2, rank 3, etc.). This analysis focused on zero versus straight angle decoding in the aSTS, as it yielded the highest accuracy. We applied it when training on MC and testing on HC for each pose. Plotting accuracy as a function of the number of included neurons suggested that less than half contributed positively to decoding (see Figure S3). We examined the tuning for head and body orientation of the 10 “best” neurons (Figure S3). For half or more of those the two-way ANOVA showed a significant interaction. These are indicated by the red color in the Figure. They showed a variety of tuning patterns for head and body orientation, suggesting that the decoding of the head-body orientation angle results from a combination of neurons with different tuning profiles. Based on a suggestion from reviewer 2, we performed for each neuron of experiment 1 a one-way ANOVA with as factor head-body orientation angle. To do that, we combined all 64 trials that had the same head-body orientation angle. The percentage of neurons (required to be responsive in the tested condition) for which this one-way ANOVA was significant was low but larger than the expected 5% (Type 1 error), with a median of 16.5% (range: 3 to 23%) in aSTS and 8% for mSTS (range: 0-19%). However, a higher percentage of the 10 best neurons for each pose (indicated by the star) showed a significant one-way ANOVA for angle (for P1, MC: 50% (95% confidence interval (CI): 19% – 81%); P1, HC: 70% (CI: 35% - 93%); P2, MC: 70% (CI: 35% – 93%); P2: HC: 50% (CI: 19%-81%)). These percentages were significantly higher than expected for a random sample from the population of neurons for each pose-centering combination (expected percentages listed in the same order as above: 16%, 13%, 16%, and 10%; all outside CI). Thus, for at least half of the “best” neurons, the response differed significantly among the head-orientation angles at the single neuron level. Nonetheless, the tuning profiles were diverse, suggesting a populationl code for head-body orientation angle. We have added this interesting and novel result to the Results (page 16) and Suppl. Material (Figure S3).

      (2) Figure 4 of the paper shows a better decoding of the configuration angle in the anterior STS than in the middle STS. This is an interesting result, suggesting a transformation in the neural representation between these two areas. However, some control analyses are needed to further elucidate the nature of this transformation. For example, what about the decoding of head and body orientations - dose absolute orientation information decrease along the hierarchy, accompanying the increase in configuration information?

      We have performed now two additional analyses, one in which we decoded the orientation of the head and another one in which we decoded the orientation of the body. We employed the responses to the avatar of experiment 1, using the same sample of neurons of which we decoded the head-body orientation angle. To decode the head orientation, the trials with identical head orientation, irrespective of their body orientation, were given the same label. For this, we employed only responses in the head-centered condition. To decode the body orientation, the trials with identical body orientation, irrespective of their head orientation, had the same label, and we employed only responses in the body-centered condition. The decoding was performed separately for each pose (P1 and P2) and region. We decoded either the responses of 20 neurons (10 randomly sampled from each monkey for each of the 1000 resamplings), 40 neurons (20 randomly sampled per monkey), or 60 neurons (30 neurons per monkey) since the sample of 60 neurons yielded close to ceiling performance for the body orientation decoding. For each pose, the body orientation decoding was worse for aSTS than for mSTS, although this difference reached significance only for P1 and for the 40 neurons sample of P2 (p < 0.025; two-tailed test; same procedure as employed for testing the significance of the decoding of whole-body orientation for upright versus inverted avatars (Experiment 3))). Face orientation decoding was significantly worse for aSTS compared to mSTS. These results are in line with the previously reported decreased decoding of face orientation in the anterior compared to mid-STS face patches (Meyers EM, Borzello M, Freiwald WA, Tsao D. Intelligent information loss: the coding of facial identity, head pose, and non-face information in the macaque face patch system. J Neurosci. 2015 May 6;35(18):7069-81), and decreased decoding of body orientation in anterior compared to mid-STS body patches (Kumar S, Popivanov ID, Vogels R. Transformation of Visual Representations Across Ventral Stream Body-selective Patches. Cereb Cortex. 2019 Jan 1;29(1):215-229). As mentioned by the reviewer, this contrasts with the decoding of the head-body orientation angle, which increases when moving more anteriorly. We mention this finding now in the Discussion (page 27) and present the new Figure S10 in the Suppl. Material.    

      (3) While this work has characterized the neural integration of head and body information in detail, it's unclear how the neural representation relates to the animal's perception. Behavioural experiments using the same set of stimuli could help address this question, but I agree that these additional experiments may be beyond the scope of the current paper. I think the authors should at least discuss the potential outcomes of such experiments, which can be tested in future studies.

      Unfortunately, we do not have behavioral data. One prediction would be that the discrimination of head-body orientation angle, irrespective of the viewpoint of the avatar, would be more accurate for zero versus straight angles compared to the right versus left angles. We have added this to the Discussion (page 28).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) P22 L373. It should read Figure S5C instead of S4C.

      Thanks; corrected.

      (2) Figure 7B. All inverted decoding accuracies, although significantly lower than upright decoding accuracies, appear significantly above baseline. Should the title be amended accordingly?

      Thanks for pointing this out. To avoid future misunderstanding we have changed the title to:

      “Integration of head and body orientations in the macaque superior temporal sulcus is stronger for upright bodies”

      (3) Discussion L432-33. "with some neurons being tuned to a particular orientation of both the head and the body". Wouldn't that be visible as a diagonal profile on the normalized net responses in Fig 3D? Or can the Anova evidence such a tuning?

      We meant to say that some neurons were tuned to a particular combination of head and body orientation, like the third aSTS example neuron shown in Figure 3D. We have corrected the sentence.

      Reviewer #2 (Recommendations for the authors):

      Major comment:

      This paper effectively demonstrates that the angular relationship between the head and body can be decoded from population responses in the anterior STS. In other words, these neurons encode information about the head-body angle. However, how exactly do these neurons encode this information? Given that the study employed electrophysiological recordings from a local population of neurons, it might be possible to provide additional data on the response patterns of individual neurons to shed light on the underlying encoding mechanisms.

      Although the paper already presents example response patterns (Figures 3D, E) and shows that STS neurons encode interactions between head and body orientations (Figure 3B), it remains unclear whether the angle difference between the head and body has a systematic effect on neuronal responses. For instance, a description of whether some neurons preferentially encode specific head-body angle differences (e.g., a "45-degree angle neuron"), or additional population analyses such as a one-way ANOVA with angle difference as the main effect (or two-way ANOVA with angle difference as one of the main effect), would be very informative. Such data could offer valuable insights into how individual neurons contribute to the encoding of head-body angle differences-a detail that may also be reflected in the decoding results. Alternatively, it is possible that the encoding of head-body angle is inherently complex and only discernible via decoding methods applied to population activity. Either scenario would provide interesting and useful information to the field.

      We have performed two additional analyses which are relevant to this comment. First, we attempted to relate the tuning for body and head orientation with the decoding of the head-body orientation angle. To do this, one needs to identify the neurons that contribute to the decoding of head-body orientation angles. For that, we employed a neuron-dropping analysis, similar to Chiang et al. (Chiang FK, Wallis JD, Rich EL. Cognitive strategies shift information from single neurons to populations in prefrontal cortex. Neuron. 2022 Feb 16;110(4):709-721.) to assess the positive (or negative) contribution of each neuron to the decoding performance. We performed cross-validated linear SVM decoding N times, each time leaving out a different neuron (using N-1 neurons; 2000 resamplings of pseudo-population vectors). We then ranked decoding accuracies from highest to lowest, identifying the ‘worst’ (rank 1) to ‘best’ (rank N) neurons. Next, we conducted N decodings, incrementally increasing the number of included neurons from 1 to N, starting with the worst-ranked neuron (rank 1) and sequentially adding the next (rank 2, rank 3, etc.). This analysis focused on zero versus straight angle decoding in the aSTS, as it yielded the highest accuracy. We applied it when training on MC and testing on HC for each pose. Plotting accuracy as a function of the number of included neurons suggested that less than half contributed positively to decoding (see Figure S3). We examined the tuning for head and body orientation of the 10 “best” neurons (Figure S3). For half or more of those the two-way ANOVA showed a significant interaction. These are indicated by the red color in the Figure. They showed a variety of tuning patterns for head and body orientation, suggesting that the decoding of the head-body orientation angle results from a combination of neurons with different tuning profiles.

      Second, we have followed the suggestion of the reviewer to perform for each neuron of experiment 1 a one-way ANOVA with as factor head-body orientation angle. To do that, we combined all 64 trials that had the same head-body orientation angle. The percentage of neurons (required to be responsive in the tested condition) for which this one-way ANOVA was significant is shown in the Tables below for each region, separately for each pose (P1, P2), centering condition (MC = monkey-centered; HC = head-centered) and monkey subject (M1, M2). The percentages were low but larger than the expected 5% (Type 1 error), with a median of 16.5% (range: 3 to 23%) in aSTS and 8% for mSTS (range: 0-19%).

      Author response table 1.

      Interestingly, a higher percentage of the 10 best neurons for each pose (indicated by the star in the Figure above) showed a significant one-way ANOVA for angle (for P1, MC: 50% (95% confidence interval (CI): 19% – 81%); P1, HC: 70% (CI: 35% - 93%); P2, MC: 70% (CI: 35% – 93%); P2: HC: 50% (CI: 19%-81%)). These percentages were significantly higher than expected for a random sample from the population of neurons for each pose-centering combination (expected percentages listed in the same order as above: 16%, 13%, 16%, and 10%; all outside CI). Thus, for at least half of the “best” neurons, the response differed significantly among the head-orientation angles at the single neuron level. Nonetheless, the tuning profiles were quite diverse, suggesting population coding of head-body orientation angle. We have added this interesting and novel result to the Results (page 16) and Suppl. Material (Figure S3).    

      Minor comments:

      (1) Figure 4A, Fourth Row Example (Zero Angle vs. Straight Angle, Bottom of the P2 Examples): The order of the example stimuli might be incorrect- the 0{degree sign} head with 180{degree sign} body stimulus (leftmost) might be swapped with the 180{degree sign} head with 0{degree sign} body stimulus (5th from the left). While this ordering may be acceptable, please double-check whether it reflects the authors' intended arrangement.

      We have changed the order of the two stimuli in Figure 4A, following the suggestion of the reviewer.

      (2) Page 12, Lines 192-194: The text states, "Interestingly, some neurons (e.g. Figure 3D) were tuned to a particular combination of a head and body irrespective of centering." However, Figure 3D displays data for a total of 10 neurons. Could you please specify which of these neurons are being referred to in this context?

      The wording was not optimal. We meant to say that some neurons were tuned to a particular combination of head and body orientation, like the third aSTS example neuron of Figure 3D. We have rephrased the sentence and clarified which example neuron we referred to.

      (3) Page 28, Lines 470-471: The text states, "We observed no difference in response strength between anatomically possible and impossible configurations." Please clarify which data were compared for response strength, as I could not locate the corresponding analyses.

      The anatomically possible and impossible configurations differ in the head-body orientation angle. However, as we reported before in the Results, there was no effect of head-body orientation angle on mean response strength across poses (Friedman ANOVA; all p-values for both poses and centerings > 0.1). We have clarified this now in the Discussion (page 28).

      (4) Pages 40-43, Decoding Analyses: In experiments 2 and 3, were the decoding analyses performed on simultaneously recorded neurons? If so, such analyses might leverage trial-by-trial correlations and thus avoid confounds from trial-to-trial variability. In contrast, experiment 1, which used single-shank electrodes, would lack this temporal information. Please clarify how trial numbers were assigned to neurons in each experiment and how this assignment may have influenced the decoding performance.

      For the decoding analyses of experiments 2 and 3, we combined data from different daily penetrations, with only units from the same penetration being recorded simultaneously. In the decoding analyses of each experiment, the trials were assigned randomly to the pseudo-population vectors, shuffling on each resampling the trial order per neuron. This shuffling abolishes noise correlations in the analysis of each experiment.

      (5) Page 41, Lines 792-802: The authors state that "To assess the significance of the differences in classification scores between pairs of angles ... we computed the difference in classification score between the two pairs for each resampling and the percentile of 0 difference corresponded to the p-value." In a two-sided test under the null hypothesis of no difference between the distributions, the conventional approach would be to compute the p-value as the proportion of resampled differences that are as extreme or more extreme than the observed difference. Since a zero difference might be relatively rare, relying solely on its percentile could potentially misrepresent the tail probabilities relevant to a two-sided test. Could you clarify how their method addresses this issue?

      This test is based on the computation of the distribution of the difference between classification accuracies across resamplings. This is similar to the computation of the confidence interval of a  difference. Thus, we assess whether the theoretical zero value (= no difference; = null hypothesis) is outside the 2.5 and 97.5 percentile interval of the computed distribution of the empirically observed differences. We clarified now in the Methods (page 41) that for a two-tailed test the computed p-value (the percentile of the zero value) should be smaller than 0.025.

      (6) Page 43, Lines 829-834: The manuscript explains: "The mean of 10 classification accuracies (i.e., of 10 resamplings) was employed to obtain a distribution (n=100) of the differences in classification accuracy ... The reported standard deviations of the classification accuracies are computed using also the means of 10 resamplings." I am unfamiliar with this type of analysis and am unclear about the rationale for calculating distributions and standard deviations based on the means of 10 resamplings rather than using the original distribution of classification accuracies. This resampling procedure appears to yield a narrower distribution and smaller standard deviations than the original data. Could you please justify this approach?

      The logic of the analysis is to reduce the noise in the data, by averaging across 10 randomly selected resamplings, but still keeping a sufficient number of data (100 values) for a test.

      Reviewer #3 (Recommendations for the authors):

      (1) Some sentences are too long and difficult to parse. For example, in line 177: "the correlations between the responses to the 64 head-body orientation conditions of the two centerings for the neuron and pose combinations showing significant head-body interactions for the two centerings were similar to those observed for the whole population."

      We have modified this sentence: For neuron and pose combinations with significant head-body interactions in both centerings, the correlations between responses to the 64 head-body orientation conditions were similar to those observed in the whole population.

      (2) The authors argue in line 485: "in our study, a search bias cannot explain the body-inversion effect since we selected responsive units using both upright and inverted images." However, the body-selective patches were localized using upright images, correct?

      The monkey-selective patches were localized using upright images indeed. However, we recorded in experiment 3 (and 2) also outside the localized patches (as we noted before in the Methods:  “In experiments 2 and 3 we recorded from a wider region, which overlapped with the two monkey patches and the recording locations of experiment 1”). Furthermore, the preference for upright monkey images is not an all-or-nothing phenomenon: most units still responded to inverted monkeys. Also, we believe it is likely that the mean responses to the inverted bodies in the monkey patches, defined by upright bodies versus objects, would be larger than those to objects and we would be surprised to learn that there is a patch selective for inverted bodies that we would have missed with our localizer.

      (3) Typo: line 447, "this independent"->"is independent"?

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Reviewer #1:

      Summary:

      The Roco proteins are a family of GTPases characterized by the conserved presence of an ROC-COR tandem domain. How GTP binding alters the structure and activity of Roco proteins remains unclear. In this study, Galicia C et al. took advantage of conformationspecific nanobodies to trap CtRoco, a bacterial Roco, in an active monomeric state and determined its high-resolution structure by cryo-EM. This study, in combination with the previous inactive dimeric CtRoco, revealed the molecular basis of CtRoco activation through GTP-binding and dimer-to-monomer transition.

      Strengths:

      The reviewer is impressed by the authors' deep understanding of the CtRoco protein. Capturing Roco proteins in a GTP-bound state is a major breakthrough in the mechanistic understanding of the activation mechanism of Roco proteins and shows similarity with the activation mechanism of LRRK2, a key molecule in Parkinson's disease. Furthermore, the methodology the authors used in this manuscript - using conformation-specific nanobodies to trap the active conformation, which is otherwise flexible and resistant to single-particle average - is highly valuable and inspiring.

      Weakness:

      Though written with good clarity, the paper will benefit from some clarifications.

      (1) The angular distribution of particles for the 3D reconstructions should be provided (Figure 1 - Sup. 1 & Sup. 2).

      Figure 1 – Figure supplements 1 and 2 now contain particle distribution plots.

      (2) The B-factors for protein and ligand of the model, Map sharpening factor, and molprobity score should be provided (Table 1).

      Table 1 now contains B-factors and molprobity scores.

      The map used to interpret the model was post-processed by density modification, and therefore no data concerning sharpening factors are provided in the output.

      (3) A supplemental Figure to Figure 2B, illustrating how a0-helix interacts with COR-A&LRR before and after GTP binding in atomic details, will be helpful for the readers to understand the critical role of a0-helix during CtRoco activation.

      This is now illustrated in the new Figure 2 – Figure Supplement 1.

      (4) For the following statement, "On the other hand, only relatively small changes are observed in the orientation of the Roc a3 helix. This helix, which was previously suggested to be an important element in the activation of LRRK2 (Kalogeropulou et al., 2022), is located at the interface of the Roc and CORB domains and harbors the residues H554 and Y558, orthologous to the LRRK2 PD mutation sites N1337 and R1441, respectively." It is not surprising the a3-helix of the ROC domain only has small changes when the ROC domain is aligned (Figure 2E). However, in the study by Zhu et al (DOI: 10.1126/science.adi9926), it was shown that a3-helix has a "see-saw" motion when the COR-B domain is aligned. Is this motion conserved in CtRoco from inactive to active state?

      We indeed describe the conformational changes from the perspective of the Roc domain. When using the COR-B domain for structural alignment, a rotational movement of Roc (including a “seesaw”-like movement of the α3-helix helix around His554) with respect to COR-B is correspondingly observed.

      This is now added to Figure 2E. Additionally, the text was adapted to:

      “Interestingly, this rotational movement of CORB seems to use the H554-Y558-Y804 triad on the interface of Roc and CORB as a pivot point (Figure 2E). Mutation of either of the corresponding residues in LRRK2 (N1437, R1441, Y1699, respectively) is associated with PD and leads to LRRK2 activation. Residues H554 and Y558 are located on the Roc a3 helix, which was previously suggested to be an important element in the activation of LRRK2 (Kalogeropulou et al., 2022). Indeed, while the orientation of the a3 helix with respect to the rest of the Roc domain only undergoes small changes upon GTPgS binding, it can be observed that this helix undergoes a “seesaw-like” movement with respect to the CORB domain. A similar rearrangement was previously also observed for Rab29-mediated activation of human LRRK2 (Störmer et al., 2023; Zhu et al., 2022).”

      (5) A supplemental figure showing the positions of and distances between NbRoco1 K91 and Roc K443, K583, and K611 would help the following statement. "Also multiple crosslinks between the Nbs and CtRoco, as well as between both nanobodies were found. ... NbRoco1-K69 also forms crosslinks with two lysines within the Roc domain (K583 and K611), and NbRoco1-K91 is crosslinked to K583".

      A figure displaying these crosslinks is now provided as Figure 4–figure supplement 1. However, in interpreting these crosslinks it should be taken into consideration that the additive length of the DSSO spacer and the lysine side chains leads to a theoretical upper limit of ∼26 Å for the distance between the α carbon atoms of cross-linked lysines (and even a cut-off distance of 35 Å when taking into account protein dynamics).

      (6) It would be informative to show the position of CtRoco-L487 in the NF and GTP-bound state and comment on why this mutation favors GTP hydrolysis.

      L487 is located in Switch 1, which is a critical region for nucleotide binding and hydrolysis. Unfortunately, most probably due to flexibility, the Switch 1 region could not be entirely modeled (in neither nucleotide state). Since L487 is located on the edge of the interpretable portion of the Switch 1 in both structures (see Author response image 1 below), any interpretation regarding the role of this residue would be highly speculative.

      Author response image 1.

      The following text was added to the Results section:

      “Also the Switch 1 loop could not be fully modeled in our structure, presumably indicating some flexibility in this region despite the presence of a GTP analogue. Interestingly, the Switch 1 loop harbors the site of the PD-analogous L487A mutation that leads to a stabilization of the CtRoco dimer with a concomitant decrease in GTPase activity (Deyaert et al., 2019). Unfortunately, an exact interpretation of this effect of the L487A mutation is hampered by the lack of a well resolved Switch 1 loop.”

      Reviewer #2:

      Summary

      The manuscript by Galicia et al describes the structure of the bacterial GTPyS-bound CtRoco protein in the presence of nanobodies. The major relevance of this study is in the fact that the CtRoco protein is a homolog of the human LRRK2 protein with mutations that are associated with Parkinson's disease. The structure and activation mechanisms of these proteins are very complex and not well understood. Especially lacking is a structure of the protein in the GTP-bound state. Previously the authors have shown that two conformational nanobodies can be used to bring/stabilize the protein in a monomerGTPyS-bound state. In this manuscript, the authors use these nanobodies to obtain the GTPyS-bound structure and importantly discuss their results in the context of the mammalian LRRK2 activation mechanism and mutations leading to Parkinson's disease. The work is well performed and clearly described. In general, the conclusions on the structure are reasonable and well-discussed in the context of the LRRK2 activation mechanism.

      Strengths:

      The strong points are the innovative use of nanobodies to stabilize the otherwise flexible protein and the new GTPyS-bound structure that helps enormously in understanding the activation cycle of these proteins.

      Weakness:

      The strong point of the use of nanobodies is also a potential weak point; these nanobodies may have induced some conformational changes in a part of the protein that will not be present in a GTPyS-bound protein in the absence of nanobodies.

      Two major points need further attention.

      (1) Several parts of the protein are very flexible during the monomer-dimer activity cycle. This flexibility is crucial for protein function, but obviously hampers structure resolution. Forced experiments to reduce flexibility may allow better structure resolution, but at the same time may impede the activation cycle. Therefore, careful experiments and interpretation are very critical for this type of work. This especially relates to the influence of the nanobodies on the structure that may not occur during the "normal" monomerdimer activation cycle in the absence of the nanobodies (see also point 2). So what is the evidence that the nanobody-bound GTPyS-bound state is biochemically a reliable representative of the "normal" GTP-bound state in the absence of nanobodies, and therefore the obtained structure can be confidentially used to interpret the activation mechanism as done in the manuscript.

      See below for an answer to remark 1 and 2.

      (2) The obtained structure with two nanobodies reveals that the nanobodies NbRoco1 and NbRoco2 bind to parts of the protein by which a dimer is impossible, respectively to a0helix of the linker between Roc-COR and LRR, and to the cavity of the LRR that in the dimer binds to the dimerizing domain CORB. It is likely the open monomer GTP-bound structure is recognized by the nanobodies in the camelid, suggesting that overall the open monomer structure is a true GTP-bound state. However, it is also likely that the binding energy of the nanobody is used to stabilize the monomer structure. It is not automatically obvious that in the details the obtained nonobody-Roco-GTPyS structure will be identical to the "normal" Roco-GTPyS structure. What is the influence of nanobody-binding on the conformation of the domains where they bind; the binding energy may be used to stabilize a conformation that is not present in the absence of the nanobody. For instance, NbRoco1 binds to the a0 helix of the linker; what is here the "normal" active state of the Roco protein, and is e.g. the angle between RocCOR and LRR also rotated by 135 degrees? Furthermore, nanobody NbRoco2 in the LRR domain is expected to stabilize the LRR domain; it may allow a position of the LRR domain relative to the rest of the protein that is not present without nanobody in the LRR domain. I am convinced that the observed open structure is a correct representation of the active state, but many important details have to be supported by e,g, their CX-MS experiments, and in the end probably need confirmation by more structures of other active Roco proteins or confirmation by a more dynamic sampling of the active states by e.g. molecular dynamics or NMR.

      Recently, nanobodies have increasingly been used successfully to obtain structural insights in protein conformational states (reviewed in Uchański et al, Curr. Opin. Struc. Biol. 2020). As reviewer # 2 points out, the concern is sometimes raised that antibodies could distort a protein into non-native conformations. Here, it is important to note that the nanobodies were raised by immunizing a llama with the fully native CtRoco protein bound to a non-hydrolysable GTP analogue, after which the nanobodies were selected by phage display using the same fully native and functional form of the protein. As clearly explained in Manglik et al. Annu Rev Pharmacol Toxicol. 2017, the probability of an in vivo matured nanobody inducing a non-native conformation of the antigen is low, although it is possible that it selects a high-energy, low-population conformation of a dynamic protein. Immature B cells require engagement of displayed antibodies with antigen to proliferate and differentiate during clonal selection. Antibodies that induce non-native conformations of the antigen pay a substantial energetic penalty in this process, and B cell clones displaying such antibodies will have a significantly lower probability of proliferation and differentiation into mature antibody-secreting B lymphocytes. Hence, many recent experiments and observation give credence to the notion that nanobodies bind antigens primarily by conformational selection and not induced fit (e.g. Smirnova et al. PNAS 2015).

      Extrapolated to the case of CtRoco, which is clearly very flexible in its GTP-bound form, this means that the nanobodies are able to trap and stabilize one conformational state that is representative of the “active state” ensemble of the protein. In this respect, it is clear from our experiments (XL-MS, affinity and effect on GTPase activity) that the effects of NbRoco1 and NbRoco2 are additive (or even cooperative), meaning that both nanobodies recognize different features of the same CtRoco “active state”. Correspondingly, the monomeric, elongated “open” conformation is also observed in the structure of CtRoco bound to NbRoco1 only (Figure1 - supplement 2), albeit that this structure still displays more flexibility. The monomerization and conformational changes that we observe and describe in the current paper at high resolution are also in very good agreement with earlier observations for CtRoco in the GTP-bound form in absence of any nanobodies, including negative stain EM (Deyaert et al. Nature Commun, 2017), hydrogen-deuterium exchange experiments (Deyaert et al. Biochem. J. 2019) and native MS (Leemans et al. Biochem J. 2020).

      In the revised manuscript we added the following text to the discussion:

      “To decrease this flexibility, we have now used two previously developed conformationspecific nanobodies (NbRoco1 and NbRoco2) to stabilize the protein in the GTP-state (Leemans et al., 2020), allowing us to solve its structure using cryo-EM (Figure 1). Recently, Nbs have successfully been used to obtain structural insights in the conformational states of a number of highly dynamic proteins (Uchański et al, 2020). These studies established that Nbs bind antigens primarily by conformational selection rather than by induced fit (Manglik et al., 2017; Smirnova et al.,2015). Since NbRoco1 and NbRoco2 were generated by immunization with fully native CtRoco bound to a nonhydrolysable GTP analogue, and subsequently selected by phase display using the same functional protein, it is thus safe to assume that these Nbs bind to and stabilize a relevant conformation that is present within the “active” CtRoco conformational space (Leemans et al., 2020). Moreover, our current structures are also in very good agreement with previous biochemical studies and data from HDX-MS and negative stain EM (Deyaert et al., 2019; Deyaert, Wauters, et al., 2017).”

      Recommendations for the authors:

      Reviewer #1:

      (1) Figure 2C: please label the residues with meshes (switch 2).

      Labels have been added to figure 2C.

      (2) A supplemental figure for the following statement will be helpful "A remarkable feature of the CtRoco dimer structure was the dimer-stabilized orientation of the P-loop, which would hamper direct nucleotide binding on the dimer. Correspondingly, in the current structure, the P-loop changes orientation, allowing GTPgS to bind, although the EM map does not allow unambiguous placement of the entire P-loop. Surprisingly, also the Switch 1 loop could not be fully modeled, which could indicate some flexibility in this region despite the presence of a GTP analog".

      An additional Figure 2–figure supplement 2 has been added to illustrate this.

      (3) A supplemental figure for the following statement will be helpful "A final important observation in the Roc domain concerns the very C-terminal part of Switch 2 (residues 520 to 533), which could not be modeled in our GTP bound structure due to flexibility, while in the nucleotide-free dimer structure this region is structured and located at the interface of the Roc domain with the LRR-Roc linker and CORA. In this way, the conformational changes induced by GTPgS binding could be relayed via the Switch 2 toward the LRR and CORA domains, and vice versa."

      An additional Figure 2–figure supplement 2 has been added to illustrate this.

      (4) A structural comparison of each domain (LRR, ROC, COR) between NF and GTP-bound states will be greatly useful to understand statements in the manuscript, such as "In addition to the Cterminal dimerization part of CORB that becomes unstructured, also other large conformational changes are observed in the CORA and CORB domains of CtRoco upon GTPgS binding."

      We would like to clarify that with this statement we refer to changes in the relative orientation of the domains between the nucleotide-free and GTPgS-bound states, rather than to conformational changes within each domain. These changes in relative orientation are illustrated in Figure 2 and the associated Figure supplements.

      (5) The statement "to a lesser extent, also between CDR1 and the LRR-Roc linker" is not clearlyillustrated in Figure 3B.

      The reviewer is correct, and we now also show CDR1 in Figure 3B.

      (6) Extra panels can be added in Figure 1 Sup. 4 to illustrate the following statement "In the density map NbRoco2 can easily be identified and placed on the concave side of the LRR domain... Nterminal and C-terminal b-strands interacting with the very C-terminal repeat of the LRR".

      We belief the density map corresponding to NbRoco2 is clearly shown in Figure 1 – supplement 4A. A reference to this figure panel is now added to the main text.

      (7) "In the presence of both Nbs, the hydrolysis rate was increased 4-fold compared to CtRocoL487A alone and 2-fold compared to CtRoco-L487A in the presence of NbRoco1 only, again illustrating a collaboration between the Nbs (Figure 5C)" Here, is it 6-fold instead of 4-fold?

      The reviewer is correct. We changed this accordingly in the manuscript.

      Reviewer #2:

      (1) At many places in the manuscript the lack of structural details is explained by the assumed local flexibility of the protein. This may be true for many cases (such as linker regions), but is probably not always correct; several other explanations are possible to get no local structural details.

      See our answer to point 2, below.

      (2) At several other places in the manuscript the high flexibility is used to explain the lack of structural details (so the reasoning is reversed compared to point 1); this would require that a priori it is known that that the region is flexible and therefore no structure can be expected. An example is found mid-page 8: "A final important observation in the Roc domain concerns the very C-terminal part of Switch 2 (residues 520 to 533), which could not be modeled in our GTP bound structure due to flexibility, while in the nucleotide-free dimer structure this region is structured and located at the interface of the Roc domain with the LRR-Roc linker and CORA." As written there must be a reference to experiments showing the "due to flexibility"

      The reviewer is correct that additional factors might affect the interpretability of the map, such as the small size of the regions used for the focused refinements (around 50 kDa each) or a preferential distribution of orientation of the particles in the grid. Particle distribution plots are now shown in Figure 1 – Figure supplements 1 and 2. However, due to the intrinsic flexible nature of the Switch 1 and Switch 2 regions, we assume this flexibility to be the major cause of lack of features in the EM maps, especially since some of the neighboring regions display well-resolved maps.

      Nevertheless, in the manuscript we reworded our statements to be more careful. For example, on page 8:

      “Also the Switch 1 loop could not be fully modeled in our structure, presumably indicating some flexibility in this region despite the presence of a GTP analogue.”

      “… potentially due to flexibility of this region in the new position of the Switch 2…”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      Weaknesses:

      The authors do not discuss based on genomic information; the genomes of the cichlids from the three lakes have been decoded and are therefore available. However, indeed, the species in Lake Tanganyika and Lake Malawi/Victoria are genetically distant from each other, so a comparative genome analysis would not have yielded the results presented here. I recommend adding such a discussion to the Discussion.

      We appreciate your comment. We added the discussion regarding the genomic aspect of parallel evolution.

      Line 386-393: “From a genomic perspective, several studies have investigated the genetic basis of hypertrophied lip cichlids (Masonick et al., 2023; Nakamura et al., 2021). Importantly, some Wnt pathway-related genes (tcf4 and daam2) and ECM-related genes (postna, col12a1a, and col12a1b) have been found to be under positive selection in cichlids with hypertrophied lips of Lake Victoria (see Nakamura et al., 2021 Table S3). For future research, examining whether these genes are under selection in other lakes is crucial to understand the genetic mechanisms underlying the parallel evolution of hypertrophied lips.”

      Minor comments:

      Line 30, the Wnt --> the genes in Wnt

      We appreciate your comment. According to the comment, we corrected the sentence.

      Line 30: “the Wnt signaling pathway” -> “the genes in Wnt signaling pathway”

      Line 42-44, "It is considered that the same direction of natural selection drives phenotypic changes among species since it is unlikely that these complex phenotypes have been acquired repeatedly just by neutral evolution". How about "Since it is unlikely that such a complex phenotype was acquired repeatedly by neutral evolution alone, the same direction of natural selection among species is likely to drive the parallel phenotypic change."?

      We agree with your suggestion and correct the sentence of our manuscript.

      Line 42-44: “It is considered that the same direction of natural selection drives phenotypic changes among species since it is unlikely that these complex phenotypes have been acquired repeatedly just by neutral evolution”

      “Since it is unlikely that such a complex phenotype was acquired repeatedly by neutral evolution alone, the same direction of natural selection among species is likely to drive the parallel phenotypic change”

      Line 60, polygenic --> likely to be polygenic

      We appreciate your comment. Indeed, it is better to weaken the wording.

      Line 60: “most traits are polygenic” -> “most traits are likely to be polygenic”

      Line 91, the Wnt --> the genes in Wnt

      We appreciate your correction. Last paragraph of introduction has been corrected according to the suggestion of Reviewer 2 (Q1).

      Line 230, NovaSeq --> Illumina NovaSeq

      We appreciate your correction.

      Line 222: “NovaSeq 6000” -> “Illumina NovaSeq 6000”

      Line 231 "mRNA Library Prep Kit". Please add a company name.

      We appreciate your correction. We added company’s information.

      Line 223: “a TruSeq stranded mRNA Library Prep Kit.” -> “a TruSeq stranded mRNA Library Prep Kit (Illumina)”

      Line 267, as for the tip of hypertrophied lips, could you add and point out which part is the tip?

      We dissected hypertrophied lips in two half anterior and half posterior. We added the sentence in the materials and methods section.

      Line 156-158: “The lips of H. chilotes were analyzed separately for the base and tip.” -> “The lips of H. chilotes were dissected in two half anterior (tip) and half posterior (base), which are analyzed separately.”

      Line 272, "133 proteins upregulated and 5 proteins downregulated" in hypertrophied lip or normal lip?

      We appreciate your correction. We added the sentence as follows.

      Line 264: “133 proteins upregulated and 5 proteins downregulated”

      “133 proteins upregulated and 5 proteins downregulated in the hypertrophied lip”

      Line 274, "hypertrophied lips" means tip of hypertrophied lips?

      We appreciate your correction. We corrected the sentence as follows.

      Line 266: “hypertrophied lips are abundant” -> “tip of hypertrophied lips is abundant”

      Line 277, Did you perform multiple testing correction for statistical significance?

      We appreciate your comment about multiple testing corrections. We did not apply multiple testing corrections in our “exploratory” analysis of proteomics not to miss biologically important candidates in a limited sample size (n=3). We calculated the multiple corrected p-value in the Benjamini Hochberg method (Author response image 1, right). The result suggested that almost the same proteoglycans and its related proteins as we focused on are highly accumulated in the hypertrophied lips in milder conditions (significance level of 0.1).

      Author response image 1.

      Thus, our main conclusions remain unchanged even with correction applied, however, the overall balance of the volcano plot is not visually appealing (Author response image 1, right).

      It is important to note that we selected the Top 20 proteins based on fold change rather than statistical significance. In addition, our proteomic findings show consistency with our histological and transcriptome data, providing the biological validation from various aspects. While we understand the potential benefits of multiple testing correction, our current approach without multiple testing still offers valuable and fair data to propose hypothesis on the molecular mechanisms of lip hypertrophy in cichlids. Therefore, we want to use original figure without multiple testing. We greatly appreciate the understanding of the reviewer.

      Line 349-351, "The results of the enrichment analysis suggested that the genes that were categorized into both canonical and non-canonical Wnt signaling pathways, were highly expressed in the hypertrophied lips of juvenile and adult cichlids."

      The wnt category was enriched by analyzing the highly expressed genes, so isn't it natural that the wnt category is highly expressed?

      Did you mean to say as in the following sentence?

      "Enrichment of genes categorized in the canonical and noncanonical Wnt signaling pathways suggested that high expression of genes in the Wnt signaling pathway is likely to be involved in the hypertrophied lips of juvenile and adult fish."

      Thank you for your comments. We corrected our manuscript as follows.

      Line 341-344: “The results of the enrichment analysis suggested that the genes that were categorized into both canonical and non-canonical Wnt signaling pathways, were highly expressed in the hypertrophied lips of juvenile and adult cichlids.”

      “As a result of enrichment analysis, DEGs were categorized in the canonical and noncanonical Wnt signaling pathways, suggesting that high expression of genes in the Wnt signaling pathway is likely to be involved in the hypertrophied lips of juvenile and adult fish.”

      Line 403-404, "several other pathways may be involved in the development of hypertrophied lips". Do you have any evidence?

      We appreciate your comment regarding possible evidence for the involvement of multiple pathways in hypertrophied lip development. Our statement was based on two main points:

      (1) While we highlighted the Wnt pathway because this pathway is known to increase proteoglycan expression, we cannot exclude the possibility of the involvement of other pathways. For instance, our enrichment analysis in adult cichlids identified VEGF-related pathways, which could contribute to lip hypertrophy by increasing vascularization and nutrient supply to the lip tissue.

      (2) Previous quantitative trait locus (QTL) analysis by Henning et al. (2017) concluded that lip hypertrophy is likely influenced by numerous loci with small additive effects. This indicates that lip hypertrophy is a complex phenotype consisted of multiple genetic factors, some which probably correspond to different molecular pathways.

      Given these points, we draw a conclusion that emphasize the importance of Wnt pathway while also recognizing the potential cooperative interaction of multiple pathways in developing lip hypertrophy. Without confusing the two statements, we corrected our manuscript as follows.

      Line 398-412: “We uncovered the apparent relationships between hypertrophied lips and the expression profiles of ECM proteins, in particularly proteoglycans. The trends for the overall expression of ECM-related genes were similar across hypertrophied lip species, but we rarely observed a specific gene that was commonly expressed at high or low levels in all three examples of hypertrophied lips across all East African Great Lakes. Furthermore, although we focused primarily on the relationship between the Wnt signaling pathway and lip hypertrophy, several other pathways may be involved in the development of hypertrophied lips. These findings imply that although enlargement of proteoglycan-rich loose connective tissue is common in hypertrophied lips, the developmental pathways to accomplish this are diverse in each lake.”

      “We uncovered the apparent relationships between hypertrophied lips and the expression profiles of ECM proteins, in particularly proteoglycans. The trends for the overall expression of ECM-related genes were similar across hypertrophied lip species, but we rarely observed a specific gene that was commonly expressed at high or low levels in all three examples of hypertrophied lips across all East African Great Lakes. Furthermore, although we focused primarily on the relationship between the Wnt signaling pathway and lip hypertrophy, several other pathways may be involved in the development of hypertrophied lips. For example, our enrichment analysis in adult cichlids identified VEGF-related pathways, which could contribute to lip hypertrophy by increasing vascularization and nutrient supply to the lip tissue. In addition, previous quantitative trait locus (QTL) analysis by Henning et al. (2017) concluded that lip hypertrophy is likely influenced by numerous loci with small additive effects. These lines of data imply that although enlargement of proteoglycan-rich loose connective tissue is common in hypertrophied lips, the developmental pathways to accomplish this are diverse in each lake.”

      Reviewer 2:

      Minor comments:

      Last paragraph of Introduction: Remove the results of this study.

      We appreciate your suggestion. We remove the specialized results from the last paragraph.

      “In this study, we comprehensively compared the hypertrophied lips of cichlids across all East African Great Lakes using histology, proteomics, and transcriptomics. Histological and proteomic analyses revealed a distinct microstructure of hypertrophied lips compared to normal lips, and primary candidate proteins were identified. Transcriptome analysis at different developmental stages showed that the genes in Wnt signaling pathway was highly expressed in cichlids with hypertrophied lips at both the juvenile and adult stages. It is noteworthy that the distinct expression profiles observed in the proteome and transcriptome analyses of hypertrophied lips were similar among cichlids from each of the East African Great Lakes. The present study, which integrates comprehensive analyses for cichlids from all East African Great Lakes, provides insight for a better understanding of the molecular basis of a typical example of parallel evolution.”

      Line 87-91: “In this study, we comprehensively compared the hypertrophied and normal lips of cichlids across all East African Great Lakes at various biological levels using histology, proteomics, and transcriptomics. As a result, we showed that a novel key pathway commonly involved in the formation of hypertrophied lips, providing insight into a better understanding of the molecular basis of a typical example of parallel evolution.”

      Line 156: Italicize the scientific names.

      We appreciate your correction.

      Line 148: “M. zebra and O. niloticus” -> “M. zebra and O. niloticus

      Line 261: Remove the period after "Victoria."

      We appreciate your correction.

      Line 253: “Lake Victoria. (Figure 1; Figure S2).” -> “Lake Victoria (Figure 1; Figure S2).”

      Line 416: Remove the period after "tissue."

      We appreciate your correction.

      Line 420: “tissue. (A,B)” -> “tissue (A,B)”

      Line 646: Probably "the anterior side to the left."

      We apologize for our mistake. As you commented, the anterior side is left. We corrected our manuscript as follows.

      Line 648: “the anterior side to the right” -> “the anterior side to the left”

      Fig. S2: Based on Fig. 1, the VG stained area appears larger in the Hypertrophied lip species; however, it is the opposite in Fig. S2.

      We appreciate your comments. This is because we calculated the ratio of the VG-stained area to the whole lip area. While the absolute VG-stained area is larger in hypertrophied lips, the proportion of the VG-stained area relative to the total lip area is smaller. This correction using entire area allows us to simply compare the degree of lip hypertrophy among species.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Public Review

      Summary:

      (1) This work describes a simple mechanical model of worm locomotion, using a series of rigid segments connected by damped torsional springs and immersed in a viscous fluid.

      (2) It uses this model to simulate forward crawling movement, as well as omega turns.

      Strengths:

      (3) The primary strength is in applying a biomechanical model to omega-turn behaviors.

      (4) The biomechanics of nematode turning behaviors are relatively less well described and understood than forward crawling.

      (5) The model itself may be a useful implementation to other researchers, particularly owing to its simplicity.

      Weaknesses:

      (6) The strength of the model presented in this work relative to prior approaches is not well supported, and in general, the paper would be improved with a better description of the broader context of existing modeling literature related to undulatory locomotion.

      (7) This paper claims to improve on previous approaches to taking body shapes as inputs.

      (8) However, the sole nematode model cited aims to do something different, and arguably more significant, which is to use experimentally derived parameters to model both the neural circuits that induce locomotion as well as the biomechanics and to subsequently compare the model to experimental data.

      (9) Other modeling approaches do take experimental body kinematics as inputs and use them to produce force fields, however, they are not cited or discussed.

      (10) Finally, the overall novelty of the approach is questionable.

      (11) A functionally similar approach was developed in 2012 to describe worm locomotion in lattices (Majmudar, 2012, Roy. Soc. Int.), which is not discussed and would provide an interesting comparison and needed context.

      9-11: The paper you recommended and our manuscript have some similarities and differences.

      Similarities

      Firstly, the components constituting the worm are similar in both models. ElegansBot models the worm as a chain of n rods, while the study by Majmudar et al. (2012) models it as a chain of n beads. Each bead in the Majmudar et al. model has a directional vector, making it very similar to ElegansBot's rod. However, there's a notable difference: in the Majmudar et al. model, each bead has an area for detecting contact between the obstacle and the bead, while in ElegansBot, the rod does not feature such an area.

      Secondly, the types of forces and torques acting on the components constituting the worm are similar. Each rod in ElegansBot receives frictional force, muscle force, and joint force. Each bead in the Majmudar et al. model receives a constraint force, viscous force, and a repulsive force from obstacles. Each rod in ElegansBot receives frictional torque, muscle torque, and joint torque. Each bead in the Majmudar et al. model receives elastic torque, constraint torque, drive torque, and viscous torque. The Majmudar et al. model's constraint force and torque are similar to ElegansBot's joint force and torque in that they prevent two connected components of the worm from separating. The Majmudar et al. model's viscous force and torque are similar to ElegansBot's frictional force and torque in that they are forces exchanged between the worm and its surrounding environment (ground surface). The Majmudar et al. model's drive torque is similar to ElegansBot's muscle force and muscle torque as a cause of the worm's motion. However, unlike ElegansBot, the Majmudar et al. model did not consider the force generating the drive torque, and there are differences in how each force and torque is calculated. This will be discussed in more detail below.

      Differences

      Firstly, the medium in which the worm locomotes is different. ElegansBot is a model describing motion in a homogeneous medium like agar or water without obstacles, while the Majmudar et al. model describes motion in water with circular obstacles fixed at each lattice point. This is because the purposes of the models are different. ElegansBot analyzes locomotion patterns based on the friction coefficient, while the Majmudar et al. model analyzes locomotion patterns based on the characteristics of the obstacle lattice, such as the distance between obstacles. Also, for this reason, the Majmudar et al. model's bead, unlike ElegansBot's rod, receives a repulsive force from obstacles.

      Secondly, the specific methods of calculating similar types of forces differ. ElegansBot calculates joint forces by substituting frictional forces, muscle forces, frictional torques, and muscle torques into an equation derived from differentiating a boundary condition equation twice over time, where two neighboring rods always meet at one point. This involves determining the process through which various forces and torques are transmitted across the worm. Specifically, it entails calculating how the frictional forces and torques, as well as the muscle forces and torques acting on each rod, are distributed throughout the entire length of the worm. In contrast, The Majmudar et al. model uses Lagrange multipliers method based on a boundary condition that the curve length determined by each bead's tangential angle does not change, to calculate the constraint force and torque before calculating the drive torque and viscous force. This implies that the Majmudar et al. model did not consider the mechanism by which the drive torque and viscous force received by one bead are distributed throughout the worm. ElegansBot's rod receives an anisotropic Stokes frictional force from the ground surface, while the Majmudar et al. model considered the frictional force according to the Navier-Stokes equation for incompressible fluid, assuming the fluid velocity at the bead's location as the bead's velocity.

      Thirdly, unlike the Majmudar et al. model, ElegansBot considers the inertia of the worm components. Therefore, ElegansBot can simulate regardless of how low or high the ground surface's friction coefficient is. the Majmudar et al. model is not like this.

      (12) The idea of applying biomechanical models to describe omega turns in C. elegans is a good one, however, the kinematic basis of the model as used in this paper (the authors do note that the control angle could be connected to a neural model, but don't do so in this work) limits the generation of neuromechanical control hypotheses.

      8, 12: We do not agree with the claim that ElegansBot could limit other researchers in generating neuromechanical control hypotheses. The term θ_("ctrl" ,i)^((t) ) used in our model is designed to be replaceable with neuromechanical control in the future.

      (13) The model may provide insights into the biomechanics of such behaviors, however, the results described are very minimal and are purely qualitative.

      (14-1) Overall, direct comparisons to the experiments are lacking or unclear.

      14-1: If you look at the text explaining Fig. 2 and 5 (Fig. 2 and 4 in old version), it directly compares the velocity, wave-number, and period as numerical indicators representing the behavior of the worm, between the experiment and ElegansBot.

      (14-2) Furthermore, the paper claims the value of the model is to produce the force fields from a given body shape, but the force fields from omega turns are only pictured qualitatively.

      13, 14-2: We gratefully accept the point that our analysis of the omega-turn is qualitative. Therefore, we have conducted additional quantitative analysis on the omega-turn and inserted the results into the new Fig. 4. We have considered the term 'Force field' as referring to the force vector received by each rod. We have created numerical indicators representing various behaviors of the worm and included them in the revised manuscript.

      (15) No comparison is made to other behaviors (the force experienced during crawling relative to turning for example might be interesting to consider) and the dependence of the behavior on the model parameters is not explored (for example, how does the omega turn change as the drag coefficients are changed).

      Thank you for the great idea. To compare behaviors, first, a clear criterion for distinguishing behaviors is needed. Therefore, we have created a new mathematical definition for behavior classification in the revised manuscript (“Defining Behavioral Categories” in Method). After that, we compared the force and power (energy consuming rate) between each forward locomotion, backward locomotion, and omega-turn (Fig. 4). And in the revised manuscript, we newly analyzed how the turning behavior changes with variations in the friction coefficients in Figs. S4-S7.

      (16) If the purpose of this paper is to recapitulate the swim-to-crawl transition with a simple model, and then apply the model to new behaviors, a more detailed analysis of the behavior of the model variables and their dependence on the variables would make for a stronger result.

      In our revised manuscript, we have quantitatively analyzed the changes occurring in turning behavior from water to agar, and the results are presented in Figs. S9 and S10.

      (17) In some sense, because the model takes kinematics as an input and uses previously established techniques to model mechanics, it is unsurprising that it can reproduce experimentally observed kinematics, however, the forces calculated and the variation of parameters could be of interest.

      (18) Relatedly, a justification of why the drag coefficients had to be changed by a factor of 100 should be explored.

      (19) Plate conditions are difficult to replicate and the rheology of plates likely depends on a number of factors, but is for example, changes in hydration level likely to produce a 100-fold change in drag? or something more interesting/subtle within the model producing the discrepancy?

      18, 19: As mentioned in the paper, we do not know if the friction coefficients in the study of Boyle et al. (2012) and the friction coefficients in the experiment of Stephens et al. (2016) are the same. In our revised manuscript, we have explored more in detail the effects of the friction coefficient's scale factor, and explained why we chose a scale factor of 1/100 (“Proper Selection of Friction Coefficients” in Supplementary Information). In summary, we analyzed the changes in trajectory due to scaling of the friction coefficient, and chose the scale factor 1/100 as it allowed ElegansBot to accurately reproduce the worm's trajectory while also being close to the friction coefficients in the Boyle et al. paper.

      (20) Finally, the language used to distinguish different modeling approaches was often unclear.

      (21) For example, it was unclear in what sense the model presented in Boyle, 2012 was a "kinetic model" and in many situations, it appeared that the term kinematic might have been more appropriate. Thank you for the feedback. As you pointed it out, we have corrected that part to 'kinematic' in the revised manuscript.

      (22) Other phrases like "frictional forces caused by the tension of its muscles" were unclear at first glance, and might benefit from revision and more canonical usage of terms.

      We agree that the expression may not be immediately clear. This is due to the word limit for the abstract (the abstract of eLife VOR should be under 200 words, and our paper's abstract is 198 words), which forced us to convey the causality in a limited number of words. Therefore, although we will not change the abstract, the expression in question means that the muscle tension, which is the cause of the worm's locomotion, ultimately generates the frictional force between the worm and the ground surface.

      Recommendations For The Authors

      (23) As I stated in my public review, I think the paper could be made much stronger if a more detailed exploration of turning mechanics was presented.

      (24) Relatedly, rather than restricting the analysis to individual videos of turning behaviors, I wonder if a parameterized model of the turning kinematics would be fruitful to study, to try to understand how different turning gaits might be more or less energetically favorable.

      We thank the reviewer once again for their suggestion. Thanks to their proposal, we were able to conduct additional quantitative analysis on turning behavior.

      Reviewer #2

      Public Review

      Summary:

      (1) Developing a mechanical model of C. elegans is difficult to do from basic principles because it moves at a low (but not very small) Reynolds number, is itself visco-elastic, and often is measured moving at a solid/liquid interface.

      (2) The ElegansBot is a good first step at a kinetic model that reproduces a wide range of C. elegans motiliy behavior.

      Strengths: (3) The model is general due to its simplicity and likely useful for various undulatory movements.

      (4) The model reproduces experimental movement data using realistic physical parameters (e.g. drags, forces, etc).

      (5) The model is predictive (semi?) as shown in the liquid-to-solid gait transition.

      (6) The model is straightforward in implementation and so likely is adaptable to modification and addition of control circuits.

      Weaknesses:

      (7) Since the inputs to the model are the actual shape changes in time, parameterized as angles (or curvature), the ability of the model to reproduce a realistic facsimile of C. elegans motion is not really a huge surprise. (8) The authors do not include some important physical parameters in the model and should explain in the text these assumptions.

      (9. 1) The cuticle stiffness is significant and has been measured [1].

      (10. 2) The body of C. elegans is under high hydrostatic pressure which adds an additional stiffness [2].

      (11. 3) The visco-elasticity of C. elegans body has been measured. [3]

      Thank you for asking. The stiffness of C. elegans is an important consideration. We took this into account when creating ElegansBot, but did not explain it in the paper. The detailed explanation is as follows. C. elegans indeed has stiffness due to its cuticle and internal pressure. This stiffness is treated as a passive elastic force (elastic force term of lateral passive body force) in the paper of Boyle et al. (2012). However, the maximum spring constant of the passive elastic force is 1/20 of the maximum spring constant of the active elastic force. If we consider this fact in our model, the elastic term of the muscle torque is as follows: ( is the active torque elasticity coefficient, is the passive torque elasticity coefficient)

      where

      Therefore, there is no need to describe the active and passive terms separately in

      Furthermore, since , assuming , then and .

      (12) There is only a very brief mention of proprioception.

      (13) The lack of inclusion of proprioception in the model should be mentioned and referenced in more detail in my opinion.

      As you emphasized, proprioception is an important aspect in the study of C. elegans' locomotion. In our paper, its importance is briefly introduced with a sentence each in the introduction and discussion. However, our research is a model about the process of the creation of body motion originated from muscle forces, and it does not model the sensory system that senses body posture. Therefore, there is no mention of using proprioception in our paper's results section. What is mentioned in the discussion is that ElegansBot can be applied as the kinetic body model part in a combination model of a kinetic body model and a neuronal circuit model that receives proprioception as a sensory signal.

      (14) These are just suggested references.

      (15) There may be more relevant ones available.

      The papers you provided contain specific information about the Young's modulus of the C. elegans body. The first paper (Rahimi et al., 2022) measured the Young's modulus of the cuticle after chemically isolating it from C. elegans, while the second paper (Park et al., 2007) and third paper (Backholm et al., 2013) measured the elasticity and Young's modulus of C. elegans without separating the cuticle. Based on the Young's modulus provided in each paper (although the second and third papers did not measure stiffness in the longitudinal direction), we derived the elastic coefficient (assuming a worm radius of 25 μm, cuticle thickness of 0.5 μm, and 1/25 of longitudinal length of the cuticle of 40 μm). The range was quite broad, from 9.82ⅹ1011 μg/sec2 (from the first paper) to 2.16 ⅹ 108 μg / sec2 (from the third paper). Although the elastic coefficient value in our paper falls within this range, since the range of the elastic coefficient is wide, we think we can modify the elastic coefficient in our paper and will be able to reapply our model if more accurate values become known in the future.

      Reviewer #3

      Public Review

      Summary:

      (1) A mechanical model is used with input force patterns to generate output curvature patterns, corresponding to a number of different locomotion behaviors in C. elegans

      Strengths:

      (2) The use of a mechanical model to study a variety of locomotor sequences and the grounding in empirical data are strengths.

      (3) The matching of speeds (though qualitative and shown only on agar) is a strength.

      Weaknesses:

      (4) What is the relation between input and output data?

      ElegansBot takes the worm's body control angle as the input, and produces trajectory and force of each segment of the worm as the output.

      (5) How does the input-output relation depend on the parameters of the model?

      If 'parameter' is understood as vertical and horizontal friction coefficients, then the explanation for this can be found in Fig. 5 (Fig. 4 in the old version).

      (6) What biological questions are addressed and can significant model predictions be made?

      Equation of motion deciphering locomotion of C. elegans including turning behaviors which were relatively less well understood.

      Recommendations For The Authors

      (7) The novelty and significance of the paper should be clarified.

      We have added quantitative analyses of turning behavior in the revised manuscript, and we hope this will be helpful to you.

      (8) Previously much more detailed models have been published, as compared to this one.

      We hope the reviewer can point out any previous model that we may have missed.

      (9) The mechanics here are simplified (e.g. no information about dorsal/ventral innervation but only a bending angle) setting limitations on the capacity for model predictiveness.

      (10) Such limitations should be discussed.

      We view the difference between dorsal/ventral innervation and bending angle not as a matter of simplification, but rather as a reflection of the hierarchy that our model implements. Our model does not consider dorsal/ventral innervation, but it uses the bending angle to reproduce behavior in various input and frictional environments, which signifies the strong predictiveness of ElegansBot (Figure 2, 3, 5 (2, 3, 4 in the old version)). Moreover, if the midline of C. elegans is incompressible, then modeling by dividing into dorsal/ventral, as opposed to modeling solely with the bending angle, does not increase the degree of freedom of the worm model, and therefore does not increase its predictiveness.

      (11) The aims of the paper and results need to be supported quantitatively and analyzed through parameter sweeps and intervention.

      We have conducted additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

      (12) The methods are given only in broad brushstrokes, and need to be much more clear (and ideally sharing all code).

      We have thoroughly detailed every aspect of this research, from deriving the physical constants of C. elegans, agar, and water to developing the formulas and proofs necessary for operating ElegansBot and its applications. This comprehensive information is all presented in the Results, Methods, and Supplementary Information sections, as well as in the source code. Moreover, we have already ensured that our research can be easily reproduced by providing detailed explanations and by making ElegansBot accessible through public software databases (PyPI, GitHub). To further aid in its application and understanding, especially for those less familiar with the subject, we have also included minimal code as examples in the database. This code is designed to simplify the process of reproducing the results of the paper, thereby making our research more accessible and understandable. Therefore, we believe that readers will easily gain significant assistance from the extensive information we have provided. Should readers require further help, they can always contact us, and we will be readily available to offer support.

      (13) The supporting figures and movies need to include a detailed analysis to evidence the claims.

      We have conducted and provided additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chen et al. used cryo-ET and in vitro reconstituted system to demonstrate that the autoinhibited form of LRRK2 can also assemble into filaments that wrap around the microtubule, although the filaments are typically shorter and less regular compared to the previously reported active-LRRK2 filaments. The structure revealed a new interface involving the N-terminal repeats that were disordered in the previous active-LRRK2 filament structure. The autoinhibited-LRRK2 filament also has different helical parameters compared to the active form.

      Strengths:

      The structure obtained in this study is the highest resolution of LRRK2 filaments done by subtomogram averaging, representing a major technical advance compared to the previous Cell paper from the same group. Overall, I think the data are well presented with beautiful graphic rendering, and valuable insights can be gained from this structural study.

      Weaknesses:

      (1) There are only three main figures, together with 9 supplemental figures. The authors may consider breaking the currently overwhelming Figures 1 and 3 into smaller figures and moving some of the supplemental figures to the main figure, e.g., Figure S7.

      (2) The key analysis of this manuscript is to compare the current structure with the previous active-LRRK2 filament structure. Currently, such a comparison is buried in Figure 3H. It should be part of Figure 1.

      We thank the reviewer for this suggestion. As suggested, we have rearranged the figures, split Figure 1 and 3 into smaller Figures, and moved the comparison analysis in Figure 3H to the new Figure 1. Specifically, the old Figure 1 is separated into two figures, introducing the model-building process and describing the two symmetric axes. The old Figure 3 is also separated into two small figures, describing the geometric analysis and model comparison, respectively.

      Reviewer #2 (Public review):

      The authors of this paper have done much pioneering work to decipher and understand LRRK2 structure and function, to uncover the mechanism by which LRRK2 binds to microtubules, and to study the roles that this may play in biology. Their previous data demonstrated that LRRK2 in the active conformation (pathogenic mutation or Type I inhibitor complex) bound to microtubule filaments in an ordered helical arrangement. This they showed induced a "roadblock" in the microtubule impacting vesicular trafficking. The authors have postulated that this is a potentially serious flaw with Type 1 inhibitors and that companies should consider generating Type 2 inhibitors in which the LRRK2 is trapped in the inactive conformation. Indeed the authors have published much data that LRRK2 complexed to Type 2 inhibitors does not seem to associate with microtubules and cause roadblocks in parallel experiments to those undertaken with type 1 inhibitors published above.

      In the current study, the authors have undertaken an in vitro reconstitution of microtubule-bound filaments of LRRK2 in the inactive conformation, which surprisingly revealed that inactive LRRK2 can also interact with microtubules in its auto-inhibited state. The authors' data shows that while the same interphases are seen with both the active LRRK2 and inactive microtubule bound forms of LRRK2, they identified a new interphase that involves the WD40-ARM-ANK- domains that reportedly contributes to the ability of the inactive form of LRRK2 to bind to microtubule filaments. The structures of the inactive LRRK2 complexed to microtubules are of medium resolution and do not allow visualisation of side chains.

      This study is extremely well-written and the figures are incredibly clear and well-presented. The finding that LRRK2 in the inactive autoinhibited form can be associated with microtubules is an important observation that merits further investigation. This new observation makes an important contribution to the literature and builds upon the pioneering research that this team of researchers has contributed to the LRRK2 fields. However, in my opinion, there is still significant work that could be considered to further investigate this question and understand the physiological significance of this observation.

      We thank the reviewer for the positive comments and we agree that more work can be done next to understand the physiological significance of the autoinhibited LRRK2 in cellular environments. We are actively working on understanding how the stability of autoinhibited full-length LRRK2 is regulated, especially how the transfer between autoinhibited and active forms of LRRK2 can happen. Our in situ data (Watabane et al. 2020) indicates that overexpressed hyperactive PD-mutant LRRK2 mainly adopts its active-like conformation in cells. Thus, learning how the state transfer occurs will allow us to target autoinhibited LRRK2 specifically and efficiently in cells and study its structure and function in physiological conditions.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Chen et al examines the structure of the inactive LRRK2 bound to microtubules using cryo-EM tomography. Mutations in this protein have been shown to be linked to Parkinson's Disease. It is already shown that the active-like conformation of LRRK2 binds to the MT lattice, but this investigation shows that full-length LRRk2 can oligomerize on MTs in its autoinhibited state with different helical parameters than were observed with the active-like state. The structural studies suggest that the autoinhibited state is less stable on MTs.

      Strengths:

      The protein of interest is very important biomedically and a novel conformational binding to microtubules in the proposed.

      Weaknesses:

      (1) The structures are all low resolution.

      We thank the reviewer for the comments on both the strengths and weaknesses of the manuscript. We agree with the reviewer that higher resolution would provide more information about how LRRK2 interacts with microtubules and oligomerizes in its autoinhibited form. However, with the current resolution, our model-building benefited significantly from the published high-resolution models and the alpha-fold predictions. We used cryo-ET and subtomogram analysis to solve the structure because this filament is less regular than the right-handed active LRRK2 filament, preventing us from using conventional single-particle analysis. As highlighted by reviewer 1, being able to push the resolution to sub-nanometer is an important advance reflecting state-of-the-art subtomogram analysis, especially for a heterogeneous sample.  Notably, the microtubule reconstruction reached higher resolution, comparable to our previous single-particle studies on LRRK2-RCKW (Snead and Matyszewski et al.), confirming the data quality.

      (2) There are no measurements of the affinity of the various LRRK2 molecules (with and without inhibitors) to microtubules. This should be addressed through biochemical sedimentation assay.

      We thank the reviewer for the suggestion and we agree that learning the binding affinity between LRRK2 and microtubules would be informative. We attempted to purify the LRRK2 with mutants on the WD40:ARM/ANK interface we identified in the manuscript.. Unfortunately, either LRRK2 or LRRK2<sup>I2020T</sup> with N-terminal mutants (R521A/F573A/E854K), the yield and purity of the final samples are significantly worse than our routine LRRK2 prep. Our chromatography and gel electrophoresis results indicate that proteins are degrading during purification.

      Author response image 1.

      While we have attached the results here, and it would be interesting to investigate why N-terminal mutations destabilize LRRK2, we anticipate that significant efforts would be required for further experiments, which we respectfully consider outside of the scope of this manuscript. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure S9, the graphic definition of "chain length" in panel A is misleading. The authors can simply note in the figure legend that "chain length is the number of asymmetric units in a continuous chain".

      We thank the reviewer for the suggestion. The updated figure and legend have incorporated the changes.

      (2) In Figure S7B, the conformation changes of the 'G-loop' and the 'DYG' motifs are not so convincing at the current resolution.

      We thank the reviewer for pointing it out. We agree that our model resolution is not high enough to support the unbiased observation of the conformation changes of the key kinase motifs. In the revised manuscript, we avoided emphasizing the comparison between the two models. Instead, we state that for both the MLi-2 bound map and the GZD-824 bound map, the corresponding published high-resolution models fit into each kinase map, but the MLi-2 bound model doesn’t fit as well in the GZD-824 bound map, with a correlation value dropped from 0.44 to 0.4, supporting our statement that “full-length LRRK2 bound to microtubules is in its autoinhibited state in our reconstituted system”.

      Reviewer #2 (Recommendations for the authors):

      (1) Are there any cellular experiments that could be done to demonstrate that inactive LRRK2 associates with microtubules in cells?

      We thank the reviewer for pointing out this direction for future studies. We are studying the physiological significance of the autoinhibited LRRK2 in cells, but haven’t yet been successful at demonstrating physiological binding to microtubules. Further, as noted in our response to reviewer #3, we are also actively working on understanding how the stability of autoinhibited full-length LRRK2 is regulated, especially how the transfer between autoinhibited and active forms of LRRK2 can happen. Our in situ data (Watabane et al. 2020) indicates that hyperactive PD-mutant overexpressed LRRK2 mainly adopts its active-like conformation in cells. Thus, learning how the state transfer occurs will allow us to target autoinhibited LRRK2 specifically and efficiently in cells and study its structure and function in physiological conditions.

      (2) Previous work that the authors and others have undertaken has suggested that only LRRK2 in its active conformation can associate with microtubule filaments and the authors have shown that this leads to a roadblock in vesicular transport only when LRRK2 is complexed with Type 1 but not Type 2 inhibitors. There seems to be some discrepancy here that is not addressed in the paper as based on the current results one would also expect LRRK2 bound to Type 2 inhibitors to induce roadblocks in microtubule filaments. How can this be explained?

      We thank the reviewer for raising this important question. Taking all of our published data together, we believe that LRRK2 can introduce roadblocks with Type 1 inhibitor bound in the active-like conformation, where N-terminus LRRK2 domains are flexible and don’t block the kinase active site. In other words, full-length LRRK2 can form roadblocks when it behaves more like the truncated LRRK2<sup>RCKW</sup> variant. The autoinhibited LRRK2 forms shorter and less stable oligomers on microtubules, making it harder to block transport. Consistent with this, our in situ LRRK2-microtubule structure was observed in cells where LRRK2 is in an active-like conformation, and the LRRK2 N-terminus appeared to be flexible and away from the microtubule when forming right-handed filaments.

      (3) Does the finding that inactive LRRK2 only binds to microtubules as a short filament, explain the differences between the inactive and active forms of LRRK2 binding to microtubules and causing roadblocks?

      We thank the reviewer for discussing this point with us and asking the question. As we replied in the previous comment, the reviewer’s conclusion explains how the roadblock phenomenon occurs only under certain circumstances. We expanded our discussion to add the following and address the question:

      “Notably, we previously demonstrated that active‐like LRRK2, when bound to a Type I inhibitor, can form roadblocks that impair vesicular transport. Since autoinhibited LRRK2 assembles into shorter, less stable oligomers on microtubules, we anticipate it will exert reduced road‐blocking effects in cells, regardless of the inhibitor bound.”

      (4) Could the authors undertake further characterization of the new WD40-ARM-ANK interphase that they have identified? Is this important for the binding of the autoinhibited mutant? Could mutants be made in this interphase to see if this prevents the autoinhibited but not the active conformation of LRRK2 binding to microtubules?

      We thank the reviewer for the comment. As mentioned in our response to Reviewer #2, public comment #2, we attempted to purify the LRRK2 with mutants on the WD40:ARM/ANK interface we identified in the manuscript multiple times. Unfortunately, either LRRK2 or LRRK2<sup>I2020T</sup> with N-terminal mutants (R521A/F573A/E854K), the yield and purity of the final samples are significantly worse than our routine LRRK2 prep. Our chromatography and gel electrophoresis results indicate that proteins are degrading during purification.

      (5) The authors identify several disease-relevant missense mutations that appear to lie within the novel interphase that the authors have characterised in this study. Although this is discussed in the Discussion, some experimental data demonstrating how these missense mutations impact the ability of inactive LRRK2 to bind to microtubule filaments in the presence or absence of Type 1 and Type 2 compounds could provide further experimental data that emphasises the physiological importance of the results presented in this study.

      We thank the reviewer for discussing this interesting direction. The disease-relevant missense mutations can have a direct or indirect impact on the binding of autoinhibited LRRK2 to microtubules, and we agree that it would be interesting to test it out in the future. However, we anticipate that significant effort would be required for further experiments. Alas, our funding for this project ended suddenly and we want to report our results to the community.

      (6) For the data that is shown in Figure 1, could the authors explain how this differs from results in previous papers of the authors showing that the active form of LRRK2 binds microtubules? How does the binding observed here differ from that observed in the previous studies? To a non-specialist reader, the data looks fairly like what has previously been reported.

      We thank the reviewer for asking the question. As mentioned in the response to the public review, the detailed comparison between the data and the previous papers is described in Figure 3, and we agree that it is helpful to incorporate this information in Figure 1. In the revised manuscript, we have incorporated the comparison panel in Figure 1.

      (7) The finding that the autoinhibited LRRK2 forms short and sparse oligomers on microtubules raises the question of how physiological this observation is. Having some data that suggests that this is physiologically relevant would boost the impact of this study.

      We agree with the reviewer on this comment. As discussed in the response to the first comment from the reviewer, we have not been able to assess the physiological relevance of LRRK2 binding to microtubules in either active or inactive state, but continue to pursue this line of research. We are aware and regret that this lessens the impact of this work.

      (8) For the more general reader the authors could potentially better highlight why the key finding in this paper is important.

      We thank the reviewer for the suggestion. To further address the significance of the key findings, especially how it can open up more possibilities for inhibitor-based drug development, we expand our discussion section to include the following:

      “Understanding how Type I and Type II inhibitors’ binding to LRRK2 affects its mechanism is vital to the design of inhibitor-based PD drug development strategies. Our findings revealed that different LRRK2 kinase inhibitors bind to autoinhibited LRRK2 similarly either in solution or on microtubules. Furthermore, the observation of autoinhibited LRRK2 forming short, less stable oligomers on microtubules opens new possibilities to inhibit LRRK2 activity in PD patients. A Type I inhibitor specifically targeting autoinhibited LRRK2 may alleviate the effect of LRRK2 roadblocks on microtubules. Alternatively, a promising strategy of LRRK2 inhibitor design can focus on the stabilization of allosteric N-terminus blocking on the kinase domain, which favors the formation of autoinhibited LRRK2 oligomers on microtubules and causes fewer side effects.”

      Reviewer #3 (Recommendations for the authors):

      In the third paragraph of the introduction, expand on whether type-1 inhibitors which "capture kinases in a closed, "active-like" conformation still inhibit the kinase activity.

      We thank the reviewer for the request to expand this paragraph. We added the following explanation for better understanding in the third paragraph:

      “Type-I inhibitors bind to the ATP binding site and target the kinase in its ‘active-like' conformation, inhibiting its kinase activity.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for truly valuable advice and comments. We have made multiple corrections and revisions to the original pre-print accordingly per the following comments:

      1. Pro1153Leu is extremely common in the general population (allele frequency in gnomAD is 0.5). Further discussion is warranted to justify the possibility that this variant contributes to a phenotype documented in 1.5-3% of the population. Is it possible that this variant is tagging other rare SNPs in the COL11A1 locus, and could any of the existing exome sequencing data be mined for rare nonsynonymous variants?

      One possible avenue for future work is to return to any existing exome sequencing data to query for rare variants at the COL11A1 locus. This should be possible for the USA MO case-control cohort. Any rare nonsynonymous variants identified should then be subjected to mutational burden testing, ideally after functional testing to diminish any noise introduced by rare benign variants in both cases and controls. If there is a significant association of rare variation in AIS cases, then they should consider returning to the other cohorts for targeted COL11A1 gene sequencing or whole exome sequencing (whichever approach is easier/less expensive) to demonstrate replication of the association.

      Response: Regarding the genetic association of the common COL11A1 variant rs3753841 (p.(Pro1335Leu)), we do not propose that it is the sole risk variant contributing to the association signal we detected and have clarified this in the manuscript. We concluded that it was worthy of functional testing for reasons described here. Although there were several common variants in the discovery GWAS within and around COL11A1, none were significantly associated with AIS and none were in linkage disequilibrium (R2>0.6) with the top SNP rs3753841. We next reviewed rare (MAF<=0.01) coding variants within the COL11A1 LD region of the associated SNP (rs3753841) in 625 available exomes representing 46% of the 1,358 cases from the discovery cohort. The LD block was defined using Haploview based on the 1KG_CEU population. Within the ~41 KB LD region (chr1:103365089- 103406616, GRCh37) we found three rare missense mutations in 6 unrelated individuals, Table below. Two of them (NM_080629.2:c.G4093A:p.A1365T; NM_080629.2:c.G3394A:p.G1132S), from two individuals, are predicted to be deleterious based on CADD and GERP scores and are plausible AIS risk candidates. At this rate we could expect to find only 4-5 individuals with linked rare coding variants in the total cohort of 1,358 which collectively are unlikely to explain the overall association signal we detected. Of course, there also could be deep intronic variants contributing to the association that we would not detect by our methods. However, given this scenario, the relatively high predicted deleteriousness of rs3753841 (CADD= 25.7; GERP=5.75), and its occurrence in a GlyX-Y triplet repeat, we hypothesized that this variant itself could be a risk allele worthy of further investigation.

      Author response table 1.

      We also appreciate the reviewer’s suggestion to perform a rare variant burden analysis of COL11A1. We did conduct pilot gene-based analysis in 4534 European ancestry exomes including 797 of our own AIS cases and 3737 controls and tested the burden of rare variants in COL11A1. SKATO P value was not significant (COL11A1_P=0.18), but this could due to lack of power and/or background from rare benign variants that could be screened out using the functional testing we have developed.

      1. COL11A1 p.Pro1335Leu is pursued as a direct candidate susceptibility locus, but the functional validation involves both: (a) a complementation assay in mouse GPCs, Figure 5; and (b) cultured rib cartilage cells from Col11a1-Ad5 Cre mice (Figure 4). Please address the following:

      2A. Is Pro1335Leu a loss of function, gain of function, or dominant negative variant? Further rationale for modeling this change in a Col11a1 loss of function cell line would be helpful.

      Response: Regarding functional testing, by knockdown/knockout cell culture experiments, we showed for the first time that Col11a1 negatively regulates Mmp3 expression in cartilage chondrocytes, an AIS-relevant tissue. We then tested the effect of overexpressing the human wt or variant COL11A1 by lentiviral transduction in SV40-transformed chondrocyte cultures. We deleted endogenous mouse Col11a1 by Cre recombination to remove the background of its strong suppressive effects on Mmp3 expression. We acknowledge that Col11a1 missense mutations could confer gain of function or dominant negative effects that would not be revealed in this assay. However as indicated in our original manuscript we have noted that spinal deformity is described in the cho/cho mouse, a Col11a1 loss of function mutant. We also note the recent publication by Rebello et al. showing that missense mutations in Col11a2 associated with congenital scoliosis fail to rescue a vertebral malformation phenotype in a zebrafish col11a2 KO line. Although the connection between AIS and vertebral malformations is not altogether clear, we surmise that loss of the components of collagen type XI disrupt spinal development. in vivo experiments in vertebrate model systems are needed to fully establish the consequences and genetic mechanisms by which COL11A1 variants contribute to an AIS phenotype.

      2B. Expression appears to be augmented compared WT in Fig 5B, but there is no direct comparison of WT with variant.

      Response: Expression of the mutant (from the lentiviral expression vector) is increased compared to mutant. We observed this effect in repeated experiments. Sequencing confirmed that the mutant and wildtype constructs differed only at the position of the rs3753841 SNP. At this time, we cannot explain the difference in expression levels. Nonetheless, even when the variant COL11A1 is relatively overexpressed it fails to suppress MMP3 expression as observed for the wildtype form.

      2C. How do the authors know that their complementation data in Figure 5 are specific? Repetition of this experiment with an alternative common nonsynonymous variant in COL11A1 (such as rs1676486) would be helpful as a comparison with the expectation that it would be similar to WT.

      Response: We agree that testing an allelic series throughout COL11A1 could be informative, but we have shifted our resources toward in vivo experiments that we believe will ultimately be more informative for deciphering the mechanistic role of COL11A1 in MMP3 regulation and spine deformity.

      2D. The y-axes of histograms in panel A need attention and clarification. What is meant by power? Do you mean fold change?

      Response: Power is directly comparable to fold change but allows comparison of absolute expression levels between different genes.

      2E. Figure 5: how many technical and biological replicates? Confirm that these are stated throughout the figures.

      Response: Thank you for pointing out this oversight. This information has been added throughout.

      1. Figure 2: What does the gross anatomy of the IVD look like? Could the authors address this by showing an H&E of an adjacent section of the Fig. 2 A panels?

      Response: Panel 2 shows H&E staining. Perhaps the reviewer is referring to the WT and Pax1 KO images in Figure 3? We have now added H&E staining of WT and Pax1 KO IVD as supplemental Figure 3E to clarify the IVD anatomy.

      1. Page 9: "Cells within the IVD were negative for Pax1 staining ..." There seems to be specific PAX1 expression in many cells within the IVD, which is concerning if this is indeed a supposed null allele of Pax1. This data seems to support that the allele is not null.

      Response: We have now added updated images for the COL11A1 and PAX1 staining to include negative controls in which we omitted primary antibodies. As can be seen, there is faint autofluorescence in the PAX1 negative control that appears to explain the “specific staining” referred to by the reviewer. These images confirm that the allele is truly a null.

      1. There is currently a lack of evidence supporting the claim that "Col11a1 is positively regulated by Pax1 in mouse spine and tail". Therefore, it is necessary to conduct further research to determine the direct regulatory role of Pax1 on Col11a1.

      Response: We agree with the reviewer and have clarified that Pax1 may have either a direct or indirect role in Col11a1 regulation.

      1. There is no data linking loss of COL11A1 function and spine defects in the mouse model. Furthermore, due to the absence of P1335L point mutant mice, it cannot be confirmed whether P1335L can actually cause AIS, and the pathogenicity of this mutation cannot be directly verified. These limitations need to be clearly stated and discussed. A Col11a1 mouse mutant called chondroysplasia (cho), was shown to be perinatal lethal with severe endochondral defects (https://pubmed.ncbi.nlm.nih.gov/4100752/). This information may help contextualize this study.

      Response: We partially agree with the reviewer. Spine defects are reported in the cho mouse (for example, please see reference 36 Hafez et al). We appreciate the suggestion to cite the original Seegmiller et al 1971 reference and have added it to the manuscript.

      1. A recent article (PMID37462524) reported mutations in COL11A2 associated with AIS and functionally tested in zebrafish. That study should be cited and discussed as it is directly relevant for this manuscript.

      Response: We agree with the reviewer that this study provides important information supporting loss of function I type XI collagen in spinal deformity. Language to this effect has been added to the manuscript and this study is now cited in the paper.

      1. Please reconcile the following result on page 10 of the results: "Interestingly, the AISassociated gene Adgrg6 was amongst the most significantly dysregulated genes in the RNA-seq analysis (Figure 3c). By qRT-PCR analysis, expression of Col11a1, Adgrg6, and Sox6 were significantly reduced in female and male Pax1-/- mice compared to wild-type mice (Figure 3d-g)." In Figure 3f, the downregulation of Adgrg6 appears to be modest so how can it possibly be highlighted as one of the most significantly downregulated transcripts in the RNAseq data?

      Response: By “significant” we were referring to the P-value significance in RNAseq analysis, not in absolute change in expression. This language was clearly confusing, and we have removed it from the manuscript.

      1. It is incorrect to refer to the primary cell culture work as growth plate chondrocytes (GPCs), instead, these are primary costal chondrocyte cultures. These primary cultures have a mixture of chondrocytes at differing levels of differentiation, which may change differentiation status during the culturing on plastic. In sum, these cells are at best chondrocytes, and not specifically growth plate chondrocytes. This needs to be corrected in the abstract and throughout the manuscript. Moreover, on page 11 these cells are referred to as costal cartilage, which is confusing to the reader.

      Response: Thank you for pointing out these inconsistencies. We have changed the manuscript to say “costal chondrocytes” throughout.

      Minor points

      • On 10 of the Results: "These data support a mechanistic link between Pax1 and Col11a1, and the AIS-associated genes Gpr126 and Sox6, in affected tissue of the developing tail." qRT-PCR validation of Sox6, although significant, appears to be very modestly downregulated in KO. Please soften this statement in the text.

      Response: We have softened this statement.

      • Have you got any information about how the immortalized (SV40) costal cartilage affected chondrogenic differentiation? The expression of SV40 seemed to stimulate Mmp13 expression. Do these cells still make cartilage nodules? Some feedback on this process and how it affects the nature of the culture what be appreciated.

      Response: The “+ or –“ in Figure 5 refers to Ad5-cre. Each experiment was performed in SV40-immortalized costal chondrocytes. We have removed SV40 from the figure and have clarified the legend to say “qRT-PCR of human COL11A1 and endogenous mouse Mmp3 in SV40 immortalized mouse costal chondrocytes transduced with the lentiviral vector only (lanes 1,2), human WT COL11A1 (lane 3), or COL11A1P1335L. Otherwise we absolutely agree that understanding Mmp13 regulation during chondrocyte differentiation is important. We plan to study this using in vivo systems.

      • Figure 1: is the average Odds ratio, can this be stated in the figure legend?

      Response: We are not sure what is being asked here. The “combined odds ratio” is calculated as a weighted average of the log of the odds.

      • A more consistent use of established nomenclature for mouse versus human genes and proteins is needed.

      Human:GENE/PROTEIN

      Mouse: Gene/PROTEIN

      Response: Thank you for pointing this out. The nomenclature has been corrected throughtout the manuscript.

      • There is no Figure 5c, but a reference to results in the main text. Please reconcile. -There is no Figure 5-figure supplement 5a, but there is a reference to it in the main text. Please reconcile.

      Response: Figure references have been corrected.

      • Please indicate dilutions of all antibodies used when listed in the methods.

      Response: Antibody dilutions have been added where missing.

      • On page 25, there is a partial sentence missing information in the Histologic methods; "#S36964 Invitrogen, CA, USA)). All images were taken..."

      Response: We apologize for the error. It has been removed.

      • Table 1: please define all acronyms, including cohort names.

      Response: We apologize for the oversight. The legend to the Table has been updated with definitions of all acronyms.

      • Figure 2: Indicate that blue staining is DAPI in panel B. Clarify that "-ab" as an abbreviation is primary antibody negative.

      Response: A color code for DAPI and COL11A! staining has been added and “-ab” is now defined.

      • Page 4: ADGRG6 (also known as GPR126)...the authors set this up for ADGRG6 but then use GPR126 in the manuscript, which is confusing. For clarity, please use the gene name Adgrg6 consistently, rather than alternating with Gpr126.

      Response: Thank you for pointing this out. GPR126 has now been changed to ADGRG6 thoughout the manuscript.

      • REF 4: Richards, B.S., Sucato, D.J., Johnston C.E. Scoliosis, (Elsevier, 2020). Is this a book, can you provide more clarity in the Reference listing?

      Response: Thank you for pointing this out. This reference has been corrected.

      • While isolation was addressed, the methods for culturing Rat cartilage endplate and costal chondrocytes are poorly described and should be given more text.

      Response: Details about the cartilage endplate and costal chondrocyte isolation and culture have been added to the Methods.

      • Page 11: 1st paragraph, last sentence "These results suggest that Mmp3 expression"... this sentence needs attention. As written, I am not clear what the authors are trying to say.

      Response: This sentence has been clarified and now reads “These results suggest that Mmp3 expression is negatively regulated by Col11a1 in mouse costal chondrocytes.”

      • Page 13: line 4 from the bottom, "ECM-clearing"? This is confusing do you mean ECM degrading?

      Response: Yes and thank you. We have changed to “ECM-degrading”.

      • Please use version numbers for RefSeq IDs: e.g. NM_080629.3 instead of NM_080629 Response: This change has been made in the revised manuscript.

      • It would be helpful for readers if the ethnicity of the discovery case cohort was clearly stated as European ancestry in the Results main text.

      Response: “European ancestry” has been added at first description of the discovery cohort in the manuscript.

      • Avoid using the term "mutation" and use "variant" instead.

      Response: Thank you for pointing this out. “Variant” is now used throughout the manuscript.

      • Define error bars for all bar charts throughout and include individual data points overlaid onto bars.

      Response: Thank you. Error bars are now clarified in the Figure legends.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reply to reviewer comments:

      (1) Given the interpretations of this study hinge on the specificity of the antibodies used in immune fluorescence, the authors should provide full western-blot images of all their antibodies in supplementary information. 

      The commercial antibodies have been validated by the provider. 

      Additionally, we did our own tests. Of note is that proper validation of any antibody is only possible by using a knockout mouse for each protein analyzed (i.e. for pPKA wt vs. pka ko mice). This is not possible, because we do not have all these knock-out strains. However, specific proteins like pPKA, pCAMKII, and pCAMKIV are known to be increased by a light pulse. We show by western blot that pPKA (Fig. 2a, b) and pCamKII (Fig. S2a, b) are increased in wt animals mirroring what we observed in the immunofluorescence. These results suggest that the signal is specific to these antibodies. We provide a full panel of western blots, including the other proteins studied by immunofluorescence such as pCamKIV, pCREB, CaV 3.1, and pDARP32 and show that they detect a protein of the expected size. Full Western-blots mentioned in the manuscript are shown in Supplementary Figure 7. Below are additional validations of antibodies used in the immunofluorescence experiments.

      Author response image 1.

      Author response image 2.

      (2) The explanation in the results section surrounding Fig. 4 seems to be specific for the representative trace rather than the group. Specifically, does the following statement apply to all the replicates?  " A Ca2+ transient was observed right before the light was given at ZT14 (Fig. 4b), which showed the same magnitude as those observed during and after the light stimulus". 

      If not this should be corrected.  

      We have replaced now Fig. 4b with an average trace of all experiments. The individual traces can be seen in supplementary figure 4d.

      (3) Are lines 236 -244 and figure 5A/B demonstrating shCDK5 being similar to no-calcium or EGTA conditions at the level of CREB not contradicting Figure 3 which argues that the reason behind the increase in CAMK-phosphorylation and pCREB following shCDK5 is increased basal calcium? If this is the case then why does removing the external calcium phenocopy shCDK5 in these cells? The authors need to clarify this and give an explanation. 

      (4) The authors should explain why they see an equivalent level (or more) of CREB activation, 5 minutes following forskolin activation in Ca2+-free condition (apparent in the case of shCDK5 and EGTA) in the FRET assay. Does this not imply PKA is the most likely candidate mediating this reaction at this stage? Given this interaction has been demonstrated in multiple (other) experiments including in vitro isolated enzyme experiments involving CREB and PKA (E.G. fig 6A in PMID: 2900470) an absence of p-PKA pulldown is not sufficient to justify the non-involvement of PKA (PMID: 22583753). This statement needs support in the form of positive data or acknowledging the limitations in the text (conditions, single technique, etc). 

      (5) The authors should better explain the fret pairs used in the experiments involving ICAP for the reader's benefit - a reduction in fluorescence as a function of CREB activation is non-intuitive.

      We answer all three questions (3-5) together since they belong to the same concept.

      (1) How FRET works.

      The Forster resonance energy transfer (FRET) technique is widely used to investigate molecular interactions between proteins such as CREB: CBP in living cells. We used a sensor called ICAP (an Indicator of CREB Activation due to Phosphorylation) published by Friedrich and colleagues in 2010

      (https://doi.org/10.1074/jbc.M110.124545). The sensor is composed of three different elements: 1) the KID domain of CREB containing the Ser-133, which is phosphorylated upon forskolin induction in our experimental setup, 2) the KIX domain of CBP, which is responsible for the dimerization with phospho-CREB and 3) a short linker that separates the KID with the KIX domain. KID is flanked by a cyan fluorescent protein (CFP), while KIX is flanked by a yellow fluorescent protein (YFP). When KID is not phosphorylated, the ICAP conformation allows CFP - stimulated by blue UV light - to transfer energy to YFP, producing FRET resulting in yellow light emission. Therefore, the ratiometric analysis FRET/CFP shows FRET > CFP. After a stimulus (forskolin), the serine-133 in KID is phosphorylated and KID can bind to KIX. The dimerization separates CFP from YFP, resulting in decreased FRET and increased CFP-dependent blue light emission (see Author response image 3 below). Therefore, the ratiometric analysis FRET/CFP shows FRET<CFP over time (usually within 20’ after the forskolin stimulus).

      Author response image 3.

      FRET model. On the left is a schematic representation of how ICAP works. On the right, an example of the quantified FRET decrease associated with increased KID: KIX interaction.

      (2) The ‘apparent’ contradiction between Figure 5A and Fig 3.

      As mentioned before, the chosen FRET method is ratiometric, meaning that a relative FRET signal in fluorescence is measured compared to the baseline (absence of forskolin, assay buffer). The FRET experiment can only tell whether there is a change in the phosphorylation state of KID during the live imaging comparing the baseline to the period after the forskolin treatment. The result produces a delta [ (time after forskolin)(baseline)]. The higher the delta, the more KID is phosphorylated after forskolin treatment. If KID phosphorylation is not increased compared to the baseline, the FRET signal tends to return to the baseline with a reduced delta [ (time after forskolin)-(baseline)]. Therefore, the experiment does not tell at the quantitative level the amount of KID (CREB domain) phosphorylation before the stimulus. It only tells whether after the stimulus the phosphorylation is increased producing or not a delta. This means that the lack of delta can be caused by: A) high KID phosphorylation in the baseline which does not further increase after the forskolin stimulus; B) very low KID phosphorylation in the baseline which does not increase after the forskolin stimulus. In Fig. 5A, wt cells (orange trace, lines, and double arrow) show a higher delta compared to the ko cells (blue trace, lines, and double arrow). The result indicated that the phosphorylation of CREB (KID domain) is increased after the forskolin stimulus only in the wt. To that extent, the results are in line with the experiment that we show in Figure 3. Indeed, the increased delta in CREB phosphorylation is observed only in the scramble animals, where it is lost in the ko (the blue double arrow indicates the delta in the scramble). 

      Author response image 4.

      (3) The FRET signal within 3 minutes after forskolin stimulation

      The signal mentioned by the reviewers at 5’ is an artifact given by the light diffraction promoted by the addition of Forskolin in DMSO which propagates through the plate. The same effect is observed in the only DMSO treatment (Fig.S5). Therefore, it needs not to be taken into account. The amplitude of this signal in this window of time is due to many independent variables (buffer composition, cell shape, room temperature, pipetting), therefore it is not possible to speculate any consideration about it. We never consider this time window for describing our results.

      Author response image 5.

      (4) Role of PKA and considerations about experiments performed in Fig. 5a and b

      To answer the question about the role of PKA, we believe it is a pivotal player. Our results indicate that PKA might promote CaV3.1, the entrance of calcium, and therefore, CAM Kinase pathway activation leading to CREB phosphorylation (Fig. 5). However, if the calcium is depleted, even a channel activation mediated by PKA cannot propagate the signal. For that reason, when we deplete calcium in wt cells as we do in the experiment performed in Figure 5B the activation of PKA alone cannot promote the CREB phosphorylation associated with a reduction of the FRET signal. As mentioned before, the FRET method gives a binary answer. It means either a higher or lower delta comparing time after forskolin to baseline. It cannot give stoichiometric info about the level of calcium and/or phosphorylation in the baseline. To that extent, the FRET experiment in Figure 5A cannot be connected to the experiment in Figure 5B. The method is the same, but the scientific questions are different. In Figure 5A we demonstrate that CDK5 plays a role in the PKA activation pathway. In Figure 5B we demonstrate that the general pathway needs calcium.

      We modified the text accordingly.

      (6) The presentation of the data in Figure 6 seems to be divergent from the rest of the data presentations. Please make it more consistent and also provide more explanations. Specifically, the authors suggest increased P-CREB nuclear localization (and an increase in phosphorylated PKA/CAMK) following shCDK5. Won't this lead to an increase in Per1, Dec1, cFos, and Sik1 basally (pre-light pulse)?

      We followed the reviewer's suggestion and present data in Figure 6 as done before in the manuscript. The reviewers should also consider our papers published before (Brenna et al., 2019; Brenna et al., 2021). In these papers, we demonstrate two important concepts that are in line with this manuscript. First, the lack of CDK5 promotes PER2 degradation and lack of nuclear translocation (Brenna et al., 2019). Second, PER2 plays a scaffold role in promoting the formation of the CREB transcriptional complex involved in the regulation of the expression of light-dependent genes (Brenna et al., 2021). Therefore, the take-home message here is that even if a lack of Cdk5 promotes a higher basal level of CREB phosphorylation, it also promotes PER2 degradation. Therefore, without PER2, the CREB-dependent gene expression is reduced. For this reason, we say that CDK5 gates phase shift (via PKA-CAM Kinases-CREB axis) of the circadian clock (via PER2).

      (7) The authors should discuss why calcium-sensitive phosphatases such as PP2A (PMID: 23752926) or calcineurin (PMID: 10217279) are not considered candidates for dephosphorylation of DARPP32 as these are described previously (CDK5) and conditions of increased calcium as seen here would favour these enzymes. The phospho-T75 data are supportive, but such additional discussion could be important given the past demonstrations.

      We thank the reviewers for the great insight. The pathway that promotes the T75 phosphorylation/dephosphorylation indeed includes many players as calcineurin and PPA2A. We mention this in the discussion now as follows:

      However, phosphatases such as PP2A and calcineurin, which de-phosphorylate DARPP32 including the Cdk5 phosphorylation site, may be involved in this process as well (Girault and Nairn, 2021). Upon light treatment and increase of Ca2+ these phosphatases would dephosphorylate DARPP32 and thereby inactivate it, leading to PKA activation. This process may occur in parallel to the Cdk5 regulation of DARPP32 contributing to a sustained activation of the light signaling pathway via PKA activation.

      (8) additional details on the knock-downs would be helpful: 

      - the relative amount of reduction in gene expression upon shRNA treatment should be provided  - How was the exact viral delivery and reduction in shRNA-induced knock-down confirmed for the individual animals?  

      The validation of Cdk5 knockdown was widely performed in the previous paper (Brenna et al., 2019, Fig2-Fig supp1, and Fig3-Fig suppl2). We used the same mice. We confirmed the goodness of the silencing also in the supp figure 1A of the current paper.

      (9) The authors only focus on male mice. This is rather incomplete, as it leaves away an important half of biological reality. Testing relevant aspects of the work in female mice would close this significant gap and also increase the number of biological replicates, which can still be considered relatively low. 

      We thank the reviewers for the suggestion. We injected female mice and performed the Ashoff type-II light pulse experiment at ZT14 and observe the same phenotype as for male mice. This is stated now in the paper and the data are shown in supplemental figure 1 e-f.

      (10) Given the roles of CdK5 in circadian clock period length regulation, but also light-induced phase delays, it would be interesting for a broader audience to discuss possible expectations of CdK5's roles, e.g. 

      (a) How will other circadian parameters, eg. activity bouts (numbers, length, activity onset/ offset) be affected? 

      (b) How does that relate to sleep, sleep phases? 

      (c) What is the expected impact on other physiological rhythms, eg food intake, cortisol levels? 

      (d) What are the expected effects on circadian oscillation of gene expression in other brain regions, organs? 

      We thank the reviewers for the observations. 

      a) The activity was discussed in the previous paper (Brenna et al. 2019). ShCdk5 mice show a reduced activity in both DD and LD 12:12 compared to wt, mirroring the Per2 brdm phenotype (Figure- Suppl3, with the difference mostly observed at night time (Figure 2-suppl4).

      We also demonstrate in Suppl Fig1 b, c of the current paper that light pulse does not affect the period length either in scramble mice or in sh Cdk5.

      b) We performed preliminary experiments with SCN shCdk5 knock-down animals and compared them to scr control mice using the Piezo sleep system. Total sleep was not different, however during the dark phase shCdk5 animals tended to sleep a bit more, similar to the neuronal Per2 KO animals (Wendrich et al., 2023 https://doi.org/10.3390/clockssleep5020017 ). After sleep-deprivation no differences were observed between shCdk5 and scr animals. This was comparable to the neuronal Per2 KO animals that also showed no phenotype after sleep deprivation.

      c) and d) We did not investigate food intake, cortisol, or other parameters involving peripheral clocks. We did not investigate the gene expression in other brain regions because the SCN is the main brain region involved in the regulation of the circadian clock phase shift. However future studies will address these questions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Campbell et al investigated the effects of light on the human brain, in particular the subcortical part of the hypothalamus during auditory cognitive tasks. The mechanisms and neuronal circuits underlying light effects in non-image forming responses are so far mostly studied in rodents but are not easily translated in humans. Therefore, this is a fundamental study aiming to establish the impact light illuminance has on the subcortical structures using the high-resolution 7T fMRI. The authors found that parts of the hypothalamus are differently responding to illuminance. In particular, they found that the activity of the posterior hypothalamus increases while the activity of the anterior and ventral parts of the hypothalamus decreases under high illuminance. The authors also report that the performance of the 2-back executive task was significantly better in higher illuminance conditions. However, it seems that the activity of the posterior hypothalamus subpart is negatively related to the performance of the executive task, implying that it is unlikely that this part of the hypothalamus is directly involved in the positive impact of light on performance observed. Interestingly, the activity of the posterior hypothalamus was, however, associated with an increased behavioural response to emotional stimuli. This suggests that the role of this posterior part of the hypothalamus is not as simple regarding light effects on cognitive and emotional responses. This study is a fundamental step towards our better understanding of the mechanisms underlying light effects on cognition and consequently optimising lighting standards. 

      Strengths: 

      While it is still impossible to distinguish individual hypothalamic nuclei, even with the highresolution fMRI, the authors split the hypothalamus into five areas encompassing five groups of hypothalamic nuclei. This allowed them to reveal that different parts of the hypothalamus respond differently to an increase in illuminance. They found that higher illuminance increased the activity of the posterior part of the hypothalamus encompassing the MB and parts of the LH and TMN, while decreasing the activity of the anterior parts encompassing the SCN and another part of TMN. These findings are somewhat in line with studies in animals. It was shown that parts of the hypothalamus such as SCN, LH, and PVN receive direct retinal input in particular from ipRGCs. Also, acute chemogenetic activation of ipRGCs was shown to induce activation of LH and also increased arousal in mice. 

      Weaknesses: 

      While the light characteristics are well documented and EDI calculated for all of the photoreceptors, it is not very clear why these irradiances and spectra were chosen. It would be helpful if the authors explained the logic behind the four chosen light conditions tested. Also, the lights chosen have cone-opic EDI values in a high correlation with the melanopic EDI, therefore we can't distinguish if the effects seen here are driven by melanopsin and/or other photoreceptors. In order to provide a more mechanistic insight into the light-driven effects on cognition ideally one would use a silent substitution approach to distinguish between different photoreceptors. This may be something to consider when designing the follow-up studies. 

      Reviewer #1 (Recommendations For The Authors): 

      (1) As suggested in the public review more information regarding the reasons behind the chosen light condition is needed. 

      While the light characteristics are well documented and EDI calculated for all of the photoreceptors, it is not very clear why these irradiances and spectra were chosen. It would be helpful if the authors explained the logic behind the four chosen light conditions tested. Also, the lights chosen have cone-opic EDI values in a high correlation with the melanopic EDI, therefore we can't distinguish if the effects seen here are driven by melanopsin or cone opsins. In order to provide a more mechanistic insight into the light-driven effects on cognition ideally one would use a silent substitution approach to distinguish between different photoreceptors. 

      (2) In support of this work, it was shown in mice that acute activation of ipRGCs using chemogenetics induces c-fos in some of the hypothalamic brain areas discussed here including LH (Milosavljevic et al, 2016 Curr Biol). Another study to consider including in the discussion is by Sonoda et al 2020 Science, in which the authors showed that a subset of ipRGCs release GABA. 

      (3) Figure 1 looks squashed, especially the axes. Also, Figure 2 looks somewhat blurry. I would suggest that the authors edit the figures to correct this.

      We thank the reviewer for their positive comments and agree with the weaknesses they pointed out. 

      (1) The explanation regarding the choice of the illuminance is now included in the revised manuscript (PAGE 17): “Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al., 2010, 2011). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.”

      The revised discussion makes clear that these choices limit the interpretation about the photoreceptors involved (PAGES 12-13): “We based our rationale and part of our interpretations on ipRGC projections, which have been demonstrated in rodents to channel the NIF biological impact of light and incorporate the inputs from rods and cones with their intrinsic photosensitivity into a light signal that can impact the brain (Güler et al., 2008; Tri & Do, 2019). Given the polychromatic nature of the light we used, classical photoreceptors and their projections to visual brain areas are, however, very likely to have directly or indirectly contributed to the modulation by light of the regional activity of the hypothalamus.”

      The discussion also points out the promises of silent substitution (PAGE 13): “Future human studies could isolate the contribution of each photoreceptor class to the impact of light on cognitive brain functions by manipulating prior light history (Chellappa et al., 2014) or through the use of silent substitutions between metameric light exposures (Viénot et al., 2012)”.

      (2) We now refer to the studies by Milosavljevic et al. and Sonoda et al. 

      PAGE 9: “Our data may therefore be compatible with an increase in orexin release by the LH with increasing illuminance. In line with this assumption, chemoactivation of ipRGCs lead to increase c-fos production, a marker of cellular activation, over several nuclei of the hypothalamus, including the lateral hypothalamus (Milosavljevic et al., 2016). If this initial effect of light we observe over the posterior part of the hypothalamus was maintained over a longer period of exposure, this would stimulate cognition and maintain or increase alertness (Campbell et al., 2023) and may also be part of the mechanisms through which daytime light increases the amplitude in circadian variations of several physiological features (BanoOtalora et al., 2021; Dijk et al., 2012).”

      PAGE 10: “Chemoactivation of ipRGCs in rodents led to an increase activity of the SCN, over the inferior anterior hypothalamus, but had no impact on the activity of the VLPO, over the superior anterior hypothalamus (Milosavljevic et al., 2016). How our findings fit with these fine-grained observations and whether there are species-specific differences in the responses to light over the different part of the hypothalamus remains to be established.”

      PAGE 10: “In terms of chemical communication, these changes in activity could be the results of an inhibitory signal from a subclass of ipRGCs, potentially through the release aminobutyric acid (GABA), as a rodent study found that a subset of ipRGCs release GABA at brain targets including the SCN (and intergeniculate leaflet and ventral lateral geniculate nucleus), leading to a reduction in the ability of light to affect pupil size and circadian photoentrainment (Sonoda et al., 2020). Whatever the signalling of ipRGC, our finding over the anterior hypothalamus could correspond to a modification of GABA signalling of the SCN which has been reported to have excitatory properties, such that the BOLD signal changes we report may correspond to a reduction in excitation arising in part from the SCN (Albers et al., 2017).”

      (3) Figures 1 and 2 were modified. We hope their quality is now satisfactory. We are willing to provide separate figures prior to publication of the Version of Record.

      Reviewer #2 (Public Review): 

      Summary 

      The interplay between environmental factors and cognitive performance has been a focal point of neuroscientific research, with illuminance emerging as a significant variable of interest. The hypothalamus, a brain region integral to regulating circadian rhythms, sleep, and alertness, has been posited to mediate the effects of light exposure on cognitive functions. Previous studies have illuminated the role of the hypothalamus in orchestrating bodily responses to light, implicating specific neural pathways such as the orexin and histamine systems, which are crucial for maintaining wakefulness and processing environmental cues. Despite advancements in our understanding, the specific mechanisms through which varying levels of light exposure influence hypothalamic activity and, in turn, cognitive performance, remain inadequately explored. This gap in knowledge underscores the need for high-resolution investigations that can dissect the nuanced impacts of illuminance on different hypothalamic regions. Utilizing state-of-the-art 7 Tesla functional magnetic resonance imaging (fMRI), the present study aims to elucidate the differential effects of light on the hypothalamic dynamics and establish a link between regional hypothalamic activity and cognitive outcomes in healthy young adults. By shedding light on these complex interactions, this research endeavours to contribute to the foundational knowledge necessary for developing innovative therapeutic strategies aimed at enhancing cognitive function through environmental modulation. 

      Strengths: 

      (1) Considerable Sample Size and Detailed Analysis: The study leverages a robust sample size and conducts a thorough analysis of hypothalamic dynamics, which enhances the reliability and depth of the findings. 

      (2) Use of High-Resolution Imaging: Utilizing 7 Tesla fMRI to analyze brain activity during cognitive tasks offers high-resolution insights into the differential effects of illuminance on hypothalamic activity, showcasing the methodological rigor of the study. 

      (3) Novel Insights into Illuminance Effects: The manuscript reveals new understandings of how different regions of the hypothalamus respond to varying illuminance levels, contributing valuable knowledge to the field. 

      (4) Exploration of Potential Therapeutic Applications: Discussing the potential therapeutic applications of light modulation based on the findings suggests practical implications and future research directions. 

      Weaknesses: 

      (1) Foundation for Claims about Orexin and Histamine Systems: The manuscript needs to provide a clearer theoretical or empirical foundation for claims regarding the impact of light on the orexin and histamine systems in the abstract. 

      (2) Inclusion of Cortical Correlates: While focused on the hypothalamus, the manuscript may benefit from discussing the role of cortical activation in cognitive performance, suggesting an opportunity to expand the scope of the manuscript. 

      (3) Details of Light Exposure Control: More detailed information about how light exposure was controlled and standardized is needed to ensure the replicability and validity of the experimental conditions. 

      (4) Rationale Behind Different Exposure Protocols: To clarify methodological choices, the manuscript should include more in-depth reasoning behind using different protocols of light exposure for executive and emotional tasks. 

      Reviewer #2 (Recommendations For The Authors): 

      Attention to English language precision and correction of typographical errors, such as "hypothalamic nuclei" instead of "hypothalamus nuclei," is necessary for enhancing the manuscript.

      We thank the reviewer for recognising the interest and strength of our study.

      (1) As detailed in the discussion, we do believe orexin and histamine are excellent candidates for mediating the results we report. As also pointing out, however, we are in no position to know which neurons, nuclei, neurotransmitter and neuromodulator underlie the results. The last sentence of the abstract (PAGE 2) was therefore removed as we agree the statement was too strong. We carefully reconsider the discussion and believe that no such overstatement was present.

      (2) Hypothalamus nuclei are connected to multiple cortical (and subcortical) structures. The relevance of these projections will vary with the cognitive task considered. In addition, we have not yet considered the cortex in our analyses such that truly integrating cortical structures appears premature. 

      We nevertheless added the following short statement (PAGE 11): “Subcortical structures, and particularly those receiving direct retinal projections, including those of the hypothalamus, are likely to receive light illuminance signal first before passing on the light modulation to the cortical regions involved in the ongoing cognitive process (Campbell et al., 2023).”

      (3) We now include the following as part of the method section (PAGES 16-17): “Illuminance and spectra could not be directly measured within the MRI scanner due to the ferromagnetic nature of measurement systems. The coil of the MRI and the light stand, together with the lighting system were therefore placed outside of the MR room to reproduce the experimental conditions of the in a completely dark room. A sensor was placed 2 cm away from the mirror of the coil that is mounted at eye level, i.e. where the eye of the first author of the paper would be positioned, to measure illuminance and spectra. The procedure was repeated 4 times for illuminance and twice for spectra and measurements were averaged. This procedure does not take into account interindividual variation in head size and orbit shape such that the reported illuminance levels may have varied slightly across subjects. The relative differences between illuminance are, however, very unlikely to vary substantially across participants such that statistics consisting of tests for the impact of relative differences in illuminance were not affected. The detailed values reported in Supplementary Table 2 were computed combining spectra and illuminance using the excel calculator associated with a published work (Lucas et al., 2014).”

      (4) The explanation regarding the choice of the illuminance is now included in the revised manuscript (PAGE 17): “Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al., 2010, 2011). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.”

      (5) The manuscript was thoroughly rechecked, and we hope to have spotted all typos and language errors.

      Reviewer #3 (Public Review): 

      Summary: 

      Campbell and colleagues use a combination of high-resolution fMRI, cognitive tasks, and different intensities of light illumination to test the hypothesis that the intensity of illumination differentially impacts hypothalamic substructures that, in turn, promote alterations in arousal that affect cognitive and affective performance. The authors find evidence in support of a posterior-to-anterior gradient of increased blood flow in the hypothalamus during task performance that they later relate to performance on two different tasks. The results provide an enticing link between light levels, hypothalamic activity, and cognitive/affective function, however, clarification of some methodological choices will help to improve confidence in the findings. 

      Strengths: 

      * The authors' focus on the hypothalamus and its relationship to light intensity is an important and understudied question in neuroscience. 

      Weaknesses: 

      (1) I found it challenging to relate the authors' hypotheses, which I found to be quite compelling, to the apparatus used to test the hypotheses - namely, the use of orange light vs. different light intensities; and the specific choice of the executive and emotional tasks, which differed in key features (e.g., block-related vs. event-related designs) that were orthogonal to the psychological constructs being challenged in each task. 

      (4) Given the small size of the hypothalamus and the irregular size of the hypothalamic parcels, I wondered whether a more data-driven examination of the hypothalamic time series would have provided a more parsimonious test of their hypothesis. 

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors may wish to explain the importance of the orange light condition in the early section of the results -- i.e., when they first present the task structure. As it stands, I don't have a good appreciation of why the orange light was included -- was it a control condition? And if the differences between the light conditions (e.g., the narrow- vs. wide-band of light) were indeed ignored by focussing on the illuminance levels, are there any potential issues that the authors could then mitigate against with further experiments/analyses? 

      (2) Are there other explanations for why illuminance levels might improve cognitive performance? For instance, the capacity to more easily perceive the stimuli in an experiment could plausibly make it easier to complete a given task. If this is the case, can the authors conceptualise a way to rule out this hypothesis? 

      (3) Did the authors control for the differences in the number of voxels in each hypothalamic subregion? Or perhaps consider estimating the variance across voxels within the larger parcels, to determine whether the mean time series was comparable to the time series of the smaller parcels? 

      (4) An alternative strategy that would mitigate against the differences in the size of hypothalamic parcels would be to conduct analyses on the hypothalamus without parcellation, but instead using dimensionality reduction techniques to observe the natural spread of responses across the hypothalamus. From the authors' results, my intuition is that these analyses will lead to similar conclusions, albeit without any of the potential issues with respect to differently-sized parcels. 

      We thank the reviewer for acknowledging the originality and interest of our study. We agree that some methodological choices needed more explanation. We will address the weaknesses they pointed out as follows:

      (1) The explanation regarding the choice of the illuminance is now included in the revised manuscript (PAGE 17): “Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al., 2010, 2011). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.”

      The revised discussion makes clear that these choices limit the interpretation about the photoreceptors involved (PAGE 12-13): “We based our rationale and part of our interpretations on ipRGC projections, which have been demonstrated in rodents to channel the NIF biological impact of light and incorporate the inputs from rods and cones with their intrinsic photosensitivity into a light signal that can impact the brain (Güler et al., 2008; Tri & Do, 2019). Given the polychromatic nature of the light we used, classical photoreceptors and their projections to visual brain areas are, however, very likely to have directly or indirectly contributed to the modulation by light of the regional activity of the hypothalamus.”

      We further mention that (PAGE 13): “Furthermore, we cannot exclude that colour and/or spectral differences between the orange and 3 blue-enriched light conditions may have contributed to our findings. Research in rodent model demonstrated that variation in the spectral composition of light was perceived by the suprachiasmatic nucleus to set circadian timing (Walmsley et al., 2015). No such demonstration has, however, been reported yet for the acute impact of light on alertness, attention, cognition or affective state.”

      Regarding the choice of tasks, we added the following the method section (PAGE 18): “Prior work of our team showed that the n-back task and emotional task included in the present protocol were successful probes to demonstrate that light illuminance modulates cognitive activity, including within subcortical structures (though resolution did not allow precise isolation of nuclei or subparts) (e.g. (Vandewalle et al., 2007, 2010)). When taking the step of ultra-high-field imaging, we therefore opted for these tasks as our goal was to show that illuminance affects brain activity across cognitive domains while not testing for task-specific aspects of these domains.”

      We further added to the discussion (PAGE 8): “The pattern of light-induced changes was consistent across an executive and an emotional task which consisted of block and an event-related fMRI design, respectively. This suggests that a robust anterior-posterior gradient of activity modulation by illuminance is present in hypothalamus across cognitive domains.”

      (2) We are unsure what the reviewer refers to when he states that the experiment could make it easier to perceive a stimulus. Aside from the fact that illuminance can increase alertness and attention such that a stimulus may be better or more easily perceived/processed, we do not see how blocks of ambient light, i.e. a long-lasting visual stimulus, may render auditory stimulation (letters or pseudo-words in the present) easier to perceive. To our knowledge multimodal or cross-modal integration has been robustly demonstrated for short visual/auditory cues that would precede or accompany auditory/visual stimulation. 

      We are willing to clarify this issue in the text if we receive additional explanation from the reviewer.

      (3) We added subpart size as covariate in the analyses (instead of subpart number) and it did not affect the output of the statistical analyses (Author response table 1). 

      For completeness, we further computed standard deviation of the activity estimates of the voxels within each parcel for the main analysis of the n-back tasks and found a main effect of subpart (Author response table 2) indicating that the variability of the estimates varied across subparts. Post hoc contrast and the display included in Author response image1 show however that the difference were not related to subpart size per see. It is in fact the largest subpart (subpart 4) that shows the largest variability while one of the smallest subpart (subpart 2) shows the lowest variability. Though it may have contributed, it is therefore unlikely to explain our findings. We consider the analyses reported in (Author response table 1 and 2 and (Author response image 1 as very technical and did not include it in the supplementary material for conciseness. If the reviewer judges it essential, we can reconsider our decision.  

      While computing these analyses, we realized that there were errors in the table 1 reporting the statistical outcomes of the main analyses of the emotional task. The main statistical outputs remain the same except for a nominal main effect of the task (emotional vs. neutral) and the fact that post hoc show a consistent difference between the posterior subpart (subpart 3) and all the other subparts, rather than all the other subparts except for the difference with superior tubular hypothalamus subpart: p-corrected = 0.09. We apologise for this slight error and were unable to isolate its origin. It does not modify the rest of the analyses (which were also rechecked) and the interpretations. 

      Author response table 1.

      Recomputations of the main GLMMs using subpart sizes rather than subpart numbers as covariate of interest.

      Author response image 1.

      Activity estimate variability per hypothalamus subpart and subpart size.  

      Author response table 2.

      Difference in activity estimate standard deviation between hypothalamus subparts during the n-back task.

      Outputs of the generalized linear mixed model (GLMM) with subject as the random factor (intercept and slope), and task and subpart as repeated measures (ar(1) autocorrelation).

      * The corrected p-value for multiple comparisons over 2 tests is p < 0.025.

      # Refer to Fig.2A for correspondence of subpart numbers

      The text referring to Table 1 was modified accordingly (PAGE 5): “A nominal main effect of the task was detected for the emotional task [p = 0.049; Table 1] but not for the n-back task. For both tasks, there was no significant main effect for any of the other covariates and post hoc analyses showed that the index of the illuminance impact was consistently different in the posterior hypothalamus subpart compared to the other subparts [pcorrected ≤ 0.05]”.

      (4) We agree that a data driven approach could have constituted an alternative means to tests our hypothesis. We opted for an approach that we mastered best, while still allowing to conclusively test for regional differences in activity across the hypothalamus. Examination of time series of the very same data we used will mainly confirm the results of our analyses – an anterior-posterior gradient in the impact of illuminance - while it may yield slight differences in the boarders of the subparts of the hypothalamus undergoing decreased or increased activity with increasing illuminance. While the suggested approach may have been envisaged if we had been facing negative results (i.e. no differences between subparts, potentially because subparts would not reflect functional differences in response to illuminance change), it would constitute a circular confirmation of our main findings (i.e. using the same data). While we truly appreciate the suggestion, we do not consider that it would constitute a more parsimonious test of our hypothesis, now that we successfully applied GLM/parcellation and GLMM approaches.

      We added the following statement to the discussion to take this comment into account (PAGE 12): “Future research may consider data-driven analyses of hypothalamus voxels time series as an alternative to the parcellation approach we adopted here. This may refine the delineation of the subparts of the hypothalamus undergoing decreased or increased activity with increasing illuminance.”

      Response references

      Albers, H. E., Walton, J. C., Gamble, K. L., McNeill, J. K., & Hummer, D. L. (2017). The dynamics of GABA signaling: Revelations from the circadian pacemaker in the suprachiasmatic nucleus. Frontiers in Neuroendocrinology, 44, 35–82. https://doi.org/10.1016/J.YFRNE.2016.11.003

      Bano-Otalora, B., Martial, F., Harding, C., Bechtold, D. A., Allen, A. E., Brown, T. M., Belle, M. D. C., & Lucas, R. J. (2021). Bright daytime light enhances circadian amplitude in a diurnal

      mammal. Proceedings of the National Academy of Sciences of the United States of America, 118(22), e2100094118. https://doi.org/10.1073/PNAS.2100094118/SUPPL_FILE/PNAS.2100094118.SAPP.PDF

      Campbell, I., Sharifpour, R., & Vandewalle, G. (2023). Light as a Modulator of Non-Image-Forming Brain Functions Positive and Negative Impacts of Increasing Light Availability. Clocks & Sleep, 5(1), 116. https://doi.org/10.3390/CLOCKSSLEEP5010012

      Chellappa, S. L., Ly, J. Q. M., Meyer, C., Balteau, E., Degueldre, C., Luxen, A., Phillips, C., Cooper, H. M., & Vandewalle, G. (2014). Photic memory for executive brain responses. Proceedings of the National Academy of Sciences of the United States of America, 111(16), 6087–6091. https://doi.org/10.1073/pnas.1320005111

      Dijk, D. J., Duffy, J. F., Silva, E. J., Shanahan, T. L., Boivin, D. B., & Czeisler, C. A. (2012). Amplitude reduction and phase shifts of melatonin, cortisol and other circadian rhythms after a gradual advance of sleep and light exposure in humans. PloS One, 7(2). https://doi.org/10.1371/JOURNAL.PONE.0030037

      Güler, A. D., Ecker, J. L., Lall, G. S., Haq, S., Altimus, C. M., Liao, H. W., Barnard, A. R., Cahill, H., Badea, T. C., Zhao, H., Hankins, M. W., Berson, D. M., Lucas, R. J., Yau, K. W., & Hattar, S. (2008). Melanopsin cells are the principal conduits for rod-cone input to non-image-forming vision. Nature, 453(7191), 102–105. https://doi.org/10.1038/nature06829

      Lucas, R. J., Peirson, S. N., Berson, D. M., Brown, T. M., Cooper, H. M., Czeisler, C. A., Figueiro, M. G., Gamlin, P. D., Lockley, S. W., O’Hagan, J. B., Price, L. L. A., Provencio, I., Skene, D. J., & Brainard, G. C. (2014). Measuring and using light in the melanopsin age. Trends in Neurosciences, 37(1), 1–9. https://doi.org/10.1016/j.tins.2013.10.004

      Milosavljevic, N., Cehajic-Kapetanovic, J., Procyk, C. A., & Lucas, R. J. (2016). Chemogenetic Activation of Melanopsin Retinal Ganglion Cells Induces Signatures of Arousal and/or Anxiety in Mice. Current Biology, 26(17), 2358–2363. https://doi.org/10.1016/j.cub.2016.06.057

      Sonoda, T., Li, J. Y., Hayes, N. W., Chan, J. C., Okabe, Y., Belin, S., Nawabi, H., & Schmidt, T. M. (2020). A noncanonical inhibitory circuit dampens behavioral sensitivity to light. Science (New York, N.Y.), 368(6490), 527–531. https://doi.org/10.1126/SCIENCE.AAY3152

      Tri, M., & Do, H. (2019). Melanopsin and the Intrinsically Photosensitive Retinal Ganglion Cells: Biophysics to Behavior. Neuron, 104, 205–226. https://doi.org/10.1016/j.neuron.2019.07.016

      Vandewalle, G., Hébert, M., Beaulieu, C., Richard, L., Daneault, V., Garon, M. Lou, Leblanc, J., Grandjean, D., Maquet, P., Schwartz, S., Dumont, M., Doyon, J., & Carrier, J. (2011). Abnormal hypothalamic response to light in seasonal affective disorder. Biological Psychiatry, 70(10), 954–961. https://doi.org/10.1016/j.biopsych.2011.06.022

      Vandewalle, G., Schmidt, C., Albouy, G., Sterpenich, V., Darsaud, A., Rauchs, G., Berken, P. Y., Balteau, E., Dagueldre, C., Luxen, A., Maquet, P., & Dijk, D. J. (2007). Brain responses to violet, blue, and green monochromatic light exposures in humans: Prominent role of blue light and the brainstem. PLoS ONE, 2(11), e1247. https://doi.org/10.1371/journal.pone.0001247

      Vandewalle, G., Schwartz, S., Grandjean, D., Wuillaume, C., Balteau, E., Degueldre, C., Schabus, M., Phillips, C., Luxen, A., Dijk, D. J., & Maquet, P. (2010). Spectral quality of light modulates emotional brain responses in humans. Proceedings of the National Academy of Sciences of the United States of America, 107(45), 19549–19554. https://doi.org/10.1073/pnas.1010180107

      Viénot, F., Brettel, H., Dang, T.-V., & Le Rohellec, J. (2012). Domain of metamers exciting intrinsically photosensitive retinal ganglion cells (ipRGCs) and rods. Journal of the Optical Society of America A, 29(2), A366. https://doi.org/10.1364/josaa.29.00a366

      Walmsley, L., Hanna, L., Mouland, J., Martial, F., West, A., Smedley, A. R., Bechtold, D. A., Webb, A. R., Lucas, R. J., & Brown, T. M. (2015). Colour As a Signal for Entraining the Mammalian Circadian Clock. PLOS Biology, 13(4), e1002127. https://doi.org/10.1371/journal.pbio.1002127

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The aim of this paper is to describe a novel method for genetic labelling of animals or cell populations, using a system of DNA/RNA barcodes.

      Strengths:

      • The author's attempt at providing a straightforward method for multiplexing Drosophila samples prior to scRNA-seq is commendable. The perspective of being able to load multiple samples on a 10X Chromium without antibody labelling is appealing.

      • The authors are generally honest about potential issues in their method, and areas that would benefit from future improvement.

      • The article reads well. Graphs and figures are clear and easy to understand.

      We thank the reviewer for these positive comments.

      Weaknesses:

      • The usefulness of TaG-EM for phototaxis, egg laying or fecundity experiments is questionable. The behaviours presented here are all easily quantifiable, either manually or using automated image-based quantification, even when they include a relatively large number of groups and replicates. Despite their claims (e.g., L311-313), the authors do not present any real evidence about the cost- or time-effectiveness of their method in comparison to existing quantification methods.

      While the behaviors that were quantified in the original manuscript were indeed relatively easy to quantify through other methods, they nonetheless demonstrated that sequencing-based TaG-EM measurements faithfully recapitulated manual behavioral measurements. In response to the reviewer’s comment, we have added additional experiments that demonstrate the utility of TaG-EM-based behavioral quantification in the context of a more labor-intensive phenotypic assay (measuring gut motility via food transit times in Drosophila larvae, Figure 4, Supplemental Figure 7). We found that food transit times in the presence and absence of caffeine are subtly different and that, as with larger effect size behaviors, TaG-EM data recapitulates the results of the manual assay. This experiment demonstrates both that TaG-EM can be used to streamline labor-intensive behavioral assays (we have included an estimate of the savings in hands-on labor for this assay by using a multiplexed sequencing approach, Supplemental Figure 8) and that TaG-EM can quantify small differences between experimental groups. We also note in the discussion that an additional benefit of TaGEM-based behavioral assays is that the observed is blinded as to the experimental conditions as they are intermingled in a single multiplexed assay. We have added the following text to the paper describing these experiments.

      Results:

      “Quantifying food transit time in the larval gut using TaG-EM

      Gut motility defects underlie a number of functional gastrointestinal disorders in humans (Keller et al., 2018). To study gut motility in Drosophila, we have developed an assay based on the time it takes a food bolus to transit the larval gut (Figure 4A), similar to approaches that have been employed for studying the role of the microbiome in human gut motility (Asnicar et al., 2021). Third instar larvae were starved for 90 minutes and then fed food containing a blue dye. After 60 minutes, larvae in which a blue bolus of food was visible were transferred to plates containing non-dyed food, and food transit (indicated by loss of the blue food bolus) was scored every 30 minutes for five hours (Supplemental Figure 7). 

      Because this assay is highly labor-intensive and requires hands-on effort for the entire five-hour observation period, there is a limit on how many conditions or replicates can be scored in one session (~8 plates maximum). Thus, we decided to test whether food transit could be quantified in a more streamlined and scalable fashion by using TaG-EM (Figure 4B). Using the manual assay, we observed that while caffeinecontaining food is aversive to larvae, the presence of caffeine reduces transit time through the gut (Figure 4C, Supplemental Figure 7). This is consistent with previous observations in adult flies that bitter compounds (including caffeine) activate enteric neurons via serotonin-mediated signaling and promote gut motility (Yao and Scott, 2022). We tested whether TaG-EM could be used to measure the effect of caffeine on food transit time in larvae. As with prior behavioral tests, the TaG-EM data recapitulated the results seen in the manual assay (Figure 4D). Conducting the transit assay via TaGEM enables several labor-saving steps. First, rather than counting the number of larvae with and without a food bolus at each time point, one simply needs to transfer nonbolus-containing larvae to a collection tube. Second, because the TaG-EM lines are genetically barcoded, all the conditions can be tested at once on a single plate, removing the need to separately count each replicate of each experimental condition. This reduces the hands-on time for the assay to just a few minutes per hour.  A summary of the anticipated cost and labor savings for the TaG-EM-based food transit assay is shown in Supplemental Figure 8.”

      Discussion:

      “While the utility of TaG-EM barcode-based quantification will vary based on the number of conditions being analyzed and the ease of quantifying the behavior or phenotype by other means, we demonstrate that TaG-EM can be employed to cost-effectively streamline labor-intensive assays and to quantify phenotypes with small effect sizes (Figure 4, Supplemental Figure 8). An additional benefit of multiplexed TaG-EM behavioral measurements is that the experimental conditions are effectively blinded as the multiplexed conditions are intermingled in a single assay.”

      Methods:

      “Larval gut motility experiments

      Preparing Yeast Food Plates

      Yeast agar plates were prepared by making a solution containing 20% Red Star Active Dry Yeast 32oz (Red Star Yeast) and 2.4% Agar Powder/Flakes (Fisher) and a separate solution containing 20% Glucose (Sigma-Aldrich). Both mixtures were autoclaved with a 45-minute liquid cycle and then transferred to a water bath at 55ºC. After cooling to 55ºC, the solutions were combined and mixed, and approximately 5 mL of the combined solution was transferred into 100 x 15 mm petri dishes (VWR) in a PCR hood or contamination-free area. For blue-dyed yeast food plates, 0.4% Blue Food Color (McCormick) was added to the yeast solution. For the caffeine assays, 300 µL of a solution of 100 mM 99% pure caffeine (Sigma-Aldrich) was pipetted onto the blue-dyed yeast plate and allowed to absorb into the food during the 90-minute starvation period.

      Manual Gut Motility Assay

      Third instar Drosophila larvae were transferred to empty conical tubes that had been misted with water to prevent the larvae from drying out. After a 90-minute starvation period the larvae were moved from the conical to a blue-dyed yeast plate with or without caffeine and allowed to feed for 60 minutes. Following the feeding period, the larvae were transferred to an undyed yeast plate. Larvae were scored for the presence or absence of a food bolus every 30 minutes over a 5-hour period. Up to 8 experimental replicates/conditions were scored simultaneously. 

      TaG-EM Gut Motility Assay

      Third instar larvae were starved and fed blue dye-containing food with or without caffeine as described above. An equal number of larvae from each experimental condition/replicate were transferred to an undyed yeast plate. During the 5-hour observation period, larvae were examined every 30 minutes and larvae lacking a food bolus were transferred to a microcentrifuge tube labeled for the timepoint. Any larvae that died during the experiment were placed in a separate microcentrifuge tube and any larvae that failed to pass the food bolus were transferred to a microcentrifuge tube at the end of the experiment. DNA was extracted from the larvae in each tube and TaG-EM barcode libraries were prepared and sequenced as described above.”

      • Behavioural assays presented in this article have clear outcomes, with large effect sizes, and therefore do not really challenge the efficiency of TaG-EM. By showing a Tmaze in Fig 1B, the authors suggest that their method could be used to quantify more complex behaviours. Not exploring this possibility in this manuscript seems like a missed opportunity.

      See the response to the previous point.

      • Experiments in Figs S3 and S6 suggest that some tags have a detrimental effect on certain behaviours or on GFP expression. Whereas the authors rightly acknowledge these issues, they do not investigate their causes. Unfortunately, this question the overall suitability of TaG-EM, as other barcodes may also affect certain aspects of the animal's physiology or behaviour. Revising barcode design will be crucial to make sure that sequences with potential regulatory function are excluded.

      We have determined that the barcode (BC#8) that had no detectable Gal4induced gene expression in Figure S6 (now Supplemental Figure 9) has a deletion in the GFP coding region that ablates GFP function. Interestingly, the expressed TaG-EM barcode transcript is still detectable in single cell sequencing experiments, but obviously this line cannot be used for cell enrichment (at least based solely on GFP expression from the TaG-EM construct). While it is unclear how this line came to have a lesion in the GFP gene, we have subsequently generated >150 additional TaG-EM stocks and we have tested the GFP expression of these newly established stocks by crossing them to Mhc-Gal4. All of the additional stocks had GFP expression in the expected pattern, indicating that the BC#8 construct is an outlier with respect to inducibility of GFP. We have added the following text to the results section to address this point:

      “No GFP expression was visible for TaG-EM barcode number 8, which upon molecular characterization had an 853 bp deletion within the GFP coding region (data not shown). We generated and tested GFP expression of an additional 156 TaG-EM barcode lines (Alegria et al., 2024), by crossing them to Mhc-Gal4 and observing expression in the adult thorax. All 156 additional TaG-EM lines had robust GFP expression (data not shown).”

      It is certainly the case that future improvements to the construct design may be necessary or desirable and that back-crossing could likely be used to alleviate line-toline differences for specific phenotypes, we also address this point in the discussion with the following text:

      “We excluded this poor performing barcode line from the fecundity tests, however, backcrossing is often used to bring reagents into a consistent genetic background for behavioral experiments and could also potentially be used to address behavior-specific issues with specific TaG-EM lines. In addition, other strategies such as averaging across multiple barcode lines or permutation of barcode assignment across replicates could also mitigate such deficiencies.”

      • For their single-cell experiments, the authors have used the 10X Genomics method, which relies on sequencing just a short segment of each transcript (usually 50-250bp - unknown for this study as read length information was not provided) to enable its identification, with the matching paired-end read providing cell barcode and UMI information (Macosko et al., 2015). With average fragment length after tagmentation usually ranging from 300-700bp, a large number of GFP reads will likely not include the 14bp TaG-EM barcode. 

      The 10x Genomics 3’ workflows that were used for sequencing TaG-EM samples reads the cell barcode and UMI in read one and the expressed RNA sequence in read two. We sequenced the samples shown in Figure 5 in the initial manuscript using a run configuration that generated 150 bp for read two. The TaG-EM barcodes are located just upstream of the poly-adenylation sites (based on the sequencing data, we observe two different poly-A sites and the TaG-EM barcode is located 35 and 60 bp upstream of these sites). Based on the location of the TaG-EM barcodes,150 bp reads is sufficient to see the barcode in any GFP-associated read (when using the 3’ gene expression workflow). In addition to detecting the expression of the TaG-EM barcodes in the 10x Genomics gene expression library, it is possible to make a separate library that enriches the barcode sequence (similar to hashtag or CITE-Seq feature barcode libraries). We have added experimental data where we successfully performed an enrichment of the TaG-EM barcodes and sequenced this as a separate hashtag library (Supplemental Figure 18). We have added text to the results describing this work and also included a detailed information in the methods for performing TaG-EM barcode enrichment during 10x library prep. 

      Results:

      “In antibody-conjugated oligo cell hashing approaches, sparsity of barcode representation is overcome by spiking in an additional primer at the cDNA amplification step and amplifying the hashtag oligo by PCR. We employed a similar approach to attempt to enrich for TaG-EM barcodes in an additional library sequenced separately from the 10x Genomics gene expression library. Our initial attempts at barcode enrichment using spike-in and enrichment primers corresponding to the TaG-EM PCR handle were unsuccessful (Supplemental Figure 18). However, we subsequently optimized the TaG-EM barcode enrichment by 1) using a longer spike-in primer that more closely matches the annealing temperature used during the 10x Genomics cDNA creation step, and 2) using a nested PCR approach to amplify the cell-barcode and unique molecular identifier (UMI)-labeled TaG-EM barcodes (Supplemental Figure 18). Using the enriched library, TaG-EM barcodes were detected in nearly 100% of the cells at high sequencing depths (Supplemental Figure 19). However, although we used a polymerase that has been engineered to have high processivity and that has been shown to reduce the formation of chimeric reads in other contexts (Gohl et al., 2016), it is possible that PCR chimeras could lead to unreliable detection events for some cells. Indeed, many cells had a mixture of barcodes detected with low counts and single or low numbers of associated UMIS. To assess the reliability of detection, we analyzed the correlation between barcodes detected in the gene expression library and the enriched TaG-EM barcode library as a function of the purity of TaG-EM barcode detection for each cell (the percentage of the most abundant detected TaG-EM barcode, Supplemental Figure 19). For TaG-EM barcode detections where the most abundance barcode was a high percentage of the total barcode reads detected (~75%-99.99%), there was a high correlation between the barcode detected in the gene expression library and the enriched TaG-EM barcode library. Below this threshold, the correlation was substantially reduced. 

      In the enriched library, we identified 26.8% of cells with a TaG-EM barcode reliably detected, a very modest improvement over the gene expression library alone (23.96%), indicating that at least for this experiment, the main constraint is sufficient expression of the TaG-EM barcode and not detection. To identify TaG-EM barcodes in the combined data set, we counted a positive detection as any barcode either identified in the gene expression library or any barcode identified in the enriched library with a purity of >75%. In the case of conflicting barcode calls, we assigned the barcode that was detected directly in the gene expression library. This increased the total fraction of cells where a barcode was identified to approximately 37% (Figure 6B).”

      Methods:

      “The resulting pool was prepared for sequencing following the 10x Genomics Single Cell 3’ protocol (version CG000315 Rev C), At step 2.2 of the protocol, cDNA amplification, 1 µl of TaG-EM spike-in primer (10 µM) was added to the reaction to amplify cDNA with the TaG-EM barcode. Gene expression cDNA and TaG-EM cDNA were separated using a double-sided SPRIselect (Beckman Coulter) bead clean up following 10x Genomics Single Cell 3’ Feature Barcode protocol, step 2.3 (version CG000317 Rev E). The gene expression cDNA was created into a library following the CG000315 Rev C protocol starting at section 3. Custom nested primers were used for enrichment of TaG-EM barcodes after cDNA creation using PCR.  The following primers were tested (see Supplemental Figure 18):

      UMGC_IL_TaGEM_SpikeIn_v1:

      GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCTTCCAACAACCGGAAGT*G*A UMGC_IL_TaGEM_SpikeIn_v2:

      GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCAGCTTATAACTTCCAACAACCGGAAGT*G*A

      UMGC_IL_TaGEM_SpikeIn_v3:

      TGTGCTCTTCCGATCTGCAGCTTATAACTTCCAACAACCGGAAGT*G*A D701_TaGEM:

      CAAGCAGAAGACGGCATACGAGATCGAGTAATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCAGC*T*T

      SI PCR Primer:

      AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGC*T*C

      UMGC_IL_DoubleNest:

      GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGCAGCTTATAACTTCCAACAACCGG*A*A

      P5: AATGATACGGCGACCACCGA

      D701:

      GATCGGAAGAGCACACGTCTGAACTCCAGTCACATTACTCGATCTCGTATGCCGTCTTCTGCTTG

      D702:

      GATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGGAGAATCTCGTATGCCGTCTTCTGCTTG

      After multiple optimization trials, the following steps yielded ~96% on-target reads for the TaG-EM library (Supplemental Figure 18, note that for the enriched barcode data shown in Figure 6 and Supplemental Figure 19, a similar amplification protocol was used TaG-EM barcodes were amplified from the gene expression library cDNA and not the SPRI-selected barcode pool). TaG-EM cDNA was amplified with the following PCR reaction: 5 µl purified TaG-EM cDNA, 50 µl 2x KAPA HiFi ReadyMix (Roche), 2.5 µl UMGC_IL_DoubleNest primer (10 µM), 2.5 µl SI_PCR primer (10 µM), and 40 µl nuclease-free water. The reaction was amplified using the following cycling conditions: 98ºC for 2 minutes, followed by 15 cycles of 98ºC for 20 seconds, 63ºC for 30 seconds, 72ºC for 20 seconds, followed by 72ºC for 5 minutes. After the first PCR, the amplified cDNA was purified with a 1.2x SPRIselect (Beckman Coulter) bead cleanup with 80% ethanol washes and eluted into 40 µL of nuclease-water. A second round of PCR was run with following reaction: 5 µl purified TaG-EM cDNA, 50 µl 2x KAPA HiFi ReadyMix (Roche), 2.5 µl D702 primer (10 µM), 2.5 µl p5 Primer (10 µM), and 40 µl nuclease-free water. The reaction was amplified using the following cycling conditions: 98ºC for 2 minutes, followed by 10 cycles of 98ºC for 20 seconds, 63ºC for 30 seconds, 72ºC for 20 seconds, followed by 72ºC for 5 minutes. After the second PCR, the amplified cDNA was purified with a 1.2x SPRIselect (Beckman Coulter) bead cleanup with 80% ethanol washes and eluted into 40uL of nuclease-water. The resulting 3’ gene expression library and TaG-EM enrichment library were sequenced together following Scenario 1 of the BioLegend “Total-Seq-A Antibodies and Cell Hashing with 10x Single Cell 3’ Reagents Kit v3 or v3.1” protocol. Additional sequencing of the enriched TaG-EM library also done following Scenario 2 from the same protocol.” 

      When a given cell barcode is not associated with any TaG-EM barcode, then demultiplexing is impossible. This is a major problem, which is particularly visible in Figs 5 and S13. In 5F, BC4 is only detected in a couple of dozen cells, even though the Jon99Ciii marker of enterocytes is present in a much larger population (Fig 5C). Therefore, in this particular case, TaG-EM fails to detect most of the GFP-expressing cells. 

      Figure 5 in the original manuscript represented data from an experiment in which there were eight different TaG-EM barcoded samples present, including four replicates of the pan-midgut driver (each of which included enterocyte populations). One would not expect the BC4 enterocyte driver expression to be observed in all of the Jon99Ciii cells, since the majority of the GFP+ cells shown in the UMAP plot were likely derived from and are labeled by the pan-midgut driver-associated barcodes. Thus, the design and presentation of this particular experiment (in particular, the presence of eight distinct samples in the data set) is making the detection of the TaG-EM barcodes look sparser than it actually is. We have added a panel in both Figure 6B and Supplemental Figure 17B that shows the overall detection of barcodes in the enriched barcode library and gene expression library or the gene expression library only, respectively, for this experiment.

      However, the reviewer’s overall point regarding barcode detection is still valid in that if we consider all eight barcodes, we only see TaG-EM barcode labeling associated with about a quarter of all the cells in this gene expression library, or about 37% of cells when we include the enriched TaG-EM barcode library. While improving barcode detection will improve the yield and is necessary for some applications (such as robust detection of multiplets), we would argue that even at the current level of success this approach has significant utility. First, if one’s goal is to unambiguously label a cell cluster and trace it to a defined cell population in vivo, sparse labeling may be sufficient. Second, demultiplexing is still possible (as we demonstrate) but involves a trade off in yield (not every cell is recovered and there is some extra sequencing cost as some sequenced cells cannot be assigned to a barcode). 

      Similarly, in S13, most cells should express one of the four barcodes, however many of them (maybe up to half - this should be quantified) do not. Therefore, the claim (L277278) that "the pan-midgut driver were broadly distributed across the cell clusters" is misleading. Moreover, the hypothesis that "low expressing driver lines may result in particularly sparse labelling" (L331-333) is at least partially wrong, as Fig S13 shows that the same Gal4 driver can lead to very different levels of barcode coverage.

      As described above, since this experiment included eight different TaG-EM barcodes expressed by five different drivers, the expectation is that only about half of the cells in Figure S13 (now Figure S20) should express a TaG-EM barcode. It is not clear why BC2 is underrepresented in terms of the number of cells labeled and BC7 is overrepresented. We agree with the reviewer that this should be described more accurately in the paper and that it does impact our interpretation related to driver strength and barcode detection. We have revised this sentence in the discussion and also added additional text in the results describing the within driver variability seen in this experiment.

      Results text:

      “As expected, the barcodes expressed by the pan-midgut driver were broadly distributed across the cell clusters (Supplemental Figure 20). However, the number of cells recovered varied significantly among the four pan-midgut driver associated barcodes.”

      Discussion text:

      “It is likely that the strength of the Gal4 driver contributes to the labeling density. However, we also observed variable recovery of TaG-EM barcodes that were all driven by the same pan-midgut Gal4 driver (Supplemental Figure 20).”

      • Comparisons between TaG-EM and other, simpler methods for labelling individual cell populations are missing. For example, how would TaG-EM compare with expression of different fluorescent reporters, or a strategy based on the brainbow/flybow principle?

      The advantage of TaG-EM is that an arbitrarily large number of DNA barcodes can be used (contingent upon the availability of transgenic lines – we described 20 barcoded lines in our initial manuscript and we have now extended this collection to over 170 lines), while the number of distinguishable FPs is much lower. Brainbow/Flybow uses combinatorial expression of different FPs, but because this combinatorial expression is stochastic, tracing a single cell transcriptome to a defined cell population in vivo based on the FP signature of a Brainbow animal would likely not be possible (and would almost certainly be impossible at scale).

      • FACS data is missing throughout the paper. The authors should include data from their comparative flow cytometry experiment of TaG-EM cells with or without additional hexameric GFP, as well as FSC/SSC and fluorescence scatter plots for the FACS steps that they performed prior to scRNA-seq, at least in supplementary figures.

      We have added Supplemental Figures with the FACS data for all of the single cell sequencing data presented in the manuscript (Supplemental Figures 12 and 14).

      • The authors should show the whole data described in L229, including the cluster that they chose to delete. At least, they should provide more information about how many cells were removed. In any case, the fact that their data still contains a large number of debris and dead cells despite sorting out PI negative cells with FACS and filtering low abundance barcodes with Cellranger is concerning.

      This description was referring to the unprocessed Cellranger output (not filtered for low abundance barcodes). Prior to filtering for cell barcodes with high mitochondria or rRNA (or other processing in Seurat/Scanpy), we saw two clusters, one with low UMI counts and enrichment of mitochondrial genes (see Cellranger report below). 

      Author response image 1.

      These cell barcodes were removed by downstream quality filtering and the remaining cells showed expression of expected intestinal stem cell and enteroblast marker genes.

      Overall, although a method for genetic tagging cell populations prior to multiplexing in single-cell experiments would be extremely useful, the method presented here is inadequate. However, despite all the weaknesses listed above, the idea of barcodes expressed specifically in cells of interest deserves more consideration. If the authors manage to improve their design to resolve the major issues and demonstrate the benefits of their method more clearly, then TaG-EM could become an interesting option for certain applications.

      We thank the reviewer for this comment and hope that the above responses and additional experiments and data that we have added have helped to alleviate the noted weaknesses.

      Reviewer #2 (Public Review):

      In this manuscript, Mendana et al developed a multiplexing method - Targeted Genetically-Encoded Multiplexing or TaG-EM - by inserting a DNA barcode upstream of the polyadenylation site in a Gal4-inducible UAS-GFP construct. This Multiplexing method can be used for population-scale behavioral measurements or can potentially be used in single-cell sequencing experiments to pool flies from different populations. The authors created 20 distinctly barcoded fly lines. First, TaG-EM was used to measure phototaxis and oviposition behaviors. Then, TaG-EM was applied to the fly gut cell types to demonstrate its applications in single-cell RNA-seq for cell type annotation and cell origin retrieving.

      This TaG-EM system can be useful for multiplexed behavioral studies from nextgeneration sequencing (NGS) of pooled samples and for Transcriptomic Studies. I don't have major concerns for the first application, but I think the scRNA-seq part has several major issues and needs to be further optimized.

      Major concerns:

      (1) It seems the barcode detection rate is low according to Fig S9 and Fig 5F, J and N. Could the authors evaluate the detection rate? If the detection rate is too low, it can cause problems when it is used to decode cell types.

      See responses to Reviewer #1 on this topic above.  

      (2) Unsuccessful amplification of TaG-EM barcodes: The authors attempted to amplify the TaG-EM barcodes in parallel to the gene expression library preparation but encountered difficulties, as the resulting sequencing reads were predominantly offtarget. This unsuccessful amplification raises concerns about the reliability and feasibility of this amplification approach, which could affect the detection and analysis of the TaG-EM barcodes in future experiments.

      As noted above, we have now established a successful amplification protocol for the TaG-EM barcodes. This data is shown in Figure 6, and Supplemental Figures 18-19 and we have included a detailed information in the methods for performing TaG-EM barcode enrichment during 10x library prep. We have also included code in the paper’s Github repository for assigning TaG-EM barcodes from the enriched library to the associated 10x Genomics cell barcodes.

      (3) For Fig 5, the singe-cell clusters are not annotated. It is not clear what cell types are corresponding to which clusters. So, it is difficult to evaluate the accuracy of the assignment of barcodes.

      We have added annotation information for the cell clusters based on expression of cell-type-specific marker genes (Figure 6A, Supplemental Figures 16-17).

      (4) The scRNA-seq UMAP in Fig 5 is a bit strange to me. The fly gut epithelium contains only a few major cell types, including ISC, EB, EC, and EE. However, the authors showed 38 clusters in fig 5B. It is true that some cell types, like EE (Guo et al., 2019, Cell Reports), have sub-populations, but I don't expect they will form these many subtypes. There are many peripheral small clusters that are not shown in other gut scRNAseq studies (Hung et al., 2020; Li et al., 2022 Fly Cell Atlas; Lu et al., 2023 Aging Fly Cell Atlas). I suggest the authors try different data-processing methods to validate their clustering result.

      For all of the single cell experiments, after doublet and ambient RNA removal (as suggested below), we have reclustered the datasets and evaluated different resolutions using Clustree. As the Reviewer points out, there are different EE subtypes, as well as regionalized expression differences in EC and other cell populations, so more than four clusters are expected (an analysis of the adult midgut identified 22 distinct cell types). With this revised analysis our results more closely match the cell populations observed in other studies (though it should be noted that the referenced studies largely focus on the adult and not the larval stage).  

      (5) Different gut drivers, PMC-, PC-, EB-, EC-, and EE-GAL4, were used. The authors should carefully characterize these GAL4 expression in larval guts and validate sequencing data. For example, does the ratio of each cell type in Fig 5B reflect the in vivo cell type ratio? The authors used cell-type markers mostly based on the knowledge from adult guts, but there are significant morphological and cell ratio differences between larval and adult guts (e.g., Mathur...Ohlstein, 2010 Science).

      We have characterized the PC driver which is highlighted in Supplemental Figure 13, and the EC and EE drivers which are highlighted in Figure 6G-N in detail in larval guts and have added this data to the paper (Supplemental Figure 21). The EB driver was not characterized histologically as EB-specific antibodies are not currently available. The PMG-Gal4 line exhibits strong expression throughout the larval gut (Figure 5B and barcodes are recovered from essentially all of the larval gut cell clusters using this driver (Supplemental Figure 20). We don’t necessarily expect the ratios of cells observed in the scRNA-Seq data to reflect the ratios typically observed in the gut as we performed pooled flow sorting on a multiplexed set of eight genotypes and driver expression levels, flow sorting, and possibly other processing steps could all influence the relative abundance of different cell types. However, detailed characterization of these driver lines did reveal spatial expression patterns that help explain aspects of the scRNA-Seq data. We have also added the following text to the paper to further describe the characterization of the drivers:

      Results:

      “Detailed characterization of the EC-Gal4 line indicated that although this line labeled a high percentage of enterocytes, expression was restricted to an area at the anterior and middle of the midgut, with gaps between these regions and at the posterior (Supplemental Figure 21). This could explain the absence of subsets of enterocytes, such as those labeled by betaTry, which exhibits regional expression in R2 of the adult midgut (Buchon et al., 2013).”

      “Detailed characterization of the EE-Gal4 driver line indicated that ~80-85% of Prospero-positive enteroendocrine cells are labeled in the anterior and middle of the larval midgut, with a lower percentage (~65%) of Prospero-positive cells labeled in the posterior midgut (Supplemental Figure 21). As with the enterocyte labeling, and consistent with the Gal4 driver expression pattern, the EE-Gal4 expressed TaG-EM barcode 9 did not label all classes of enteroendocrine cells and other clusters of presumptive enteroendocrine cells expressing other neuropeptides such as Orcokinin, AstA, and AstC, or neuropeptide receptors such as CCHa2 (not shown) were also observed.”

      Methods:

      “Dissection and immunostaining

      Midguts from third instar larvae of driver lines crossed to UAS-GFP.nls or UAS-mCherry were dissected in 1xPBS and fixed with 4% paraformaldehyde (PFA) overnight at 4ºC. Fixed samples were washed with 0.1% PBTx (1xPBS + 0.1% Triton X-100) three times for 10 minutes each and blocked in PBTxGS (0.1% PBTx + 3% Normal Goat Serum) for 2–4 hours at RT. After blocking, midguts were incubated in primary antibody solution overnight at 4ºC. The next day samples were washed with 0.1% PBTx three times for 20 minutes each and were incubated in secondary antibody solution for 2–3 hours at RT (protected from light) followed by three washes with 0.1% PBTx for 20 minutes each. One µg/ml DAPI solution prepared in 0.1% PBTx was added to the sample and incubated for 10 minutes followed by washing with 0.1% PBTx three times for 10 minutes each. Finally, samples were mounted on a slide glass with 70% glycerol and imaged using a Nikon AX R confocal microscope. Confocal images were processed using Fiji software. 

      The primary antibodies used were rabbit anti-GFP (A6455,1:1000 Invitrogen), mouse anti-mCherry (3A11, 1:20 DSHB), mouse anti-Prospero (MR1A, 1:50 DSHB) and mouse anti-Pdm1 (Nub 2D4, 1:30 DSHB). The secondary antibodies used were goat antimouse and goat anti-rabbit IgG conjugated to Alexa 647 and Alexa 488 (1:200) (Invitrogen), respectively. Five larval gut specimens per Gal4 line were dissected and examined.”

      (6) Doublets are removed based on the co-expression of two barcodes in Fig 5A. However, there are also other possible doublets, for example, from the same barcode cells or when one cell doesn't have detectable barcode. Did the authors try other computational approaches to remove doublets, like DoubleFinder (McGinnis et al., 2019) and Scrublet (Wolock et al., 2019)?

      We have included DoubleFinder-based doublet removal in our data analysis pipeline. This is now described in the methods (see below).

      (7) Did the authors remove ambient RNA which is a common issue for scRNA-seq experiments?

      We have also used DecontX to remove ambient RNA. This is now described in the methods:

      “Datasets were first mapped and analyzed using the Cell Ranger analysis pipeline (10x Genomics). A custom Drosophila genome reference was made by combining the BDGP.28 reference genome assembly and Ensembl gene annotations. Custom gene definitions for each of the TaG-EM barcodes were added to the fasta genome file and .gtf gene annotation file. A Cell Ranger reference package was generated with the Cell Ranger mkref command. Subsequent single-cell data analysis was performed using the R package Seurat (Satija et al., 2015). Cells expressing less than 200 genes and genes expressed in fewer than three cells were filtered from the expression matrix. Next, percent mitochondrial reads, percent ribosomal reads cells counts, and cell features were graphed to determine optimal filtering parameters. DecontX (Yang et al., 2020) was used to identify empty droplets, to evaluate ambient RNA contamination, and to remove empty cells and cells with high ambient RNA expression. DoubletFinder (McGinnis et al., 2019) to identify droplet multiplets and remove cells classified as multiplets. Clustree (Zappia and Oshlack, 2018) was used to visualize different clustering resolutions and to determine the optimal clustering resolution for downstream analysis. Finally, SingleR (Aran et al., 2019) was used for automated cell annotation with a gut single-cell reference from the Fly Cell Atlas (Li et al., 2022). The dataset was manually annotated using the expression patterns of marker genes known to be associated with cell types of interest. To correlate TaG-EM barcodes with cell IDs in the enriched TaG-EM barcode library, a custom Python script was used (TaGEM_barcode_Cell_barcode_correlation.py), which is available via Github: https://github.com/darylgohl/TaG-EM.”

      (8) Why does TaG-EM barcode #4, driven by EC-GAL4, not label other classes of enterocyte cells such as betaTry+ positive ECs (Figures 5D-E)? similarly, why does TaG-EM barcode #9, driven by EE-GAL4, not label all EEs? Again, it is difficult to evaluate this part without proper data processing and accurate cell type annotation.

      As noted in the response to a comment by Reviewer #1 above, part of this apparent sparsity of labeling is due to the way that this experiment was designed and visualized. We have added a new Figure panel in both Figure 6B and Supplemental Figure 17B that shows the overall detection of barcodes in the enriched barcode library and gene expression library or the gene expression library only, respectively, to better illustrate the efficacy of barcode detection. See also the response to point 5 above. Both the lack of labelling of betaTry+ ECs and subsets of EEs is consistent with the expression patterns of the EC-Gal4 and EE-Gal4 drivers.

      (9) For Figure 2, when the authors tested different combinations of groups with various numbers of barcodes. They found remarkable consistency for the even groups. Once the numbers start to increase to 64, barcode abundance becomes highly variable (range of 12-18% for both male and female). I think this would be problematic because the differences seen in two groups for example may be due to the barcode selection rather than an actual biologically meaningful difference.

      While there is some barcode-to-barcode variability for different amplification conditions, the magnitude of this variation is relatively consistent across the conditions tested. We looked at the coefficient of variation for the evenly pooled barcodes or for the staggered barcodes pooled at different relative levels. While the absolute magnitude of the variation is higher for the highly abundant barcodes in the staggered conditions, the CVs for these conditions (0.186 for female flies and for 0.163 male flies) were only slightly above the mean CV (0.125) for all conditions (see Supplemental Figure 3):

      We have added this analysis as Supplemental Figure 3 and added the following text to the paper:(

      “The coefficients of variation were largely consistent for groups of TaG-EM barcodes pooled evenly or at different levels within the staggered pools (Supplemental Figure 3).”

      (10) Barcode #14 cannot be reliably detected in oviposition experiment. This suggests that the BC 14 fly line might have additional mutations in the attp2 chromosome arm that affects this behavior. Perhaps other barcode lines also have unknown mutations and would cause issues for other untested behaviors. One possible solution is to backcross all 20 lines with the same genetic background wild-type flies for >7 generations to make all these lines to have the same (or very similar) genetic background. This strategy is common for aging and behavior assays.

      See response to Reviewer #1 above on this topic.

      Reviewer #3 (Public Review):

      The work addresses challenges in linking anatomical information to transcriptomic data in single-cell sequencing. It proposes a method called Targeted Genetically-Encoded Multiplexing (TaG-EM), which uses genetic barcoding in Drosophila to label specific cell populations in vivo. By inserting a DNA barcode near the polyadenylation site in a UASGFP construct, cells of interest can be identified during single-cell sequencing. TaG-EM enables various applications, including cell type identification, multiplet droplet detection, and barcoding experimental parameters. The study demonstrates that TaGEM barcodes can be decoded using next-generation sequencing for large-scale behavioral measurements. Overall, the results are solid in supporting the claims and will be useful for a broader fly community. I have only a few comments below:

      We thank the reviewer for these positive comments.

      Specific comments:

      (1) The authors mentioned that the results of structure pool tests in Fig. 2 showed a high level of quantitative accuracy in detecting the TaG-EM barcode abundance. Although the data were generally consistent with the input values in most cases, there were some obvious exceptions such as barcode 1 (under-represented) and barcodes 15, 20 (overrepresented). It would be great if the authors could comment on these and provide a guideline for choosing the appropriate barcode lines when implementing this TaG-EM method.

      See the response to point 9 from Reviewer 2. Although there seem to be some systematic differences in barcode amplification, the coefficient of variation was relatively consistent across all of the barcode combinations and relative input levels that we examined. Our recommendation (described in the text) is to average across 3-4 independent barcodes (which yielded a R2 values of >0.99 with expected abundance in the structured pooled tests).  

      (2) In Supplemental Figure 6, the authors showed GFP antibody staining data with 20 different TaG-EM barcode lines. The variability in GFP antibody staining results among these different TaG-EM barcode lines concerns the use of these TaG-EM barcode lines for sequencing followed by FACS sorting of native GFP. I expected the native GFP expression would be weaker and much more variable than the GFP antibody staining results shown in Supplemental Figure 6. If this is the case, variation of tissue-specific expression of TaG-EM barcode lines will likely be a confounding factor.

      Aside from barcode 8, which had a mutation in the GFP coding sequence, we did not see significant variability in expression levels either in the wing disc. Subtle differences seen in this figure most likely result from differences in larval staging. Similar consistent native (unstained) GFP expression of the TaG-EM constructs was seen in crosses with Mhc-Gal4 (described above). 

      (3) As the authors mentioned in the manuscript, multiple barcodes for one experimental condition would be a better experimental design. Could the authors suggest a recommended number of barcodes for each experiential condition? 3? 4? Or more? 

      See response to Reviewer #3, point number 1 above.

      (3b) Also, it would be great if the authors could provide a short discussion on the cost of such TaG-EM method. For example, for the phototaxis assay, if it is much more expensive to perform TaG-EM as compared to manually scoring the preference index by videotaping, what would be the practical considerations or benefits of doing TaG-EM over manual scoring?

      While this will vary depending on the assay and the scale at which one is conducting experiments, we have added an analysis of labor savings for the larval gut motility assay (Supplemental Figure 8). We have also added the following text to the Discussion describing some of the trade-offs to consider in assessing the potential benefit of incorporating TaG-EM into behavioral measurements:

      “While the utility of TaG-EM barcode-based quantification will vary based on the number of conditions being analyzed and the ease of quantifying the behavior or phenotype by other means, we demonstrate that TaG-EM can be employed to cost-effectively streamline labor-intensive assays and to quantify phenotypes with small effect sizes (Figure 4, Supplemental Figure 8).”

      Recommendations for the authors:  

      While recognising the potential of the TaG-EM methodology, we had a few major concerns that the authors might want to consider addressing:

      As stated above, we are grateful to the reviewers and editor for their thoughtful comments. We have addressed many of the points below in our responses above, so we will briefly respond to these points and where relevant direct the reader to comments above.

      (1) We were concerned about the efficacy of TaG-EM in assessing more complex behaviours than oviposition and phototaxis. We note that Barcode #14 cannot be reliably detected in oviposition experiment. This suggests that the BC 14 fly line might have additional mutations in the attp2 chromosome arm that affects this behavior. Perhaps other barcode lines also have unknown mutations and would cause issues for other untested behaviors. One possible solution is to back-cross all 20 lines with the same genetic background wild-type flies for >7 generations to make all these lines to have the same (or very similar) genetic background. This strategy is common for aging and behavior assays.

      See response to Reviewer #1 and Reviewer #2, item 10, above.

      (2) We were unable to assess the drop-out rates of the TaG-EM barcode from the sequencing. The barcode detection rate is low (Fig S9 and Fig 5F, J and N). This would be a considerable drawback (relating to both experimental design and cost), if a large proportion of the cells could not be assigned an identity.

      See comments above addressing this point.

      (3) The effectiveness of TaG-EM scRNA-seq on the larvae gut is not very effective - the cells are not well annotated, the barcodes seem not to have labelled expected cell types (ECs and EEs), and there is no validation of the Gal4 drivers in vivo.

      See previous comments. We have addressed specific comments above on data processing and annotation, included a visualization of the overall effectiveness of labeling, added a protocol and data on enriched TaG-EM barcode libraries, and have added detailed characterization of the Gal4 drivers in the larval gut (Figure 6, Supplemental Figures 17-21).

      (4) A formal assessment of the cost-effectiveness would be an important consideration in broad uptake of the methodology.

      While this is difficult to do in a comprehensive manner given the breadth of potential applications, we have included estimates of labor savings for one of the behavioral assays that we tested (Supplemental Figure 8). We have also included a discussion of some of the factors that would make TaG-EM useful or cost-effective to apply for behavioral assays (see response to Reviewer #3, comment 3b, above). We have also added the following text to the discussion to address the cost considerations in applying TaG-EM for scRNA-Seq:

      “For single cell RNA-Seq experiments, the cost savings of multiplexing is roughly the cost of a run divided by the number of independent lines multiplexed, plus labor savings by also being able to multiplex upstream flow cytometry, minus loss of unbarcoded cells. Our experiments indicated that for the specific drivers we tested TaG-EM barcodes are detected in around one quarter of the cells if relying on endogenous expression in the gene expression library, though this fraction was higher (~37%) if sequencing an enriched TaG-EM barcode library in parallel (Figure 6, Supplemental Figures 18-19).”

      (5) Similarly, a formal assessment of the effect of the insertion on the variability in GFP expression and the behaviour needs to be documented.

      See responses to Reviewer #1, Reviewer #2, item 9, and Reviewer #3, item 2 above.

      Reviewer #1 (Recommendations For The Authors):

      (in no particular order of importance)

      • L84-85: the authors should either expand, or remove this statement. Indeed, lack of replicates is only true if one ignores that each cell in an atlas is indeed a replicate. Therefore, depending on the approach or question, this statement is inaccurate.

      This sentence was meant to refer to experiments where different experimental conditions are being compared and not to more descriptive studies such as cell atlases. We have revised this sentence to clarify.

      “Outside of descriptive studies, these costs are also a barrier to including replicates to assess biological variability; consequently, a lack of biological replicates derived from independent samples is a common shortcoming of single-cell sequencing experiments.”

      • L103-104: this sentence is unclear.

      We have revised this sentence as follows:

      “Genetically barcoded fly lines can also be used to enable highly multiplexed behavioral assays which can be read out using high throughput sequencing.”

      • In Fig S1 it is unclear why there are more than 20 different sequences in panel B where the text and panel A only mention the generation of 20 distinct constructs. This should be better explained.

      The following text was added to the Figure legend to explain this discrepancy:

      “Because the TaG-EM barcode constructs were injected as a pool of 29 purified plasmids, some of the transgenic lines had inserts of the same construct. In total 20 unique lines were recovered from this round of injection.”

      • It would be interesting to compare the efficiency of TaG-EM driven doublet removal (Fig 5A) with standard doublet-removing software (e.g., DoubletFinder, McGinnis et al., 2019).

      We have done this comparison, which is now shown in Supplemental Figure 15.

      • I would encourage the authors to check whether barcode representation in Fig S13  can be correlated to average library size, as one would expect libraries with shorter reads to be more likely to include the 14-bp barcode and therefore more accurately recapitulate TaG-EM barcode expression.

      These are not independent sequencing libraries, but rather data from barcodes that were multiplexed in a single flow sort, 10x droplet capture, and sequencing library. Thus, there must be some other variable that explains the differential recovery of these barcodes.

      • Fig 4A should appear earlier in the paper.

      We have moved Figure 4A from the previous manuscript (a schematic showing the detailed design of the TaG-EM construct) to Figure 1A in the revised version.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      (1) There is a typo for Fig S13 figure legends: BC1, BC1, BC3... should be BC1, BC2, BC3.

      Fixed.

      Reviewer #3 (Recommendations For The Authors):

      Comments to authors:

      (1) It would be great if the authors could provide an additional explanation on how these 29 barcode sequences were determined.

      Response: This information is in the Methods section. For the original cloned plasmids:

      “Expected construct size was verified by diagnostic digest with _Eco_RI and _Apa_LI. DNA concentration was determined using a Quant-iT PicoGreen dsDNA assay (Thermo Fisher Scientific) and the randomer barcode for each of the constructs was determined by Sanger sequencing using the following primers:

      SV40_post_R: GCCAGATCGATCCAGACATGA

      SV40_5F: CTCCCCCTGAACCTGAAACA”

      For transgenic flies, after DNA extraction and PCR enrichment (details also in the Methods section):

      “The barcode sequence for each of the independent transgenic lines was determined by Sanger sequencing using the SV40_5F and SV40_PostR primers.”

      (2) Why did the authors choose myr-GFP as the backbone instead of nls-GFP if the downstream application is to perform sequencing?

      We initially chose myr::GFP as we planned to conduct single cell and not single nucleus sequencing and myr::GFP has the advantage of labeling cell membranes which could facilitate the characterization or confirmation of cell type-specific expression, particularly in the nervous system. However, we have considered making a version of the TaG-EM construct with a nuclear targeted GFP (thereby enabling “NucEM”). In the Discussion, we mention this possibility as well as the possibility of using a second nuclear-GFP construct in conjunction with TaG-EM lines is nuclear enrichment is desired:

      “In addition, while the original TaG-EM lines were made using a membrane-localized myr::GFP construct, variants that express GFP in other cell compartments such as the cytoplasm or nucleus could be constructed to enable increased expression levels or purification of nuclei. Nuclear labeling could also be achieved by co-expressing a nuclear GFP construct with existing TaG-EM lines in analogy to the use of hexameric GFP described above.”

      Minor comments:

      (1) Line 193, Supplemental Figure 4 should be Supplemental Figure 5

      Fixed.

      (2) Scale bars should be added in Figure 4, Supplemental Figures 6, 7, and 8A.

      We have added scale bars to these figures and also included scale bars in additional Supplemental Figures detailing characterization of the gut driver lines.

      (3) Were Figure 4C and Supplemental Figure 7 data stained with a GFP antibody?

      No, this is endogenous GFP signal. This is now noted in the Figure legends.

      (4) Line 220, specify the three barcode lines (lines #7, 8, 9) in the text. 

      Added this information.

      Same for Lines 251-254. Line 258, which 8 barcode Gal4 line combinations?

      (5) Line 994, typo: (BC1, BC1, BC3, and BC7)-> (BC1, BC2, BC3, and BC7)

      Fixed.

      (6) Figure 5 F, J and N, add EC-Gal4, EB-Gal4, and EE-Gal4 above each panel to improve readability.

      We have added labels of the cell type being targeted (leftmost panels), the barcode, and the marker gene name to Figure 6 C-N.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews: 

      Reviewer #1 Comments on revisions: 

      The authors have addressed my concerns so I am fine with revision in principle.

      Thank you for taking the time to review our work and for your thoughtful feedback. We’re glad to hear that your concerns have been addressed.

      Reviewer #2 Comments on revisions:

      The authors have addressed many of the concerns raised in the initial review and provided alternative analytical approaches to address the relevant questions in this revision. Some of these are useful; however, they have not fully addressed one critical point. 

      In my original critique, I noted that the maternal KO might not be suitable as a control, given that there is no significant phenotypic difference between the maternal-only KO and the maternal-zygotic KO. While we did not dispute the molecular differences presented in Figure 2, so how the authors conclude in the Response "embryos with a maternal KO or zygotic heterozygous KO of Oct4 or Sox2 show no noticeable ... molecular difference (Figure 2-figure supplement 4A)"? The authors should recheck whether this is a typographical error or a valid statement. 

      Additionally, I recommend the removal of phrases such as "absolutely priority" and "pivotal" throughout the manuscript, as these terms are overly assertive without sufficient supporting evidence.

      We sincerely appreciate the reviewer’s feedback and would like to take this opportunity to provide further clarification, as there might have been a misunderstanding.

      We respectfully disagree with the reviewer’s statement that “there is no significant phenotypic difference between the maternal-only KO and the maternal-zygotic KO.” Based on privious publications, there is clear evidence that maternal-zygotic KO embryos exhibit significant defects: they fail to form a healthy primitive endoderm, are unable to give rise to embryonic stem cells (ESCs) in vitro, and die shortly after implantation (Frum et al., Dev Cell 2013; Wu et al., Nat Cell Biol 2013; Le Bin et al., Development 2014; Wicklow et al., PLoS Genet 2014). In contrast, maternal-only KO embryos develop as healthy as wild-type (WT) embryos and do not display any of these phenotypic abnormalities. We believe that this distinction validates our use of maternal KO embryos as proper controls in our experiments. 

      To address the reviewer’s concerns and ensure clarity, we have also revised the following statement in the manuscript.

      Original manuscript: “Mouse embryos with a maternal KO or zygotic heterozygous KO of either factor show no noticeable phenotype or molecular difference (Figure 2-figure supplement 4A) (Avilion et al., 2003; Frum et al., 2013; Kehler et al, 2004; Nichols et al., 1998; Wicklow et al., 2014; Wu et al., 2013).” 

      Revised manuscript: “Maternal KO embryos (circles in Figure 2—figure supplement 4A) clustered together with wildtype embryos (triangles and squares) in the PCA analysis, consistent with previous studies reporting no observable phenotype in maternal KO embryos (Avilion et al., 2003; Frum et al., 2013; Kehler et al, 2004; Nichols et al., 1998; Wicklow et al., 2014; Wu et al., 2013).”

      While we acknowledge the potential for using maternal-only KO controls to underestimate differences between control and KO samples, we believe this approach does not introduce false positives in our RNA-seq and ATAC-seq experiments, only the possibility of more conservative conclusions. This minimizes the risk of overestimating the molecular impact.

      We appreciate the reviewer’s recommendation regarding the use of overly assertive terms. Upon careful review of the manuscript and response letter, we could not find instances of the term “absolutely priority.” However, we do use the term “pivotal” and would prefer to retain it as we believe it accurately reflects the importance of the findings presented in our manuscript.

      Thank you for your thoughtful comments and suggestions! We hope this response clarifies our rationale and addresses the concerns.

      ---

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      Summary:

      Numerous mechanism and structural studies reported the cooperative role of Oct4 and Sox2 during the establishment of pluripotency during reprogramming. Due to the difficulty in sample collection and RNA-seq with low-number cells, the precise mechanisms remain in early embryos. This manuscript reported the role of OCT4 and SOX2 in mouse early embryos using knockout models with low-input ATAC-seq and RNA-seq. Compared to the control, chromatin accessibility and transcriptome were affected when Oct4 and Sox2 were deleted in early ICM. Specifically, decreased ATAC-seq peaks showed enrichment of Motifs of TF such as OCT, SOX, and OCT-SOX, indicating their importance during early development. Moreover, by deep analysis of ATAC-seq and RNA-seq data, they found Oct4 and Sox2 target enhancer to activate their downstream genes. In addition, they also uncovered the role of OS during development from the morula to ICM, which provided the scientific community with a more comprehensive understanding.

      Strengths:

      On the whole, the manuscript is innovative, and the conclusions of this paper are mostly well supported by data, however, there are some issues that need to be addressed.

      Weaknesses:

      Major Points:

      (1) In Figure 1, a more detailed description of the knockout strategy should be provided to clarify itself. The knockout strategy in Fig1 is somewhat obscure, such as how is OCT4 inactivated in Oct4mKO2 heterozygotes. As shown in Figure 1, the exon of OCT4 is not deleted, and its promoter is not destroyed. Therefore, how does OCT4 inactivate to form heterozygotes?

      Thank you for helping clarify this. We will add a detailed description of the knockout strategy in the legends for Figure 1A and 1B, as shown below. Note that the same strategy was used by Nichols et al (Cell, 1998).

      Figure 1A. Schemes of mKO2-labeled Oct4 KO (Oct4<sup>mKO2</sup>) and Oct4<sup>flox</sup> alleles. In the Oct4<sup>mKO2</sup> allele, a PGK-pac∆tk-P2A-mKO2-pA cassette was inserted 3.6 kb upstream of the Oct4 transcription start site (TSS) and a promoter-less FRT-SA-IRES-hph-P2A-Venus-pA cassette was inserted into Oct4 intron 1. The inclusion of a stop codon followed by three sets of polyadenylation signal sequences (pA) after the Venus cassette ensures both transcriptional and translational termination, effectively blocking the expression of Oct4 exons 2–5.

      Figure 1B. Schemes of EGFP-labeled Sox2 KO (Sox2<sup>EGFP</sup>) and Sox2 <sup>flox</sup> alleles. In the Sox2 Sox2<sup>EGFP</sup> allele, the 5’ untranslated region (UTR), coding sequence and a portion of the 3’ UTR of Sox2 were deleted and replaced with a PGK-EGFP-pA cassette. Notably, 1,023 bp of the Sox2 3’UTR remain intact.

      (2) Is ZP3-Cre expressed in the zygotes? Is there any residual protein?

      This is indeed a very important issue. Here is why we think we are on the safe side. ZP3 is specifically expressed in growing oocytes, thus making ZP3-Cre a widely used tool for deleting maternally inherited alleles. When we crossed Oct4<sup>flox/flox</sup>; ZP3-Cre<sup>-</sup>_females with _Oct4<sup>flox/flox</sup>; ZP3-Cre<sup>+</sup> males, we got ZP3-Cre<sup>+</sup> Oct4<sup>flox/flox</sup> but no Oct4<sup> flox/∆</sup> or Oct4<sup> ∆/∆</sup> pups, suggesting that the paternally inherited ZP3-Cre allele is not functionally active in zygotes, which is consistent with reports from other researchers (e.g. Frum, et al., Dev Cell 2013; Wu, et al., Nat Cell Biol 2013).

      (3) What motifs are enriched in the rising ATAC-seq peaks after knocking out of OCT4 and SOX2?

      The enriched motifs in the rising ATAC-seq peaking in Oct4 KO and Sox2 KO ICMs are the GATA, TEAD, EOMES and KLF motifs, as shown in Figure 4A and Figure supplement 7.

      (4) The ordinate of Fig4c is lost.

      Thank you for pointing this out. The y-axis is average normalized signals (reads per million-normalized pileup signals). We will add it in the revised version.

      (5) Signals of H3K4me1, H3K27ac, and so on are usually used to define enhancers, and the loci of enhancers vary greatly in different cells. In the manuscript, the authors defined ATAC-seq peaks far from the TSS as enhancers. The definition in this manuscript is not strictly an enhancer.

      Thank you for this insightful comment. We analyzed the published H3K27ac ChIP-seq data of mouse ICM at 94-96 h post hCG (B. Liu, et al., Nat Cell Biol 2024) to assess the enrichment of H3K27ac around our ATAC-seq peaks. Unfortunately, the data quality is poor, e.g., inconsistent across replicates (Author response image 1A), and shows little enrichment around the well-defined enhancers (Author response image 1B). Nevertheless, as we admit not all the distal ATAC-seq peaks or open chromatin regions are enhancers, we have replaced “enhancers” with “open chromatin regions”, “ATAC-seq peaks” or “putative enhancers”.

      Author response image 1.

      Analysis of the published H3K27ac ChIP-seq dataset of mouse ICM at 94-96 h post hCG (B. Liu, et al., Nat Cell Biol 2024). A. ChIP-seq profiles of H3K27ac over the decreased, unchanged and increased ATAC-seq peaks in our Oct4-KO late ICMs. To exclude spurious peaks, only strong unchanged peaks (57,512 out of 142,096) were used in the analysis. B. IGV tracks displaying ATAC-seq and H3K27ac ChIP-seq profiles around Dppa3 and Oct4. Red boxes mark the known OCT-SOX enhancers.

      (6) If Oct4 and Sox2 truly activate sap 30 and Uhrf 1, what effect does interfere with both genes have on gene expression and chromatin accessibility?

      This is indeed an interesting question. Unfortunately, we have not conducted this specific experiment, so we do not have direct results. However, Sap30 is a key component of the mSin3A corepressor complex, while Uhrf1 regulates the establishment and maintenance of DNA methylation. Both proteins are known to function as repressors. Therefore, we hypothesize that interfering with these two genes could alleviate repression of some genes, such as trophectoderm markers, similar to what we have observed in Oct4 KO and Sox2 KO ICMs.

      Reviewer #2 (Public review):

      In this manuscript, Hou et al. investigate the interplay between OCT4 and SOX2 in driving the pluripotent state during early embryonic lineage development. Using knockout (KO) embryos, the authors specifically analyze the transcriptome and chromatin state within the ICM-to-EPI developmental trajectory. They emphasize the critical role of OCT4 and the supportive function of SOX2, along with other factors, in promoting embryonic fate. Although the paper presents high-quality data, several key claims are not well-supported, and direct evidence is generally lacking.

      Major Points:

      (1) Although the authors claim that both maternal KO and maternal KO/zygotic hetero KO mice develop normally, the molecular changes in these groups appear overestimated. A wildtype control is recommended for a more robust comparison. (a complementary comment from the reviewer: “Both maternal KO and maternal-zygotic KO in this study exhibited phenotypic consistency but molecular disparity. Specifically, both KO and control groups could develop normally; however, their chromatin landscapes and transcriptomic profiles showed different. This raises the question of whether the molecular differences are real. We suggest that inclusion of a completely wild-type control group would make the comparison more robust.”)

      Thank you for your feedback as this point was obviously not clear in the manuscript. Here is our explanation: Mouse embryos with a maternal KO or zygotic heterozygous KO of Oct4 or Sox2 show no noticeable phenotype or molecular difference (Figure 2-figure supplement 4A) (Avilion et al., 2003; Frum et al., 2013; Kehler et al, 2004; Nichols et al., 1998; Wicklow et al., 2014; Wu et al., 2013). We have clarified this point in the revised manuscript.

      (2) The authors assert that OCT4 and SOX2 activate the pluripotent network via the OCT-SOX enhancer. However, the definition of this enhancer is based solely on proximity to TSSs, which is a rough approximation. Canonical enhancers are typically located in intronic and intergenic regions and marked by H3K4me1 or H3K27ac. Re-analyzing enhancer regions with these standards could be beneficial. Additionally, the definitions of "close to" or "near" in lines 183-184 are unclear and not defined in the legends or methods.

      Thank you for this insightful and helpful comment. As stated in the response to Reviewer #1’s point (5), we have replaced “enhancers” with “open chromatin regions”, “ATAC-seq peaks” or “putative enhancers”.

      The definition of "close to" or "near" in lines 183-184 is in the legend of Figure 2E and Methods. In the GSEA analysis, Ensembl protein-coding genes with TSSs located within 10 kb of ATAC-seq peak centers were included, so that some of the intronic ATAC-seq peaks were taken into consideration. We have also added the information in the main text of the revised manuscript.

      (3) There is no evidence that the decreased peaks/enhancers could be the direct targets of Oct4 and Sox2 throughout this manuscript. Figures 2 and 4 show only minimal peak annotations related to OCT and SOX motifs, and there is a lack of chromatin IP data. Therefore, claims about direct targets are not substantiated and should be appropriately revised.

      Yes indeed, you have a point. In Figure Supplement 3C, we analyzed the published Sox2 CUT&RUN data from E4.5 ICMs (Li et al., Science, 2023), which demonstrates that the reduced ATAC-seq peaks in our Sox2 KO ICMs are enriched with the Sox2 CUT&RUN signals. Unfortunately, we did not to find similar published data for Oct4 in embryos. We have removed the statement indicating that these are the direct targets in the revised manuscript.

      (4) Lines 143-146 lack direct data to support the claim. Actually, the main difference in cluster 1, 11 and 3, 8, 14 is whether the peak contains OCT-SOX motif. However, the reviewer cannot get any information of peaks activated by OCT4 rather than SOX2 in cluster 1, 11.

      Thank you for the comment that we hope we can clarify.

      Lines 143-146 are: “Notably, the peaks activated by Oct4 but not by Sox2 in the ICM tended to be already open at the morula stage (Figure 2B, clusters 1 and 11), whereas those dependent on both Oct4 and Sox2 became open in the ICM (Figure 2B, clusters 3, 8 and 14).”

      We agree with you that clusters 3/8/14 are more enriched in OCT-SOX motifs than clusters 1/11. However, this is consistent with our observation that accessibility of peaks in clusters 1 and 11 relies mainly on Oct4, while accessibility in clusters 3, 8, 14 depends on both Oct4 and Sox2. But maybe the term “activate” is misleading. We have rephrased the text as below:

      “Notably, compared to the peaks that depend on Oct4 but not Sox2 (Figure 2B, clusters 1 and 11), those reliant on both Oct4 and Sox2 show greater enrichment of the OCT-SOX motif (Figure 2B, clusters 3, 8 and 14). The former group was generally already open in the morula, while the latter group only became open in the ICM. “

      Minor Points:

      (1) Lines 153-159: The figure panel does not show obvious enrichment of SOX2 signals or significant differences in H3K27ac signals across clusters, thus not supporting the claim.

      We hope to be able to explain this.

      Line 153-159 refer to two datasets:  Figure Supplement 3C and 3D.

      In Figure Supplement 3C, the average plots above the heatmaps show that the decreased ATAC-seq peaks (the indigo lines) have higher enrichment with Sox2 CUT&RUN signals than the increased or unchanged peaks (the yellow and light blue lines, respectively).

      In Figure Supplement 3D, the average plots indicate that H3K27ac signals around the center of the decreased ATAC-seq peaks (the indigo line) show higher enrichment compared to the unaltered and decreased groups (the light blue and yellow lines, respectively). Notably, H3K27ac enrichment appears slightly offset from the central nucleosome-free regions.

      (2) Lines 189-190: The term "identify" is overstated for the integrative analysis of RNA-seq and ATAC-seq, which typically helps infer TF targets rather than definitively identifying them.

      You are right. We have replaced “identify” with “infer” in the revised manuscript.

      (3) The Discussion is lengthy and should be condensed.

      We have shortened the discussion in the revised manuscript.

    1. Author Response

      Review 1:

      Major concerns that need to be addressed:

      Investigate the effects of Malat1 on the clearance of Listeria or LCMV.

      In our prior publication (Gagnon et al, Cell Reports) we showed that miR-15/16 deficiency in T cells does not affect the clearance of LCMV, and that transferred memory T cells formed in these mice can function normally to clear a secondary infection with Listeria expressing the LCMV gp33 peptide. However, the size of the memory pool was clearly changed, as was the programming of memory cells. Here, we show that disrupting miR15/16 binding to MALAT1 induces a reciprocal phenotype, validating a biological function for this RNA:RNA interaction. We employed these systems because they are widely used to reveal key aspects of T cell memory, but both infections are readily cleared by the host. These changes in the memory response likely play a limiting role in some biological context(s), and we agree that further investigation to uncover such situations would further validate the importance of this RNA circuit.

      Demonstrate that Malat1 shuttles to the cytosol, this will strengthen the conclusions that Malat1 sponges miR15/16.

      The location of miR-15/16 interaction with Malat1 is an interesting area for future study. Many prior studies have shown clearly that Malat1 is primarily located in the nucleus, but since T cells express such a large excess of this lncRNA, even the remaining fraction detected in the cytosol may be sufficient to “sponge” a significant amount of miR-15/16. Alternatively, these molecules may interact in the nucleus, or during mitosis. As the reviewer suggests, Malat1 may shuttle between compartments, raising the intriguing possibility that it could not only “sponge” but “drag” miR-15/16 away from its targets into the nucleus. A proper analysis of the mechanism of ceRNA function is beyond the scope of this paper, but we do believe that this circuit may be an especially good one for further study.

      Through flow cytometry or immunoblot analyses, investigate the effects of Malat1-miR15/16 on genes listed in table 3. This would add credence to the sequencing and CLIP data.

      We thank the reviewer for bringing to our attention the manuscript’s overemphasis on the former Table 3 gene set, which represented just a few of the hundreds of genes for which our data provide evidence for miR-15/16 binding and inhibition of expression. We have removed this table to avoid the appearance of suggesting an oversimplified model for how miR-15/16 regulate T cell responses, and replaced it with a short description of two targets (Pik3r1 and Mapk8) that link the roles of miR-15/16 in T cell activation and tumor suppression. Like transcription factors, miRNAs function as network regulators of gene expression, gaining biological power through their ability to coregulate many genes with convergent effects on cell behavior. In the case of miR-15/16, our published data, reinforced by the data in this manuscript, indicates that the relevant target network is very large, and that even very small changes in the expression of these targets is sufficient to alter the fate of antigen-responsive T cells in the setting of acute infection.

      This comment also raises the important issue of target validation, which is often difficult, since the effect size for each miRNA target is small (typically 10-30%, sometimes reaching 50% reduction). The expected effect of Malat1 inhibition of miR-15/16 is some fraction of that. Nevertheless, in Figure 3 and Figure 7, we validated two direct targets (CD28 and Bcl2) using flow cytometry, a technique that facilitates precise sampling of protein expression on a large number of individual cells.

      Minor concerns:

      The discussion is too broad and does not address the limitations of the study.

      We added a sentence to acknowledge the limitation regarding small effect sizes and the shortcomings of the acute infection models used in this study:

      “The magnitude of this effect was modest in acute LCMV and Listeria infection, two models that feature robust pathogen clearance, allowing assessment of memory T cells in the absence of chronic antigen persistence. Further work is needed to assess other settings in which Malat1:miR-15/16 interaction may have a bigger impact on the outcome of immune responses.”

      Reviewer 2:

      1) Given the lack of an effect on microRNA or Malat1 levels following the genetic modification is it possible that Malat1 is actually not directly bound by the miRNA? Could the knock-out of the miRNA could induce Ago2 loss on Malat1 by indirect mechanisms? If there is any room for doubt about a direct interaction the authors should at least mention discuss.

      There is very little room for doubt about the direct interaction between miR-15/16 and Malat1. The AHC data we report indicates that the loss of Ago2 binding to the mutant Malat1 occurs predominantly at the site containing the miR-15/16 binding site of interest. This suggests that the mutation we created does not affect global Ago2 levels or occupancy across the rest of the transcript. Further, the miR-15/16 KO data directly support this result, showing that miR-15/16 is necessary for Ago2 binding at that site. If loss of miR15/16 resulted in a non-specific indirect loss of binding to Malat1, we would expect that other binding events would be affected as well, which we do not observe.

      In the Results, the authors write: "miR-15/16 has not been previously shown to interact with Malat1", but they should cite/discuss: MALAT1 regulates the transcriptional and translational levels of proto-oncogene RUNX2 in colorectal cancer metastasis, Qing Ji et al, 2019.

      We thank the reviewer for bringing this study to our attention, and we have cited it in our updated version of the manuscript. While the interaction between miR-15/16 and Malat1 has been shown before, our study represents a significant step beyond this study in two important ways: The rigorous biochemical mapping of the miR-15/16:Malat1 interaction site, and direct evidence for the role of a miR:lncRNA interaction in an in vivo physiological phenotype.

      2) The authors write: "Only a few studies demonstrate sequence dependent function of lncRNAs (Elguindy and Mendell, 2021; Kleaveland et al., 2018; Lee et al., 1999)". But this seems more common that the statement implies (see for example this review: https://www.sciencedirect.com/science/article/pii/S002228361200896 0#s0065).Moreover, SNPs in lncRNAs are associated with pathologies (see for example: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6306726/, where also SNPs in Malat1 are presented). The authors could acknowledge this and by reformulating their sentence and citing these.

      A large number of studies uncovered lncRNA functions without identifying RNA sequences that are responsible for that activity, but evidence for sequence-specific effects remain rare. We thank the reviewer for providing direction to additional sequence-specific studies and we have now cited several of them in the updated version of the introduction:

      “Studies demonstrating sequence dependent function of lncRNAs are comparatively rare (Carrieri et al., 2012; Elguindy and Mendell, 2021; Faghihi et al., 2008; Gong and Maquat, 2011; Kleaveland et al., 2018; Lee et al., 1999; Yoon et al., 2012).”

      In particular, association of important SNPs with lncRNA loci is an exciting motivator in the study of lncRNAs and can be informative in the dissection of lncRNA function. For Malat1 in the linked Minotti et al publication, we do not believe the SNPs referenced represent indications of sequence-specific transcript function. The SNPs identified for Malat1 are rs1194338, rs4102217, and rs591291. In the UCSC genome browser screenshot in Author response image 1, you can see that all of these SNPs are upstream of Malat1 and in regions of extremely dense H3K27Ac, suggesting enhancer function. These SNPs do not represent sequence specific function of the Malat1 transcript, but rather more likely genomic sequence regulation of Malat1 (or nearby gene) expression.

      Author response image 1.

      • Figure 2H: In the figure legend, could the authors clarify what they mean by "same conditions as in F"?

      We have updated the figure legend for clarity.

      • Figure 3 panel labels B, C, D don't match figure.

      We have corrected this and provided an updated figure.

      • Figure 4 D, E, F: Can the authors comment more about why in their opinion early activation genes are not significantly decreased in Malat1 scr/scr?

      Figure 4A shows that interrupting Malat1 interaction with miR-15/16 does affect the early induction of the immediate early gene CD69. Even miR-15/16 deficiency did not affect Nur77 expression, indicating that Malat1 and miR-15/16 regulate specific cues and signaling pathways involved in T cell activation. In particular, the transcriptomic analysis led us to focus on effects on costimulation-induced genes (Figure 3). Figure panels 4D, E, and F show the production of cytokines, including IL-2, which has been well documented to be responsive to CD28 signaling and clearly did so in our experiments. These data show a consistent increase in miR-15/16-deficient T cells, despite considerable noise in the assay. The trend toward reduced IL-2 in Malatscr/scr T cells is of smaller magnitude, as expected, and not statistically significant. Repeating this assay to obtain a better p value doesn’t seem warranted. However, we did independently observe decreased IL-2 production in Malatscr/scr T cells in an ex vivo cytokine capture assay (Figure 7F-G).

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their careful review of our manuscript and the constructive comments. We have addressed the majority of comments with either new experiments, analyses, and/or text revisions. A summary of the major changes is listed below, followed by our point-by-point responses to the reviewer comments.

      Major changes:

      (1) We sought to gain insight into the potential mechanistic cause of the increased intrinsic excitability of Cntnap2<sup>-/-</sup> dSPNs. Given that Kv1.1 and 1.2 potassium channels are known to interact with Caspr2 (the protein encoded by Cntnap2), we hypothesized that altered number, location, and/or function of these channels may underlie the excitability change in these cells. To investigate this, we performed new analyses of the initial dataset to assess action potential (AP) properties known to be impacted by potassium channel function. Indeed, we found that AP frequency was increased, and rheobase current, AP latency and AP threshold were decreased in Cntnap2<sup>-/-</sup> dSPNs, suggestive of altered Kv1.2 function. These data are in the new Supplemental Fig. 4. We also performed new electrophysiology experiments in which we pharmacologically blocked Kv1.1 and 1.2 to assess whether the effects of blocking these channels would be occluded in Cntnap2<sup>-/-</sup> dSPNs. We found that 1) WT dSPNs responded to blockade of Kv1.1/1.2 channels by increasing their excitability but Cntnap2<sup>-/-</sup> dSPNs did not and 2) Kv1.1/1.2 channels were more important contributors to the excitability of dSPNs compared to iSPNs. These new data are presented in the revised Fig. 4 and Supplemental. Figs. 5 and 6.

      (2) We performed additional experiments to assess excitatory synaptic properties, specifically AMPA/NMDA receptor ratio. This has been added to Fig. 1.

      (3) We performed more rigorous statistical analyses of the initial physiology datasets to align with the statistics performed for the revision experiments. This applies to Fig. 1, Fig. 2, Fig. 3, Fig. 5, and Supp. Fig. 2.

      (4) In the discussion section, we now highlight potential limitations of the study and further discuss the variable impact that Cntnap2 loss has on different cell types and brain regions.  

      Reviewer #1 (Public Review):

      Summary:

      Cording et al. investigated how deletion of CNTNAP2, a gene associated with autism spectrum disorder, alters corticostriatal engagement and behavior. Specifically, the authors present slice electrophysiology data showing that striatal projection neurons (SPNs) are more readily driven to fire action potentials in response to stimulation of corticostriatal afferents, and this is due to increases in SPN intrinsic excitability rather than changes in excitatory or inhibitory synaptic inputs. The authors show that CNTNAP2 mice display repetitive behaviors, enhanced motor learning, and cognitive inflexibility. Overall the authors' conclusions are supported by their data, but a few claims could use some more evidence to be convincing.

      Strengths:

      The use of multiple behavioral techniques, both traditional and cutting-edge machine learning-based analyses, provides a powerful means of assessing repetitive behaviors and behavioral transitions/rigidity.

      Characterization of both excitatory and inhibitory synaptic responses in slice electrophysiology experiments offers a broad survey of the synaptic alterations that may lead to increased corticostriatal engagement of SPNs.

      Weaknesses:

      (1) The authors conclude that increased cortical engagement of SPNs is due to changes in SPN intrinsic excitability rather than synaptic strength (either excitatory or inhibitory). One weakness is that only AMPA receptor-mediated responses were measured. Though the holding potential used for experiments in Figure 1FI wasn't clear, recordings were presumably performed at a hyperpolarized potential that limits NMDA receptormediated responses. Because the input-output experiments used to conclude that corticostriatal engagement of SPNs is elevated (Figure 1B-E) were conducted in the current clamp, it is possible that enhanced NMDA receptor engagement contributed to increased SPN responses to cortical stimulation. Confirming that NMDA receptor-mediated EPSC components are not altered would strengthen the main conclusion.

      The reviewer is correct, the initial optically-evoked EPSC assessments were performed at a hyperpolarized potential (-70mV), thus measuring primarily AMPAR-mediated currents. We agree that assessing potential changes in the NMDAR-mediated EPSC component is important and we have completed new experiments to assess this. We find no differences in NMDAR-mediated EPSCs assessed at +40mV or the AMPA:NMDA ratio.

      These results have been added to Fig. 1. An expanded analysis of these results is shown in Author response image 1. We note that the previous AMPAR-mediated EPSC results have been replicated in this additional experiment, again showing no change in Cntnap2<sup>-/-</sup> SPNs. 

      Author response image 1.

      AMPA and NMDA receptor-mediated EPSCs are unchanged in Cntnap2<sup>-/-</sup> SPNs. (A) Quantification (mean ± SEM) of AMPA:NMDA ratio per cell for Cntnap2<sup>+/+</sup> and Cntnap2<sup>-/-</sup> dSPNs, p=0.9537, MannWhitney test. (B) dSPN AMPA current per cell, p=0.6172, Mann-Whitney test. (C) dSPN NMDA current per cell, p=0.6009, Mann-Whitney test. (D) dSPN AMPA:NMDA ratio averaged by animal, p=0.8413, Mann-Whitney test. (E) dSPN AMPA current averaged by animal, p>0.9999, Mann-Whitney test. (F) dSPN NMDA current averaged by animal, p=0.6905, Mann-Whitney test. (G) Quantification (mean ± SEM) of AMPA:NMDA ratio per cell for Cntnap2<sup>+/+</sup> and Cntnap2<sup>-/-</sup> iSPNs, p=0.4104, Mann-Whitney test. (H) iSPN AMPA current per cell, p=0.9010, Mann-Whitney test. (I) iSPN NMDA current per cell, p=0.9512, two-tailed unpaired t test. (J) iSPN AMPA:NMDA averaged by animal, p=0.3095, Mann-Whitney test. (K) iSPN AMPA current averaged by animal, p=>0.9999, Mann-Whitney test. (L) iSPN NMDA current averaged by animal, p=0.8413, MannWhitney test. All values were recorded using 20% blue light intensity. For dSPNs: Cntnap2<sup>+/+</sup> n=22 cells from 5 mice, Cntnap2<sup>-/-</sup> n=22 cells from 5 mice. For iSPNs: Cntnap2<sup>+/+</sup> n=21 cells from 5 mice, Cntnap2<sup>-/-</sup>n=21 cells from 5 mice.

      (2) Data clearly show that SPN intrinsic excitability is increased in knockout mice. Given that CNTNAP2 has been linked to potassium channel regulation, it would be helpful to show and quantify additional related electrophysiology data such as negative IV curve responses and action potential hyperpolarization.

      We appreciate this suggestion. As indicated by the reviewer, Caspr2, has previously been shown to control the clustering of Kv1-family potassium channels in axons isolated from optic nerve and corpus callosum (PMIDs: 10624965, 12963709, 29300891). In particular, Caspr2 is known to associate directly with Kv1.2 (PMID: 29300891). To assess a potential contribution of Kv1.2 to the excitability phenotype, we performed additional analyses of our original dataset to quantify AP properties known to be impacted by changes in Kv1.2 function (i.e. latency to fire and AP threshold, new Supp. Fig. 4). We identified several changes in Cntnap2<sup>-/-</sup> dSPNs resembling those that occur in wild-type cells when Kv1.2 is blocked (i.e. reduced threshold and reduced latency to fire, Supp. Fig. 4). 

      We then performed a pharmacological experiment, blocking Kv1.2 using α-dendrotoxin (α-DTX) while recording intrinsic excitability to assess whether the effects of this drug on dSPN excitability were occluded in Cntnap2<sup>-/-</sup> cells. Indeed, we found that while blocking Kv1.2 in wild-type dSPNs significantly reduced threshold and increased intrinsic excitability, these effects were not seen in Cntnap2<sup>-/-</sup> dSPNs (new Fig. 4). We believe that this suggests an altered contribution of Kv1.2 to the intrinsic excitability of mutant dSPNs, owing to a change in the clustering, number, or function of these channels. Therefore, loss-of-function of Kv1.2 is a likely explanation for the enhanced intrinsic excitability of Cntnap2<sup>-/-</sup> dSPNs. Interestingly, we found that α-DTX had only subtle effects on iSPNs (Cntnap2 WT or mutant), suggesting a lesser contribution of this channel in controlling the excitability of indirect pathway cells. This finding can account for the relatively stronger effect of Cntnap2 loss on dSPN physiology. The results of these new experiments and analyses are presented in the new Fig. 4, Supp. Fig. 5 and Supp. Fig. 6. 

      (3) As it stands, the reported changes in dorsolateral striatum SPN excitability are only correlative with reported changes in repetitive behaviors, motor learning, and cognitive flexibility.

      We agree that we have not identified a causative relationship between the change in dorsolateral dSPN excitability and the behaviors that we measured in Cntnap2<sup>-/-</sup> mice. That said, in a previous study, we showed that selective deletion of the autism spectrum disorder (ASD) risk gene Tsc1 from dorsal striatal dSPNs resulted in increased corticostriatal drive and this was sufficient to increase rotarod motor learning (PMID: 34380034). Therefore, while we have not demonstrated causality in this study, we hypothesize that changes in dSPN excitability are likely to contribute to the behavioral phenotypes observed in Cntnap2<sup>-/-</sup> mice. 

      Reviewer #2 (Public Review):

      Summary:

      This is an important study characterizing striatal dysfunction and behavioral deficits in Cntnap2<sup>-/-</sup> mice. There is growing evidence suggesting that striatal dysfunction underlies core symptoms of ASD but the specific cellular and circuit level abnormalities disrupted by different risk genes remain unclear. This study addresses how the deletion of Cntnap2 affects the intrinsic properties and synaptic connectivity of striatal spiny projection neurons (SPN) of the direct (dSPN) and indirect (iSPN) pathways. Using Thy1-ChR2 mice and optogenetics the authors found increased firing of both types of SPNs in response to cortical afferent stimulation. However, there was no significant difference in the amplitude of optically-evoked excitatory postsynaptic currents (EPSCs) or spine density between Cntnap2<sup>-/-</sup> and WT SPNs, suggesting that the increased corticostriatal coupling might be due to changes in intrinsic excitability. Indeed, the authors found Cntnap2<sup>-/-</sup> SPNs, particularly dSPNs, exhibited higher intrinsic excitability, reduced rheobase current, and increased membrane resistance compared to WT SPNs. The enhanced spiking probability in Cntnap2<sup>-/-</sup> SPNs is not due to reduced inhibition. Despite previous reports of decreased parvalbumin-expressing (PV) interneurons in various brain regions of Cntnap2<sup>-/-</sup> mice, the number and function (IPSC amplitude and intrinsic excitability) of these interneurons in the striatum were comparable to WT controls.

      This study also includes a comprehensive behavioral analysis of striatal-related behaviors. Cntnap2<sup>-/-</sup> mice demonstrated increased repetitive behaviors (RRBs), including more grooming bouts, increased marble burying, and increased nose poking in the holeboard assay. MoSeq analysis of behavior further showed signs of altered grooming behaviors and sequencing of behavioral syllables. Cntnap2<sup>-/-</sup> mice also displayed cognitive inflexibility in a four-choice odor-based reversal learning assay. While they performed similarly to WT controls during acquisition and recall phases, they required significantly more trials to learn a new odor-reward association during reversal, consistent with potential deficits in corticostriatal function.

      Strengths:

      This study provides significant contributions to the field. The finding of altered SPN excitability, the detailed characterization of striatal inhibition, and the comprehensive behavioral analysis are novel and valuable to understanding the pathophysiology of Cntnap2<sup>-/-</sup> mice.

      Weaknesses:

      (1) The approach based on Thy-ChR2 mice has the advantage of overcoming issues caused by injection efficiency and targeting variability. However, the spread of oEPSC amplitudes across mice shown in panels of Figure 1 G/I is very high with almost one order of magnitude difference between some mice. Given this is one of the most important points of the study it will be important to further analyze and discuss what this variability might be due to. Typically, in acute slice recordings, the within-animal variability is larger than the variability across animals. From the sample sizes reported it seems the authors sampled a large number of animals, but with a relatively low number of neurons per animal (per condition). Could this be one of the reasons for this variability?

      We agree with the reviewer that the variability in these experiments is quite large. We have replicated these experiments in the process of performing AMPA:NMDA ratio recordings (see above response to Reviewer 1’s comment). We again find no differences in AMPAR-mediated EPSC amplitude between WT and mutant SPNs (Author response image 2). Notably, these experiments also demonstrate a large amount of variability. In the original dataset, a small number of cells were collected from each animal (~1-3 cells/mouse). However, the variability remains in the new dataset, in which more cells were collected from each animal (~4-6 cells/mouse). We find both withinanimal and between-animal variability, as can be seen in Author response image 2 (recordings made from the same animal are color-coordinated). Potential sources of variability in this experiment include: 1) variable expression of ChR2 per mouse, 2) variable innervation of ChR2-expressing terminals onto any given recorded cell, and/or 3) differences in prior plasticity state between cells (i.e. some neurons may have recently undergone corticostriatal LTP or LTD). 

      Author response image 2.

      Optically-evoked AMPAR EPSCs exhibit within- and between-animal variability. (A) Quantification of EPSC amplitude evoked in dSPNs at different light intensities from the original dataset, plotted by cell (line represents the mean, dots/squares represent average EPSC amplitude for each recorded cell). Cntnap2<sup>+/+</sup> n=17 cells from 8 mice, Cntnap2<sup>-/-</sup> n=13 cells from 5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 56) = 0.3879, geno F (1, 28) = 0.8098, stim F (1.047, 29.32) = 76.56. (B) Quantification of EPSC amplitude evoked in dSPNs, averaged by mouse (line represents the mean, dots/squares represent average EPSC amplitude for each mouse). Cntnap2<sup>+/+</sup> n=8 mice, Cntnap2<sup>-/-</sup> n=5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 22) = 0.2154, geno F (1, 11) = 0.2585, stim F (1.053, 11.58) = 49.68. (C) Quantification of EPSC amplitude in dSPNs from the revision dataset, plotted by cell (line represents the mean, dots/squares represent average EPSC amplitude for each recorded cell). Cntnap2<sup>+/+</sup> n=22 cells from 5 mice, Cntnap2<sup>-/-</sup> n=22 cells from 5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 84) = 0.01885, geno F (1, 42) = 0.002732, stim F (1.863, 78.26) = 20.93. (D) Quantification of EPSC amplitude in dSPNs from the revision dataset, averaged by mouse (line represents the mean, dots/squares represent average EPSC amplitude for each mouse). Cntnap2<sup>+/+</sup> n=5 mice, Cntnap2<sup>-/-</sup> n=5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 16) = 0.06288, geno F (1, 8) = 0.006548, stim F (1.585, 12.68) = 16.97. (E) Quantification of EPSC amplitude evoked in iSPNs from the original dataset, plotted by cell (line represents the mean, dots/squares represent average EPSC amplitude for each recorded cell). Cntnap2<sup>+/+</sup> n=13 cells from 6 mice, Cntnap2<sup>-/-</sup> n=11 cells from 5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 44) = 0.9414, geno F (1, 22) = 1.333, stim F (1.099, 24.18) = 52.26. (F) Quantification of EPSC amplitude evoked in iSPNs from original dataset, averaged by mouse (line represents the mean, dots/squares represent average EPSC amplitude for each mouse). Cntnap2<sup>+/+</sup> n=6 mice, Cntnap2<sup>-/-</sup> n=5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 18) = 0.4428, geno F (1, 9) = 0.5635, stim F (1.095, 9.851) = 23.82. (G) Quantification of EPSC amplitude evoked in iSPNs from the revision dataset, plotted by cell (line represents the mean, dots/squares represent average EPSC amplitude for each recorded cell). Cntnap2<sup>+/+</sup> n=21 cells from 5 mice, Cntnap2<sup>-/-</sup> n=21 cells from 5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 80) = 0.04134, geno F (1, 40) = 0.007025, stim F (1.208, 48.31) = 102.9. (H) Quantification of EPSC amplitude evoked in iSPNs from the revision dataset, averaged by mouse (line represents the mean, dots/squares represent average EPSC amplitude for each mouse). Cntnap2<sup>+/+</sup> n=5 mice, Cntnap2<sup>-/-</sup> n=5 mice. Repeated measures two-way ANOVA p values are shown; g x s F (2, 16) = 0.001865, geno F (1, 8) = 0.1004, stim F (1.179, 9.433) = 61.31.

      (2) This is particularly important because the analysis of corticostriatal evoked APs in panels C and E is performed on pooled data without considering the variability in evoked current amplitudes across animals shown in G and I. Were the neurons in panels C/E recorded from the same mice as shown in G/I? If so, it would be informative to regress AP firing data (say at 20% LED) to the average oEPSC amplitude recorded on those mice at the same light intensity. However, if the low number of neurons recorded per mouse is due to technical limitations, then increasing the sample size of these experiments would strengthen the study.

      We appreciate this point; however, the evoked AP experiment and the evoked EPSC experiment were performed on different mice, so it is not possible to correlate the data across experiments. While the evoked AP experiments were performed using potassium-based internal, we used a cesium-based internal to measure AMPAR-mediated EPSCs to more accurately detect synaptic currents. We note that the evoked AP experiments share a similar amount of variability as the evoked EPSC experiments, again possibly owing to variable expression of channelrhodopsin per mouse, variable innervation of ChR2-positive terminals onto individual cells, and/or differences in prior plasticity status between cells.  

      (3) On a similar note, there is no discussion of why iSPNs also show increased corticostriatal evoked firing in Figure 1E, despite the difference in intrinsic excitability shown in Figure 3. This suggests other potential mechanisms that might underlie altered corticostriatal responses. Given the role of Caspr2 in clustering K channels in axons, altered presynaptic function or excitability could also contribute to this phenotype, but potential changes in PPR have not been explored in this study.

      We have now performed more rigorous statistics on the data in Fig. 1 (repeated measures two-way ANOVA) such that the difference in corticostriatal evoked firing in Cntnap2<sup>-/-</sup> iSPNs no longer reaches statistical significance. This is consistent with the modest but statistically non-significant effect of Cntnap2 loss on iSPN intrinsic excitability. We agree with the reviewer that presynaptic alterations could potentially contribute to the changes in cortically-driven action potentials, especially as this experiment was performed without any synaptic blockers present, and Cntnap2 is deleted from all cells. That said, if changes in presynaptic release probability accounted for the increased corticostriatal drive, we would expect to see differences in cortically-evoked EPSCs onto SPNs. 

      While we can’t rule out the possibility of pre-synaptic changes, a straightforward explanation for our findings is that loss or alteration of Kv1.2 channel function is responsible for the increased excitability of Cntnap2<sup>-/-</sup> dSPNs, resulting in enhanced spiking in response to cortical input. Given the fact that Kv1.2 channels appear less important for regulating iSPN excitability (see new Fig. 4 and Supp. Fig. 6), this can explain the greater impact of Cntnap2 loss on dSPN physiology.

      (4) Male and female SPNs have different intrinsic properties but the number and/or balance of M/F mice used for each experiment is not reported.

      We agree that this is an important consideration. Author response table 1 provides the sex breakdown for the intrinsic excitability experiments. While we did not explicitly power the experiments to test for sex differences, Author response image 3 shows the data separated by sex and genotype for the intrinsic excitability experiments. Within genotype, we find no significant differences between males and females, except for Cntnap2<sup>-/-</sup> iSPNs which showed a significant interaction between sex and current step (Author response image 3F). Interestingly, while present in both sexes, the excitability shift of Cntnap2<sup>-/-</sup> dSPNs may be slightly more pronounced in females compared to males (Author response image 3C and D). However, this result would require further validation with a greater sample size.

      Author response table 1.

      Numbers of male and female mice used for the intrinsic excitability experiments.

      Author response image 3.

      Enhanced excitability of Cntnap2<sup>-/-</sup> dSPNs is present in both males and females. (A) Quantification (mean ± SEM) of the number of APs evoked in dSPNs in Cntnap2<sup>+/+</sup> males and females at different current step amplitudes. Cntnap2<sup>+/+</sup> males n=12 cells from 4 mice, Cntnap2<sup>+/+</sup> females n=8 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; s x c F (28, 560) = 0.8992, sex F (1, 20) = 0.3754, current F (1.279, 25.57) = 56.85. (B) Quantification (mean ± SEM) of the number of APs evoked in dSPNs in Cntnap2<sup>-/-</sup> males and females at different current step amplitudes. Cntnap2<sup>-/-</sup> males n=12 cells from 4 mice, Cntnap2<sup>-/-</sup> females n=11 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; s x c F (28, 588) = 0.6752, sex F (1, 21) = 0.04534, current F (2.198, 46.15) = 78.89. (C) Quantification (mean ± SEM) of the number of APs evoked in dSPNs in Cntnap2<sup>+/+</sup> males and Cntnap2<sup>-/-</sup> males at different current step amplitudes. Cntnap2<sup>+/+</sup> males n=12 cells from 4 mice, Cntnap2<sup>-/-</sup> males n=12 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; g x c F (28, 672) = 2.233, geno F (1, 24) = 3.746, current F (1.708, 40.98) = 79.82. (D) Quantification (mean ± SEM) of the number of APs evoked in dSPNs in Cntnap2<sup>+/+</sup> females and Cntnap2<sup>-/-</sup> females at different current step amplitudes. Cntnap2<sup>+/+</sup> females n=8 cells from 4 mice, Cntnap2<sup>-/-</sup> females n=11 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; g x c F (28, 476) = 1.547, geno F (1, 17) = 5.912, current F (1.892, 32.17) = 58.76. (E) Quantification (mean ± SEM) of the number of APs evoked in iSPNs in Cntnap2<sup>+/+</sup> males and females at different current step amplitudes. Cntnap2<sup>+/+</sup> males n=10 cells from 4 mice, Cntnap2<sup>+/+</sup> females n=12 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; s x c F (28, 560) = 1.236, sex F (1, 20) = 1.074, current F (2.217, 44.34) = 179.6. (F) Quantification (mean ± SEM) of the number of APs evoked in iSPNs in Cntnap2<sup>-/-</sup> males and females at different current step amplitudes. Cntnap2<sup>-/-</sup> males n=12 cells from 4 mice, Cntnap2<sup>-/-</sup> females n=9 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; s x c F (28, 532) = 2.513, sex F (1, 19) = 2.639, current F (1.858, 35.31) = 152.5. (G) Quantification (mean ± SEM) of the number of APs evoked in iSPNs in Cntnap2<sup>+/+</sup> males and Cntnap2<sup>-/-</sup> males at different current step amplitudes. Cntnap2<sup>+/+</sup> males n=10 cells from 4 mice, Cntnap2<sup>-/-</sup> males n=12 cells from 4 mice. Repeated measures twoway ANOVA p values are shown; g x c F (28, 560) = 0.4723, geno F (1, 20) = 0.5675, current F (2.423, 48.47) = 301.7. (H) Quantification (mean ± SEM) of the number of APs evoked in iSPNs in Cntnap2<sup>+/+</sup> females and Cntnap2<sup>-/-</sup> females at different current step amplitudes. Cntnap2<sup>+/+</sup> females n=12 cells from 4 mice, Cntnap2<sup>-/-</sup> females n=9 cells from 4 mice. Repeated measures two-way ANOVA p values are shown; g x c F (28, 532) = 1.655, geno F (1, 19) = 0.2322, current F (2.081, 39.55) = 99.45.

      (5) There is no mention of how membrane resistance was calculated, and no I/V plots are shown.

      Passive properties were calculated from the average of five -5 mV, 100 ms long test pulse steps applied at the beginning of every experiment. Membrane resistance was calculated from the double exponential curve fit. This has now been added to the methods section.

      (6) It would be interesting to see which behavior transitions most contribute to the decrease in entropy. Are these caused by repeated or perseverative grooming bouts? Or is this inflexibility also observed across other behaviors? The transition map in Figure S5 shows the overall number of syllables and transitions but not their sequence during behavior. Can this be analyzed by calculating the ratio of individual 𝑢𝑖 × 𝑝𝑖,𝑗 × log2 𝑝𝑖,𝑗 factors across genotypes?

      We thank the reviewer for raising an insightful question. Here we use a finite state Markov chain model to describe the syllable transitions in animal behavior. To quantify the randomness in the system, we calculated the entropy of the Markov chain (see methods section). The reviewer suggested calculating the partial entropy of the transition matrix, which would allow us to estimate the contribution of a subset of states to the entropy of the whole system, given by the equation:

      The partial equation can indeed quantify the stochasticity, or “flexibility” in our context, of the sub-system containing only a subset of the behavior syllables. However, there are two main limitations to this approach:

      (1) The partial entropy fails to account for the transitions connecting the subset with the rest of the states in the system

      (2) The stationary distribution may not reflect the actual probabilities in the isolated sub-system S.

      Consequently, the partial entropy cannot be directly interpreted as the fraction of contributions from specific syllable pairs or sub-system to the entropy of the whole system. To be more specific, while a significant difference between the same sub-system in WT and KO groups could indicate that the sub-system contributes significantly to the difference of overall entropy, a non-significant result does not mean that the sub-system does not contribute to overall entropy difference, as interactions between the sub-system and other notconsidered states are not accounted for.

      Author response image 4.

      Grooming syllables contribute to some but not all differences in syllable transitions in Cntnap2<sup>-/-</sup> mice. We calculated the entropy of each syllable pair using 𝑢𝑖 × 𝑝𝑖,𝑗 × log2 𝑝𝑖,𝑗 for every syllable pair and every animal. We then statistically tested the difference between genotypes for each syllable pair using Mann-Whitney tests. This plot displays those adjusted p-values for each syllable pair between WT and KO groups. The significant p-values suggest that the transitions to syllables 24 and 25 are different between genotypes (note that these correspond to grooming syllables, see Fig. 5N). However, since the overall entropy is a summation of every pair, it is difficult to conclude that syllables 24 and 25 are the sole contributors to the different entropy we observed.

      Reviewer #3 (Public Review):

      Summary:

      The authors analyzed Cntnap2 KO mice to determine whether loss of the ASD risk gene CNTNAP2 alters the dorsal striatum's function.

      Strengths:

      The results demonstrate that loss of Cntnap2 results in increased excitability of striatal projection neurons (SPNs) and altered striatal-dependent behaviors, such as repetitive, inflexible behaviors. Unlike other brain areas and cell types, synaptic inputs onto SPNs were normal in Cntnap2 KO mice. The experiments are welldesigned, and the results support the authors' conclusions.

      Weaknesses:

      The mechanism underlying SPN hyperexcitability was not explored, and it is unclear whether this cellular phenotype alone can account for the behavioral alterations in Cntnap2 KO mice. No clear explanation emerges for the variable phenotype in different brain areas and cell types.

      We agree that identifying the mechanism by which Cntnap2 loss affects intrinsic excitability is interesting and important. We have added experiments to address this and conclude that the improper clustering, number, or function of Kv1.2 channels in Cntnap2<sup>-/-</sup> dSPNs is likely responsible for their increased excitability. These channels are known to be clustered/organized in part by Caspr2 (PMIDs: 10624965, 12963709, 29300891), and Kv1.2 channels are known to play an important role in regulating excitability in SPNs (PMIDs: 13679409, 32075716). In the case of dSPNs, blocking these channels with α-DTX significantly increased the excitability of WT cells (as has been previously reported); however, this effect was occluded in mutant cells, perhaps owing to a decreased contribution of Kv1.2 channels to excitability in Cntnap2<sup>-/-</sup> dSPNs. In addition, we found that blockade of these channels with α-DTX only modestly affected the excitability of iSPNs. Therefore, this can explain why loss of Cntnap2 more strongly affects the excitability of dSPNs. Please see new Fig. 4, Supp. Fig. 5 and Supp. Fig. 6 for these new data. 

      We agree with the reviewer that we have not identified a causative relationship between the change in dSPN excitability and the behavioral alterations in Cntnap2<sup>-/-</sup> mice. This is a limitation of the study. 

      It is interesting to speculate on the root of the varying impacts to excitability that occur across different brain regions and cell types in Cntnap2<sup>-/-</sup> mice. Increased excitability, as we see in dSPNs, has been identified in cerebellar Purkinje cells and L2/3 pyramidal neurons in somatosensory cortex in the context of Cntnap2 loss (PMIDs: 34593517, 30679017, 36793543). However, other cell types in Cntnap2<sup>-/-</sup> mice have exhibited no change in excitability (mPFC, L2/3 pyramidal neurons, PMID: 31141683) or hypoexcitability (subset of L5/6 pyramidal neurons, PMID: 29112191). While all of these cell types express Kv1.2 channels, they fundamentally vary in their intrinsic properties, owing to the role that other ion channels play in membrane excitability. As a result, loss of Cntnap2 is expected to have a variable effect on excitability depending on the cell type and the complement of other ion channels that are present. In addition, an initial change in excitability may drive secondary, potentially compensatory, changes in other channels that lead to a different excitability state. These changes are also expected to be cell type-specific. We do note that both of the cell types that show increased excitability in the context of Cntnap2 loss have been shown to exhibit an α-DTX-sensitive Kv1 channel current, such that application of α-DTX results in increased firing of these cells (cerebellar Purkinje cells; PMIDs: 17087603, 16210348 and L2/3 pyramidal neurons in somatosensory cortex; PMID: 17215507). These findings are consistent with our results in Cntnap2<sup>-/-</sup> dSPNs. 

      Reviewer #1 (Recommendations For The Authors):

      More thorough analysis of some of the manually quantified behaviors would be helpful. For example, only the grooming bout number was presented- what about the duration of bouts and total time grooming? Similarly, for the open field the number of center entries was reported but what about the total time in the center?

      We have quantified the time spent grooming and total time spent in the center during the open field test from our original data (Author response image 5). These data were not originally included in the manuscript because they were recorded for only a subset of the total animals. For each of these measures we find trend level changes, which are consistent with the primary measures reported in the main manuscript. 

      Author response image 5.

      Time in center and time spent grooming trend towards an increase in Cntnap2<sup>-/-</sup> mice.  (A) Quantification (mean ± SEM) of total time spent in the center of the open field during a 60 minute test, p=0.0656, Mann-Whitney test. (B) Time spent grooming during the first 20 minutes of the open field test, p=0.0611, Mann-Whitney test. For both measurements, Cntnap2<sup>+/+</sup> n=18 mice, Cntnap2<sup>-/-</sup> n=19 mice.

      Reviewer #3 (Recommendations For The Authors):

      What accounts for the hyperexcitability observed in Cntnap2-deficient SPNs? The authors noted that excitability is reportedly increased, reduced, or unchanged in different brain areas. What accounts for this disparity? Is it about the subcellular localization of Kv1 channels? The authors may want to test this possibility experimentally. At least, they may want to test whether Kv1 channels are mislocalized in SPNs.

      We agree that this is an important point, and we have performed additional experiments to address this. We find that the Kv1.2 blocker a-DTX significantly increases the excitability of WT dSPNs but not Cntnap2<sup>-/-</sup> dSPNs. This suggests that the mechanism underlying dSPN hyperexcitability in Cntnap2 mutants is the improper clustering, number, or function of Kv1.2 channels. These channels are known to be clustered and organized in part by Caspr2 (PMIDs: 10624965, 12963709, 29300891) and have been shown to play an important role in regulating the excitability of SPNs (PMIDs: 13679409, 32075716). Interestingly, we find that a-DTX has less of an effect on the excitability of iSPNs, which may account for the greater impact of Cntnap2 loss on dSPNs. Please see new Fig. 4, Supp. Fig. 5 and Supp. Fig. 6 for these added data and analyses. 

      Please see above response to Reviewer #3 for our speculation on the variable impact of Cntnap2 loss on different cell types and brain regions. 

      We agree with the reviewer that assessing potential differences in subcellular localization of Kv1 channels in our model would bolster the conclusion that these channels are mislocalized in the Cntnap2<sup>-/-</sup> striatum. We piloted these experiments using immunohistochemistry to stain for Kv1.1 and 1.2 but found that without very high-resolution imaging, it would be challenging to accurately quantify Kv1 puncta in a cell type-specific manner. We instead chose to investigate the functional contribution of Kv1 channels to the dSPN hyperexcitability phenotype through the a-DTX experiments outlined above. α-DTX strongly inhibits Kv1.2 channels, but also Kv1.1 channels to some extent (PMIDs: 12042352, 13679409). We find that the effects of a-DTX on SPN excitability are occluded in Cntnap2<sup>-/-</sup> dSPNs; therefore, we conclude that Kv1.2 (and possibly Kv1.1) channels have reduced function in these cells. Further work will be needed to determine if this is a result of channel mislocalization or another type of alteration. 

      The authors did not detect synaptic changes in Cntnap-deficient SPNs. This important observation should be briefly discussed in the context of previous work in other brain regions and cell types. For example, some studies reported structural and functional changes at excitatory synapses. The variable impact on synapses suggests distinct compensatory mechanisms in different brain areas.

      Given the prior literature showing effects of Cntnap2 loss on synapses in other brain regions, we were surprised that striatal synapses were not impacted in our model. We agree with the reviewer that the variable changes in synaptic properties across brain regions in Cntnap2 mutant mice is likely a result of distinct compensatory changes in these regions. Differences may also arise depending on whether the synaptic changes originate from the post-synaptic cell or from pre-synaptic changes. An interesting direction for future studies would be to explore the developmental trajectory of excitability and synaptic changes to determine which may be initial perturbations versus those that are secondary and potentially compensatory.

      Line 138: "synaptic excitability". How is this term defined? Consider "synaptic changes" instead.

      “Synaptic excitability” was used to mean a change in the number and/or function of glutamate receptors. We have now changed this term to “excitatory synaptic changes.”

      Consider a short paragraph to highlight some limitations of this study. For example, it is unclear whether SPN hyperexcitability results from a compensatory change in Cntnap2 KO mice and whether the behavioral phenotype is solely due to this cellular phenotype. The study focuses on cortical projections onto SPNs, but these cells receive inputs from other brain areas that were not explored. Lastly, no clear explanation emerges for the variable phenotype in different brain areas and cell types.

      We thank the reviewer for this suggestion and have added several paragraphs to the discussion highlighting some limitations of this study.

      We hypothesize that the dSPN hyperexcitability in Cntnap2<sup>-/-</sup> mice is a primary change, due to the direct relationship between Caspr2 and Kv1.2 channels. The results of our -DTX experiments suggest that the function and/or contribution of these channels to excitability is altered in Cntnap2<sup>-/-</sup> dSPNs. However, it is possible that there are additional changes in dSPNs that occur as a result of Cntnap2 loss and contribute to the hyperexcitability of these cells. Rather surprisingly, we don’t find evidence for altered excitatory (specifically from cortical inputs) or inhibitory synaptic function, suggesting lack of engagement of homeostatic mechanisms at the synaptic level.

      We have not yet determined whether there is a causative relationship between the change in dSPN excitability and the behavioral alterations in Cntnap2<sup>-/-</sup> mice. This is a limitation of the current study. In our discussion section, we highlight that the dSPN changes we observe in dorsolateral striatum (DLS) are known to be sufficient to enhance rotarod learning in other mouse models and thus supports a connection between this cellular change and behavior. For the other behaviors we measured, we acknowledge that both DLS and other striatal or extra-striatal brain regions have been implicated in these behaviors, and therefore less of a direct connection can be made. 

      In terms of the inputs, we focused on cortical inputs given their known role in mediating motor and habit learning (PMID: 15242609, 16237445, 19198605). Notably, corticostriatal synapses have been shown to be altered across a variety of mouse models with mutations in ASD risk genes and therefore may be a point of convergence for disparate genetic insults (PMID: 31758607). We agree that the striatum receives inputs from a variety of brain regions, notably the thalamus, which we did not explore in this study. This would be an interesting area for future studies.

      Finally, it is difficult to speculate on the root of the varying impacts to excitability that occur across different brain regions and cell types in Cntnap2<sup>-/-</sup> mice. Please see above response to Reviewer #3 for some speculation on this point in regard to the potential involvement of Kv1.2 in the excitability changes in various Cntnap2<sup>-/-</sup> cell types. To expand upon this, it is known that ASD-associated mutations can have varying impacts on cell function even across similar cell types within a given brain region – we have seen this between dSPNs and iSPNs (this study, PMIDs: 34380034, 39358043), as have other groups studying ASD risk gene mutations in striatum (PMID: 24995986). This differential impact of the same mutation on intrinsic and/or synaptic physiology across cell types has been identified in other brain regions as well (PMID: 22884327, 26601124). Differences in transcriptional programs, protein expression, neuronal morphology, synaptic inputs and plasticity state make up a non-exhaustive set of variables that will impact the physiological function of a neuron, both in terms of the direct but also indirect consequences of an ASD risk gene mutation. To better address this important question, future studies would benefit from a systematic approach to assessing physiological changes in a given ASD mouse model, both across development and across brain regions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for the detailed and constructive reviews. We revised the paper accordingly, and a point-by-point reply appears below. The main changes are:

      • An extended discussion section that places our work in context with other related developments in theory and modeling.

      • A new results section that demonstrates a substantial improvement in performance from a non-linear activation function. This led to addition of a co-author.

      • The mathematical proof that the resolvent of the adjacency matrix leads to the shortest path distances has been moved to a separate article, available as a preprint and attached to this resubmission. This allows us to present that work in the context of graph theory, and focus the present paper on neural modeling.

      Reviewer #1 (Public Review):

      This paper presents a highly compelling and novel hypothesis for how the brain could generate signals to guide navigation towards remembered goals. Under this hypothesis, which the authors call "Endotaxis", the brain co-opts its ancient ability to navigate up odor gradients (chemotaxis) by generating a "virtual odor" that grows stronger the closer the animal is to a goal location. This idea is compelling from an evolutionary perspective and a mechanistic perspective. The paper is well-written and delightful to read.

      The authors develop a detailed model of how the brain may perform "Endotaxis", using a variety of interconnected cell types (point, map, and goal cells) to inform the chemotaxis system. They tested the ability of this model to navigate in several state spaces, representing both physical mazes and abstract cognitive tasks. The Endotaxis model performed reasonably well across different environments and different types of goals.

      The authors further tested the model using parameter sweeps and discovered a critical level of network gain, beyond which task performance drops. This critical level approximately matched analytical derivations.

      My main concern with this paper is that the analysis of the critical gain value (gamma_c) is incomplete, making the implications of these analyses unclear. There are several different reasonable ways in which the Endotaxis map cell representations might be normalized, which I suspect may lead to different results. Specifically, the recurrent connections between map cells may either be an adjacency matrix, or a normalized transition matrix. In the current submission, the recurrent connections are an unnormalized adjacency matrix. In a previous preprint version of the Endotaxis manuscript, the recurrent connections between the map cells were learned using Oja's rule, which results in a normalized state-transition matrix (see "Appendix 5: Endotaxis model and the successor representation" in "Neural learning rules for generating flexible predictions and computing the successor representation", your reference 17). The authors state "In summary, this sensitivity analysis shows that the optimal parameter set for endotaxis does depend on the environment". Is this statement, and the other conclusions of the sensitivity analysis, still true if the learned recurrent connections are a properly normalized state-transition matrix?

      Yes, this is an interesting topic. In v.1 of our bioRxiv preprint we used Oja’s rule for learning, which will converge on a map connectivity that reflects the transition probabilities. The matrix M becomes a left-normalized or right-normalized stochastic matrix, depending on whether one uses the pre-synaptic or the post-synaptic version of Oja’s rule. This is explained well in Appendix 5 of Fang 2023.

      In the present version of the model we use a rule that learns the adjacency matrix A, not the transition matrix T. The motivation is that we want to explain instances of oneshot learning, where an agent acquires a route after traversing it just once. For example, we had found experimentally that mice can execute a complex homing route on the first attempt.

      An agent can establish whether two nodes are connected (adjacency) the very first time it travels from one node to the other. Whereas it can evaluate the transition probability for that link only after trying this and all the other available links on multiple occasions. Hence the normalization terms in Oja’s rule, or in the rule used by Fang 2023, all involve some time-averaging over multiple visits to the same node. This implements a gradual learning process over many experiences, rather than a one-shot acquisition on the first experience.

      Still one may ask whether there are advantages to learning the transition matrix rather than the adjacency matrix. We looked into this with the following results:

      • The result that (1/γ − A)−1 is monotonically related to the graph distances D in the limit of small γ (a proof now moved to the Meister 2023 preprint) , holds also for the transition matrix T. The proof follows the same steps. So in the small gain limit, the navigation model would work with T as well.

      • If one uses the transition matrix to compute the network output (1/γ − T)-1 then the critical gain value is γc = 1. It is well known that the largest eigenvalue of any Markov transition matrix is 1, and the critical gain γc is the inverse of that. This result is independent of the graph. So this offers the promise that the network could use the same gain parameter γ regardless of the environment.

      • In practice, however, the goal signal turned out to be less robust when based on T than when based on A. We illustrate this with the attached Author response image 1. This replicates the analysis in Figure 3 of the manuscript, using the transition matrix instead of the adjacency matrix. Some observations:

      • Panel B: The goal signal follows an exponential dependence on graph distance much more robustly for the model with A than with T. This holds even for small gain values where the exponential decay is steep.

      • Panel C: As one raises the gain closer to the critical value, the goal signal based on T scatters much more than when based on A.

      • Panels D, E: Navigation based on A works better than based on T. For example, using the highest practical gain value, and a readout noise of ϵ = 0.01, navigation based on T has a range of only 8 steps on this graph, whereas navigation based on A ranges over 12 steps, the full size of this graph.

      We have added a section “Choice of learning rule” to explain this. The Author response image 1 is part of the code notebook on Github.

      Author response image 1.

      Overall, this paper provides a very compelling model for how neural circuits may have evolved the ability to navigate towards remembered goals, using ancient chemotaxis circuits.

      This framework will likely be very important for understanding how the hippocampus (and other memory/navigation-related circuits) interfaces with other processes in the brain, giving rise to memory-guided behavior.

      Reviewer #2 (Public Review):

      The manuscript presents a computational model of how an organism might learn a map of the structure of its environment and the location of valuable resources through synaptic plasticity, and how this map could subsequently be used for goal-directed navigation.

      The model is composed of 'map cells', which learn the structure of the environment in their recurrent connections, and 'goal-cell' which stores the location of valued resources with respect to the map cell population. Each map cell corresponds to a particular location in the environment due to receiving external excitatory input at this location. The synaptic plasticity rule between map cells potentiates synapses when activity above a specified threshold at the pre-synaptic neuron is followed by above-threshold activity at the post-synaptic neuron. The threshold is set such that map neurons are only driven above this plasticity threshold by the external excitatory input, causing synapses to only be potentiated between a pair of map neurons when the organism moves directly between the locations they represent. This causes the weight matrix between the map neurons to learn the adjacency for the graph of locations in the environment, i.e. after learning the synaptic weight matrix matches the environment's adjacency matrix. Recurrent activity in the map neuron population then causes a bump of activity centred on the current location, which drops off exponentially with the diffusion distance on the graph. Each goal cell receives input from the map cells, and also from a 'resource cell' whose activity indicates the presence or absence of a given values resource at the current location. Synaptic plasticity potentiates map-cell to goal-cell synapses in proportion to the activity of the map cells at time points when the resource cell is active. This causes goal cell activity to increase when the activity of the map cell population is similar to the activity where the resource was obtained. The upshot of all this is that after learning the activity of goal cells decreases exponentially with the diffusion distance from the corresponding goal location. The organism can therefore navigate to a given goal by doing gradient ascent on the activity of the corresponding goal cell. The process of evaluating these gradients and using them to select actions is not modelled explicitly, but the authors point to the similarity of this mechanism to chemotaxis (ascending a gradient of odour concentration to reach the odour source), and the widespread capacity for chemotaxis in the animal kingdom, to argue for its biological plausibility.

      The ideas are interesting and the presentation in the manuscript is generally clear. The two principle limitations of the manuscript are: i) Many of the ideas that the model implements have been explored in previous work. ii) The mapping of the circuit model onto real biological systems is pretty speculative, particularly with respect to the cerebellum.

      Regarding the novelty of the work, the idea of flexibly navigating to goals by descending distance gradients dates back to at least Kaelbling (Learning to achieve goals, IJCAI, 1993), and is closely related to both the successor representation (cited in manuscript) and Linear Markov Decision Processes (LMDPs) (Piray and Daw, 2021, https://doi.org/ 10.1038/s41467-021-25123-3, Todorov, 2009 https://doi.org/10.1073/pnas.0710743106). The specific proposal of navigating to goals by doing gradient descent on diffusion distances, computed as powers of the adjacency matrix, is explored in Baram et al. 2018 (https://doi.org/10.1101/421461), and the idea that recurrent neural networks whose weights are the adjacency matrix can compute diffusion distances are explored in Fang et al. 2022 (https://doi.org/10.1101/2022.05.18.492543). Similar ideas about route planning using the spread of recurrent activity are also explored in Corneil and Gerstner (2015, cited in manuscript). Further exploration of this space of ideas is no bad thing, but it is important to be clear where prior literature has proposed closely related ideas.

      We have added a discussion section on “Theories and models of spatial learning” with a survey of ideas in this domain and how they come together in the Endotaxis model.

      Regarding whether the proposed circuit model might plausibly map onto a real biological system, I will focus on the mammalian brain as I don't know the relevant insect literature. It was not completely clear to me how the authors think their model corresponds to mammalian brain circuits. When they initially discuss brain circuits they point to the cerebellum as a plausible candidate structure (lines 520-546). Though the correspondence between cerebellar and model cell types is not very clearly outlined, my understanding is they propose that cerebellar granule cells are the 'map-cells' and Purkinje cells are the 'goal-cells'. I'm no cerebellum expert, but my understanding is that the granule cells do not have recurrent excitatory connections needed by the map cells. I am also not aware of reports of place-field-like firing in these cell populations that would be predicted by this correspondence. If the authors think the cerebellum is the substrate for the proposed mechanism they should clearly outline the proposed correspondence between cerebellar and model cell types and support the argument with reference to the circuit architecture, firing properties, lesion studies, etc.

      On further thought we agree that the cerebellum-like circuits are not a plausible substrate for the endotaxis algorithm. The anatomy looks compelling, but plasticity at the synapse is anti-hebbian, and - as the reviewer points out - there is little evidence for recurrence among the inputs. We changed the discussion text accordingly.

      The authors also discuss the possibility that the hippocampal formation might implement the proposed model, though confusingly they state 'we do not presume that endotaxis is localized to that structure' (line 564).

      We have removed that confusing bit of text.

      A correspondence with the hippocampus appears more plausible than the cerebellum, given the spatial tuning properties of hippocampal cells, and the profound effect of lesions on navigation behaviours. When discussing the possible relationship of the model to hippocampal circuits it would be useful to address internally generated sequential activity in the hippocampus. During active navigation, and when animals exhibit vicarious trial and error at decision points, internally generated sequential activity of hippocampal place cells appears to explore different possible routes ahead of the animal (Kay et al. 2020, https://doi.org/10.1016/j.cell.2020.01.014, Reddish 2016, https:// doi.org/10.1038/nrn.2015.30). Given the emphasis the model places on sampling possible future locations to evaluate goal-distance gradients, this seems highly relevant.

      In our model, the possible future locations are sampled in real life, with the agent moving there or at least in that direction, e.g. via VTE movements. In this simple form the model has no provision for internal planning, and the animal never learns any specific route sequence. One can envision extending such a model with some form of sequence learning that would then support an internal planning mechanism. We mention this in the revised discussion section, along with citation of these relevant articles.

      Also, given the strong emphasis the authors place on the relationship of their model to chemotaxis/odour-guided navigation, it would be useful to discuss brain circuits involved in chemotaxis, and whether/how these circuits relate to those involved in goal-directed navigation, and the proposed model.

      The neural basis of goal-directed navigation is probably best understood in the insect brain. There the locomotor decisions seem to be initiated in the central complex, whose circuitry is getting revealed by the fly connectome projects. This area receives input from diverse sensory areas that deliver the signal on which the decisions are based. That includes the mushroom body, which we argue has the anatomical structure to implement the endotaxis algorithm. It remains a mystery how the insect chooses a particular goal for pursuit via its decisions. It could be revealing to force a change in goals (the mode switch in the endotaxis circuit) while recording from brain areas like the central complex. Our discussion now elaborates on this.

      Finally, it would be useful to clarify two aspects of the behaviour of the proposed algorithm:

      1) When discussing the relationship of the model to the successor representation (lines 620-627), the authors emphasise that learning in the model is independent of the policy followed by the agent during learning, while the successor representation is policy dependent. The policy independence of the model is achieved by making the synapses between map cells binary (0 or 1 weight) and setting them to 1 following a single transition between two locations. This makes the model unsuitable for learning the structure of graphs with probabilistic transitions, e.g. it would not behave adaptively in the widely used two-step task (Daw et al. 2011, https://doi.org/10.1016/ j.neuron.2011.02.027) as it would fail to differentiate between common and rare transitions. This limitation should be made clear and is particularly relevant to claims that the model can handle cognitive tasks in general. It is also worth noting that there are algorithms that are closely related to the successor representation, but which learn about the structure of the environment independent of the subjects policy, e.g. the work of Kaelbling which learns shortest path distances, and the default representation in the work of Piray and Daw (both referenced above). Both these approaches handle probabilistic transition structures.

      Yes. Our problem statement assumes that the environment is a graph with fixed edge weights. The revised text mentions this and other assumptions in a new section “Choice of learning rule”.

      2) As the model evaluates distances using powers of adjacency matrix, the resulting distances are diffusion distances not shortest path distances. Though diffusion and shortest path distances are usually closely correlated, they can differ systematically for some graphs (see Baram et al. ci:ted above).

      The recurrent network of map cells implements a specific function of the adjacency matrix, namely the resolvent (Eqn 7). We have a mathematical proof that this function delivers the shortest graph distances exactly, in the limit of small gain (γ in Eqn 7), and that this holds true for all graphs. For practical navigation in the presence of noise, one needs to raise the gain to something finite. Figure 3 analyzes how this affects deviations from the shortest graph distance, and how nonetheless the model still supports effective navigation over a surprising range. The mathematical details of the proof and further exploration of the resolvent distance at finite gain have been moved to a separate article, which is cited from here, and attached to the submission. The preprint by Baram et al. is cited in that article.

      Reviewer #3 (Public Review):

      This paper argues that it has developed an algorithm conceptually related to chemotaxis that provides a general mechanism for goal-directed behaviour in a biologically plausible neural form.

      The method depends on substantial simplifying assumptions. The simulated animal effectively moves through an environment consisting of discrete locations and can reliably detect when it is in each location. Whenever it moves from one location to an adjacent location, it perfectly learns the connectivity between these two locations (changes the value in an adjacency matrix to 1). This creates a graph of connections that reflects the explored environment. In this graph, the current location gets input activation and this spreads to all connected nodes multiplied by a constant decay (adjusted to the branching number of the graph) so that as the number of connection steps increases the activation decreases. Some locations will be marked as goals through experiencing a resource of a specific identity there, and subsequently will be activated by an amount proportional to their distance in the graph from the current location, i.e., their activation will increase if the agent moves a step closer and decrease if it moves a step further away. Hence by making such exploratory movements, the animal can decide which way to move to obtain a specified goal.

      I note here that it was not clear what purpose, other than increasing the effective range of activation, is served by having the goal input weights set based on the activation levels when the goal is obtained. As demonstrated in the homing behaviour, it is sufficient to just have a goal connected to a single location for the mechanism to work (i.e., the activation at that location increases if the animal takes a step closer to it); and as demonstrated by adding a new graph connection, goal activation is immediately altered in an appropriate way to exploit a new shortcut, without the goal weights corresponding to this graph change needing to be relearnt.

      As the reviewer states, allowing a graded strengthening of multiple synapses from the map cells increases the effective range of the goal signal. We have now confirmed this in simulations. For example, in the analysis of Fig 3E, a single goal synapse enables perfect navigation only over a range of 7 steps, whereas the distributed goal synapses allow perfect navigation over the full 12 steps. This analysis is included in the code notebook on Github.

      Given the abstractions introduced, it is clear that the biological task here has been reduced to the general problem of calculating the shortest path in a graph. That is, no real-world complications such as how to reliably recognise the same location when deciding that a new node should be introduced for a new location, or how to reliably execute movements between locations are addressed. Noise is only introduced as a 1% variability in the goal signal. It is therefore surprising that the main text provides almost no discussion of the conceptual relationship of this work to decades of previous work in calculating the shortest path in graphs, including a wide range of neural- and hardwarebased algorithms, many of which have been presented in the context of brain circuits.

      The connection to this work is briefly made in appendix A.1, where it is argued that the shortest path distance between two nodes in a directed graph can be calculated from equation 15, which depends only on the adjacency matrix and the decay parameter (provided the latter falls below a given value). It is not clear from the presentation whether this is a novel result. No direct reference is given for the derivation so I assume it is novel. But if this is a previously unknown solution to the general problem it deserves to be much more strongly featured and either way it needs to be appropriately set in the context of previous work.

      As far as we know this proposal for computing all-pairs-shortest-path is novel. We could not find it in textbooks or an extended literature search. We have discussed it with two graph theorist colleagues, who could not recall seeing it before, although the proof of the relationship is elementary. Inspired by the present reviewer comment, we chose to publish the result in a separate article that can focus on the mathematics and place it in the appropriate context of prior work in graph theory. For related work in the area of neural modeling please see our revised discussion section.

      Once this principle is grasped, the added value of the simulated results is somewhat limited. These show: 1) in practical terms, the spreading signal travels further for a smaller decay but becomes erratic as the decay parameter (map neuron gain) approaches its theoretical upper bound and decreases below noise levels beyond a certain distance. Both follow the theory. 2) that different graph structures can be acquired and used to approach goal locations (not surprising) .3) that simultaneous learning and exploitation of the graph only minimally affects the performance over starting with perfect knowledge of the graph. 4) that the parameters interact in expected ways. It might have been more impactful to explore whether the parameters could be dynamically tuned, based on the overall graph activity.

      This is a good summary of our simulation results, but we differ in the assessment of their value. In our experience, simulations can easily demolish an idea that seemed wonderful before exposure to numerical reality. For example, it is well known that one can build a neural integrator from a recurrent network that has feedback gain of exactly 1. In practical simulations, though, these networks tend to be fickle and unstable, and require unrealistically accurate tuning of the feedback gain. In our case, the theory predicts that there is a limited range of gains that should work, below the critical value, but large enough to avoid excessive decay of the signal. Simulation was needed to test what this practical range was, and we were pleasantly surprised that it is not ridiculously small, with robust navigation over a 10-20% range. Similarly, we did not predict that the same parameters would allow for effective acquisition of a new graph, learning of targets within the graph, and shortest-route navigation to those targets, without requiring any change in the operation of the network.

      Perhaps the most biologically interesting aspect of the work is to demonstrate the effectiveness, for flexible behaviour, of keeping separate the latent learning of environmental structure and the association of specific environmental states to goals or values. This contrasts (as the authors discuss) with the standard reinforcement learning approach, for example, that tries to learn the value of states that lead to reward. Examples of flexibility include the homing behaviour (a goal state is learned before any of the map is learned) and the patrolling behaviour (a goal cell that monitors all states for how recently they were visited). It is also interesting to link the mechanism of exploration of neighbouring states to observed scanning behaviours in navigating animals.

      The mapping to brain circuits is less convincing. Specifically, for the analogy to the mushroom body, it is not clear what connectivity (in the MB) is supposed to underlie the graph structure which is crucial to the whole concept. Is it assumed that Kenyon cell connections perform the activation spreading function and that these connections are sufficiently adaptable to rapidly learn the adjacency matrix? Is there any evidence for this?

      Yes, there is good evidence for recurrent synapses among Kenyon cells (map cells in the model), and for reward-gated synaptic plasticity at the synapses onto mushroom body output cells (goal cells in our model). We have expanded this material in the discussion section. Whether those functions are sufficient to learn the structure of a spatial environment has not been explored; we hope our paper might give an impetus, and are exploring behavioral experiments on flies with colleagues.

      As discussed above, the possibility that an algorithm like 'endotaxis' could explain how the rodent place cell system could support trajectory planning has already been explored in previous work so it is not clear what additional insight is gained from the current model.

      Please see our revised discussion section on “theories and models of spatial learning”. In short, some ingredients of the model have appeared in prior work, but we believe that the present formulation offers an unexpectedly simple end-to-end solution for all components of navigation: exploration, target learning, and goal seeking.

      Reviewer #1 (Recommendations For The Authors):

      Major concern:

      See the public review. How do the results change depending on whether the recurrent connections between map cells are an adjacency matrix vs. a properly normalized statetransition matrix? I'm especially asking about results related to critical gain (gamma_c), and the dependence of the optimal parameter values on the environment.

      Please see our response above including the attached reviewer figure.

      Minor concerns:

      It is not always clear when the learning rule is symmetric vs asymmetric (undirected vs directed graph), and it seems to switch back and forth. For example, line 127 refers to a directed graph; Fig 2B and the intro describe symmetric Hebbian learning. Most (all?) of the simulations use the symmetric rule. Please make sure it's clear.

      For simplicity we now use a symmetric rule throughout, as is appropriate for undirected graphs. We mention that a directed learning rule could be used to learn directed graphs. See the section on “choice of learning rule”. M_ij is not defined when it's first introduced (eq 4). Consider labeling the M's and the G's in Fig 2.

      Done.

      The network gain factor (gamma, eq 4) is distributed over both external and recurrent inputs (v = gamma(u + Mv)), instead of local to the recurrent weights like in the Successor Representation. This notational choice is obviously up to the authors. I raise slight concern for two reasons -- first, distributing gamma may affect some of the parameter sweep results (see major concern), and second, it may be confusing in light of how gamma is used in the SR literature (see reviewer's paper for the derivation of how SR is computed by an RNN with gain gamma).

      In our model, gamma represents the (linear) activation function of the map neuron, from synaptic input to firing output. Because the synaptic input comes from point cells and also from other map cells, the gain factor is applied to both. See for example the Dayan & Abbott book Eqn 7.11, which at steady state becomes our Eqn 4. In the formalism of Fang 2023 (Eqn 2), the factor γ is only applied to the recurrent synaptic input J ⋅ f, but somehow not to the place cell input ϕ. Biophysically, one could imagine applying the variable gain only to the recurrent synapses and not the feed-forward ones. Instead we prefer to think of it as modulating the gain of the neurons, rather than the synapses. The SR literature follows conventions from the early reinforcement learning papers, which were unconstrained by thinking about neurons and synapses. We have added a footnote pointing the reader to the uses of γ in different papers.

      In eq 13, and simulations, noise is added to the output only, not to the activity of recurrently connected neurons. It is possible this underestimates the impact of noise since the same magnitude of noise in the recurrent network (map cells) could have a compounded effect on the output.

      Certainly. The equivalent output noise represents the cumulative effect of noise everywhere in the network. We argue that a cumulative effect of 1% is reasonable given the overall ability of animals at stimulus discrimination, which is also limited by noise everywhere in the network. This has been clarified in the text.

      Fig 3 E, F, it looks like the navigated distance may be capped. I ask because the error bars for graph distance = 12 are so small/nonexistent. If it's capped, this should be in the legend.

      Correct. 12 is the largest distance on this graph. This has been added to the caption.

      Fig 3D legend, what does "navigation failed" mean? These results are not shown.

      On those occasions the agent gets trapped at a local maximum of the goal signal other than the intended goal. We have removed that line as it is not needed to interpret the data.

      Line 446, typo (Lateron).

      Fixed.

      Line 475, I'm a bit confused by the discussion of birds and bats. Bird behavior in the real world does involve discrete paths between points. Even if they theoretically could fly between any points, there are costs to doing so, and in practice, they often choose discrete favorite paths. It is definitely plausible that animals that can fly could also employ Endotaxis, so it is confusing to suggest they don't have the right behavior for Endotaxis, especially given the focus on fruit flies later in the discussion.

      Good points, we removed that remark. Regarding fruit flies, they handle much important business while walking, such as tracking a mate, fighting rivals over food, finding a good oviposition site.

      Section 9.3, I'm a bit confused by the discussion of cerebellum-like structures, because I don't think they have as dense recurrent connections as needed for the map cells in Endotaxis. Are you suggesting they are analogous to the output part of Endotaxis only, not the whole thing?

      Please see our reply in the public review. We have removed this discussion of cerebellar circuits.

      Line 541, "After sufficient exploration...", clarify that this is describing learning of just the output synapses, not the recurrent connections between map cells?

      We have revised this entire section on the arthropod mushroom body.

      In lines 551-556, the discussion is confusing and possibly not consistent with current literature. How can a simulation prove that synapses in the hippocampus are only strengthened among immediately adjacent place fields? I'd suggest either removing this discussion or adding further clarification. More broadly, the connection between Endotaxis and the hippocampus is very compelling. This might also be a good point to bring up BTSP (though you do already bring it up later).

      As suggested, we removed this section.

      Line 621 "The successor representation (at least as currently discussed) is designed to improve learning under a particular policy" That's not actually accurate. Ref 17 (reviewer's manuscript, cited here) is not policy-specific, and instead just learns the transition statistics experienced by the animal, using a biologically plausible learning rule that is very similar to the Endotaxis map cell learning rule (see our Appendix 5, comparing to Endotaxis, though that was referencing the previous version of the Endotaxis preprint where Oja's rule was used).

      We have edited this section in the discussion and removed the reference to policyspecific successor representations.

      Line 636 "Endotaxis is always on" ... this was not clear earlier in the paper (e.g. line 268, and the separation of different algorithms, and "while learning do" in Algorithm 2).

      The learning rules are suspended during some simulations so we can better measure the effects of different parts of endotaxis, in particular learning vs navigating. There is no interference between these two functions, and an agent benefits from having the learning rules on all the time. The text now clarifies this in the relevant sections.

      Section 9.6, I like the idea of tracing different connected functions. But when you say "that could lead to the mode switch"... I'm a bit confused about what is meant here. A mode switch doesn't need to happen in a different brain area/network, because winnertake-all could be implemented by mutual inhibition between the different goal units.

      This is an interesting suggestion for the high-level control algorithm. A Lorenzian view is that the animal’s choice of mode depends on internal states or drives, such as thirst vs hunger, that compete with each other. In that picture the goal cells represent options to be pursued, whereas the choice among the options occurs separately. But one could imagine that the arbitrage between drives happens through a competition at the level of goal cells: For example the consumption of water could lead to adaptation of the water cell, such that it loses out in the winner-take-all competition, the food cell takes over, and the mouse now navigates towards food. In this closed-loop picture, the animal doesn’t have to “know” what it wants at any given time, it just wants the right thing. This could eliminate the homunculus entirely! Of course this is all a bit speculative. We have edited the closing comments in a way that leaves open this possibility.

      Line 697-704, I need more step-by-step explanation/derivation.

      We now derive the properties of E step by step starting from Eqn (14). The proof that leads to Eqn 14 is now in a separate article (available as a preprint and attached to this submission).

      Reviewer #3 (Recommendations For The Authors):

      • Please include discussion and comparison to previous work of graph-based trajectory planning using spreading activation from the current node and/or the goal node. Here is a (far from comprehensive) list of papers that present similar algorithms:

      Glasius, R., Komoda, A., & Gielen, S. C. (1996). A biologically inspired neural net for trajectory formation and obstacle avoidance. Biological Cybernetics, 74(6), 511-520.

      Gaussier, P., Revel, A., Banquet, J. P., & Babeau, V. (2002). From view cells and place cells to cognitive map learning: processing stages of the hippocampal system. Biological cybernetics, 86(1), 15-28.

      Gorchetchnikov A, Hasselmo ME. A biophysical implementation of a bidirectional graph search algorithm to solve multiple goal navigation tasks. Connection Science. 2005;17(1-2):145-166

      Martinet, L. E., Sheynikhovich, D., Benchenane, K., & Arleo, A. (2011). Spatial learning and action planning in a prefrontal cortical network model. PLoS computational biology, 7(5), e1002045.

      Ponulak, F., & Hopfield, J. J. (2013). Rapid, parallel path planning by propagating wavefronts of spiking neural activity. Frontiers in computational neuroscience, 7, 98.

      Khajeh-Alijani, A., Urbanczik, R., & Senn, W. (2015). Scale-free navigational planning by neuronal traveling waves. PloS one, 10(7), e0127269.

      Adamatzky, A. (2017). Physical maze solvers. All twelve prototypes implement 1961 Lee algorithm. In Emergent computation (pp. 489-504). Springer, Cham.

      Please see our reply to the public review above, and the new discussion section on “Theories and models of spatial learning”, which cites most of these papers among others.

      • Please explain, if it is the case, why the goal cell learning (other than a direct link between the goal and the corresponding map location) and calculation of the overlapping 'goal signal' is necessary, or at least advantageous.

      Please see our reply in the public review above.

      • Map cells are initially introduced (line 84) as getting input from "only one or a few point cells". The rest of the paper seems to assume only one. Does it work when this is 'a few'? Does it matter that 'a few' is an option?

      We simplified the text here to “only one point cell”. A map cell with input from two distant locations creates problems. After learning the map synapses from adjacencies in the environment, the model now “believes” that those two locations are connected. This distorts the graph on which the graph distances are computed and introduces errors in the resulting goal signals. One can elaborate the present toy model with a much larger population of map cells that might convey more robustness, but that is beyond our current scope.

      • (line 539 on) Please explain what feature in the mushroom body (or other cerebellumlike) circuits is proposed to correspond to the learning of connections in the adjacency matrix in the model.

      Please see our response to this critique in the public review above. In the mushroom body, the Kenyon cells exhibit sparse responses and are recurrently connected. These would correspond to map cells in Endotaxis. For vertebrate cerebellum-like circuits, the correspondence is less compelling, and we have removed this topic from the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Lu et. al. proposed here a direct role of LPS in inducing hepatic fat accumulation and that the metabolism of LPS therefore can mitigate fatty liver injury. With an Acyloxyacyl hydrolase whole-body KO mice, they demonstrated that Acyloxyacyl hydrolase deletion resulted in higher hepatic fat accumulation over 8 months of high glucose/high fructose diet. Previous literature has found that hepatocyte TLR4 (which is a main receptor for binding LPS) KO reduced fatty liver in the MAFLD model, and this paper complements this by showing that degradation/metabolism of LPS can also reduce fatty liver. This result proposed a very interesting mechanism and the translational implications of utilizing Acyloxyacyl hydrolase to decrease LPS exposure are intriguing.

      The strengths of the present study include that they raised a very simplistic mechanism with LPS that is of interest in many diseases. The phenotype shown in the study is strong. The mechanism proposed by the findings is generally well supported.

      There are also several shortcomings in the findings of this study. As AOAH is a whole-body KO, the source production of AOAH in MAFLD is unclear. Although the authors used published single-cell RNA-seq data and flow-isolated liver cells, physiologically LPS degradation could occur in the blood or the liver. The authors linked LPS to hepatocyte fatty acid oxidation via SREBP1. The mechanism is not explored in great depth. Is this signaling TLR4? In this model, LPS could activate macrophages and mediate the worsening of hepatocyte fatty liver injury via the paracrine effect instead of directly signaling to hepatocytes, thus it is not clear that this is a strictly hepatocyte LPS effect. It would also be very interesting to see if administration of the AOAH enzyme orally could mitigate MAFLD injury. Overall, this work will add to the current understanding of the gut-liver axis and development of MAFLD and will be of interest to many readers.

      We thank the reviewers for their important questions and comments.

      In previous studies we found that AOAH is expressed in Kupffer cells and dendritic cells cells (Shao et al., 2007). Single-cell RNAseq analysis of mouse livers by others has found AOAH in Kupffer cells, monocytes, NK cells and ILC1 cells (Remmerie et al.,2020). We also analyzed human liver single-cell RNAseq data and found that AOAH is expressed in monocytes, macrophages, resident and circulating NK cells, and some T cells (Ramachandran et al., 2019) (Please see new Figure 3E). Using clodronate-liposomes to deplete Kupffer cells we found that hepatic AOAH mRNA diminished and nSREBP1 increased (Please see new Figure 5D). These results suggest that Kupffer cells are the major source of AOAH in the liver and that LPS needs to be inactivated in the liver to prevent hepatocyte lipid accumulation.

      Using primary hepatocyte culture, we found that LPS can stimulate hepatocytes directly to induce mTOR activation and SREBP1 activation (new Figure 6E). Adding purified Kupffer cells to the hepatocyte culture did not further increase SREBP1 activation. These results suggest that LPS may directly stimulate hepatocyte to accumulate fat, at least in vitro.

      Both TLR4 and caspase 11 are reported to play important roles in MASLD development (Sharifnia et al., 2015; Zhu et al., 2021). We have crossed Aoah<sup>-/-</sup> mice with TLR4<sup>-/-</sup> mice and found that Aoah<sup>-/-</sup>TLR4<sup>-/-</sup> and Aoah<sup>-/-</sup> mice had similarly severe MASLD. This is probably because TLR4 is required for gut homeostasis (Rakoff-Nahoum et al., 2004); in TLR4 whole-body KO mice compromised gut homeostasis may result in more severe MASLD. By specifically deleting TLR4 on hepatocytes, Yu et al found that NASH-induced fibrosis was mitigated (Yu et al., 2021). In future studies we therefore would need to specifically delete TLR4 in hepatocytes to test whether excessive gut-derived LPS in Aoah<sup>-/-</sup> mice stimulates hepatic TLR4 to induce more severe MASLD. We would also test whether Caspase 11 is required for hepatic fat accumulation in Aoah<sup>-/-</sup> mice.

      It is intriguing to test whether providing exogenous AOAH may mitigate MASLD. We will use an AAV expressing AOAH to test this idea.

      Reviewer #2 (Public review):

      The authors of this article investigated the impact of the host enzyme AOAH on the progression of MASLD in mice. To achieve this, they utilized whole-body Aoah<sup>-/-</sup> mice. The authors demonstrated that AOAH reduced LPS-induced lipid accumulation in the liver, probably by decreasing the expression and activation of SREBP1. In addition, AOAH reduced hepatic inflammation and minimized tissue damage.

      However, this paper is descriptive without a clear mechanistic study. Another major limitation is the use of whole-body KO mice so the cellular source of the enzyme remains undefined. Moreover, since LPS-mediated SREBP1 regulation or LPS-mediated MASLD progression is already documented, the role of AOAH in SREBP1-dependent lipid accumulation and MASLD progression is largely expected.

      Specific comments:

      (1) The overall human relevance of the current study remains unclear.

      It is a good point. We have studied human relevance and show the results in Figure 3E. AOAH expression increased in the hepatic macrophages and monocytes of MASLD patients.

      (2) Is AOAH secreted from macrophages or other immune cells? Are there any other functions of AOAH within the cells?

      AOAH can be secreted from kidney proximal tubule cells and the released AOAH can be taken up by cells that do not express AOAH (Feulner et al., 2004). AOAH can also deacylate oxidized phospholipids, DAMP molecules (Zou et al., 2021).

      (3) Due to using whole-body KO mice, the role of AOAH in specific cell types was unclear in this study, which is one of the major limitations of this study. The authors should at least conduct in vitro experiments using a co-culture system of hepatocytes and Kupffer cells (or other immune cells) isolated from WT or Aoah<sup>-/-</sup> mice.

      Thanks for the suggestion.

      Using clodronate-liposomes, we depleted Kupffer cells and found that hepatic AOAH mRNA diminished and nSREBP1 increased in the liver (Please see new Figure 5D). These results confirm that Kupffer cells are the major source of AOAH in the liver and LPS needs to be inactivated in the liver to prevent hepatocyte lipid accumulation.  Using primary hepatocyte culture, we found that LPS can stimulate hepatocytes directly to induce mTOR activation and SREBP1 activation (new Figure 6E).  These results suggest that LPS may directly stimulate hepatocytes to accumulate fat, at least in vitro.

      (4) It has been well-known that intestinal tight junction permeability is increased by LPS or inflammatory cytokines. However, in Figure 3E, intestinal permeability is comparable between the groups in both diet groups. The authors should discuss more about this result. In addition, intestinal junctional protein should be determined by Western blot and IHC (or IF) to further confirm this finding.

      We have stained ZO-1 (Please see Author response image 1, ZO-1- green fluorescence) in Aoah<sup>+/+</sup> and Aoah<sup>-/-</sup> mouse colonic sections. We did not see a big difference between the two strains of mice.

      Author response image 1.

      Feeding a high fat diet in our mouse facility for 28 weeks has led to increased gut permeability, but there was no difference between Aoah<sup>+/+</sup> and Aoah<sup>-/-</sup>mice. Thus, the more severe MASLD in Aoah<sup>-/-</sup> mice is mainly caused by elevated bioactive LPS instead of increased LPS translocation from the intestine to the liver.

      (5) In Figure 6, the LPS i.g. Aoah<sup>-/-</sup> group is missing. This group should be included to better interpret the results.

      Please see new Figure 6. When we orally gavaged Aoah<sup>-/-</sup> mice with LPS, fecal LPS levels did not increase further. Their liver SREBP1 did not increase further while the SREBP1 target gene expression increased when compared with Aoah<sup>-/-</sup> mice i.g. PBS.

      (6) The term NAFLD has been suggested to be changed to MASLD as the novel nomenclature according to the guidelines of AASLD and EASL.

      Thanks for the suggestion. We have changed NAFLD to MASLD.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Consider using MAFLD rather than NAFLD.

      Thanks for the suggestion. We have changed NAFLD to MASLD.

      References

      Feulner, J.A., M. Lu, J.M. Shelton, M. Zhang, J.A. Richardson, and R.S. Munford. 2004. Identification of acyloxyacyl hydrolase, a lipopolysaccharide-detoxifying enzyme, in the murine urinary tract. Infection and immunity 72:3171-3178.

      Zou, B., M. Goodwin, D. Saleem, W. Jiang, J. Tang, Y. Chu, R.S. Munford, and M. Lu. 2021. A highly conserved host lipase deacylates oxidized phospholipids and ameliorates acute lung injury in mice. eLife 10:

    1. Author Response

      The following is the authors’ response to the current reviews.

      Comment 1: The descriptions about body weights should be matched.

      Regrettably, we did not monitor the body weights throughout the study. We have now revised the description clarifying the confusions. Importantly we evaluated the weights of the muscle (EDL and soleus) and heart tissues in 8-month-old mice (Fig. 1A).

      Comment 2: Quantitative data for figures.

      As stated in the manuscript, the presented images are representatives of at least three mice per genotype. However, assessing specific measurements such as cell sizes, diameters, or mitochondria sizes in histological tissue sections and electron microscopical fields is not feasible due to practical limitations. Unfortunately, we do not have access to specialized software for such analyses. While semi-quantification of Western blot bands is possible, implementing this for all Western blots in the manuscript would result in a substantial increase in the number of bar graphics. Below are Western blots from additional two pairs of mice used in all figures.

      Comment 3: Confusions about “total mitochondrial content”.

      The mitochondria content in cells was assessed by quantitatively comparing the DNA level of the mitochondrial gene cytochrome B to that of the nuclear gene 18S using quantitative PCR. This method is commonly used to determine the relative number of mitochondria in cells. However, we have revised and provided a clearer description in the figure legend to avoid any potential confusion.

      Comment 4: Suggestions on further analyses of PGC1-alpha and TFAM. LC3-I and -II.

      We evaluated LC3-I/II levels in PTPMT1 knockout muscles, and our findings did not indicate any signs of increased autophagic activity (Supplementary Figure S3). We will examine PGC1-alph and TFAM levels in our future studies. It is worth noting that in our previous RNA-seq analyses of PTPMT1 knockout hematopoietic cells, we did not observe any significant alterations in the expression levels of these two genes.

      Comment 5: Description on fibrotic lesions.

      Quantifying fibrotic areas poses a significant challenge. Therefore, we were only able to describe this finding.

      Comment 6: Fig 6 is not well organized and aligned.

      In response to your suggestion, we have reorganized this figure accordingly. Panels C, D, and E display mitochondrial OCR data derived from three biological replicates/genotype. We feel that these changes are sufficient to demonstrate the differences in substrate utilization between PTPMT1 knockout and control mitochondria.

      Comment 7: Descriptions on glucose oxidation and glycolysis in different types of muscle fibers are confusing

      We have followed the suggestions and revised the descriptions accordingly.

      Comment 8: A discussion about lactate utilization in cardiomyocytes would be helpful.

      Following this suggestion, we have now added a brief discussion.

      Comment 9: “Cropped” images were used in Fig 10.

      The images shown in Fig. 10 were not cropped images. In order to efficiently use the tissue and mitochondrial lysates, the Western blot membranes were intentionally cut into smaller fragments based on the molecular weights of the proteins to be detected. These smaller membrane sections were then employed for individual Western blotting purposes.

      Minor comment 1: The order of Fig 1 panels should be reorganized.

      Following this suggestion, we have now reorganized this figure.

      Minor comment 2: Suggestion for an Echocardiograph result table.

      These analyses were carried out by trained personnel at the Emory Animal Physiology Core. The data presented in our manuscript was provided by them. It is important to note that no additional parameters were measured beyond the data provided by the Core.

      Minor comment 3: Is ROS production increased in PTPMT1 knockout muscle cells?

      Yes, PTPMT1 knockout tissues showed elevated overall cellular ROS levels even at 3 months (Figure 6I).

      Minor comment 4: Typo in S10 legend.

      The typo has been corrected.


      The following is the authors’ response to the original reviews.

      Comment 1: The effects of PTPMT1 on the skeletal muscle and heart might be an embryonic defect. They might be mediated by significantly reduced mTOR signaling

      We acknowledge the valid point made by this reviewer. While both CKMM-Cre and Myh6Cre express Cre during the embryonic stage, we did not observe any developmental defects in skeletal muscle-specific (PTPMT1fl/fl/CKMM-Cre) or heart-specific (PTPMT1fl/fl/Myh6-Cre) knockout mice. These knockout mice appeared indistinguishable from their WT littermates until the age of 3-4 months.

      Morphologically, the skeletal muscle and heart dissected from these mice showed no abnormalities. Additionally, mitochondria isolated from these tissues did not exhibit any morphological/structural defects. Undoubtedly, the late-onset phenotypes observed in the knockout mice over time was attributed to the metabolic defects arising from the loss of PTPMT1 in the embryos. Although PTPMT1 knockout muscle cells and cardiomyocytes initially maintained energy homeostasis through enhanced fatty acid and glutamate oxidation, along with metabolic adaptations or activation of alternative energy-producing pathways in the first few months, they eventually encountered substantial energy deficits. This was attributed to the subsequent occurrence of oxidative stress and mitochondrial damage. In response to this valuable feedback, we have included a brief discussion in the manuscript's discussion section to address this point.

      As mentioned in the manuscript, the late-onset phenotypes observed in our study were likely a result of subsequent damages induced by prolonged metabolic substrate shift and lipid accumulation within the cells. We agree with the reviewer that decreased mTOR activities may also contribute to these late effects, and have included a brief discussion in the discussion section.

      Comment 2: Why are the effects of the loss of PTPMT1 similar in the skeletal muscle and heart.

      The depletion of PTPMT1 yields similar effects in both tissue types; however, the manifestations occur earlier in the skeletal muscle. Although mitochondria in the skeletal muscle and heart have distinct preferences for energy sources, prolonged forced utilization of fatty acids caused by PTPMT1 depletion eventually leads to lipid accumulation and cellular damage (lipotoxicity) in both tissue types. This phenomenon underscores the importance of maintaining a balance in substrate utilization to prevent adverse effects on cellular health in the skeletal muscle and heart.

      Comment 3: AMPK is activated in PTPMT1 knockout cardiomyocytes; this should have cardioprotective effects.

      AMPK can be activated through various mechanisms. In our study, AMPK activation occurs in response to energetic stress in late-stage PTPMT1 knockout tissues that displayed significantly reduced ATP levels, aligning with its role as a bioenergetic stress sensor. It is possible that AMPK activation alone was insufficient to overcome the secondary damages induced by the prolonged metabolic switch from carbohydrate metabolism to fatty acid metabolism.

      Comment 4: Knockout skeletal muscles and hearts had lipid accumulation; why were knockout mice smaller than controls? Are there any changes in white fat, core temperature or browning of fat? Rescue experiments should be considered to prove that lipid accumulation is the cause of death in the knockout mice.

      We believe that the lipid accumulation observed in muscle cells and cardiomyocytes of the knockout mice does not necessarily imply that these tissue-specific knockout mice would be heavier or have increased body fat. We appreciate the suggestions regarding energy expenditure tests and rescue experiments. We will certainly consider incorporating these experiments into our future study.

      As stated in the manuscript, we did not observe any morphological changes in white or brown fat tissues in the adipocyte-specific PTPMT1 knockout mice. Furthermore, we assessed body temperature and its response to a cold environment (4°C), and no differences were detected between the knockout mice and the control mice.

      Comment 5: Are there sex differences in muscle and heart phenotypes in the tissue specific knockout mice?

      We did not observe significant differences in phenotypes between male and female knockout mice.

      Comment 6: What happens to UCP2 activity in PTPMT1 deleted cells and what is its function in mediating AMPK and/mTOR regulation.

      Currently, there is a lack of direct methods available to measure UCP2 activity. The relationship between UCP2 and the regulation of AMPK and mTOR has not been extensively investigated.

      Comment 7: What is the effect of PTPMT1 deletion on cardiolipin synthesis?

      PTPMT1 has been implicated in both facilitating mitochondrial utilization of pyruvate and participating in the synthesis of cardiolipin. To investigate the impact of PTPMT1 knockout on cardiolipin levels, we plan to establish a mass spectrometry assay for the quantitative analysis of cardiolipin in knockout mitochondria. Completing these experiments might require a considerable amount of time. Nonetheless, we extensively addressed this point in the discussion section.

      Minor concerns:

      Comment 8: The title needs more specificity.

      As suggested, we have revised the title to "Loss of PTPMT1 restricts mitochondrial utilization of carbohydrates and induces muscle atrophy and heart failure in tissue-specific knockout mice".

      Comment 9: Heart and skeletal muscle weights in Fig 1A should be normalized against tibia length.

      Unfortunately, we did not perform normalization in this study. However, we appreciate the suggestion and will incorporate it into our future studies. It is important to note that the lengths of tibias in the knockout mice were only marginally shorter.

      Comment 10: Low magnification and longitudinal section of the muscle should be shown in Fig 1B and 2A.

      The histological images provide supporting evidence for the conclusion, despite not being optimal in quality. We acknowledge the suggested improvements and assure you that we will integrate them into our future studies. It is crucial to emphasize that each conclusion in this study was derived from multiple experimental designs, rather than solely relying on morphological changes.

      Comment 11: Fig 1F is mislabeled as 1G.

      We have conducted a thorough review and can confidently confirm that the labeling is correct.

      Comment 12: Fig 2F and 6B should be quantified.

      As indicated in the manuscript, the images presented are representatives of at least three mice per genotype. While semi-quantification of Western blot bands is possible, implementing this for all Western blots in the manuscript would result in a substantial increase in the number of bar graphics. Below are Western blot images from additional two pairs of mice included in Fig. 2F and Fig. 6B. Furthermore, Western blot images from two additional pairs of mice in other figures are also provided below.

      Author response image 1

      Western blotting data from additional two pairs of mice in Fig. 2F.

      Author response image 2

      Western blotting data from additional two pairs of mice in Fig. 6B.

      Author response image 3

      Western blotting data from additional two pairs of mice in Supplementary Fig. 2G.

      Author response image 4

      Western blotting data from additional two pairs of mice in Supplementary Fig. 3A.

      Author response image 5

      Western blotting data from additional two pairs of mice in Supplementary Fig. 3C.

      Author response image 6

      Western blotting data from additional two pairs of mice in Supplementary Fig. 3D.

      Author response image 7

      Western blotting data from additional two pairs of mice in Supplementary Fig. 4F.

      Author response image 8

      Western blotting data from additional two pairs of mice in

      Author response image 9

      Western blotting data from additional two pairs of mice in Supplementary Fig. 7C.

      Comment 13: Knockout mice should be placed on HFD or keto diet to test for the effects of PTPMT1 depletion.

      We appreciate this thoughtful suggestion. We will certainly incorporate this suggestion into our future studies, expanding beyond the scope of the current initial report.

      Comment 14: Suggestions on Fig 4A.

      Please see our response to Comment 10.

      Comment 15: Suggestions for improving echocardiographs.

      These analyses were conducted by trained personnel at the Emory Animal Physiology Core. The data presented in our manuscript was provided by them. We appreciate bringing the issues to our attention, and we will inform them accordingly.

      Comment 16: Comment on Fig 5B.

      The tissues were sectioned at comparable, if not identical, levels. WT and PTPMT1 knockout heart sections look dramatically different because of the dilated myopathy observed in the knockout hearts.

      Comment 17: Comment on Fig 5C.

      We believe the cell death occurred predominantly in cardiomyocytes.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Gating of Kv10 channels is unique because it involves coupling between non-domain swapped voltage sensing domains, a domain-swapped cytoplasmic ring assembly formed by the N- and C-termini, and the pore domain. Recent structural data suggests that activation of the voltage sensing domain relieves a steric hindrance to pore opening, but the contribution of the cytoplasmic domain to gating is still not well understood. This aspect is of particular importance because proteins like calmodulin interact with the cytoplasmic domain to regulate channel activity. The effects of calmodulin (CaM) in WT and mutant channels with disrupted cytoplasmic gating ring assemblies are contradictory, resulting in inhibition or activation, respectively. The underlying mechanism for these discrepancies is not understood. In the present manuscript, Reham Abdelaziz and collaborators use electrophysiology, biochemistry and mathematical modeling to describe how mutations and deletions that disrupt inter-subunit interactions at the cytoplasmic gating ring assembly affect Kv10.1 channel gating and modulation by CaM. In the revised manuscript, additional information is provided to allow readers to identify within the Kv10.1 channel structure the location of E600R, one of the key channel mutants analyzed in this study. However, the mechanistic role of the cytoplasmic domains that this study focuses on, as well as the location of the ΔPASCap deletion and other perturbations investigated in the study remain difficult to visualize without additional graphical information. This can make it challenging for readers to connect the findings presented in the study with a structural mechanism of channel function.

      The authors focused mainly on two structural perturbations that disrupt interactions within the cytoplasmic domain, the E600R mutant and the ΔPASCap deletion. By expressing mutants in oocytes and recording currents using Two Electrode Voltage-Clamp (TEV), it is found that both ΔPASCap and E600R mutants have biphasic conductance-voltage (G-V) relations and exhibit activation and deactivation kinetics with multiple voltage-dependent components. Importantly, the mutant-specific component in the G-V relations is observed at negative voltages where WT channels remain closed. The authors argue that the biphasic behavior in the G-V relations is unlikely to result from two different populations of channels in the oocytes, because they found that the relative amplitude between the two components in the G-V relations was highly reproducible across individual oocytes that otherwise tend to show high variability in expression levels. Instead, the G-V relations for all mutant channels could be well described by an equation that considers two open states O1 and O2, and a transition between them; O1 appeared to be unaffected by any of the structural manipulations tested (i.e. E600R, ΔPASCap, and other deletions) whereas the parameters for O2 and the transition between the two open states were different between constructs. The O1 state is not observed in WT channels and is hypothesized to be associated with voltage sensor activation. O2 represents the open state that is normally observed in WT channels and is speculated to be associated with conformational changes within the cytoplasmic gating ring that follow voltage sensor activation, which could explain why the mutations and deletions disrupting cytoplasmic interactions affect primarily O2. 

      Severing the covalent link between the voltage sensor and pore reduced O1 occupancy in one of the deletion constructs. Although this observation is consistent with the hypothesis that voltage-sensor activation drives entry into O1, this result is not conclusive. Structural as well as functional data has established that the coupling of the voltage sensor and pore does not entirely rely on the S4-S5 covalent linker between the sensor and the pore, and thus the severed construct could still retain coupling through other mechanisms, which is consistent with the prominent voltage dependence that is observed. If both states O1 and O2 require voltage sensor activation, it is unclear why the severed construct would affect state O1 primarily, as suggested in the manuscript, as opposed to decreasing occupancy of both open states. In line with this argument, the presence of Mg2+ in the extracellular solution affected both O1 and O2. This finding suggests that entry into both O1 and O2 requires voltage-sensor activation because Mg2+ ions are known to stabilize the voltage sensor in its most deactivated conformations. 

      We agree with the reviewer that access to both states requires a conformational change in the voltage sensor. This was stated in our revised article: “In contrast, to enter O2, all subunits must complete both voltage sensor transitions and the collective gating ring transition.” We interpret the two gating steps as sequential; the effective rotation of the intracellular ring would happen only once the sensor is in its fully activated position.

      We also agree that the S4-S5 segment cannot be the only interaction mechanism, as we demonstrated in our earlier work (Lörinczi et al., 2015; Tomczak et al., 2017).  

      Activation towards and closure from O1 is slow, whereas channels close rapidly from O2. A rapid alternating pulse protocol was used to take advantage of the difference in activation and deactivation kinetics between the two open components in the mutants and thus drive an increasing number of channels towards state O1. Currents activated by the alternating protocol reached larger amplitudes than those elicited by a long depolarization to the same voltage. This finding is interpreted as an indication that O1 has a larger macroscopic conductance than O2. In the revised manuscript, the authors performed single-channel recordings to determine why O1 and O2 have different macroscopic conductance. The results show that at voltages where the state O1 predominates, channels exhibited longer open times and overall higher open probability, whereas at more depolarized voltages where occupancy of O2 increases, channels exhibited more flickery gating behavior and decreased open probability. These results are informative but not conclusive because additional details about how experiments were conducted, and group data analysis are missing. Importantly, results showing inhibition of single ΔPASCap channels by a Kv10-specific inhibitor are mentioned but not shown or quantitated - these data are essential to establish that the new O1 conductance indeed represents Kv10 channel activity.

      We observed the activity of a channel compatible with Kv10.1 ΔPAS-Cap (long openings at low-moderate potentials, very short flickery activity at strong depolarizations) in 12 patches from oocytes obtained from different frog operations over a period of two and a half months once the experimental conditions could be established. As stated in the text, we did not proceed to generate amplitude histograms because we could not resolve clear single-channel events at strong depolarizations. Astemizole abolished the activity and (remarkably) strongly reduced the noise in traces at strong depolarizations, which we interpret as partially caused by flicker openings.

      Author response image 1.

      We include two example recordings of Astemizole application (100µM) on two different patches. Both recordings are performed at -60 mV (to decrease the likelihood that the channel visits O2) with 100 mM internal and 60 mM external K+. In both cases, the traces in Astemizole are presented in red.

      It is shown that conditioning pulses to very negative voltages result in mutant channel currents that are larger and activate more slowly than those elicited at the same voltage but starting from less negative conditioning pulses. In voltage-activated curves, O1 occupancy is shown to be favored by increasingly negative conditioning voltages. This is interpreted as indicating that O1 is primarily accessed from deeply closed states in which voltage sensors are in their most deactivated position. Consistently, a mutation that destabilizes these deactivated states is shown to largely suppress the first component in voltage-activation curves for both ΔPASCap and E600R channels.

      The authors then address the role of the hidden O1 state in channel regulation by calmodulation. Stimulating calcium entry into oocytes with ionomycin and thapsigarging, assumed to enhance CaM-dependent modulation, resulted in preferential potentiation of the first component in ΔPASCap and E600R channels. This potentiation was attenuated by including an additional mutation that disfavors deeply closed states. Together, these results are interpreted as an indication that calcium-CaM preferentially stabilizes deeply closed states from which O1 can be readily accessed in mutant channels, thus favoring current activation. In WT channels lacking a conducting O1 state, CaM stabilizes deeply closed states and is therefore inhibitory. It is found that the potentiation of ΔPASCap and E600R by CaM is more strongly attenuated by mutations in the channel that are assumed to disrupt interaction with the C-terminal lobe of CaM than mutations assumed to affect interaction with the N-terminal lobe. These results are intriguing but difficult to interpret in mechanistic terms. The strong effect that calcium-CaM had on the occupancy of the O1 state in the mutants raises the possibility that O1 can be only observed in channels that are constitutively associated with CaM. To address this, a biochemical pull-down assay was carried out to establish that only a small fraction of channels are associated with CaM under baseline conditions. These CaM experiments are potentially very interesting and could have wide physiological relevance. However, the approach utilized to activate CaM is indirect and could result in additional nonspecific effects on the oocytes that could affect the results.

      Finally, a mathematical model is proposed consisting of two layers involving two activation steps for the voltage sensor, and one conformational change in the cytoplasmic gating ring - completion of both sets of conformational changes is required to access state O2, but accessing state O1 only requires completion of the first voltage-sensor activation step in the four subunits. The model qualitatively reproduces most major findings on the mutants. Although the model used is highly symmetric and appears simple, the mathematical form used for the rate constants in the model adds a layer of complexity to the model that makes mechanistic interpretations difficult. In addition, many transitions that from a mechanistic standpoint should not depend on voltage were assigned a voltage dependence in the model. These limitations diminish the overall usefulness of the model which is prominently presented in the manuscript. The most important mechanistic assumptions in the model are not addressed experimentally, such as the proposition that entry into O1 depends on the opening of the transmembrane pore gate, whereas entry into O2 involves gating ring transitions - it is unclear why O2 would require further gating ring transitions to conduct ions given that the gating ring can already support permeation by O1 without any additional conformational changes.

      In essence, we agree with the reviewer; we already have addressed these points in our revised article:

      Regarding the voltage dependence we write “the κ/λ transition could reasonably be expected to be voltage independent because we related it to ring reconfiguration, a process that should occur as a consequence of a prior VSD transition. We have made some attempts to treat this transition as voltage independent but state-specific with upper-layer bias for states on the right and lower-layer bias for states on the left. This is in principle possible, as can already be gleaned from the similar voltage ranges of the left-right transition (α/β) and the κL/λ transition. However, this approach leads to a much larger number of free, less well constrained kinetic parameters and drastically complicated the parameter search. ” As you can see, we also formulated a strategy to free the model of the potentially spurious voltage dependence and (in bold here) explained why we did not follow this route in this study. 

      Regarding the need for gating ring transitions after O1, we wrote, “Thus, the underlying gating events can be separated into two steps: The first gating step involves only the voltage sensor without engaging the ring and leads to a pre-open state, which is non-conducting in the WT but conducting in our mutants. The second gating event operates at higher depolarizations, involves a change in the ring, and leads to an open state both in WT and in the mutants. ” 

      We interpret your statements such that you expect the conducting state to remain available once O1 is reached. However, the experimental evidence speaks against that the pore availability remains regardless of the further gating steps beyond O1. The description of model construction is informative here: “... we could exclude many possible [sites at which O1 connects to closed states] because the attachment site must be sufficiently far away from the conventional open state [O2]. Otherwise, the transition from "O1 preferred" to "O2 preferred" via a few closed intermediate states is very gradual and never produces the biphasic GV curves [that we observed]. ” 

      In other words, voltage-dependent gating steps beyond the state that offers access to O1 appear to close the pore, after it was open. That might occur because only then (for states in which at least one voltage sensor exceeded the intermediate position) the ring is fixed in a particular state until all sensors completed activation. In the WT, closing the pore in deactivated states might rely on an interaction that is absent in the mutant because, at least in HERG: “the interaction between the PAS domain and the C-terminus is more stable in closed than in open KV11.1 (HERG) channels, and a single chain antibody binding to the interface between PAS domain and CNBHD can access its epitope in open but not in closed channels, strongly supporting a change in conformation of the ring during gating ”

      Reviewer #3 (Public Review):

      In the present manuscript, Abdelaziz and colleagues interrogate the gating mechanisms of Kv10.1, an important voltage-gated K+ channel in cell cycle and cancer physiology. At the molecular level, Kv10.1 is regulated by voltage and Ca-CaM. Structures solved using CryoEM for Kv10.1 as well as other members of the KCNH family (Kv11 and Kv12) show channels that do not contain a structured S4-S5 linker imposing therefore a non-domain swapped architecture in the transmembrane region. However, the cytoplasmatic N- and C- terminal domains interact in a domain swapped manner forming a gating ring. The N-terminal domain (PAS domain) of one subunit is located close to the intracellular side of the voltage sensor domain and interacts with the C-terminal domain (CNBHD domain) of the neighbor subunit. Mutations in the intracellular domains has a profound effect in the channel gating. The complex network of interactions between the voltage-sensor and the intracellular domains makes the PAS domain a particularly interesting domain of the channel to study as responsible for the coupling between the voltage sensor domains and the intracellular gating ring.

      The coupling between the voltage-sensor domain and the gating ring is not fully understood and the authors aim to shed light into the details of this mechanism. In order to do that, they use well established techniques such as site-directed mutagenesis, electrophysiology, biochemistry and mathematical modeling. In the present work, the authors propose a two open state model that arises from functional experiments after introducing a deletion on the PAS domain (ΔPAS Cap) or a point mutation (E600R) in the CNBHD domain. The authors measure a bi-phasic G-V curve with these mutations and assign each phase as two different open states, one of them not visible on the WT and only unveiled after introducing the mutations.

      The hypothesis proposed by the authors could change the current paradigm in the current understanding for Kv10.1 and it is quite extraordinary; therefore, it requires extraordinary evidence to support it.

      STRENGTHS: The authors use adequate techniques such as electrophysiology and sitedirected mutagenesis to address the gating changes introduced by the molecular manipulations. They also use appropriate mathematical modeling to build a Markov model and identify the mechanism behind the gating changes.

      WEAKNESSES: The results presented by the authors do not fully support their conclusions since they could have alternative explanations. The authors base their primary hypothesis on the bi-phasic behavior of a calculated G-V curve that do not match the tail behavior, the experimental conditions used in the present manuscript introduce uncertainties, weakening their conclusions and complicating the interpretation of the results. Therefore, their experimental conditions need to be revisited. 

      We respectfully disagree. We think that your suggestions for alternative explanations are addressed in the current version of the article. We will rebut them once more below, but we feel the need to point out that our arguments are already laid out in the revised article.

      I have some concerns related to the following points:

      (1) Biphasic gating behavior

      The authors use the TEVC technique in oocytes extracted surgically from Xenopus Leavis frogs. The method is well established and is adequate to address ion channel behavior. The experiments are performed in chloride-based solutions which present a handicap when measuring outward rectifying currents at very depolarizing potentials due to the presence of calcium activated chloride channel expressed endogenously in the oocytes; these channels will open and rectify chloride intracellularly adding to the outward rectifying traces during the test pulse. The authors calculate their G-V curves from the test pulse steady-state current instead of using the tail currents. The conductance measurements are normally taken from the 'tail current' because tails are measured at a fix voltage hence maintaining the driving force constant. 

      We respectfully disagree. In contrast to other channels, like HERG, a common practice for Kv10 is not to use tail currents. It is long known that in this channel, tail currents and test-pulse steady-state currents can appear to be at odds because the channels deactivate extremely rapidly, at the border of temporal resolution of the measurements and with intricate waveforms. This complicates the estimation of the instantaneous tail current. Therefore, the outward current is commonly used to estimate conductance (Terlau et al., 1996; Schönherr et al., 1999; Schönherr et al., 2002; Whicher and MacKinnon, 2019), while the latter authors also use the extreme of the tail for some mutants.

      Due to their activation at very negative voltage, the reversal potential in our mutants can be measured directly; we are, therefore, more confident with this approach. Nevertheless, we have determined the initial tail current in some experiments. The behavior of these is very similar to the average that we present in Figure 1. The biphasic behavior is unequivocally present.

      Author response image 2.

      Calculating the conductance from the traces should not be a problem, however, in the present manuscript, the traces and the tail currents do not agree. 

      The referee’s observation is perfectly in line with the long-standing experience of several labs working with KV10: tail current amplitudes in KV10 appear to be out of proportion for the WT open state (O2). Importantly, this is due to the rapid closure, which is not present in O1. As a consequence, the initial amplitude of tail currents from O1 are easier to estimate correctly, and they are much more obvious in the graphs. Taken together, these differences between O1 and O2 explain the misconception the reviewer describes next.

      The tail traces shown in Fig1E do not show an increasing current amplitude in the voltage range from +50mV to +120mV, they seem to have reached a 'saturation state', suggesting that the traces from the test pulse contain an inward chloride current contamination. 

      As stated in the text and indicated in Author response image 3, the tail currents In Figure 1E increase in amplitude between +50 and +120 mV, as can be seen in the examples below from different experiments (+50 is presented in black, +120 in red). As stated above, the increase is not as evident as in traces from other mutants because the predominance of O2 also implies a much faster deactivation.

      Author response image 3. 

      We are aware that Ca2+-activated Cl- currents can represent a problem when interpreting electrophysiological data in oocytes. In fact, we show in Supplement 1 to Figure 8 that this can be the case during the Ca2+-CaM experiments, where the increase in Ca2+ would certainly augment Cl- contribution to the outward current. This is why we performed these experiments in Cl--free solutions. As we show in Figure 8, the biphasic behavior was also present in those experiments. 

      Importantly, Cl- free bath solutions would not correct contamination during the tail, since this would correspond to Cl- exiting the oocyte. Yet, if there would be contamination of the outward currents by Cl-, one would expect it to increase with larger depolarizations as the typical Ca2+activated Cl- current in oocytes does. As the reviewer states, this does not seem to be the case.

      In addition, this second component identified by the authors as a second open state appears after +50mV and seems to never saturate. The normalization to the maximum current level during the test pulse, exaggerates this second component on the calculated G-V curve. 

      We agree that this second component continues to increase; the reviewer brought this up in the first review, and we have already addressed this in our reply and in the discussion of the revised version: “This flicker block might also offer an explanation for a feature of the mutant channels, that is not explained in the current model version: the continued increase in current amplitude, hundreds of milliseconds into a strong depolarization (Supp. 4 to Fig. 9). If the relative stability of O2 and C2 continued to change throughout depolarization, such a current creep-up could be reproduced. However, this would require either the introduction of further layers of On ↔Cn states, or a non-Markovian modification of the model’s time evolution.” With non-Markovian, we mean a Langevin-type diffusive process. 

      It's worth noticing that the ΔPASCap mutant experiments on Fig 5 in Mes based solutions do not show that second component on the G-V.

      For the readers of this conversation, we would like to clarify that the reviewer likely refers to experiments shown in Fig. 5 of the initial submission but shown in Fig. 6 of the revised version (“Hyperpolarization promotes access to a large conductance, slowly activating open state.” Fig. 5 deals with single channels). We agree that these data look different, but this is because the voltage protocols are completely different (compare Fig. 6A (fixed test pulse, varied prepulse) and Fig. 2A (varied test pulse, fixed pre-pulse). Therefore, no biphasic behavior is expected. 

      Because these results are the foundation for their two open state hypotheses, I will strongly suggest the authors to repeat all their Chloride-based experiments in Mes-based solutions to eliminate the undesired chloride contribution to the mutants current and clarify the contribution of the mutations to the Kv10.1 gating.

      In summary, we respectfully disagree with all concerns raised in point (1). Our detailed arguments rebutting them are given above, but there is a more high-level concern about this entire exchange: the referee casts doubt on observations that are not new. Several labs have reported for a group of mutant KCNH channels: non-monotonic voltage dependence of activation (see, e.g., Fig. 6D in Zhao et al., 2017), multi-phasic tail currents (see e.g. Fig. 4A in Whicher and MacKinnon, 2019, in CHO cells where Cl- contamination is not a concern), and activation by high [Ca2+]i (Lörinczi et al., 2016). Our study replicates those observations and hypothesizes that the existence of an additional conducting state can alone explain all previously unexplained observations. We highlight the potency of this hypothesis with a Markov model that qualitatively reproduces all phenomena. We not only factually disagree with the individual points raised, but we also think that they don't touch on the core of our contribution

      (2) Two step gating mechanism.

      The authors interpret the results obtained with the ΔPASCap and the E600R as two step gating mechanisms containing two open states (O1 and O2) and assign them to the voltage sensor movement and gating ring rotation respectively. It is not clear, however how the authors assign the two open states.

      The results show how the first component is conserved amongst mutations; however, the second one is not. The authors attribute the second component, hence the second open state to the movement of the gating ring. This scenario seems unlikely since there is a clear voltagedependence of the second component that will suggest an implication of a voltage-sensing current.

      We do not suggest that the gating ring motion is not voltage dependent. We would like to point out that voltage dependence can be conveyed by voltage sensor coupling to the ring; this is the widely accepted theory of how the ring can be involved. Should the reviewer mean it in a narrow sense, that the model should be constructed such that all voltage-dependent steps occur before and independently of ring reconfiguration and that only then an additional step that reflects the (voltage-independent) reconfiguration solely, we would like to point the reviewer to the article, where we write: “the κ/λ transition could reasonably be expected to be voltage independent because we related it to ring reconfiguration, a process that should occur as a consequence of a prior VSD transition. We have made some attempts to treat this transition as voltage independent but state-specific with upper-layer bias for states on the right and lower-layer bias for states on the left. This is in principle possible, as can already be gleaned from the similar voltage ranges of the left-right transition (α/β) and the κL/λ transition. However, this approach leads to a much larger number of free, less well constrained kinetic parameters and drastically complicated the parameter search. ” As you can see, we also formulated a strategy to free the model from the potentially spurious voltage dependence and (in bold here) explained why we did not follow this route in this study. 

      The split channel experiment is interesting but needs more explanation. I assume the authors expressed the 2 parts of the split channel (1-341 and 342-end), however Tomczak et al showed in 2017 how the split presents a constitutively activated function with inward currents that are not visible here, this point needs clarification.

      As stated in the panel heading, the figure legend, and the main text, we did not use 1-341 and 342-end as done in Tomczak et al. Instead, “we compared the behavior of ∆2-10 and ∆210.L341Split,”. Evidently, the additional deletion (2-10) causes a shift in activation that explains the difference you point out. However, as we do not compare L341Split and ∆210.L341Split but ∆2-10 and ∆2-10.L341Split, our conclusion remains that “As predicted, compared to ∆2-10, ∆2-10.L341Split showed a significant reduction in the first component of the biphasic GV (Fig. 2C, D).” Remarkably, the behavior of the ∆3-9 L341Split described in Whicher and MacKinnon, 2019 (Figure 5) matches that of our ∆2-10 L341Split, which we think reinforces our case.

      Moreover, the authors assume that the mutations introduced uncover a new open state, however the traces presented for the mutations suggest that other explanations are possible. Other gating mechanisms like inactivation from the closed state, can be introduced by the mutations. The traces presented for ΔPASCap but specially E600R present clear 'hooked tails', a direct indicator of a populations of inactive channels during the test pulse that recover from inactivation upon repolarization (Tristani-Firouzi M, Sanguinetti MC. J Physiol. 1998). 

      There is a possibility that we are debating nomenclature here. In response to the suggestion that all our observations could be explained by inactivation, we attempted a disambiguation of terms in the reply and the article. As the argument is brought up again without reference to our clarification attempts, we will try to be more explicit here:

      If, starting from deeply deactivated states, an open state is reached first, and then, following further activation steps, closed states are reached, this might be termed “inactivation”. In such a reading, our model features many inactivated states. The shortest version of such a model is C-O-I. It is for instance used by Raman and Bean (2001; DOI: 10.1016/S00063495(01)76052-3) to explain NaV gating in Purkinje neurons. If “inactivation” is meant in the sense that a gating transition exists, which is orthogonal to an activation/deactivation axis, and that after this orthogonal transition, an open state cannot be reached anymore, then all of the upper floor in our model is inactivated with respect to the open state O1. Finally, the state C2 is an inactivated state to O2. In this view, “inactivation” explains the observed phenomena. 

      However, we must disagree if the referee means that a parsimonious explanation exists in which a single conducting state is the only source for all observed currents.   

      There is a high-level reason: we found a single assumption that explains three different phenomena, while the inactivation hypothesis with one conducting state cannot explain one of them (the increase of the first component under raised CaM). But there is also a low-level reason: the tails in Tristani-Firouzi and Sanguinetti 1998 are fundamentally different from what we report herein in that they lack a third component. Thus, those tails are consistent with recovery from inactivation through a single open state, while a three-component tail is not. In the framework of a Markov model, the time constants of transitions from and to a given state (say O2), cannot change unless the voltage changes. During the tail current, the voltage does not change, yet we observe: 

      i) a rapid decrease with a time constant of at most a few milliseconds (Fig 9 S2, 1-> 2),  ii) a slow increase in current, peaking after approximately 25 milliseconds and iii) a relaxation to zero current with a time constant of >50 ms. 

      According to the reviewer’s suggestion, these processes on three timescales should all be explained by depopulating and repopulating the same open state while all rates are constant. There might well be a complicated multi-level state diagram with a single open state with different variants, like (open and open inactivated) that could produce triphasic tails with these properties if the system had not reached a steady state distribution at the end of the test pulse. It cannot, however, achieve it from an equilibrated system, and certainly, it cannot at the same time produce “biphasic activation” and “activation by CaM”. 

      The results presented by the authors can be alternatively explained with a change in the equilibrium between the close to inactivated/recovery from inactivation to the open state. 

      Again, we disagree. The model construction explains in detail that the transition from the first to the second phase is not gradual. Shifting equilibria cannot reproduce this. We have extensively tested that idea and can exclude this possibility.

      Finally, the authors state that they do not detect "cumulative inactivation after repeated depolarization" but that is considering inactivation only from the open state and ignoring the possibility of the existence of close state inactivation or, that like in hERG, that the channel inactivates faster that what it activates (Smith PL, Yellen G. J Gen Physiol. 2002). 

      We respectfully disagree. We explicitly model an open state that inactivates faster (O2->C2) than it activates. Once more, this is stated in the revised article, which we point to for details. Again, this alternative mechanism does not have the potential to explain all three effects. As discussed above about the chloride contamination concerns, this inactivation hypothesis was mentioned in the first review round and, therefore, addressed in our reply and the revised article. We also explained that “inactivation” has no specific meaning in Markov models. In the absence of O1, all transitions towards the lower layer are effectively “inactivation from closed states”, because they make access to the only remaining open state less likely”. But this is semantics. What is relevant is that no network of states around a single open state can reproduce the three effets in a more parsimonious way than the assumption of the second open state does.

      (3) Single channel conductance.

      The single channels experiments are a great way to assess the different conductance of single channel openings, unfortunately the authors cannot measure accurately different conductances for the two proposed open states. The Markov Model built by the authors, disagrees with their interpretation of the experimental results assigning the exact same conductance to the two modeled open states. To interpret the mutant data, it is needed to add data with the WT for comparison and in presence of specific blockers. 

      We respectfully disagree. As previously shown, the conductance of the flickering wild-type open state is very difficult to resolve. Our recordings do not show that the two states have different single-channel conductances, and therefore the model assumes identical singlechannel conductance. 

      The important point is that the single-channel recordings clearly show two different gating modes associated with the voltage ranges in which we predict the two open states. One has a smaller macroscopic current due to rapid flickering (aka “inactivation”). These recordings are another proof of the existence of two open states because the two gating modes occur.  Wild-type data can be found in Bauer and Schwarz, (2001, doi:10.1007/s00232-001-0031-3) or Pardo et al., (1998, doi:10.1083/jcb.143.3.767) for comparison.

      We appreciate the effort editors and reviewers invested in assessing the revised manuscript. Yet, we think that the demanded revision of experimental conditions and quantification methods contradicts the commonly accepted practice for KV10 channels. Some of the reviewer comments are skeptical about the biphasic behavior, which is an established and replicated finding for many mutants and by many researchers. The alternative explanations for these disbelieved findings are either “semantics” or cannot quantitatively explain the measurements. Therefore, only the demand for more explanations and unprecedented resolution in singlechannel recordings remains. We share these sentiments.

      ———— The following is the authors’ response to the original reviews.

      (1) The authors must show that the second open state is not just an artifact of endogenous activity but represents the activity of the same EAG channels. I suggest that the authors repeat these experiments in Mes-based solutions. 

      (2) Along the same lines, it is necessary to show that these currents can be blocked using known EAG channel blockers such as astemizole. Ultimately, it will be important to demonstrate using single-channel analysis that these do represent two distinct open states separated by a closed state. 

      We have addressed these concerns using several approaches. The most substantial change is the addition of single-channel recordings on ΔPASCap. In those experiments, we could provide evidence of the two types of events in the same patch, and the presence of an outward current at -60 mV, 50 mV below the equilibrium potential for chloride. The channels were never detected in uninjected oocytes, and Astemizole silenced the activity in patches containing multiple channels. These observations, together with the maintenance of the biphasic behavior that we interpret as evidence of the presence of O1 in methanesulfonate-based solutions, strongly suggest that both O1 and O2 obey the expression of KV10.1 mutants.

      (3) Currents should be measured by increasing the pulse lengths as needed in order to obtain the true steady-state G-V curves. 

      We agree that the endpoint of activation is ill-defined in the cases where a steady-state is not reached. This does indeed hamper quantitative statements about the relative amplitude of the two components. However, while the overall shape does change, its position (voltage dependence) would not be affected by this shortcoming. The data, therefore, supports the claim of the “existence of mutant-specific O1 and its equal voltage dependence across mutants.”

      (4) A more clear and thorough description should be provided for how the observations with the mutant channels apply to the behavior of WT channels. How exactly does state O1 relate to WT behavior, and how exactly do the parameters of the mathematical model differ between WT and mutants? How can this be interpreted at a structural level? What could be the structural mechanism through which ΔPASCap and E600R enable conduction through O1? It seems contradictory that O1 would be associated exclusively with voltage-sensor activation and not gating ring transitions, and yet the mutations that enable cation access through O1 localize at the gating ring - this needs to be better clarified. 

      We have undertaken a thorough rewriting of all sections to clarify the structural correlates that may explain the behavior of the mutants. In brief, we propose that when all four voltage sensors move towards the extracellular side, the intracellular ring maintains the permeation path closed until it rotates. If the ring is altered, this “lock” is incompetent, and permeation can be detected (page 34). By fixing the position of the ring, calmodulin would preclude permeation in the WT and promote the population of O1 in the mutants.

      (5) Rather than the t80% risetime, exponential fits should be performed to assess the kinetics of activation. 

      We agree that the assessment of kinetics by a t80% is not ideal. We originally refrained from exponential fits because they introduce other issues when used for processes that are not truly exponential (as is the case here). We had planned to perform exponential fits in this revised version, but because the activation process is not exponential, the time constants we could provide would not be accurate, and the result would remain qualitative as it is now. In the experiments where we did perform the fits (Fig. 3), the values obtained support the statement made. 

      (6) It is argued based on the G-V relations in Figure 2A that none of the mutations or deletions introduced have a major effect on state O1 properties, but rather affect state O2. However, the occupancy of state O2 is undetermined because activation curves do not reach saturation. It would be interesting to explore the fitting parameters on Fig.2B further to test whether the data on Fig 2A can indeed only be described by fits in which the parameters for O1 remain unchanged between constructs. 

      We agree that the absolute occupancy of O2 cannot be properly determined if a steady state is not reached. This is, however, a feature of the channel. During very long depolarizations in WT, the current visually appears to reach a plateau, but a closer look reveals that the current keeps increasing after very long depolarizations (up to 10 seconds; see, e.g., Fig. 1B in Garg et al., 2013, Mol Pharmacol 83, 805-813. DOI: 10.1124/mol.112.084384). Interestingly, although the model presented here does not account for this behavior, we propose changes in the model that could. “If the relative stability of O2 and C2 continued to change throughout the depolarization such a current creep-up could be reproduced. However, this would require either the introduction of further layers of On↔Cn states or a non-Markovian modification of the model’s evolution.” Page 34.

      (7) The authors interpret the results obtained with the mutants DPASCAP and E600R -tested before by Lorinczi et al. 2016, to disrupt the interactions between the PASCap and cNBHD domains- as a two-step gating mechanism with two open states. All the results obtained with the E600R mutant and DPASCap could also be explained by inactivation/recovery from inactivation behavior and a change in the equilibrium between the closed states closed/inactivated states and open states. Moreover, the small tails between +90 to +120 mV suggest channels accumulate in an inactive state (Fig 1E). It is not convincing that the two open-state model is the mechanism underlying the mutant's behavior.  

      We respectfully disagree with the notion that a single open state can provide a plausible explanation for "All the results obtained with the E600R mutant and DPASCap". We think that our new single channel results settle the question, but even without this direct evidence, a quantitative assessment of the triphasic tail currents all but excludes the possibility of a single open state. We agree that it is, in principle, possible to obtain some form of a multiphasic tail with a single open state using the scheme suggested in this comment: at the end of the test pulse, a large fraction of the channels must be accumulated in inactive states, and a few are in the open state. The hyperpolarization to -100mV then induces a rapid depopulation of the open state, followed by slower replenishments from the inactive state. Exactly this process occurs in our model, when C2 empties through O2 (Supp. 5 to Fig 9, E600R model variant). However, this alone is highly unlikely to quantitatively explain the measured tail currents, because of the drastically different time scales of the initial current decay (submillisecond to at most a few milliseconds lifetime) and the much slower transient increase in current (several tens of milliseconds) and the final decay with time constants of >100 ms (see for instance data in Fig. 1 E for E600R +50 to +120mV test pulse). To sustain the substantial magnitude of slowly decaying current by slow replenishment of an open state with a lifetime of 1 ms requires vast amounts of inactivated channels. A rough estimation based on the current integral of the initial decay and the current integral of the slowly decaying current suggests that at the end of the test pulse, the ratio inactivated/open channels would have to be 500 to 1500 for this mechanism to quantitatively explain the observed tail currents. To put this in perspective: This would suggest that without inactivation all the expressed channels in an oocyte would provide 6 mA current during the +100 mV test pulse. While theoretically possible, we consider this a less likely explanation than a second open state.

      (8) Different models should be evaluated to establish whether the results in Figure 4 can also be explained by a model in which states O1 and O2 have the same conductance. It would be desirable if the conductance of both states were experimentally determined - noise analysis could be applied to estimate the conductance of both states. 

      In the modified model, O1 and O2 have the same single-channel conductance. The small conductance combined with the fast flickering did not allow an accurate determination, but we can state that there is no evidence that the single-channel conductance of the states is different.

      (9) Although not included, it looks like the model predicts some "conventional inactivation" This can be appreciated in Fig 8, and in the traces at -60mV. Interestingly, the traces obtained in the absence of Cl- also undergo slow inactivation, or 'conventional inactivation' as referred to by the authors. Please revise the following statement "Conventional inactivation was never detected in any mutants after repeated or prolonged depolarization. In the absence of inactivation, the pre-pulse dependent current increase at +40 mV could be related to changes in the relative occupancy of the open states". 

      We have carefully edited the manuscript to address this concern. The use of the term inactivation admittedly represents a challenge. We agree that the state that results from the flickering block (C2) could be defined as “inactivated” because it is preceded by an open state. Yet, in that case, the intermediate states that the channel travels between O1 and O2 would also be sensu stricto “inactivated”, but only in the mutants. We have made this clear in page 17.

      Recommendations for improving the writing and presentation.

      (1) Methods section: Please state the reversal potential calculated for the solution used. It looks like the authors used an Instantaneous I-V curve method to calculate the reversal potential; if that's correct, please show the I-V and the traces together with the protocol used. 

      We have provided the calculated reversal potentials for excised patches. We cannot predict the reversal potential in whole oocytes because we have no control over the intracellular solution. The reversal potential was determined in the mutants through the current at the end of the stimulus because the mutants produced measurable inward currents. The differences in reversal potential were not significant among mutants.

      Pulse protocols have been added to the figures.

      (2) Figure 1 suggestion: Combine the two panels in panel D and move the F panel up so the figure gets aligned in the lower end.

      Thank you, this has been done.

      (3) Please clarify the rationale for using the E600R-specific mutant. I assume it is based on the Lorinzci et al. 2016 effect and how this is similar to the DPASCap phenotype, or is it due to the impact of this mutation in the interactions between the N-term and the cNBHD? 

      We have explained the rationale for the use of E600R explicitly on page 6.

      (4) Fig S1A is not present in the current version of the manuscript. Include a cartoon as well as a structural figure clearly depicting the perturbations introduced by E600R, ΔPASCap, and the other deletions that are tested. Additional structural information supporting the discussion would also be helpful to establish clearer mechanistic links between the experimental observations described here and the observed conformational changes between states in Kv10 channel structures. 

      We have corrected this omission, thank you for pointing it out.

      (5) It would be informative to see the traces corresponding to the I-V shown in Fig 7 A and B at the same indicated time points (0, 60, 150, and 300s). Did the authors monitor the Ca2+ signal rise after the I&T treatment to see if it coincides with the peak in the 60s? 

      In Figure 7 (now Figure 8) we used voltage ramps instead of discrete I-V protocols because of the long time required for recording the latter. This is stated on page 19. Ca2+ was monitored through Cl- current after ionomycin/thapsigargin. The duration of the Ca2+ increase was reproducible among oocytes and in good agreement with the changes observed in the biphasic behavior of the mutants (Supplement 1 to Figure 8).

      (6) Fig 4. Please state in the legend what the different color traces correspond to in E600R and DPASCap. Is there a reason to change the interpulse on DPASCap to -20mV and not allow this mutant to close? Please state. How do the authors decide the 10 ms interval for the experiments in Fig 2? 

      Thank you for pointing this out, we have added the description. We have explained why we use a different protocol for ΔPASCap and the reason for using 10 ms interval (we believe the referee means Figure 4) on page 12.  

      (7) Fig. 5. Since the pre-pulse is supposed to be 5s, but the time scale doesn't correspond with a pre-pulse of 5 s before the test pulse to +40mV. Has the pre-pulse been trimmed for representation purposes? If so, please state. 

      The pre-pulse was 5s, but as the reviewer correctly supposed, the trace is trimmed to keep the +40 mV stimulus visible. This has now been clearly stated in the legend.

      (8) The mutant L322H is located within the S4 helix according to the Kv10.1 structure (PDB 5K7L), not in the 'S3-S4 linker'; please correct. 

      This has been done, thank you.

      The introduction of this mutant should also shift the voltage dependence toward more hyperpolarizing potentials (around 30mV, according to Schoenherr et al. 1999). It looks like that shift is present within the first component of the G-V. Still, since the max amplitude from the second component could be contaminated by endogenous Cl- currents, this effect is minimized. Repeating these experiments in the no Cl- solutions will help clarify this point and see the effect of the DPASCap and E600R in the background of a mutation that accelerates the transitions between the closed states (see Major comment 1). Did the authors record L322H alone for control purposes? 

      We have decided not to measure L322H alone or repeat the measurements in Cl--free solutions because we do not see a way to use the quantitative assessment of the voltage dependence of L322H and the L322H-variants of the eag domain mutants. Like in our answer to main point 3, we base our arguments not on the precise voltage dependence of the second component but on the shape of the G-V curves instead, specifically the consistent appearance of the first component and the local conductance minimum between the first and second components. After the introduction of L322H the first component is essentially absent.

      We think that the measurements of the L322H mutants cannot be interpreted as a hyperpolarizing shift in the first component. The peak of the first conductance component occurs around -20 mV in ΔPASCap and E600R (Fig. 7 C, D). After a -30mV shift, in L322H+DPASCap and L322H+E600R, this first peak would still be detected within the voltage range in our experiments, but it is not. A contamination of the second component would have little impact on this observation, which is why we refrain from the suggested measurements.  

      (9) The authors differentiate between an O1 vs. O2 state with different conductances, and maybe I missed it, but there's no quantitative distinction between the components; how are they different?

      Please see the response to the main comments 1 and 2. This has been addressed in singlechannel recordings.

      (10) Please state the voltage protocols, holding voltages, and the solutions (K+ concentration and Cl-presence/absence) used for the experiments presented in the legends on the figures. Hence, it's easier to interpret the experiments presented. 

      Thank you, this has been done.

      (11) The authors state on page 7 that "with further depolarizations, the conductance initially declined to rise again in response to strong depolarizations. This finding matches the changes in amplitude of the tail currents, which, therefore, probably reflect a true change in conductance" However, the tails in the strong voltage range (+50 to +120 mV) for the E600R mutant argue against this result. Please review.

      The increase in the amplitude of the tail current is also present in E600R, but the relative increase is smaller. We have decided against rescaling these traces because the Figure is already rather complex. We indicated this fact with a smaller arrow and clarified it in the text (page 8).

      (12) The authors mention that the threshold of activation for the WT is around -20mV; however, the foot of the G-V is more around -30 or -40mV. Please revise. 

      Thank you. We have done this. 

      (13) The authors state on page 9 that the 'second component occurs at progressively more depolarized potentials for increasingly larger N-terminal deletions" However E600R mutant that conserves the N-terminal intact has a shift as pronounced as the DPASCap and larger than the D2-10. How do the authors interpret this result? 

      We have corrected this statement in page 10 : “…the second component occurs at progressively more depolarized potentials for increasingly larger N-terminal deletions and when the structure of the ring is altered through disruption of the interaction between N- and C-termini (E600R)”.

      (14) The equation defined to fit the G-Vs, can also be used to describe the WT currents. If the O1 is conserved and present in the WT, this equation should also fit the WT data properly. The 1-W component shown could also be interpreted as an inactivating component that, in the WT, shifts the voltage-dependence of activation towards depolarizing potentials and is not visible. Still, the mutants do show it as if the transition from closed-inactivated states is controlled by interactions in the gating ring, and disturbing them does affect the transitions to the open state. 

      Out of the two open states in the mutant, O2 is the one that shares properties with the WT (e.g. it is inaccessible during Ca2+-CaM binding) while O1 is the open state with the voltage dependence that is conserved across the mutants. We, therefore, believe that this question is based on a mix-up of the two open states. We appreciate the core of the question: does the pattern in the mutants’ G-V curves find a continuation in the WT channel? 

      Firstly, the component that is conserved among mutants does not lead to current in the WT because the corresponding open state (O1) is not observed in WT. However, the gating event represented by this component should also occur in WT and –given its apparent insensitivity to eag domain mutations–  this gating step should occur in WT with the same voltage dependence as in all the mutants. This means that this first component sets a hard boundary for the most hyperpolarized G-V curve we can expect in the WT, based on our mutant measurements. Secondly, the second component shows a regular progression across mutants: The more intact the eag domain is, the more hyperpolarized the Vhalf values of transition term (1-W) and O2 activation. In Δ2-10, the transition term already almost coincides with O1 activation (estimated Vhalf values of -33.57 and -33.47 mV). A further shift of (1-W) in the WT is implausible because, if O1 activation is coupled to the earliest VSD displacement, the transition should not occur before O1 activation. Still, the second component might shift to more hyperpolarized values in the WT, depending on the impact of amino acids 2 to 10 on the second VSD transition.

      In summary, in WT the G-V should not be more hyperpolarized than the first component of the mutants, and the (1-W)-component probably corresponds to the Δ2-10 (1-W)-component. In WT the second component should be no more depolarized than the second component of Δ2-10. The WT G-V (Fig.1B) meets all these predictions derived from the pattern in the mutant GVs: When we use Eq. 4 to fit the WT G-V with A1=0 (O1 is not present in WT) and the parameters of the transition term (1-W)  fixed to the values attained in Δ2-10, we obtain a fit for the O2 component with Vhalf\=+21mV. This value nicely falls into the succession of Vhalf values for Δeag, ΔPASCap, and Δ2-10 (+103mV,+80mV,+52mV) and, at the same time, it is not more hyperpolarized than the conserved first component (Vhalf -34mV). Our measurements therefore support that the O2 component in the mutants corresponds to the single open state in the WT. 

      (15) Page 15, the authors state that 'The changes in amplitude and kinetics in response to rising intracellular Ca2+ support our hypothesis that Ca-CaM stabilized O1, possibly by driving the channels to deep closed states (Fig 5 and 6)' (pg 15). This statement seems contradictory; I can't quite follow the rationale since Ca2+ potentiates the current (Fig 7), and the addition of the L322H mutant in Fig 7 makes the shift of the first component to negative potentials visible.

      Please check the rationale for this section. 

      We have explained this more explicitly in the discussion (page 32). “Because access to O1 occurs from deep closed states, this could be explained by an increased occupancy of such deactivated states in response to CaM binding. This appears to be the case since CaM induces a biphasic behavior in the mutant channels that show reduced access to deep closed states; thus, L322H mutants behave like the parental variants in the presence of Ca2+-CaM. This implies a mechanistic explanation for the effect of Ca2+-CaM on WT since favoring entry into deep closed states would result in a decrease in current amplitude in the absence of (a permeable) O1”.

      Also, Figs 5 and 6 seem miscited here. 

      Thank you, we have corrected this.

      (16) For Figure 5, it would be helpful if each of the current traces corresponding to a particular voltage had a different color. That way, it will be easier to see how the initial holding voltage modulates current. 

      We have considered this suggestion, and we agree that it would make it easier to follow. Yet, since we have identified the mutants with different colors, it would be inconsistent if we used another color palette for this Figure. Supplement 3 to Figure 9 shows the differences in a clearer way.

      (17) Add zero-current levels to all current traces.

      We have done this.

      (18) The mathematical model should be described better. Particularly, the states from which O1 can be accessed should be described more clearly, as well as whether the model considers any direct connectivity between states O1 and O2. The origin of the voltage-dependence for transitions that do not involve voltage-sensor movements should be discussed. Also, it separation of kappa into kappa-l and kappa-r should be described. 

      We have extensively rewritten the description of the mathematical model to address these concerns.

      (19) Page 4, "reveals a pre-open state in which the transmembrane regions of the channel are compatible with ion permeation, but is still a nonconducting state". Also, page 27, "renders a hydrophobic constriction wider than 8 Å, enough to allow K+ flow, but still corresponds to a non-conducting state". These sentences are confusing - how can the regions be compatible with ion permeation, and still not be conducting? Is cation conductance precluded by a change in the filter, or elsewhere? How is it established that it represents a non-conducting state? 

      We have rephrased to clarify this apparent inconsistence. Page 4: “(…) in which the transmembrane regions of the channel are compatible with ion permeation (the permeation path is dilated, like in open states) but the intracellular gate is still in the same conformation as in closed states (Zhang et al., 2023).” Page 31: “The presence of an intact intracellular ring would preclude ionic flow in the WT, and its alteration would explain the permeability of this state in the mutants.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses:  

      (1) The heatmaps (for example, Figure 3A, B) are challenging to read and interpret due to their size. Is there a way to alter the visualization to improve interpretability? Perhaps coloring the heatmap by general anatomical region could help? We feel that these heatmaps are critical to the utility of the registration strategy, and hence, clear visualization is necessary. 

      We thank the reviewers for this point on aesthetic improvement, and we agree that clearer visualization of our correlation heatmaps is important. To address this point, we have incorporated the capability of grouping “child” subregions in anatomical order by their more general “parent” region into the package function, plot_correlation_heatmaps(). Parent regions will be can now be plotted as smaller sub-facets in the heatmaps. We have also rearranged our figures to fit enlarged heatmaps in Figures 3-5, and Supplementary Figure 10 for easier visualization. 

      (2) Additional context in the Introduction on the use of immediate early genes to label ensembles of neurons that are specifically activated during the various behavioral manipulations would enable the manuscript and methodology to be better appreciated by a broad audience. 

      We thank the reviewers for this suggestion and have revised the first part of our Introduction to reflect the broader use and appeal of immediate early genes (IEGs) for studying neural changes underlying behavior.

      (3) The authors mention that their segmentation strategies are optimized for the particular staining pattern exhibited by each reporter and demonstrate that the manually annotated cell counts match the automated analysis. They mention that alternative strategies are compatible, but don't show this data. 

      We thank the reviewers for this comment. We also appreciate that integration with alternative strategies is a major point of interest to readers, given that others may be interested in compatibility with our analysis and software package, rather than completely revising their own pre-existing pipelines. 

      Generally, we have validated the ability to import datasets generated from completely different workflows for segmentation and registration. We have since released documentation on our package website with step-by-step instructions on how to do so (https://mjin1812.github.io/SMARTTR/articles/Part5.ImportingExternalDatasets). We believe this tutorial is a major entry point to taking advantage of our analysis package, without adopting our entire workflow.

      This specific point on segmentation refers to the import_segmentation_custom()function in the package. As there is currently not a standard cell segmentation export format adopted by the field, this function still requires some data wrangling into an import format saved as a .txt file. However, we chose not to visually demonstrate this capability in the paper for a few reasons.  

      i) A figure showing the broad testing of many different segmentation algorithms, (e.g., Cellpose, Vaa3d, Trainable Weka Segmentation) would better demonstrate the efficacy of segmentation of these alternative approaches, which have already been well-documented. However, demonstrating importation compatibility is more of a demonstration of API interface, which is better shown in website documentation and tutorial notebooks.

      ii) Additionally, showing importation with one well-established segmentation approach is still a demonstration of a single use case. There would be a major burden-of-proof in establishing importation compatibility with all potential alternative platforms, their specific export formats, which may be slightly different depending on post-processing choices, and the needs of the experimenters (e.g., exporting one versus many channels, having different naming conventions, having different export formats). For example, output from Cellpose can take the form of a NumPy file (_seg.npy file), a .png, or Native ImageJ ROI archive output, and users can have chosen up to four channels. Until the field adopts a standardized file format, one flexible enough to account for all the variables of experimental interest, we currently believe it is more efficient to advise external groups on how to transform their specific data to be compatible with our generic import function.  

      (4) The authors provided highly detailed information for their segmentation strategy, but the same level of detail was not provided for the registration algorithms. Additional details would help users achieve optimal alignment.

      We apologize for this lack of detail. The registration strategy depends upon the WholeBrain (Fürth et al., 2018) package for registration to the Allen Mouse Common Coordinate Framework. While this strategy has been published and documented elsewhere, we have substantially revised our methods section on the registration process to better incorporate details of this approach.

      (5) The authors illustrate registration to the Allen atlas. Can they comment on whether the algorithm is compatible with other atlases or with alternative sectioning planes (horizontal/sagittal)? 

      Since the current registration workflow integrates WholeBrain (Fürth et al., 2018), any limitations of WholeBrain apply to our approach, which means limited support for registering non-coronal sectioning planes and reliance on the Allen Mouse Atlas (Dong, 2008). However, network analysis and plotting functions are currently compatible with the Allen Mouse Brain Atlas and the Kim Unified Mouse Brain Atlas version (2019) (Chon et al., 2019). Therefore, current limitations in registration do not preclude the usefulness of the SMARTTR software in generating valuable insights from network analysis of externally imported datasets. 

      There are a number of alternative workflows, such as the QUINT workflow (Yates et al., 2019), that support multiple different mouse atlases, and registration of arbitrarily sectioned angles. We have plans to support and a facilitate an entry point for this workflow in a future iteration of SMARTTR, but believe it is of benefit to the wider community to release and support SMARTTR in its current state.

      (6) Supplemental Figures S10-13 do not have a legend panel to define the bar graphs. 

      We apologize for this omission and have fixed our legends in our resubmission. Our supplement figure orders have changed and the corresponding figures are now Supplemental Figures S11-14.

      (7) When images in a z-stack were collapsed, was this a max intensity projection or average? Assuming this question is in regards to our manual cell counting validation approach, the zstacks were collapsed as a maximum intensity projection.  

      Reviewer #2 (Public review): 

      Weaknesses: 

      (1) While I was able to install the SMARTR package, after trying for the better part of one hour, I could not install the "mjin1812/wholebrain" R package as instructed in OSF. I also could not find a function to load an example dataset to easily test SMARTR. So, unfortunately, I was unable to test out any of the packages for myself. Along with the currently broken "tractatus/wholebrain" package, this is a good example of why I would strongly encourage the authors to publish SMARTR on either Bioconductor or CRAN in the future. The high standards set by Bioc/CRAN will ensure that SMARTR is able to be easily installed and used across major operating systems for the long term. 

      We greatly thank the reviewer for pointing out this weakness; long-term maintenance of this package is certainly a mutual goal. Loading an .RDATA file is accomplished by either doubleclicking directly on the file in a directory window, after specifying this file type should be opened in RStudio or by using the load() function, (e.g., load("directory/example.RData")). We have now explicitly outlined these directions in the online documentation. 

      Moreover, we have recently submitted our package to CRAN and are currently working on revisions following comments. This has required a package rebranding to “SMARTTR”, as there were naming conflicts with a previously archived repository on CRAN. Currently, SMARTTR is not dependent on the WholeBrain package, which remains optional for the registration portion of our workflow. Ultimately, this independence will allow us to maintain the analysis and visualization portion of the package independently.

      In the meantime, we have fully revised our installation instructions (https://mjin1812.github.io/SMARTTR/articles/SMARTTR). SMARTTR is now downloadable from a CRAN-like repository as a bundled .tar.gz file, which should ease the burden of installation significantly. Installation has been verified on a number of different versions of R on different platforms. Again, we hope these changes are sufficient and improve the process of installation. 

      (2) The package is quite large (several thousand lines include comments and space). While impressive, this does inherently make the package more difficult to maintain - and the authors currently have not included any unit tests. The authors should add unit tests to cover a large percentage of the package to ensure code stability. 

      We have added unit testing to improve the reliability of our package. Unit tests now cover over 71% of our source code base and are available for evaluation on our github website (https://github.com/mjin1812/SMARTTR). We focused on coverage of the most front-facing functions. We appreciate this feedback, which has ultimately enhanced the longevity of our software.

      (3) Why do the authors choose to perform image segmentation outside of the SMARTTR package using ImageJ macros? Leading segmentation algorithms such as CellPose and StarMap have well-documented APIs that would be easy to wrap in R. They would likely be faster as well. As noted in the discussion, making SMARTTR a one-stop shop for multi-ensemble analyses would be more appealing to a user. 

      We appreciate this feedback. We believe parts of our response to Reviewer 1, Comment 3, are relevant to this point. Interfaces for CellPose and ClusterMap (which processes in situ transcriptomic approaches, like STARmap) are both in python, and currently there are ways to call python from within R (https://rstudio.github.io/reticulate/index.html). We will certainly explore incorporating these APIs from R. However, we would anticipate this capability is more similar to “translation” between programming languages, but would not currently preclude users from the issue of needing some familiarity with the capabilities of these python packages, and thus with python syntax.

      (4) Given the small number of observations for correlation analyses (n=6 per group), Pearson correlations would be highly susceptible to outliers. The authors chose to deal with potential outliers by dropping any subject per region that was> 2 SDs from the group mean. Another way to get at this would be using Spearman correlation. How do these analyses change if you use Spearman correlation instead of Pearson? It would be a valuable addition for the author to include Spearman correlations as an option in SMARTTR. 

      We thank reviewers for this suggestion and we have updated our code base to include the possibility for using Spearman’s correlation coefficient as opposed to Pearson’s correlation coefficient for heatmaps in the get_correlations() function. Users can now use the `method` parameter, set to either “pearson” or “spearman” and results will propagate throughout the rest of the analysis using these results.

      Below, in Author response image 1 we show a visual comparison of the correlation heat maps for active eYFP<sup>+</sup> ensembles in the CT and IS groups using both Pearson and Spearman correlations. We see a strongly qualitative similarity between the heat maps. Of course, since the statistical assumptions underlying the relationship between variables using Pearson correlation (linear) vs Spearman correlation (monotonic) are different, users should take this into account when interpreting results using different approaches.

      Author response image 1.

      Pearson and Spearmen regional correlations of eYFP+ ensembles activity in the CT and IS groups.

      (5) I see the authors have incorporated the ability to adjust p-values in many of the analysis functions (and recommend the BH procedure) but did not use adjusted p-values for any of the analyses in the manuscript. Why is this? This is particularly relevant for the differential correlation analyses between groups (Figures 3P and 4P). Based on the un-adjusted pvalues, I assume few if any data points will still be significant after adjusting. While it's logical to highlight the regional correlations that strongly change between groups, the authors should caution which correlations are "significant" without adjusting for multiple comparisons. As this package now makes this analysis easily usable for all researchers, the authors should also provide better explanations for when and why to use adjusted p-values in the online documentation for new users. 

      We appreciate the feedback note that our dataset is presented as a more demonstrative and exploratory resource for readers and, as such, we accept a high tolerance for false positives, while decreasing risk of missing possible interesting findings. As noted by Reviewer #2, it is still “logical to highlight the regional correlations that strongly change between groups.” We have clarified in our methods that we chose to present uncorrected p-values when speaking of significance. 

      We have also removed any previous recommendations for preferred methods for multiple comparisons adjustment in our function documentations, as some previous documentation was outdated. Moreover, the standard multiple comparisons adjustment approaches assume complete independence between tests, whereas this assumption is violated in our differential correlational analysis (i.e., a region with one significantly altered connection is more likely than another to have another significantly altered connection).

      Ultimately, the decision to correct for multiple comparisons with standard FDR, and choice of significance threshold, should still be informed by standard statistical theory and user-defined tolerance for inclusion of false-positives and missing of false-negatives. This will be influenced by factors, such as the nature and purpose of the study, and quality of the dataset.  

      (6) The package was developed in R3.6.3. This is several years and one major version behind the current R version (4.4.3). Have the authors tested if this package runs on modern R versions? If not, this could be a significant hurdle for potential users. 

      We thank reviewers for pointing out concerns regarding versioning. We have since updated our installation approach for SMARTTR, which is compatible with versions of R >= 3.6 and has been tested on Mac ARM-based (Apple silicon) architecture (R v4.4.2), and Windows 10 (R v3.6.3, v4.5.0 [devel]). 

      The recommendation for users to install R 3.6.3 is primarily for those interested in using our full workflow, which requires installation of the WholeBrain package, which is currently a suggested package. We anticipate updating and supporting the visualization and network analysis capabilities, whilst maintaining previous versioning for the full workflow presented in this paper.  

      (7) In the methods section: "Networks were constructed using igraph and tidygraph packages." - As this is a core functionality of the package, it would be informative to specify the exact package versions, functions, and parameters for network construction. 

      We thank reviewers for pointing out the necessity for these details for code reproducibility. We have since clarified our language in the manuscript on the exact functions we use in our analysis and package versions, which we also fully document in our online tutorial. Additionally. We have printed our package development and analysis environment online at https://mjin1812.github.io/SMARTTR/articles/Part7.Development.

      (8) On page 11, "Next, we examined the cross-correlations in IEG expression across brain regions, as strong co-activation or opposing activation can signify functional connectivity between two regions" - cross-correlation is a specific analysis in signal processing. To avoid confusion, the authors should simply change this to "correlations". 

      We thank the reviewer for pointing out this potentially confusing phrasing. We have changed all instances of “cross-correlation” to “correlation”.

      (9) Panels Q-V are missing in Figure 5 caption. 

      We thank the reviewer for pointing out this oversight. We have now fixed this in our revision.

      References

      Chon, U., Vanselow, D. J., Cheng, K. C., & Kim, Y. (2019). Enhanced and unified anatomical labeling for a common mouse brain atlas. Nature Communications, 10(1), 5067. https://doi.org/10.1038/s41467-019-13057-w

      Dong, H. W. (2008). The Allen reference atlas: A digital color brain atlas of the C57Bl/6J male mouse (pp. ix, 366). John Wiley & Sons Inc.

      Fürth, D., Vaissière, T., Tzortzi, O., Xuan, Y., Märtin, A., Lazaridis, I., Spigolon, G., Fisone, G., Tomer, R., Deisseroth, K., Carlén, M., Miller, C. A., Rumbaugh, G., & Meletis, K. (2018). An interactive framework for whole-brain maps at cellular resolution. Nature Neuroscience, 21(1), 139–149. https://doi.org/10.1038/s41593-017-0027-7

      Yates, S. C., Groeneboom, N. E., Coello, C., Lichtenthaler, S. F., Kuhn, P.-H., Demuth, H.-U., Hartlage-Rübsamen, M., Roßner, S., Leergaard, T., Kreshuk, A., Puchades, M. A., & Bjaalie, J. G. (2019). QUINT: Workflow for Quantification and Spatial Analysis of Features in Histological Images From Rodent Brain. Frontiers in Neuroinformatics, 13. https://www.frontiersin.org/articles/10.3389/fninf.2019.00075

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important work by Park et al. introduces an open-top two-photon light sheet microscopy (OT-TP-LSM) for lesser invasive evaluation of intraoperative 3D pathology. The authors provide convincing evidence for the effectiveness of this technique in investigating various human cancer cells. The paper needs some minor corrections and has the potential to be of broad interest to biologists and, specifically, pathologists utilizing 3D optical microscopy.

      We would like to thank the editor for the positive general comment. We revised the manuscript by addressing the reviewers' comments.

      Public Reviews:

      Reviewer1

      Summary:

      A2. This manuscript presents the development of a new microscope method termed "open-top two-photon light sheet microscopy (OT-TP-LSM)". While the key aspects of the new approach (open-top LSM and Two-photon microscopy) have been demonstrated separately, this is the first system of integrating the two. The integration provides better imaging depth than a single-photon excitation OT-LSM.

      Strengths:

      The use of liquid prism to minimize the aberration induced by index mismatching is interesting and potentially helpful to other researchers in the field.

      • The use of propidium iodide (PI) provided a deeper imaging depth.

      Weaknesses:

      Details are lacking on imaging time, data size, the processing time to generate large-area en face images, and inference time to generate pseudo H&E images. This makes it difficult to assess how applicable the new microscope approach might be in various pathology applications.

      B2. We would like to thank the reviewer for the critical and positive comments. We agree with the reviewer that detailed information such as processing time is missing.

      The imaging time and data size were estimated per 1cm2 area and they were 7 min and 318 GB (= (7 × 60) s × 400 fps × (1850 × 512 × 2) byte) for each channel, respectively. The time for processing en-face images was relatively long by taking ~1.7 s Gb−1 after loading the image dataset at ~6.8 s Gb−1 in the current setting and needs to be shortened for intraoperative application. The time for converting OT-TP-LSM images of 512 x 512 pixels into virtual H&E staining images was 160 ms. This study was to address the current limitation of 3D pathology such as imaging depth and to develop the image processing to generate virtual H&E images. Further development such as speeding up the image processing would be needed. We added missing information and included some discussion on limitations of the new system and further development for intraoperative applications.

      C1-1. Revised manuscript, Discussion, pages 14-15 and lines 320-328

      Although OT-TP-LSM enabled high-speed 3D imaging, the post-processing time of the OT-TP-LSM image datasets was relatively long due to the large data size, sequential processing of dual channel images, and manual stitching. The long post-processing time needs to be resolved for intraoperative applications. To speed up processing, these processing steps can be performed using field-programmable gate array (FPGA)-based data acquisition with graphics processing unit (GPU)-based computing. The processing time can be further reduced by coding the algorithm in a C++-based environment. Furthermore, ImageJ-based software such as the Bigstitcher plugin can be used for automatic 3D image processing [44].

      C1-2. Revised manuscript, Materials and methods, Image acquisition and post-processing, page 17 and lines 390-398

      Image acquisition and post-processing

      Raw image datasets from dual sCMOS cameras were acquired and processed on a workstation with 128 Gb RAM and a 2 TB SSD drive. The imaging time and data size per 1cm2 area with 400 fps was 7 min and 318 GB (= (7 × 60) s × 400 fps × (1850 × 512 × 2) byte) for each channel, respectively. The raw image strip was sheared at 45° with respect to the sample surface, and a custom image processing algorithm was used to transform the image data in the XYZ coordinate. The processing for en-face image was conducted in MATLAB and took ~1.7 s Gb−1 after loading the image dataset at ~6.8 s Gb−1 in the current laboratory setting. Mosaic images were generated by joining the image strips manually.

      C1-3. Revised manuscript, Materials and methods, Virtual H&E staining of OT-TP-LSM via deep learning network, page 18 and lines 414-418

      The CycleGAN training and testing were performed using a Nvidia GeForce RTX 3090 with 24 GB RAM. The network was implemented using Python version 3.8.0 on a desktop computer with a Core i7-12700K CPU@3.61 GHz and 64 GB RAM, running Anaconda (version 22.9.0). The inference time for converting OT-TP-LSM patch image into virtual H&E patch image was measured as 160 ms.

      Reviewer 2

      Summary:

      A2. In this manuscript, the authors developed an open-top two-photon light sheet microscopy (OT-TP-LSM) that enables high-throughput and high-depth investigation of 3D cell structures. The data presented here shows that OT-T-LSM could be a complementary technique to traditional imaging workflows of human cancer cells.

      Strengths:

      High-speed and high-depth imaging of human cells in an open-top configuration is the main strength of the presented study. An extended depth of field of 180 µm in 0.9 µm thickness was achieved together with an acquisition of 0.24 mm2/s. This was confirmed by 3D visualization of human cancer cells in the skin, pancreas, and prostate.

      Weaknesses:

      The complementary aspect of the presented technique in human pathological samples is not convincingly presented. The traditional hematoxylin and eosin (H&E) staining is a well-established and widely used technique to detect human cancer cells. What would be the benefit of 3D cell visualization in an OT-TP-LSM microscope for cancer detection in addition to H&E staining?

      B2. We would like to thank the reviewer for the critical and positive comments. 3D pathology has been a long-standing research direction. The current pathology is 2D by examining H&E histology slides which were generated by thin sectioning biopsied and surgical specimens at different depths. The reliability of the pathological diagnosis suffers from under sampling of specimens. Although 3D pathology is possible by serial thin-sectioning, imaging, and then combining the images in 3D, it is not practice for clinical use due to the required labor and time.

      We demonstrated the advantages of OT-TP-LSM in various human cancer tissues. The relatively high imaging depths of OT-TP-LSM enabled the nondestructive visualization of detailed 3D cell structures with high contrast and without distortion and allowed a distinction between cancer and normal cell structures as well as the detection of cancer invasiveness within tissues. We revised the manuscript to explain the benefits of 3D pathology with OT-TP-LSM.

      C2-1. Revised manuscript, Results, 3D OT-TP-LSM imaging of human skin cancers, pages 8-9 and lines 176-180

      Using 3D visualization, normal glandular structures in the dermis were distinguished from BCC tumor nests (Video 1). Both eccrine and sebaceous glands could appear similar to BCC nests in 2D images at certain depths. Hence, nondestructive 3D visualization of cell structures would be important for distinguishing them, serving as a complement to the traditional 2D H&E images.

      C2-2. Revised manuscript, Results, 3D OT-TP-LSM imaging of human pancreatic cancers, pages 10-11 and lines 222-232

      Magnified images of ROI 1 (PDAC) at two different depths showed irregularly shaped glands with sharp angles and 3D structural complexity including unstable bridging structure inside (Figure 4B). An irregular and distorted architecture amidst desmoplastic stroma is one of the important diagnostic factors for PDAC [35]. The cancer glands exhibited disorganized cancer cell arrangement with nuclear membrane distortion. Magnified images of ROI 2 showed both nonneoplastic ducts and cancer glands in different cell arrangements (Figure 4C). The nonneoplastic ducts showed single-layered epithelium with small, evenly distributed cells expressing relatively high nuclear fluorescence. Cancer glands, on the other hand, had disorganized and multilayered structure with large nuclei. OT-TP-LSM visualized the 3D invasiveness of cancer glands within tissues nondestructively, which could not be identified from limited 2D information.

      C2-3. Revised manuscript, Results, 3D OT-TP-LSM imaging of human prostatic cancers, page 11 and lines 251-252

      OT-TP-LSM provided histological 3D information equivalent to that of the H&E stained image without the need for sectioning.

      C2-4. Revised manuscript, Discussion, page 12 and lines 274-276

      OT-TP-LSM was developed for the rapid and precise nondestructive 3D pathological examination of excised tissue specimens during both biopsy and surgery, as a compliment to traditional 2D H&E pathology by visualizing 3D cell structures.

      C2-5. Revised manuscript, Discussion, page 13 and lines 284-288

      The relatively high imaging depths of OT-TP-LSM enabled the nondestructive visualization of detailed 3D cell structures with high contrast and without distortion and allowed a distinction between cancer and normal cell structures as well as the detection of cancer invasiveness within tissues. These have been challenging with 2D histological images.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest the following points to the authors to enhance the readability of the manuscript and to provide a strong narrative to explain their findings:

      A3. Line 54: For the non-expert readers, please provide more background information about the histopathology before introducing the hematoxylin and eosin staining.

      B3. We would like to thank the reviewer for the comment. As suggested by the reviewer, we added information about the current standard method of histopathological examination and its limitations.

      C3. Revised manuscript, introduction, page 4 and lines 56-64 Precise intraoperative cancer diagnosis is crucial for achieving optimal patient outcomes by enabling complete tumor removal. The standard method is the microscopic cellular examination of surgically excised specimens following various processing steps, including thin sectioning and hematoxylin and eosin (H&E) cell staining. However, this examination method is laborious and time-consuming. Furthermore, it has inherent artifacts that disturb accurate diagnosis, including tissue loss, limited two-dimensional (2D) information, and sampling error [1]. High-speed three-dimensional (3D) optical microscopy, which can visualize cellular structures without thin sectioning, holds promise for nondestructive 3D pathological examination as a complement of 2D pathology limitation [1-4].

      A4. Line 66 and 71: Please briefly introduce the cited studies to give some information about the previous studies. This will help to reader to understand the innovative aspects of your study.

      B4. We would like to thank the reviewer for the comment. As suggested by the reviewer, we added a brief introduction about the cited studies.

      C4. Revised manuscript, introduction, pages 4-5 and lines 71-82

      As a deep tissue imaging method, two-photon microscopy (TPM) has been used in both biological and optical biopsy studies [17-19]. TPM is based on nonlinear two-photon excitation of fluorophores and achieves high imaging depths down to a few hundred micrometers by using long excitation wavelengths, which reduce light scattering. Moreover, TPM provides additional intrinsic second harmonic generation (SHG) contrast for visualizing collagen fibers within the extracellular matrix (ECM). This feature proved advantageous for high-contrast imaging of cancer tissue and microenvironmental analysis [20-22]. However, TPM has low imaging speeds due to point scanning-based imaging. To address this limitation, two-photon LSM (TP-LSM) techniques were developed for high-speed imaging [23-27]. Although TP-LSM facilitated rapid 3D imaging of cancer cells and zebrafish, its applications were limited to small samples and biological studies due to geometric limitations.

      A5. Line 72: Please mention the importance and benefit of having an open-top configuration. I think this is one of the key aspects that provide a high imaging depth in OT-LP-LSM.

      B5. We would like to thank the reviewer for the comment. Conventional LSM techniques including TP-LSM have a configuration in which the illumination objective is oriented in the horizontal plane and imaging is performed with orthogonally arranged objectives. However, this geometry limited lateral sample size physically and it is unsuitable to image centimeter-scale large tissue. Therefore, we developed OT-TP-LSM for 3D large tissue examination. High imaging depths were achieved with long excitation wavelengths and long emission wavelengths of fluorophores. The open-top configuration does not contribute to the improvement of imaging depth. We revised the manuscript to explain the need for open-top configuration.

      C5. Revised manuscript, introduction, page 5 and lines 82-86

      Conventional TP-LSM had a configuration of a horizontally oriented illumination objective and a vertically oriented imaging objective. This geometry imposed limitations on the sample size, rendering it unsuitable for the examination of centimeter-scale specimens. TP-LSM with open-top configuration is needed for 3D histological examination.

      A6. Line 78: It would be nice to clearly quantify the imaging depth here.

      B6. We would like to thank the reviewer for the comment. Although we considered entering the quantitative imaging depth of OT-TP-LSM in the introduction section, we decided that it would be appropriate to present the quantitative imaging depth in the Results section and discuss it in the Discussion section.

      A7. Line 146: Please clearly explain the reason why the upper layers are not resolved.

      B7. We would like to thank the reviewer for the comment and we are sorry for the missing information. The skin epidermis has various cell layers and superficial layers are composed of less rounded and flat cells with relatively small cytoplasm. Therefore, cells in that layer could be difficult to resolve with the current system resolution because there is little space between nuclei. Additionally, strong autofluorescence signal in the stratum corneum could be the reason for preventing visualization of the cells in the superficial layer. We revised the manuscript to explain the reasons in detail.

      C7. Revised manuscript, Results, 3D OT-TP-LSM imaging of human skin cancers, page 8 and lines 159-163

      Keratinocytes in the basal layer were relatively large and individually resolved, while those in the upper layers were unresolved and appeared as a band. It could be attributed to the upper layers being comprised of flat cells with relatively small cytoplasm, resulting in little space between nuclei. Additionally, strong autofluorescence signal in the stratum corneum might prevent visualization of the cells in the superficial layer.

      A8. Line 253: Please explain the importance of visualization of 3D cell structures in cancer pathology. I think this should be stated clearly throughout the text as it is the key component of OT-LP-LSM to complement the traditional H&E staining. Also, referring to the non-destructive manner of your technique would help to emphasize this point.

      B8. We would like to thank the reviewer for the comment. As answered in A2, the current H&E histological examination has inherent limitations due to limited 2D information and sampling errors. To resolve this, OT-TP-LSM was developed for the visualization of 3D cell structures nondestructively as a complement to traditional slide-based 2D pathology. We demonstrated the advantages of OT-TP-LSM in various human cancer tissues. The relatively high imaging depths of OT-TP-LSM enabled the nondestructive visualization of detailed 3D cell structures with high contrast and without distortion and allowed a distinction between cancer and normal cell structures as well as the detection of cancer invasiveness within tissues. We revised the manuscript to explain the benefits of 3D pathology with OT-TP-LSM.

      C8. Please refer to the answer in C2-1 – C2-5.

      A9. Figures: Please clearly mark the cancer regions in the images as indicated in Figure 5. It will help the reader to easily compare the healthy and invaded tissue parts.

      B9. We would like to thank the reviewer for the comment. We confirmed that the cancer area is not marked in Figure 4 of the pancreatic cancer tissue. We modified Figure 4 to mark the cancer region. Additionally, Figure 2 of the skin cancer tissue was also modified in this regard.

      C9. Modified Figure 2 and Figure 4.

      Author response image 1.

      Author response image 2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      This research used cell-based signaling assay and Gaussian-accelerated molecular dynamics (GaMD) to study peptide-mediated signaling activation of Polycystin-1 (PC1), which is responsible for the majority of autosomal dominant polycystic kidney disease (ADPKD) cases. Synthetic peptides of various lengths derived from the N-terminal portion of the PC1 C-terminal fragment (CTF) were applied to HEK293T cells transfected with stalkless mouse CTF expression construct. It was shown that peptides including the first 7, 9, and 17 residues of the N-terminal portion could activate signaling to the NFAT reporter. To further understand the underlying mechanism, docking and peptide-GaMD simulations of peptides composed of the first 9, 17, and 21 residues from the N-terminal portion of the human PC1 CTF were performed. These simulations revealed the correlation between peptide-CTF binding and PC1 CTF activation characterized by the close contact (salt bridge interaction) between residues R3848 and E4078. Finally, a Potts statistical model was inferred from diverged PC1 homologs to identify strong/conserved interacting pairs within PC1 CTF, some of which are highly relevant to the findings from the peptide GaMD simulations. The peptide binding pockets identified in the GaMD simulations may serve as novel targets for the design of therapeutic approaches for treating ADPKD.

      We greatly appreciate the reviewer’s encouraging and positive comments. The reviewer’ specific comments are addressed pointwise below and changes to the text will be highlighted in yellow in the revised manuscript.

      (1) The GaMD simulations all include exogenous peptides, thus lacking a control where no such peptide is present (and only stalkless CTF). An earlier study (PNAS 2022 Vol. 119 No. 19 e2113786119) covered this already, but it should be mentioned here that there was no observation of close/activation for the stalkless CTF.

      We appreciate the reviewer’s concern about the lack of a control where no exogenous peptide is present. As suggested by the reviewer, we are adding more details about the study on the stalkless CTF as a control in the Introduction of the revised manuscript. 

      (2) Although 5 independent trajectories were generated for each peptide, the authors did not provide sufficient details regarding the convergence of the simulation. This leaves some uncertainties in their results. Given that the binding poses changed relative to the starting docked poses for all three peptides, it is possible that some other binding pockets and/or poses were not explored.

      We appreciate the reviewer’s comment regarding the convergence of the simulation results. This is clarified in the revised manuscript as: 

      “We have calculated free energy profiles of individual simulations for each system, including the p9, p17, and p21, as shown below (Figs. S5, S6 and S8). For the p9 peptide, the “Bound” lowenergy state was consistently identified in the 2D free energy profile of each individual simulation (Fig. S5). For the p17 peptide, Pep-GaMD simulations were able to refine the peptide conformation from the "Unbound” to the "Intermediate” and “Bound” states in Sim1 and Sim5, while the peptide reached only the "Intermediate” state in the other three simulations (Fig. S6). For the p21 peptide, Pep-GaMD was able to refine the peptide docking conformation to the

      "Bound” state in all the five individual simulations (Fig. S8).”

      “It is important to note that the free energy profiles calculated from GaMD simulations of PC1 CTF were not fully converged since certain variations were observed among the individual simulations. Nevertheless, these calculations allowed us to identify representative low-energy binding conformations of the peptides.”

      (3) The free energy profiles (Figures 2 to 4) based on the selected coordinates provide important information regarding binding and CTF conformational change. However, it is a coarsegrained representation and complementary analysis such as RDFs, and/or contact maps between the peptide and CTF residues might be helpful to understand the details of their interactions. These details are currently only available in the text. 

      Following the reviewer's suggestion, we have now included a set of protein contact maps showing contacts between the peptides and the TOP domain for each peptide in the representative "Bound” state in revised Supplementary Information (Fig. S4). The contact maps serve to visualize the list of contacts mentioned in the main text. This will be clarified in the revised manuscript.

      (4) The use of a stalkless CTF is necessary for studying the functions of the exogenous peptides. However, the biological relevance of the stalkless CTF to ADPKD was not clearly explained, if any.

      We appreciate the reviewer’s comment. As correctly assessed by the reviewer, the stalkless CTF is not a biological form of PC1 observed in ADPKD, but rather was used as the simplest or least complex system in which the activities and binding of exogenous peptides could be studied. However, in ADPKD, there are numerous missense mutations reported within the GPCR autoproteolysis-inducing (GAIN) domain that have been shown to prevent or inhibit cleavage at the GPCR-coupled proteolysis site (GPS). Loss of PC1 GPS cleavage, which is known to cause ADPKD, would retain or sequester the stalk tethered agonist within the interior of the GAIN domain, which would presumably interfere with interactions between stalk tethered agonist residues and the remainder of the CTF. Furthermore, there are 10 single nucleotide polymorphisms reported within the stalk sequence (ADPKD Variant Database; https://pkdb.mayo.edu/welcome), most of which we have found to significantly reduce CTF-mediated activation of the NFAT reporter (Magenheimer BS, et al., Constitutive signaling by the C-terminal fragment of polycystin1 is mediated by a tethered peptide agonist; bioRxiv 2021.08.05.455255). In particular, the ADPKD-associated G3052R stalk mutation that was analyzed along with the stalkless CTF by GaMD simulations (Pawnikar et al, PNAS, 2022) has the same reduction in activity as the stalkless CTF in the cellular signaling reporter assays and the same loss of closed conformation interactions in GaMD analyses. As such, we believe the stalkless CTF has biological relevance from the aspect that it mimics the deficiency in signaling activation observed for PC1 CTF stalk mutants. This is clarified in the revised manuscript in the Introduction, page 5, “constructs encoding a stalkless PC1 CTF (a nonbiological mutant of PC1 with deletion of the first 21 N-terminal residues of CTF) and three ADPKD-associated…”) and near the beginning of the Discussion, page 16, where the biological relevance of studying the stalkless CTF is explained

      (5) The authors might want to clarify if a stalkless CTF is commonly seen in ADPKD, or if it is just a construct used for this study.

      The stalkless CTF is not a biological form of PC1, but rather a construct used for this study. This was clarified in the revised manuscript (see response above).

      (6) (Pages 7-8) "...we generated expression constructs of mouse (m) PC1 consisting of the CD5 signal peptide sequence fused in frame with the stalk sequence of mCTF ...". What is the CD5 signal peptide sequence here? What is its use?

      The CD5 signal peptide sequence is “MPMGSLQPLATLYLLGMLVASVLG” from the T cell surface glycoprotein, CD5. Since the N-terminus of PC1 CTF is derived from a posttranslational, autocatalytic, endoproteolytic cleavage event, this isoform is already membraneembedded and therefore lacks its endogenous signal peptide. The CD5 signal peptide coding sequence is added to the PC1 CTF expression constructs in order to ensure translation and insertion of the encoded protein at the endoplasmic reticulum. Additional details were added to the Experimental Procedures, page 2 of Supporting Information.

      (7) (Page 8) "All peptides were appended with a C-terminal, 7-residue hydrophilic sequence (GGKKKKK) to increase solubility". How did the authors make sure that this sequence has no influence on the signaling? 

      To determine the possible effect of the hydrophilic GGKKKKK sequence on signaling, we had a ‘solubility tag’ peptide (LGGKKKKK) synthesized and purified by GenScript. It was necessary to add an N-terminal Leu residue to the 7-residue hydrophilic tag sequence in order for the highly hydrophilic peptide to be recovered. Effect of treatment with the solubility tag peptide on activation of the NFAT reporter was assessed for both empty vector- and ∆stalkCTF-transfected cells in 3 separate signaling experiments (see figure below). Each experiment also included a negative control treatment (no peptide/culture medium only addition) and a positive control treatment (stalk peptide p17). The p17 peptide we had available was derived from the stalk sequence of human PC1 that differs from the mouse PC1 sequence at residues 15 and 17, which are two poorly conserved positions within the stalk sequence (see Reviewer 2, Response 3). In the first experiment with the solubility tag and human p17 peptides (B in figure below), we inadvertently used the empty expression vector and ∆stalkCTF expression construct from mouse PC1. After realizing our error, we then performed 2 additional signaling experiments (C and D in figure below) with the ‘correct’ human ∆stalkCTF expression construct and empty vector. In the revised manuscript, we have provided the results from each of the 3 experiments as Fig. S2 (below).

      (8) (Page 9) "Using a computational model of the ΔStalk PC1 CTF developed previously". The authors might want to expand here a little to give a short review about the structure preparation.

      We appreciate the reviewer’s suggestion regarding the addition of details for structure preparation for Stalkless CTF. We have added these details in section “Docking and Pep-GaMD simulations of peptide agonist binding to stalkless PC1 CTF” on Page 10 in the revised manuscript:  “The cryo-EM structure of human PC1-PC2 complex (PDB: 6A70) was used to build the computational model for WT PC1 CTF. As the protein had several missing regions including the Stalk and several loops, homology modeling of the missing regions was done using I-TASSER web server. Using the WT PC1 CTF model, computational model for ΔStalk was generated by deleting the first 21 residues (3049-3069) of the WT PC1 and using the structure for stalkless CTF, we successfully docked the p9, p17 and p21 stalk peptides with HPEPDOCK.  The peptides all bound to the TOP domain and the interface between the TOP domain and extracellular loop 1 (ECL1) of CTF.”

      (9) How was "contact" defined when counting the number of contacts used in the 2D PMFs (Figures 2-4). Response: We appreciate the reviewer’s comment regarding the definition of the number of contacts used in the 2D PMFs. This has been clarified in the revised manuscript as: “The number of contacts is calculated between any atom pairs within 4 Å distance of the peptide and extracellular domains of PC1 protein.”

      (10) How was the ranking of GaMD clusters done? It looks from Figure 3A that the "intermediate" state is more favorable compared to the "bound" state, but it was claimed in the text the "bound" state was ranked 1st. 

      Thanks to the reviewer for this comment. It has been clarified in the revised

      Supplementary Information: “Three independent Pep-GaMD simulations were combined to perform structural clustering using the hierarchical agglomerative clustering algorithm in CPPTRAJ. A 3 Å RMSD cutoff was used for each peptide system. PyReweighting was then applied to calculate the original free energy values of each peptide structural cluster with a cutoff of 500 frames. The structural clusters were finally ranked according to the reweighted free energy values.” And in the revised main text: “It is important to note that the free energy profiles calculated from GaMD simulations of PC1 CTF were not fully converged since certain variations were observed among the individual simulations. The free energy values of 2D PMF minima shown in Figure 3A could differ from those in the 1D PMF minima of peptide structural clusters, especially with the usage of distinct reaction coordinates. Nevertheless, these calculations allowed us to identify representative low-energy binding conformations of the peptides.”

      (11) When mentioning residue pair distances, such as in the sentence "The distance between the TOP domain residue R3848 and PL residue E4078 was 3.8 Å (Fig. 4D)" on page 12, it should be clarified if these distances are average distance, or a statistical error can be given.

      We appreciate the reviewer’s comment regarding the TOP Domain and PL distance between residues R3848-E4078. This has been clarified on page 14 in the revised manuscript as:

      “The distance between the TOP domain residue R3848 and PL residue E4078 was 3.8 Å. The distance was extracted from the top-ranked structural cluster of the p21 bound to the ΔStalk CTF, corresponding to the “Closed/Active” low-energy conformational state. (Fig. 4E)”.

      (12) More analysis of the GaMD can be performed. For example, the authors observed a single "bound" state for p21, but there must be some flexibility in the peptide and the protein itself. The authors might want to consider adding some plots illustrating the flexibility of the peptide residues (for example, a RMSD plot). Contact maps can also be added to visualize the results currently discussed in the text. 

      We thank the reviewer for their constructive suggestions. To characterize flexibility of the peptide and protein in the revised manuscript, we have added plots of the TOP-PL interaction distance between residues R3848-E4078 in PC1, the radius of gyration (Rg) of p21 and root-mean square deviation (RMSD) of p21 relative to the starting HPEPDOCK conformation of the peptide in the new Fig. S7. The peptide-protein contact map has also been added in the new Fig. S4.

      (13) (Page 7) In the sentence `...sampled the "Closed/Active" low-energy state relative to the large number of Stalk-TOP contacts`, I suggest using "related to" instead of "relative to"

      We thank the reviewer for the comment, and we have replaced "relative to" to “related to” in the following sentence `...sampled the "Closed/Active" low-energy state relative to the large number of Stalk-TOP contacts`

      (14) (Page 7) In the sentence `Our previous study utilized expression constructs of human PC1 CTF, however, in order to prepare for ...`, "PC1 CTF, however," -> "PC1 CTF. However,"

      We thank the reviewer for the comment, and we have replaced "PC1 CTF, however," to "PC1 CTF. However," in the following sentence `Our previous study utilized expression constructs of human PC1 CTF, however, in order to prepare for ...`.

      Reviewer 2:

      The autosomal dominant polycystic kidney disease (ADPKD) is a major form of polycystic kidney disease (PKD). To provide better treatment and avoid side effects associated with currently available options, the authors investigated an interesting GPCR, polycystin-1 (PC1), as a potential therapeutic target. In vitro and in silico studies were combined to identify peptide agonists for PC1 and to elucidate their roles in PC1 signaling. Overall, regarding the significance of the findings, this work described valuable peptide agonists for PC1 and the combined in vitro and in silico approach can be useful to study a complex system like PC1. However, the strength of the evidence is incomplete, as more experiments are needed as controls to validate the computational observations. The work appears premature.

      We greatly appreciate the reviewer’s encouraging and positive comments. The reviewer’ specific comments are addressed pointwise below and changes to the text will be highlighted in yellow in the revised manuscript.

      (1) The therapeutic potential of PC1 peptide agonists is unclear in the introduction. For example, while the FDA-approved drug Jynarque was mentioned, the text was misleading as it sounded like Jynarque targeted PC1. In fact, it targets another GPCR, the vasopressin receptor 2 (V2). A clear comparison of targeting PC1 over V2 pathways and their therapeutic relevance can help the readers better understand the importance of this work. Importantly, a clear background on the relationship between PC1 agonism and treatments for ADPKD is necessary.

      We understand the confusion that was caused by the brevity of our introductory paragraph and will clarify the differences in therapeutic targeting between Jynarque and our PC1 stalk-derived peptides in the revised manuscript. We will also expound on the rationale for targeting PC1 agonism as a therapeutic approach for ADPKD versus Jynarque. For example: It is known that ADPKD disease severity is dependent on the functional levels of PC1. Jynarque is a small molecule antagonist of the arginine vasopressin receptor 2, V2R, whose signaling, and production of cAMP has been shown to be increased in ADPKD. As this drug targets one of the downstream aberrant pathways, it is only capable of slowing disease progression and has numerous undesirable side effects. We reasoned that a therapeutic agent capable of stimulating and thus augmenting PC1 signaling function would be a safer, cyst initiation-proximal treatment capable of preventing cyst formation with few side effects.

      (2) PC1 is a complex membrane protein, and most figures focus on the peptide-binding site. For general readers (or readers that did not read the previous PNAS publication), it is hard to imagine the overall structure and understand where the key interactions (e.g., R3848-E4078) are in the protein and how peptide binding affects locally and globally. I suggest enhancing the illustrations.

      We thank the reviewer for the constructive comment on adding more illustrations for the PC1 protein to understand the overall structure and the location of the key interaction R3848E4078. We have included these suggestions and modified the main figures in the revised manuscript.  

      (3) The authors used the mouse construct for the cellular assays and the peptide designs in preparation for future in vivo assays. This is helpful in understanding biology, but the relevance of drug discovery is weakened. Related to Point 1, the therapeutic potential of PC1 peptide agonist is largely missing.

      The therapeutic potential of a PC1 peptide agonist is addressed in response #1 above. As mentioned in the manuscript and recognized by the reviewer, the cellular signaling assays were performed with the mouse PC1 CTF expression construct and with peptides based on the mouse PC1 stalk sequence for future, pre-clinical studies, while the peptide binding studies were performed with the human PC1 stalk sequence. We feel the relevance for drug discovery is not significantly weakened for a number of reasons: 1) as shown in Fig. 1A, the stalk sequence is highly conserved between mouse and human PC1, specifically there are only 2 residue differences present within peptides p17 and p21. One of the differences is a ‘semi-conservative’ Gln-Arg substitution at peptide residue 15, while the second difference is a conservative Ile-Val substitution at peptide residue 17; 2) we have found that an Arg to Cys mutation within the mouse PC1 CTF stalk has the same effect on signaling as the corresponding human Gln to Cys ADPKD-associated mutation which was analyzed in Pawnikar et al., 2022; and 3) both peptide residues 15 and 17 represent highly variable positions within the PC1 stalk as shown in the sequence logo (below) of the stalk sequence from 16 vertebrate species; and 4) while addressing the potential effect of the hydrophilic solubility tag on stalk peptide-mediated rescue of CTF∆stalk signaling (see Reviewer 1 comments, point #7), we utilized the ‘human’ version of p17 as a positive control and tested its activation with both mouse and human CTF∆stalk expression constructs and found that human p17 peptide was also capable of stimulating the mouse CTF∆stalk protein (Fig. S2).

      Author response image 1.

      (4) More control experiments are needed. For example, a 7-residue hydrophilic sequence (GGKKKKK) is attached to the peptide design to increase solubility. This 7-residue peptide should be tested for PC1 activation as a control. Second, there is no justification for why the peptide design must begin with residue T3041. Can other segments of the stalk also be agonists?

      As mentioned above for Reviewer 1, the hydrophilic peptide has been synthesized and tested for activation of signaling by the stalkless CTF in the revised manuscript as Fig. S2. The design of peptides that begin with residue T3041 of mouse PC1 CTF is modeled on numerous similar studies for the family of adhesion GPCRs. Optimization of the binding and activity of the PC1 peptide agonist will be investigated in future studies and could include such parameters as whether the peptide must include the first residue and whether subsegments of the stalk are also agonists, however, we feel these questions are beyond the scope of this initial report.

      (5) There are some major concerns about the simulations: The GaMD simulations showed different binding sites of p-21, p-17, and p-9, and the results report the simulated conformations as "active conformational states". However, these are only computational findings without structural biology or mutagenesis data to validate. Further, neither docking nor the simulation data can explain the peptide SAR. Finally, it will be interesting if the authors can use docking or GaMD and explain why some peptide designs (like P11-P15) are less active (as control simulations).

      The reviewer brings up an important observation regarding differences in binding sites between peptides p9, p17 and p21. We will include discussion of this observation and our interpretations to the revised manuscript. While the present study is focused on identification of initial peptides that are able to activate the PC1 CTF, we shall include further mutation experiments and simulations, peptide SAR and optimization of the lead peptides in future studies. This has been clarified in the revised manuscript.

      (6) Additional experiments for the controls and for validating the simulations. Additional simulations to explain the SAR.

      We appreciate the reviewer’s comment for additional experiments for the controls and additional simulations to explain the SAR. For future studies, we shall include further mutation experiments and simulations, peptide SAR and optimization of the lead peptides.

      (7) What is the selectivity of the peptides between PC1 and PC2?

      We have not tested the selectivity of the peptides for PC1 versus PC2 primarily because transfection of PC2 does not activate the NFAT reporter. However, it is possible that co-transfection of PC2 with the PC1 CTF could alter stalk peptide binding. This will be important to consider in future studies.

      Reviewer 3:

      The authors demonstrate the activation of Polycystin-1 (PC1), a G-protein coupled receptor, using small peptides derived from its original agonist, the stalk TA protein. In the experimental part of the study, the authors performed cellular assays to check the peptide-induced reactivation of a mutant form of PC1 which does not contain the stalk agonist. The experimental data is supported by computational studies using state-of-the-art Gaussian accelerated Molecular Dynamics (GaMD) and bioinformatics analysis based on sequence covariance. The computer simulations revealed the mechanistic details of the binding of the said peptides with the mutant PC1 protein and discovered different bound, unbound, and intermediate conformations depending on the peptide size and sequence. The use of reliable and well-established molecular simulation algorithms and the physiological relevance of this protein autosomal dominant polycystic kidney disease (ADPKD) make this work particularly valuable.

      We greatly appreciate the reviewer’s encouraging and positive comments. The reviewer’ specific comments are addressed pointwise below and changes to the text will be highlighted in yellow in the revised manuscript.

      (1) No control has been used for the computational (GaMD) study as the authors only report the free energy surface for 3 highly agonistic peptides but for none of the other peptides that did not induce an agonistic effect. Therefore, in the current version, the reliability of the computational results is not foolproof.

      We appreciate the reviewer’s concern about the lack of control with the other peptides that did not induce an agonistic effect. To address the reviewer’s concern, we have included more details on the study of the stalkless CTF and the solubility tag peptide (Fig. S2) as controls in the revised manuscript.

      (2) All discussions about the residue level interactions focused only on geometric aspects (distance, angle, etc) but not the thermodynamic aspect (e.g. residue-wise interaction energy). Considering they perform a biased simulation; the lack of interaction energy analysis only provides a qualitative picture of the mechanism.

      As mentioned by the reviewer, we have added MM/PBSA analysis results in the revised manuscript and SI.

      Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) analysis was performed to calculate the binding free energies of peptides p9, p17 and p21 to PC1 CTF. The analysis was performed using the trajectory in which the peptide was bound to the receptor. In MM/PBSA, the binding free energy of the ligand (L) to the receptor (R) to form the complex (RL) is calculated as:

      where GRL is the Gibbs free energy of the complex RL, GR is the Gibbs free energy of the molecule R in its unbound state and GL is the Gibbs free energy of the molecule L in its unbound state, respectively. 

      𝛥𝐺𝑏𝑖𝑛𝑑 can be divided into contributions of different interactions as:

      in which

      where ΔEMM , ΔGsol , 𝞓H and −TΔS are the changes in the gas-phase molecular mechanics (MM) energy, solvation free energy, enthalpy and conformational entropy upon ligand binding, respectively. ΔEMM includes the changes in the internal energies ΔEint (bond, angle and dihedral energies), electrostatic energies ΔEelec , and the van der Waals energies ΔEvdW. ΔGsol is the sum of the electrostatic solvation energy ΔGPB/GB (polar contribution) and the nonpolar contribution ΔGSA between the solute and the continuum solvent. The polar contribution is calculated using either the Poisson Boltzmann (PB) or Generalized Born (GB) model, while the nonpolar energy is usually estimated using the solvent-accessible surface area (SASA) where 𝞬 is surface tension coefficient and b is the constant offset. The change in conformational entropy −TΔS is usually calculated by normal-mode analysis on a set of conformational snapshots taken from MD simulations. However, due to the large computational cost, changes in the conformational entropy are usually neglected as we were concerned more on relative binding free energies of the similar peptide ligands.

      MM/PBSA analysis was performed using the gmx_MMPBSA software with the following command line:

      gmx_MMPBSA -O -i mmpbsa.in -cs com.tpr -ci index.ndx -cg 1 13 -ct com_traj.xtc -cp topol.top -o FINAL_RESULTS_MMPBSA.dat -eo FINAL_RESULTS_MMPBSA.csv Input file for running MM/PBSA analysis:

      &general

      sys_name="Prot-Pep-CHARMM",

      startframe=1, endframe=200, # In gmx_MMPBSA v1.5.0 we have added a new PB radii set named charmm_radii. 

      # This radii set should be used only with systems prepared with CHARMM force fields. 

      # Uncomment the line below to use charmm_radii set

      # PBRadii=7,

      /

      &pb

      # radiopt=0 is recommended which means using radii from the prmtop file for both the PB calculation and for the NP

      # calculation

      istrng=0.15, fillratio=4.0, radiopt=0

      The relative rank of the overall peptide binding free energies (Table S1) was consistent with the experimental signaling data, i.e., p21>p9>p17, for which p21 showed the largest binding free energy value of binding (-40.29±6.94 kcal/mol).

      (3) It is not mentioned clearly whether the reader should interpret the free energy landscapes quantitatively or qualitatively. Considering no error analysis or convergence plots are reported for the GaMD free energy surfaces, it may be assumed the results are qualitative. The readers should consider this caveat and not try to quantitatively reproduce these free energy landscapes with other comparable techniques.

      We appreciate the reviewer’s comment whether the free energy landscapes should be interpreted quantitatively or qualitatively. The presented free energy landscapes could be considered semi-quantitative since the simulations are not fully converged. This will be clarified in the revised manuscript as: “It is important to note that the free energy profiles calculated from GaMD simulations of PC1 CTF were not fully converged since certain variations were observed among the individual simulations. Nevertheless, these calculations allowed us to identify representative low-energy binding conformations of the peptides.”

      (4) Energy decomposition analysis similar to the following paper (https://pubs.acs.org/doi/10.1021/bi201856m) should be provided to understand the residue level enthalpic contribution in the peptide-protein interaction.

      As mentioned by the reviewer, we have performed residue-wise interaction energy analysis and included the analysis results in the revised manuscript and SI.

      Residue-wise interaction energy analysis was performed on peptides p9, p17 and p21 using the trajectory in which the peptide was bound to the PC1 CTF using the gmx_MMPBSA software with the following command line:

      gmx_MMPBSA -O -i mmpbsa.in -cs com.tpr -ct com_traj.xtc -ci index.ndx -cg 3 4 -cp topol.top -o FINAL_RESULTS_MMPBSA.dat -eo FINAL_RESULTS_MMPBSA.csv -do FINAL_DECOMP_MMPBSA.dat -deo FINAL_DECOMP_MMPBSA.csv

      Input file for running residue-wise energy decomposition analysis:

      &general

      sys_name="Decomposition", startframe=1, endframe=200,

      # forcefields="leaprc.protein.ff14SB"

      /

      &gb

      igb=5, saltcon=0.150,

      /

      # make sure to include at least one residue from both the receptor #and peptide in the print_res mask of the &decomp section.

      # this requirement is automatically fulfilled when using the within keyword.

      # http://archive.ambermd.org/201308/0075.html

      &decomp

      idecomp=2, dec_verbose=3, print_res="A/854-862 A/1-853”,

      /

      Residue-wise energy decomposition analysis allowed us to identify key residues that contributed the most to the peptide binding energies. These included residues T1 and V9 in p9 (Table S2), residues T1, R15 and V17 in p17 (Table S3), and residues P10, P11, P19 and P21 in p21 and residue W3726 in the PC1 CTF (Table S4). The energetic contributions of these residues apparently correlated to the sequence coevolution predicted from the Potts model.

      (5) To showcase the reliability of the computational approach, the authors should perform the MD simulation studies with one peptide that did not show any significant agonistic effect in the experiment. This will work as a control for the computational protocol and will demonstrate the utility of the pep-GaMD simulation in this work.

      We appreciate the reviewer’s concern about the lack of control with the other peptides that did not induce an agonistic effect. It is difficult for us to add more MD simulations on the other peptides, due to student leave after PhD graduation. But to address the reviewer’s concern, we have included more details on the study of the stalkless CTF as a control in the revised manuscript.

      (6) To assess the accuracy of the computational results the authors should mention (either in the main text or SI) whether the reported free energy surfaces were the average of the five simulations or computed from one simulation. In the latter case, free energy surfaces computed from the other four simulations should be provided in the SI. In addition, how many binding unbinding events have been observed in each simulation should be mentioned.

      We appreciate the reviewer’s comment regarding convergence of the simulation free energy surfaces. In response to Reviewer 1, we have calculated free energy profiles of individual simulations for each system, including the p9, p17, and p21 (Figs. S5, S6 and S8). 

      “We have calculated free energy profiles of individual simulations for each system, including the p9, p17, and p21 (Figs. S5, S6 and S8). For the p9 peptide, the “Bound” low-energy state was consistently identified in the 2D free energy profile of each individual simulation (Fig. S5). For the p17 peptide, Pep-GaMD simulations were able to refine the peptide conformation from the "Unbound” to the "Intermediate” and “Bound” states in Sim1 and Sim5, while the peptide reached only the "Intermediate” state in the other three simulations (Fig. S6). For the p21 peptide, PepGaMD was able to refine the peptide docking conformation to the "Bound” state in all the five individual simulations (Fig. S8).”

      “It is important to note that the free energy profiles calculated from GaMD simulations of PC1 CTF were not fully converged since certain variations were observed among the individual simulations. Nevertheless, these calculations allowed us to identify representative low-energy binding conformations of the peptides.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a useful report of a spatially-extended model to study the complex interactions between immune cells, fibroblasts, and cancer cells, providing insights into how fibroblast activation can influence tumor progression. The model opens up new possibilities for studying fibroblast-driven effects in diverse settings, which is crucial for understanding potential tumor microenvironment manipulations that could enhance immunotherapy efficacy. While the results presented are solid and follow logically from the model’s assumptions, some of these assumptions may require further validation, as they appear to oversimplify certain aspects in light of complex experimental findings, system geometry, and general principles of active matter research.

      We thank the editor for recognizing the usefulness of our work. This work does not aim to precisely describe the complexity of the tumor microenvironment in lung cancer, but rather to classify and rigorously calibrate a minimum number of parameters to the clinical data we collect and generate, and reproduce the global structures of the microenvironment. We identify different scenarios, and show how they depend on the local interactions within this framework. Although we started in the first version with coalescence in the main text and anisotropic geometry in the supporting information, we realized that we needed to provide more directions to better show how our model can be extended. Thus, in Section III-4 we added an analysis of a microenvironment with blood vessels, and showed how to introduce anisotropic friction as a function of fiber orientation, as well as active stress, paving the way for further studies, that would make our model more complex. However, in a first step, it is crucial to start with a limited number of parameters that can be rigorously determined, and this is how this first work was conceived.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present an important work where they model some of the complex interactions between immune cells, fibroblasts and cancer cells. The model takes into account the increased ECM production of cancer-associated fibroblasts. These fibres trap the cancer but also protect it from immune system cells. In this way, these fibroblasts’ actions both promote and hinder cancer growth. By exploring different scenarios, the authors can model different cancer fates depending on the parameters regulating cancer cells, immune system cells and fibroblasts. In this way, the model explores non-trivial scenarios. An important weakness of this study is that, though it is inspired by NSCLC tumors, it is restricted to modelling circular tumor lesions and does not explore the formation of ramified tumors, as in NSCLC. In this way, is only a general model and it is not clear how it can be adapted to simulate more realistic tumor morphologies.

      We thank the reviewer for highligting the importance of our work. We acknowledge that although we provided anisotropic geometries and the study of the coalescence in the first version, more effort was needed to provide tools to extend our formalism to non-ideal cases. This is now added as Section III-4, where we analyze the impact of blood vessels, and the anisotropic friction due to the nematic order for the fibers; this nematic order can also be used to introduce active nematic stress.

      Reviewer #2 (Public review):

      Summary:

      The authors develop a computational model (and a simplified version thereof) to treat an extremely important issue regarding tumor growth. Specifically, it has been argued that fibroblasts have the ability to support tumor growth by creating physical conditions in the tumor microenvironment that prevent the relevant immune cells from entering into contact with, and ultimately killing, the cancer cells. This inhibition is referred to as immune exclusion. The computational approach follows standard procedures in the formulation of models for mixtures of different material species, adapted to the problem at hand by making a variety of assumptions as to the activity of different types of fibroblasts, namely ”normal” versus ”cancer-associated”. The model itself is relatively complex, but the authors do a convincing job of analyzing possible behaviors and attempting to relate these to experimental observations.

      Strengths:

      As mentioned, the authors do an excellent job of analyzing the behavior of their model both in its full form (which includes spatial variation of the concentrations of the different cellular species) and in its simplified mean field form. The model itself is formulated based on established physical principles, although the extent to which some of these principles apply to active biological systems is not clear (see Weaknesses). The results of the model do offer some significant insights into the critical factors which determine how fibroblasts might affect tumor growth; these insights could lead to new experimental ways of unraveling these complex sets of issues and enhancing immunotherapy.

      We thank the referee for this summary and for recognizing the strengths of our paper.

      Weaknesses:

      Models of the form being studied here rely on a large number of assumptions regarding cellular behavior. Some of these seemed questionable, based on what we have learned about active systems. The problem of T cell infiltration as well as the patterning of the extracellular matrix (ECM) by fibroblasts necessarily involve understanding cell motion and cell interactions due e.g. to cell signaling. Adopting an approach based purely on physical systems driven by free energies alone does not consider the special role that active processes can play, both in motility itself and in the type of self-organization that can occur due to these cell-cell interactions. This to me is the primary weakness of this paper.

      We thank the referee for this important comment, that allows us to clarify this important point. Although biological materials are out of equilibrium, their behavior often resembles that dictated by thermodynamics. Hence the usefulness of constructing a free energy, in terms of these variables. In a first approach to decipher the complex interactions and describe the different and sometimes non-trivial outcomes in this system that involves many components, we must start by minimizing the number of parameters, and identifying those complex processes, that control the evolution of the system. The free energy that we build on this biological system contains therefore out-of-equilibrium processes that can be approximated by a ”close to equilibrium” description. Our approach is a classical one in statistical physics of active systems, namely in the effort to construct an equivalent free-energy for out-of-equilibrium systems. This allows to gain a clearer insight into those complex processes.

      We have added a sentence in the main text, section III.1, to clarify this point:

      “Building a free-energy density for a biological material is justified, because, although biological materials are out of equilibrium, their behavior often resembles that dictated by thermodynamics. It is therefore useful to write a free energy in terms of state variables.”

      Nevertheless, we recognize that we should have provided more tools for using our formalism by making it active. This is why we introduced the nematic order in the fibers in Section III-4. This nematic order can be used to introduce active stress, and we have cited previous works by some of us see [?, ?, ?] as references for building active processes out of it.

      We must also note that cell signaling has been introduced a minima in our system for providing the cue for the arrival of T-cells and NAFs from the boundaries. However, we found that although we had evoked the other role of the chemicals in the transformation from NAFs to CAFs in the text, details were not well explained. We have therefore corrected and added some explanations in the introduction of section III, and III.1, III.2.

      A separate weakness concerns the assumption that fibroblasts affect T cell behavior primarily by just making a more dense ECM. There are a number of papers in the cancer literature (see, for some examples, Carstens, J., Correa de Sampaio, P., Yang, D. et al. Spatial computation of intratumoral T cells correlates with survival of patients with pancreatic cancer. Nat Commun 8, 15095 (2017);Sun, Xiujie, Bogang Wu, Huai-Chin Chiang, Hui Deng, Xiaowen Zhang, Wei Xiong, Junquan Liu et al. ” Tumour DDR1 promotes collagen fibre alignment to instigate immune exclusion.” Nature 599, no. 7886 (2021): 673-678) that seem to indicate that density alone is not a sufficient indicator of T cell behavior. Instead, the organization of the ECM (for example, its anisotropy) could be playing a much more essential role than is given credit for here. This possibility is hinted at in the Discussion section but deserves much more emphasis.

      The referee is right in his comment, and we thank him for raising this issue. We have therefore introduced the anisotropic orientation of the fibers, which induces an anisotropic friction in a new section III-4. In addition, the references pointed out were included in this section. However, although the anisotropy strongly influences the fate of the tumor when the fibers are oriented perpendicular to the surface of the cancer nest, it is less effective when the fibroblasts are oriented in the direction of surface of the cancer nest. In the latter case, which is often the case before cancer cells reshape the tumor microenvironment, the matrix density should correlate with the friction.

      Finally, the mixed version of the model is, from a general perspective, not very different from many other published models treating the ecology of the tumor microenvironment (for a survey, see Arabameri A, Asemani D, Hadjati J (2018), A structural methodology for modeling immune-tumor interactions including pro-and anti-tumor factors for clinical applications. Math Biosci 304:48-61). There are even papers in this literature that specifically investigate effects due to allowing cancer cells to instigate changes in other cells from being tumor-inhibiting to tumor-promoting. This feature occurs not only for fibroblasts but also for example for macrophages which can change their polarization from M1 to M2. There needed to be some more detailed comparison with this existing literature.

      The referee is right that the first part of our approach, namely the dynamical system may be common in this kind of system, and it needs to be mentioned. So we added the following sentence in the discussion: ”This is in line with several similar mathematical models, that study through this lens the inhibition/activation of the immune system by cancer cells either by means of compartmental nonlinear models similar to our dynamical system, for instance regarding macrophage recruitment and cytokine signaling {arabameri2018structural} {li2019computational}, or mixture models {fotso2024mixture}. We combine the two approaches in order to rigorosly derive the parameters of the model and gain insights from both.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors should address the following points:

      Major issues

      (1) The shape of tumors simulated differs immensely from the observed tumors in Fig. 2. Here, the tumor is constituted by irregular domains, not dissimilar from domains in phase separating mixtures. The domains simulated are circular. Since the authors are using the space dependent model to model the increase in tumor cells with time in the different scenarios (immune-desert, immune-excluded, immune inflamed), it should explain how non-spherical tumor structures can be observed in these scenarios. The authors introduce tumor coalescence in page 28, however, it is not expected that the structures observed in Fig 2 are the result from different tumors merging and coalescing, because that would result from an unlikely large number of initial mutation events in the same region of the tissue. The authors should explain what mechanisms present in the model can lead to non-spherical forms.

      We agree with the reviewer that real tumors are rarely round contrary to what our numerics suggests. In fact, only the last figure of our paper in the supporting information was more appropriate for such a discussion. We are now adding discussions and new figures to better illustrate our spatial model, see Figure 6 and section III-4. The in situ geometry of tumors depends on the shape of the host organ, the diffusive (chemical) or advected species such as T cells and fibroblasts, and on the nutrients. Thus, in our case, only cancer cells are produced locally, but during growth the tumor is strongly constrained by the microenvironment, and thus the geometry of the domain we model in the numerics and its boundary conditions. This is also true for the chemicals responsible for growth, cellular advection and phenotypic transformation. Their concentration depends on a convection-diffusion equation and boundary conditions. For a tumor in situ, such as in the lung, the available space is a constraint that will dominate the final geometry of the tumor nests. We do not think that coalescence is controlled by mutational events, but most likely by the search for space necessary for growth. Compared to the first version, we add new figures (Figure 6) that show that the geometry of the organ, as well as the localization of blood vessels, are a cause of the irregularity of the tumor shapes. We also introduce orientational order, which as suggested in section III-4, can induce anisotropic friction and stresses, as well as anisotropic growth. We cite (Ackermann, Joseph, and Martine Ben Amar. ”Onsager’s variational principle in proliferating biological tissues, in the presence of activity and anisotropy.” The European Physical Journal Plus 138.12 (2023): 1103.) where we described active stresses and coupling related to anisotropic growth.

      (2) According to the authors, the model presented in equations (1) and onwards simulates the evolution of the fraction of tumor cells in the tissue. However the fraction of tumor cells, for example, depends itself on the variation of other cell types. For example, if fibroblasts were to proliferate with rate alpha, even without tumor cells proliferating, the fraction of tumor cells in the mixture should decrease as alpha times the tumor cells fraction. These terms are missing. The equations do not describe the evolution of the cells’ fractions but of the amount of cells of each type, normalised by the total carrying capacity of non-normal cells in the tissue. The text should be rewritten accordingly.

      We agree with the referee: our definition of cell density was not precise enough and may appear misleading. In the paragraph II1, we more explictly introduce the word mass fraction which is the correct physical quantity to introduce into the spatial model.

      ”All these cells have the same mass density and the sum of their mass fraction satisfies the relationship S = C + T + F<sub>NA</sub> + F<sub>A</sub> = 1-N, where N is a healthy non active component as healthy cells, for example.”

      It is less intuitive than ”number of cells per unit volume” but necessary for the following (III)

      (3) The authors start by calculating fixed points of different versions of the dynamical system without spatial dependence. They should explain what is the relevance of these fixed points: in a real situation, where the concentration of tumor fibroblasts and T-cells depend on position, in which conditions are these fixed points relevant?

      The referee is right and we will clarify this point: the dynamic analysis is a help for understanding and predicting the scenario occurring in the system. After all the steps of paragraph 2.2, we are faced with 11 independent parameters only for the dynamical system and without the parameters generated by the space modeling itself. Our estimation concerns only lung cancer. These parameters do not appear in the literature. The parameters introduced in Sec. III which are more related to physical interactions such as friction, cell-cell adhesion, etc. can be found in the literature or can be estimated and thus measured in in vitro experiments (see Ackermann and Ben Amar, EPJP 2023, P. Benaroch, J. Nikolic et al. 2024, biorxiv). So what are the fixed points for: they help to get the right numbers for spatial analysis. To recover special features of cancer evolution, we need a model, but also correct estimates of the data in a code that is quite technical and heavy, with each simulation taking a certain amount of time. For users who only need rough predictions, the analysis in section 2 is sufficient.

      It is also important to note that the global result depends only on the source terms, and on the boundary conditions. This can be illustrated with a simple example: Consider the governing equation for the density of a component with velocity v and source term:

      Integrating the equation over a fixed volume V of surface S gives:

      . This integrated equation can then be approximated by the dynamical system that we write. Thus, while the dynamical system does not give any information about the local structure of the system, it may be indicative of its global outcome.

      (4)   In page 15, the authors identify that α<sub>NA</sub> is proportional to δ𝝐<sup>4</sup>. However, in equation (7), they replace α<sub>NA</sub> by δ𝝐<sup>4</sup> without the proportionality constant. This should be corrected.

      Thank you for your remark. This typo is now corrected.

      (5) The tumor cell movement should be much slower than the T-cells. Here, the authors assign a similar friction coefficient for the cancer cells and T-cells, for example. However, in lung cancer tumor cells are epithelial, and adhere to each other in the tissue. Their movement is very restricted by the basement membranes and by cell-cell adhesion. Immune cells and T-cells on the other hand move rapidly throughout the stroma. It is a gross simplification to not consider the low epitelial tissue mobility in the context of lung cancer.

      It is possible to assume different friction coe cients for each phase pair. This has been done in a previous publication, Ackermann et al., Physics report 2021. It is also possible to play with the cell-cell adhesion in the energy density and on the diffusion coe cient introduced in the Flory-Higgins free energy. Cell-cell adhesion is taken into account in the energy, and this makes the tumor a more dense phase, while T-cells can move towards cancer cells to which they are attracted. In the last part of the paper, we show the role of an anisotropic friction due to a nematic order for activated fibroblasts and all the other cells

      (6) What is the biological mechanism by which the T-cells form a colony with a surface tension? In the phase-field model, the authors have a surface tension assigned to the cancer cells, T-cells and fibroblasts. Can the authors justify biologically why do they consider these surface tensions?

      The fact that T-cells form a colony is due to the accumulation of T-cells at the outer boundary of the tumor, as they are attracted to it but cannot penetrate due to the strong cell-cell adhesion of the tumor cells in the nest. Adding a gradient square is standard in continuous models to limit the sharp variations. In a continuous approach, the gradient square contribution limits the sharp variations in cell density which are not physical.

      Minor issues

      (a) Page 6 (end), characterisation of the fibre barrier produced by CAFs missing: what is the fibre density, how it can hinder the spread of cancer and T-cell motility? Is it so dense that it prevents ameboid movement? Can cells move through it using matrix degradation proteins?

      The fiber density corresponds to the fibrous organic extracellular matrix secreted by cancer-associated fibroblasts. In desmotic (highly fibrous tumors such as PDAC or NSCLC), this extracellular matrix deposited around the tumor forms a physical barrier around the tumor nest, preventing both cell migration and capillary and immune cells penetration. In these cases, the fibrous belt actually prevents ameboid movement and cells must deform significantly to migrate. The role of this barrier was particularly demonstrated in the reference (Grout, John A., et al. ”Spatial positioning and matrix programs of cancer-associated fibroblasts promote T-cell exclusion in human lung tumors.” Cancer Discovery 12.11 (2022): 2606-2625.). In later stages of cancer, the tumor may adapt and develop strategies to metastasize, such as matrix degradation. This matrix can be oriented, organized or disordered. To build a minimal model, we first considered an isotropic friction and also an anisotropic friction of the nematic belt, due to the activated fibroblasts. In the case of T-cells, as mentioned in section I.1, it is true that the biological literature also considers a phenotypic transformation of the T cells by the activated fibroblasts: this concerns both their proliferative capacities, antigen recognition and also their cytotoxic function. To better document the different mechanisms, we add the following publication: Cancer associated fibroblasts-an impediment to effective anti-cancer T cell immunity, by Koppensteiner, Lilian and Mathieson, Layla and O’Connor, Richard A and Akram, Ahsan R, Frontiers in immunology (2022).

      However, our goal is to build a minimal model and to characterize and quantify the physical process in which CAFs are involved, namely the role of a physical barrier, that has been documented, as documented above.

      (b) Page 19 (Fig 3), in the figure legend it is written ”resting fibroblasts”, should be ”non-activated fibroblasts”.

      The referee is right: it will be better to write non-activated fibroblasts. This is now changed in the main text.

      (c) Page 21 (equation), what is dΩ? It is dr?

      We thank the referee for raising this point. The text was indeed ambiguous as sometimes dΩ was replaced by dr. To be clearer, all the elements of volume are now noted dV , and the element of surface of the system are noted dS.

      In the article the units are in italic and should be in roman.

      Thank you for raising this point. It has been corrected.

      (d) Page 25 (beginning section III.3), the authors mention that the simulation is 2D, however, the simulation has radial symmetry. A 1D simulation in radial coordinates could simulate a 3D spherical system. Is the simulation of this section equivalent to a 1D radial simulation (in 2D)?

      The referee is right that in radial symmetry, a 1d equation may be written. We therefore present numerics with irregular shapes of the tumor nest in order to make the system fully 2d.

      (e) Page 26 (Fig 4). Legends inside the plots of plates A, B, C and D are not clear. Colorbar range of plates A and D is different. Would facilitate if the ranges were the same.

      The referee is right: the surface plots presented in figure 4 would be easier to compare with the same colorbar range for the legends. In fact, as the referee noted, figures in A, B and C have the same legends, while figure in D has a different one. This is due to the fact that D represents the case of the immune-inflamed tumor where the cancer mass fraction is quite vanishing, resulting in values that are of 3 orders of magnitude lower than those present in A, B and C. Therefore, they would disappear if the colorbar range were equal to the others.We insist more on the change of scale in the legend of Figure 4, in the new version.

      (f) Page 29 (Fig 5), would facilitate if the order of immune-desert, immune-excluded, immune-inflamed was maintained throughout the document. In this figure the immune-inflamed case appears first.

      We agree with the reviewer that following the same order in which the different cases are presented throughout the manuscript would be helpful in comparing the different figures. Therefore, we have modified Figure 5.

      (g) Page 31, the authors indicate that pharmacodynamics and pharmacokinetics are highly dependent on tumour spatial structure. Can they provide examples and citations?

      In the discussion, we have added references concerning pharmacodynamics.

      (h) Page 33 (Fig Sup2), would facilitate if the order of immune-desert, immune-excluded, immune-inflamed was maintained throughout the document. ±±

      We thank the reviewer for pointing this out, the order of the different scenarios in Fig Sup 2 has now been changed.

      Reviewer #2 (Recommendations for the authors):

      Major points

      (1) Following on from the discussion in the public review, I feel that there are a number of critical issues that need to be addressed regarding modeling assumptions. I would like to understand why the authors believe it is possible to use a free energy-driven model of the microenvironment when many of the processes relevant for their study have an undeniably ”active media” flavor.

      The referee is right that processes in biology are active processes. However, it is a classical approach to model physical interactions between biological components with a free-energy, especially cell adhesion, as they often lead to quasi-stationary equilibrium-like patterns. The free-energy approach has also the advantage to derive straight-forwardly complex phenomena involving many components. Activity can indeed be introduced in such a framework, if we know that the fibroblasts transform into myo-fibroblasts, see for example our previous publication Ackermann and Ben Amar, EPJP 2023. However, in the interest of simplification and reduction of the number of free parameters, we have not not considered further complication of the model here, as a minimal model allows to distinguish the main processes that occur. Nevertheless, introducing more precisely activity, in the nematic approach already achieved for the friction, is a natural continuation of our work: See the new Section III-4, where we introduce the nematic order, and we indicate that active nematic stresses can be written from it.

      Next, I don’t understand the assumption that T cells do not proliferate once they detect neoantigens on the cancer cells; activation of T cells usually causes them to become more proliferative.

      We thank the referee for this question. The T-cell fraction has two origins: proliferation of T-cells in situ in the stroma or inside tumor nest or external arrival from the sources that we privilege. We recognize that a full analysis of the tumor-microenvironment would require to consider proliferation near the tumor, as many more other processes which is do able but requires the knowledge of more biological date. In addition, besides, the proliferation of T-cells will be equivalent to increase the killing abilities of T-cells and these two effect overlapp in our approach.

      In order to clarify this point, we modify the following sentence in Section II.2:

      “Although proliferation of cytotoxic T-cells has been observed, we do not consider explicitly proliferation in our study as we focus on their ability to infiltrate the tumor.”

      Rather, we consider that T-cells proliferate outside the domain boundaries, so that this proliferation is included in the boundary source contributions.

      Finally, the issue of whether the density of fibers is sufficient to understand the role of fibroblasts is not at all settled. There should be a full discussion of this issue including mentioning of the Nature paper (cited in the public review) that argues that orientation (and not density) is the key to the role of fibers, as well as the earlier cited work of Kalluri and collaborators on the role of ECM density in pancreatic cancer.

      We thank the referee for this remark. As we wrote above in the response to the public review, we introduced significant additions that aim to tackle this question in the article.

      (2) The authors present a picture of a tumor cell with fibroblasts apparently arrayed circumferentially around the tumor boundary and therefore blocking infiltration. This type of tumor structure has been seen before, for example in ”On the mechanism of long-range orientational order of fibroblasts.” Proceedings of the National Academy of Sciences 114, no. 34 (2017): 8974-8979, which should be cited. More importantly, in that paper the argument is made that positive feedback between fibroblasts and ECM geometry can cause structures like this to form. If this is indeed what is occurring, this would indicate the crucial importance of a mechanism beyond what is contained in the current model. This issue should therefore be discussed within this paper. This issue is of course connected to the previous point regarding the role of ECM structure beyond density.

      We completely agree that the interplay between the fibroblast layer and the tumor shapes the tumor boundary. One of the authors has worked recently on this precise topic (Aging and freezing of active nematic dynamics of cancer-associated fibroblasts by fibronectin matrix remodeling, C Jacques, J Ackermann, S Bell, C Hallopeau, CP Gonzalez, ... bioRxiv, 2023.11. 22.568216, Ordering, spontaneous flows and aging in active fluids depositing tracks S Bell, J Ackermann, A Maitra, R Voituriez arXiv preprint arXiv:2409.05195). Since the fibroblast layer is an active material, it contributes to an anisotropic stress that can be introduced into the model. Our first strategy was to present the simplest modeling in order to focus on the most important interactions as cell-cell adhesion and cell-tissue adhesion. However, we recognize that those questions should be discussed in the text, and we discuss it in the new section III-4

      Minor points

      There are also a number of more minor points to consider:

      (1) Since the parameter is taken to be O(1), why exactly does it matter how the other parameters scale with it?

      It is very important to compare the order of magnitude of the other parameters once the selected parameter of order O(1) is really the driving parameter of the coupling. It gives a first picture of the main interactions that has to consider.

      (2) I didn’t understand the relevance of referring specifically to IL 6 among many other possibly relevant signals, as is currently done on page 7.

      This corresponds to studies aiming to correlate lung cancer risks and the concentration of interleukin, mostly IL6 and IL8 (McKeown, D. J., et al. ”The relationship between circulating concentrations of C-reactive protein, inflammatory cytokines and cytokine receptors in patients with non-small-cell lung cancer.” British journal of cancer 91.12 (2004): 1993-1995.,Brenner, Darren R., et al. ”Inflammatory cytokines and lung cancer risk in 3 prospective studies.” American journal of epidemiology 185.2 (2017): 86-95. ) but in the absence of very detailed biological information, the modeling and its results are not modified if other chemicals intervene..We slightly modeified the following phrase in section I.1:

      “In particular, in the family of inflammatory proteins, also called cytokines, Interlukin-6 (IL6) and (IL8) seem, among others to stimulate the infiltration of CD8<sup>+</sup>.

      (3) The authors need to mention the possibility of T-cell chemotaxis to the tumor being ”self-amplified” in the T cell system, as put forth in Galeano Nin˜o, Jorge Luis, Sophie V. Pageon, Szun S. Tay, Feyza Colakoglu, Daryan Kempe, Jack Hywood, Jessica K. Mazalo et al. ”Cytotoxic T cells swarm by homotypic chemokine signalling.” eLife 9 (2020): e56554. This might again reveal a needed extension of the current modelling strategy.

      We thank the referee for his/her comment on the self-amplification of T-cell population in the stroma and we mention the indicated reference in our paper. This auto-chemoatactic process which induces a dynamic of more e cient recruitment towards the tumor, may be important for immunotherapy. To have more e cient T-cell arriving at the site of the tumor, will lead a better issue for the patient, if the swarming organization is maintained in a desmoplastic nematic stroma.

      (4) It is not obvious to me that in sub figures 3F and 3H the tumor is enroute to being totally eradicated, as is stated in the text. The blue lines seemed to asymptote at non-zero population values.

      Looking at sub-figures 3F and 3H, we stated in the main text that the tumor is eradicated as the representative population approaches a 0 value fraction, or at least decays around the 0 (0.01/0.05 to be more precise). This is even more evident when compared with the other cases where the tumor mass fraction reaches values of a higher order (up to 0.6), thus leading us to dinstinguish between these different scenarios.

      (5) The description of the interaction of cells with fibers as being increased friction might be misleading, as the real effect could be actual trapping in the network (as opposed to just slowing down the motion).

      We thank the referee for this question as it allow us to make an important distinction. Indeed, what the referee describes seems to correspond to a discrete event, namely a cell trapped in a network. However, coarse-graining the dynamics to the continuous modeling seems to us as leading to an effective friction between the two phases. Moreover, we also now introduced an anisotropic friction which can represent a trapping. The velocities are not only directed around the tumor but can also be oriented towards the tumor, so that eventually the friction along the radius mimics a trapping (see Fig.4 on top). We have introduced this anisotropic friction via a nematic model, see the appendix.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Authors showed the presence of Mtb in human liver biopsy samples of TB patient and reported that chronic infection of Mtb causes immune-metabolic dysregulation. Authors showed that Mtb replicates in hepatocytes in a lipid rich environment created by up regulating transcription factor PPARγ. Authors also reported that Mtb protects itself from anti-TB drugs by inducing drug metabolising enzymes.

      Strengths:

      It has been shown that Mtb induces storage of triacylglycerol in macrophages by induction of WNT6/ACC2 which helps in its replication and intracellular survival, however, creation of favorable replicative niche in hepatocytes by Mtb is not reported. It is known that Mtb infect macrophages and induces formation of lipid-laden foamy macrophages which eventually causes tissue destruction in TB patient. In a recent article it has been reported that "A terpene nucleoside from M. tuberculosis induces lysosomal lipid storage in foamy macrophages" that shows how Mtb manipulates host defense mechanisms for its survival. In this manuscript, authors reported the enhancement of lipid droplets in Mtb infected hepatocytes and convincingly showed that fatty acid synthesis and triacylglycerol formation is important for growth of Mtb in hepatocytes. Authors also showed the molecular mechanism for accumulation of lipid and showed that the transcription factor associated with lipid biogenesis, PPARγ and adipogenic genes were upregulated in Mtb infected cells.

      The comparison of gene expression data between macrophages and hepatocytes by authors is important which indicates that Mtb modulates different pathways in different cell type as in macrophages it is related to immune response whereas, in hepatocytes it is related to metabolic pathways.

      Authors also reported that Mtb residing in hepatocytes showed drug tolerance phenotype due to up regulation of enzymes involved in drug metabolism and showed that cytochrome P450 monooxygenase that metabolize rifampicin and NAT2 gene responsible for N-acetylation of isoniazid were up regulated in Mtb infected cells.

      Weaknesses:

      There are reports of hepatic tuberculosis in pulmonary TB patients especially in immune-compromised patients, therefore finding granuloma in human liver biopsy samples is not surprising.

      Mtb infected hepatic cells showed induced DME and NAT and this could lead to enhanced metabolism of drug by hepatic cells as a result Mtb in side HepG2 cells get exposed to reduced drug concentration and show higher tolerance to drug. Authors mentioned that " hepatocyte resident Mtb may display higher tolerance to rifampicin". In my opinion higher tolerance to drug is possible only when DME of Mtb inside is up regulated or target is modified. Although, in the end authors mentioned that drug tolerance phenotype can be better attributed to host intrinsic factors rather than Mtb efflux pumps. It may be better if Drug tolerant phenotype section can be rewritten to clarify the facts.

      In the revised manuscript, by immune-staining authors convincingly showed that hepatocytes are a favourable niche for replication of MTb.

      Authors have rewritten the drug tolerant phenotype section which reads better.

      Overall, this paper has new and important information on how MTb establishes a favourable niche for growth in hepatocytes and creates a drug tolerant environment.

      We thank the reviewer for the through and insightful review.

      Reviewer #2 (Public review):

      The manuscript by Sarkar et al has demonstrated the infection of liver cells/hepatocytes with Mtb and the significance of liver cells in the replication of Mtb by reprogramming lipid metabolism during tuberculosis. Besides, the present study shows that similar to Mtb infection of macrophages (reviewed in Chen et al., 2024; Toobian et al., 2021), Mtb infects liver cells but with a greater multiplication owing to consumption of enhanced lipid resources mediated by PPARg that could be cleared by its inhibitors. The strength of the study lies in clinical evaluation of the presence of Mtb in human autopsied liver samples from individuals with miliary tuberculosis and presence of a clear granuloma-like structure. The interesting observation is of granuloma-like structure in liver which prompts further investigations in the field.

      The modulation of lipid synthesis during Mtb infection, such as PPARg upregulation, appears generic to different cell types including both liver cells and macrophage cells. It is also known that infection affect PPARγ expression and activity in hepatocytes. It is also known that this can lead to lipid droplet accumulation in the liver and the development of fatty liver disease (as shown for HCV). This study is in similar line for M.tb infection. As liver is the main site for lipid regulation, the availability of lipid resources is greater and higher is the replication rate. In short, the observations from the study confirm the earlier studies with these additional cell types. It is known that higher the lipid content, greater are Lipid Droplet-positive Mtb and higher is the drug resistance (Mekonnen et al., 2021). The DMEs of liver cells add further to the phenotype.

      Comments on revised version:

      The authors noted that even in experiments where mice were infected with lower CFUs, the presence of Mtb colonies could still be detected in the liver. It would be beneficial to include some experimental data related to this in the supplementary information, as it could provide valuable insights for the research field.

      We thank the reviewer for the in depth evaluation of our manuscript and as suggested we will include the data where Mtb was detected in the liver at low CFUs

      Reviewer #3 (Public review):

      In this revised manuscript, the authors explore how Mtb can infect hepatocytes and create a favorable niche associated with upregulation of the transcription factor PPARγ which presumably allows the bacteria to scavenge lipids from lipid droplets in host cells and upregulate drug-metabolizing enzymes to protect against its elimination. In response to the review, the authors have performed some additional immunostaining of hepatocytes, added more detail to figure legends, added experiments somewhat showing improved colocalization and staining, clarified several points and paragraphs, and updated the referenced literature and discussion.

      The current manuscript provides evidence that human miliary TB patients have infection of hepatocytes with Mtb, with evidence that the bacteria survive at least partially through upregulation of PPARγ, which significantly changes the lipid milieu of the cells. There is also an examination of transcriptomics and lipid metabolism in response to Mtb infection, as well as drug tolerance of Mtb inside hepatocytes. The current manuscript is an improvement over the previous one.

      However, although the manuscript is improved, tissue immunophenotyping of the various cells in the liver remains weak and unconvincing. This is truly a missed opportunity and lessens the rigor of the central findings and conclusions. As pointed out by another reviewer, literature has described different fates of Mtb in the liver. Given the tissue available to the authors, carefully dissecting the various cells that the bacteria are in (esp. hepatocytes versus Kupffer cells) is critical. The authors use only 2 generic markers and do not distinguish among cell types within the tissue slices. A review of the literature shows a variety of both human and mouse antibody markers. In fact, a liver atlas based on immunophenotyping has been published. Likewise, the authors comment on liver granulomas, but this is not justified without immunophenotyping.

      We would like to thank the reviewer for the in-depth and detailed suggestions. We would like to clarify that the primary aim of our study was to determine the localization of Mtb within hepatocytes and the downstream biological consequences. To this end, we employed two well-established and widely validated markers (ASPGR 1 and albumin) that are consistently used to identify hepatocytes in both human and murine liver tissue. While we acknowledge the broader potential of comprehensive immunophenotyping, our focused approach was designed to specifically address the question of hepatocyte involvement, which the selected markers effectively support, which was further reiterated by the Reviewer 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In my opinion this paper contains important information and no further information is required for this manuscript.

      We thank the reviewer for the insightful comments

      Reviewer #2 (Recommendations for the authors):

      The authors noted that even in experiments where mice were infected with lower CFUs, the presence of Mtb colonies could still be detected in the liver. It would be beneficial to include some experimental data related to this in the supplementary information, as it could provide valuable insights for the research field.

      As suggested,  we will include the data with the low CFUs in the updated manuscript.

      Reviewer #3 (Recommendations for the authors):

      • Line 340, the fact that PPARγ inhibition decreases bacterial load should not be surprising, as the authors cite several papers where this is already shown.

      • Line 379, the increased tolerance of Mtb to drugs in hepatocytes is only significant at the lower 2 concentrations, not at 5 ug/mL.

      • Fig S4F-H, the y axis is inappropriately not set to zero on the lower limit.

      • Fig S9B, the Y-axis states "relative" CFU, but there is no indication what the bars are normalized to, and the numbers are much more typical of standard CFU values. Was the "Relative" part left in by mistake?

      • Double check the ending of the figure legend for Figure S10 and S11.

      • Line 352, phenomenom [sic] is misspelled.

      • On re-read, several sentences throughout this manuscript need improvement regarding structure and grammar. I suggest careful editorial review.

      We thank the reviewer for pointing out the issues and these will be carefully modified in the next version.


      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors showed the presence of Mtb in human liver biopsy samples of TB patients and reported that chronic infection of Mtb causes immune-metabolic dysregulation. Authors showed that Mtb replicates in hepatocytes in a lipid rich environment created by up regulating transcription factor PPARγ. Authors also reported that Mtb protects itself from anti-TB drugs by inducing drug metabolising enzymes.

      Strengths:

      It has been shown that Mtb induces storage of triacylglycerol in macrophages by induction of WNT6/ACC2 which helps in its replication and intracellular survival, however, creation of favorable replicative niche in hepatocytes by Mtb is not reported. It is known that Mtb infects macrophages and induces formation of lipid-laden foamy macrophages which eventually causes tissue destruction in TB patients. In a recent article it has been reported that "A terpene nucleoside from M. tuberculosis induces lysosomal lipid storage in foamy macrophages" that shows how Mtb manipulates host defense mechanisms for its survival. In this manuscript, authors reported the enhancement of lipid droplets in Mtb infected hepatocytes and convincingly showed that fatty acid synthesis and triacylglycerol formation is important for growth of Mtb in hepatocytes. The authors also showed the molecular mechanism for accumulation of lipid and showed that the transcription factor associated with lipid biogenesis, PPARγ and adipogenic genes were upregulated in Mtb infected cells.

      The comparison of gene expression data between macrophages and hepatocytes by authors is important which indicates that Mtb modulates different pathways in different cell type as in macrophages it is related to immune response whereas, in hepatocytes it is related to metabolic pathways.

      Authors also reported that Mtb residing in hepatocytes showed drug tolerance phenotype due to up regulation of enzymes involved in drug metabolism and showed that cytochrome P450 monooxygenase that metabolize rifampicin and NAT2 gene responsible for N-acetylation of isoniazid were up regulated in Mtb infected cells.

      We thank the reviewer for the positive feedback and for highlighting the strengths of our study.

      Weaknesses:

      There are reports of hepatic tuberculosis in pulmonary TB patients especially in immune-compromised patients, therefore finding granuloma in human liver biopsy samples is not surprising.

      Mtb infected hepatic cells showed induced DME and NAT and this could lead to enhanced metabolism of drug by hepatic cells as a result Mtb in side HepG2 cells get exposed to reduced drug concentration and show higher tolerance to drug. The authors mentioned that " hepatocyte resident Mtb may display higher tolerance to rifampicin". In my opinion higher tolerance to drugs is possible only when DME of Mtb inside is up regulated or the target is modified. Although, in the end authors mentioned that drug tolerance phenotype can be better attributed to host intrinsic factors rather than Mtb efflux pumps. It may be better if the Drug tolerant phenotype section can be rewritten to clarify the facts.

      We agree that several case studies regarding liver infection in pulmonary TB patients have been reported in the literature, however this report is the first comprehensive study that establishes hepatocytes to be a favourable niche for Mtb survival and growth.

      Drug tolerance is a phenomenon that is exhibited by the bacteria and during hostpathogen interactions, can be influenced by both intrinsic (bacterial) and extrinsic (host-mediated) factors. Multiple examples of tolerance being attributed to host driven factors can be found in literature (PMID 32546788, PMID: 28659799, PMID: 32846197). Our studies demonstrate that Mtb infected hepatocytes create a drug tolerant environment by modulating the expression of Drug modifying enzymes (DMEs) in the hepatocytes.

      As suggested by the reviewer we will rewrite the drug tolerant phenotype section.

      Reviewer #2 (Public review):

      The manuscript by Sarkar et al has demonstrated the infection of liver cells/hepatocytes with Mtb and the significance of liver cells in the replication of Mtb by reprogramming lipid metabolism during tuberculosis. Besides, the present study shows that similar to Mtb infection of macrophages (reviewed in Chen et al., 2024; Toobian et al., 2021), Mtb infects liver cells but with a greater multiplication owing to consumption of enhanced lipid resources mediated by PPARg that could be cleared by its inhibitors. The strength of the study lies in the clinical evaluation of the presence of Mtb in human autopsied liver samples from individuals with miliary tuberculosis and the presence of a clear granuloma-like structure. The interesting observation is of granuloma-like structure in liver which prompts further investigations in the field.

      The modulation of lipid synthesis during Mtb infection, such as PPARg upregulation, appears generic to different cell types including both liver cells and macrophage cells. It is also known that infection affect PPARγ expression and activity in hepatocytes. It is also known that this can lead to lipid droplet accumulation in the liver and the development of fatty liver disease (as shown for HCV). This study is in a similar line for M.tb infection. As the liver is the main site for lipid regulation, the availability of lipid resources is greater and higher is the replication rate. In short, the observations from the study confirm the earlier studies with these additional cell types. It is known that higher the lipid content, the greater are Lipid Droplet-positive Mtb and higher is the drug resistance (Mekonnen et al., 2021). The DMEs of liver cells add further to the phenotype.

      We thank the reviewer for emphasizing on the strengths of our study and how it can lead to further investigations in the field.

      Reviewer #3 (Public review):

      This manuscript by Sarkar et al. examines the infection of the liver and hepatocytes during M. tuberculosis infection. They demonstrate that aerosol infection of mice and guinea pigs leads to appreciable infection of the liver as well as the lung. Transcriptomic analysis of HepG2 cells showed differential regulation of metabolic pathways including fatty acid metabolic processing. Hepatocyte infection is assisted by fatty acid synthesis in the liver and inhibiting this caused reduced Mtb growth. The nuclear receptor PPARg was upregulated by Mtb infection and inhibition or agonism of its activity caused a reduction or increase in Mtb growth, respectively, supporting data published elsewhere about the role of PPARg in lung macrophage Mtb infection. Finally, the authors show that Mtb infection of hepatocytes can cause upregulation of enzymes that metabolize antibiotics, resulting in increased tolerance of these drugs by Mtb in the liver.

      Overall, this is an interesting paper on an area of TB research where we lack understanding. However, some additions to the experiments and figures are needed to improve the rigor of the paper and further support the findings. Most importantly, although the authors show that Mtb can infect hepatocytes in vitro, they fail to describe how bacteria get from the lungs to the liver in an aerosolized infection. They also claim that "PPARg activation resulting in lipid droplets formation by Mtb might be a mechanism of prolonging survival within hepatocytes" but do not show a direct interaction between PPARg activation and lipid droplet formation and lipid metabolism, only that PPARg promotes Mtb growth. Thus, the correlations with PPARg appear to be there but causation, implied in the abstract and discussion, is not proven.

      The human photomicrographs are important and overall, well done (lung and liver from the same individuals is excellent). However, in lines 120-121, the authors comment on the absence of studies on the precise involvement of different cells in the liver. In this study there is no attempt to immunophenotype the nature of the cells harboring Mtb in these samples (esp. hepatocytes). Proving that hepatocytes specifically harbor the bacteria in these human samples would add significant rigor to the conclusions made.

      We thank the reviewer for nicely summarizing our manuscript.

      Our study establishes the involvement of liver and hepatocytes in pulmonary TB infection in mice. Understanding the mechanism of bacterial dissemination from the lung to the liver in aerosol infections demands a detailed separate study.

      Figure 6E and 6F shows how PPARγ agonist and antagonist modulate (increase and decrease respectively) bacterial growth in hepatocytes (further supported by the CFU data in Supplementary Figure 9B). Again, the number of lipid droplets in hepatocytes increase and decrease with the treatment of PPARγ agonist and antagonist respectively as shown in Figure 6G and 6H. Collectively, these studies provide strong evidence that PPARγ activation leads to more lipid droplets that support better Mtb growth.

      We thank the reviewer for finding our human photomicrographs convincing. In the manuscript, we provide evidence for the direct involvement of the hepatocytes (and liver) in Mtb infection. We have performed detailed immunophenotyping of hepatocyte cells in the mice model with ASPGR1 (asialoglycoprotein receptor 1) and in the revised version of record, we have further stained the infected hepatocytes with anti-albumin antibody.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In my opinion drug tolerant phenotype section should be rewritten for better clarification. The manuscript contains important information about hepatic tuberculosis which are not reported yet.

      We have rewritten the drug tolerant phenotype section for better clarity.

      We appreciate the reviewer’s comments regarding important information about hepatic tuberculosis

      Reviewer #2 (Recommendations for the authors):

      The following are some observations and comments on the manuscript.

      (1) The study delves into the mechanisms related to hepatic TB/miliary TB; however, the introduction and discussion only describe and discuss the data in the context of pulmonary TB giving a sense that the mandate of the MS is the exploration of the role of liver cells in pulmonary TB. There appears a gap in the connection of findings from the Miliary TB to the pulmonary TB. A discussion of the conversion of pulmonary TB to extrapulmonary /hepatic TB in the light of the findings may be helpful.

      We have modified the discussion section to include possible mechanisms that convert pulmonary TB to hepatic TB in the light of findings. Briefly, Pulmonary tuberculosis (TB) can lead to miliary TB probably through hematogenous dissemination, where Mtb spreads from the infected lungs into blood vessels either from a primary lung focus, reactivated TB or caseous necrosis.  Once in blood vessels, the bacteria seed multiple organs, forming tiny granulomas, characteristic of miliary TB. The liver involvement could be either through direct hematogenous spread or extrusion from nearby infected lymph nodes, leading to hepatic TB, which presents with granulomas and liver dysfunction. This spread underscores the severity of untreated pulmonary TB and the need for early intervention. Our in vivo infection data clearly shows that pulmonary infection of Mtb in mice and guinea pigs can steadily leads to significant infection of the liver and metabolic abnormalities in the liver. The study further highlights the need for systemic studies to better understand the route and mode of dissemination from lungs to liver for better pathophysiological understanding of the disease and creating new therapeutic targets.  

      (2) The authors show the presence of Mtb in the liver autopsies of miliary tuberculosis patients. It is well known that Mtb disseminates during the late stages to several organs and liver is a major site (Sharma et al. 2005; 10.1016/S1473-3099(05)70163-8). Other clinical observations also point to the fact that although Mtb infects liver cells, it is cleared (Thandi et al., 2018, https://doi.org/10.4049/jimmunol.200.Supp.173.20). As the samples are from miliary TB, it is expected that the bacterial load must have been very high before spreading to blood. It is known that once in blood, M.tb is expected to spread to various organs, especially highly vascular ones. Were any other tissues (especially with high vasculature) stained and verified? If yes, add to the supplementary data or discuss.

      Other tissues were not collected and stained during this study. Studies are currently underway to understand whether other vasculated organs also harbour Mtb or not. Besides several studies have shown that Mtb can infect a wide range of organs like brain, kidney, bone marrow, etc (PMID: 33142108, PMID: 28046053, PMID: 34269789) during miliary conditions.

      (3) It is not evident from this paper if hepatic infiltration occurs in pulmonary TB patients? It may therefore be important to discuss the status of liver infections in the primary pulmonary infection.

      Based on the available data from human biopsied liver samples, there is an indication of liver involvement in systemic tuberculosis (TB). However, to gain a more comprehensive understanding of hepatic infiltration in pulmonary TB patients, it is essential to conduct well-organized clinical studies. These studies should specifically target pulmonary TB patients and explore the extent and nature of liver involvement in these individuals (discussion). As suggested by the reviewer it is in the discussion

      (4) Similarly, in the mice model, M.tb was shown to localize to liver when aerosolic infection was given. Were any other tissues, such as kidney, bone marrow etc, checked? Is it because of the high dose of M.tb against the standard challenge dose of 50-100 CFU? Further, since the study in the mouse model is to mimic a miliary tuberculosis of liver, did the dissemination occur via bloodstream and if mycobacteremia could be observed in infected mice.

      Currently studies are underway to understand the involvement of other organs like kidney, brain, bone marrow, in aerosol infection mice model and how dissemination occurs in those distant organs.

      The focus of the current study was to understand the role of liver in systemic tuberculosis with emphasis on hepatocytes as a key cell type to be infected. We have also conducted the experiments with lower CFUs and could detect the presence of Mtb colonies in liver, so we do not think that the infection of liver is dependent on the dose of infection.

      (5) There are studies in mouse model which infer that liver carried the lowest bacterial burden, was cleared the fastest, and it is established that as compared to sites persistently seeded by M. tuberculosis, in the liver the bacteria rarely infect cell types other than professional phagocytes. As the observations in this study are contrasting, the discussion section should include a critical comparative analysis to justify why in the conditions used in the study, the hepatocytes and not Kupffer cells are infected. Other than the morphological description to indicate M.tb infection of hepatocytes in the liver section (fig 1E), it will be good to show localization of M.tb specifically to hepatocytes by using hepatocyte specific marker. Unlike as reported, why was a clearance of M.tb not observed even after 10 weeks (figure 2B).

      While some studies show that Mtb from the liver is cleared fast but there are several other studies that report Liver harbours Mtb even after 10 weeks postinfection (PMID: 22359543, PMID: 21533158, PMID: 29242198). We have consistently observed Mtb infection of liver post week 10 in our infection model. 

      We have performed detailed immunophenotyping of hepatocyte cells in the mice model with ASPGR1 (asialoglycoprotein receptor 1) and in the revised version of record, we have further stained the isolated hepatocytes with anti-albumin antibody (albumin is a robust marker of hepatocyte identity) and have showed the presence of Mtb in it. The data has been included in the revised manuscript (Fig 2J)

      (6) While the result section mentions that "individuals with miliary tuberculosis' (line 107), the legend of Figure 1 writes 'Presence of Mtb in human pulmonary tuberculosis patients'. This is confusing. Clarify

      We thank the reviewer for pointing it out, we have changed the figure legends to miliary tuberculosis as most of the liver biopsy samples were obtained from military tuberculosis patients. 

      (7) Supplementary Figure 2D: Corresponding control panel (uninfected) should be added, which will also verify the specificity of Ag85b. As it is known that Ag85B is secreted out from the bacteria and hence the detected signals may not confirm that Mtb is in hepatocytes. Ag85B per bacterium decreases by almost 10,000-fold at later stages of infection because of secretion (Ernst JD, Cornelius A, et al 2019 mBio). In Supl figure 2D, Ag85b signal seems to be present everywhere inside the cells. Hence, it is important that the control panel be added.

      We have included a control image below which shows no staining of Ag85B in the uninfected sample.While we acknowledge with the reviewer’s comment, but Ag85B has been consistently used as a marker for Mtb presence in multiple studies. Nargan et al., uses Ag85B based staining to characterize infection both pulmonary and EPTB samples (PMID: 38880068). Jain et al., uses Ag85B to characterize Mtb infection of Mesenchymal stem cell in lung biopsy samples of pulmonary TB patients (PMID: 32546788)

      Author response image 1.

      Ag85B staining in uninfected mice shows no signals

      (8) The kinetics experiments in Figure 3D-3G should have used time laps microscopy of a few of the infected cells or it should be represented in CFU. If we consider the doubling time of H37Rv is about 22h to 24h, the data showing that MFI increases dramatically from 5 HPI to 120 HPI, gives an impression that the bacterial number inside the cells increased more than its doubling time.

      We have added the modified plot. As suggested, the CFU of Mtb within HepG2, PHCs, THP-1, RAW 264.7 and BMDMs have been included in the revised version (Supplementary Figure 4 D-H)

      (9) What is the effect of C45 and T863 on Mtb growth invitro? The effect of C45 and T863 on Mtb growth invitro should be shown to be ruled out. The representative image in Figure 5F is DMSO or C45 treated cells panel? Please specify it.

      As per the reviewer’s suggestion we have seen the effect of C45 (30 µM) and T863 (25 µM) on Mtb growth in vitro and did not find any difference in the growth kinetics. The representative image in Figure 5F is DMSO treated cells.

      Author response image 2.

      Growth kinetics of Mtb in 7H9 medium with DMSO, C75 and T863

      (10) Supplementary Figure 6B: Correct the Y-axis label from mRNA levels to Fold change (normalised to control). Please do similar changes wherever required.

      We have made the necessary changes as per the suggestion of the reviewer.

      (11) Figure 7B and 7C: How was the normalization performed? Is the data normalized to the number of bacteria that entered the specific cell type or was normalized at 48hrs with respect to DMSO? DMSO alone data should be shown.

      In the drug tolerance assays, we have calculated the ratio of the bacterial burden in hepatocytes treated with drugs compared to hepatocytes treated with DMSO. The infection was given for 48 hours post which the infected cells were treated with the mentioned concentrations of isoniazid and rifampicin for 24 hours. CFU enumeration was conducted after this 24 hour. Figure 7A gives a schematic of the experimental set up.

      % Tolerant Bacterial population= [A/B X 100] % where A is the CFU of Mtb from infected hepatocytes treated with drug and B is the CFU of Mtb infected cells treated with DMSO.Thus the effect of MOI is negated.

      To provide further credence to the CFU data, we have analysed these studies using microscopic studies as well, where no cell death was observed under the conditions. Mouse BMDMs were as a macrophage control. We have calculated the % tolerance as ratio by measuring the mean fluorescent intensity of GFP-Mtb per hepatocyte treated with drug to MFI of GFP-Mtb per hepatocyte treated with DMSO (control). More than 20 fields, each consisting of more than 4 infected cells have been used for analysis providing additional evidence of less killing of Mtb in hepatocytes compared to BMDMs with anti-TB drugs. All these details are included in the manuscript.

      (12) While authors have shown the changes in mRNA levels of CYP3A4, CYP3A43, NAT2, the protein or activities of some of these should be measured to verify the effect.

      Currently studies are underway to understand the activities of the key proteins involved in isoniazid and rifampicin metabolism and will be published as a separate manuscript.

      Reviewer #3 (Recommendations for the authors):

      Additional comments are:

      • Figure 2D, the 20X and 40X magnifications do not look appreciably different in size. Please double-check that the correct images were used.

      We thank the reviewer for pointing it out, we havecorrected it in the revised version.

      • Lines 162-164: The authors state almost 100% purity. However, the contour plot in 2F appears to show 2 cell populations. Figure 2G is missing a legend of which colors correspond to which staining (and again there appears to be highly variable staining).

      We agree with the reviewer that there are two contours observed in Figure 2F. Although both the contours are positive for ASPGR1 protein, but the level of expression of the ASPGR1 protein is variable. The corresponding confocal image (Nucleus stained by DAPI and ASPGR1 stained with ASPGR1 antibody with Alexa fluor 555 conjugated secondary antibody) also indicates a variable staining of isolated primary hepatocytes, where some cells give a stronger intensity signal than the other cells, further visually confirming our statement. Moreover, several studies show differential expression of ASPGR1 protein in hepatocyte like cells (PMID: 27143754)

      To further clarify and be more specific with respect to the identity of the hepatocytes, we have stained primary hepatocytes from infected mouse livers with Albumin antibody (a stable marker for hepatocytes) and Ag85B (2J)

      Multiple figures throughout the manuscript, including this one, would benefit from the use of arrows to depict what is described in the legend and text more clearly, and the use of higher power insets to better define cell architecture. Finally, some images appear blurry to the eye. Improvements are needed throughout.

      As per the suggestion, we have modified the figures and figure legends for better clarity.

      • Lines 153-155. Albumin, AST and GGT appear to be significantly up at week 8, contradicting the statement that there is no change until week 10.

      We thank the reviewer for poiting it out and  have made suitable changes in the write up

      • Lines 203-205: The authors state earlier that bacteria survive in macrophage phagosomes. Do the authors know the niche for bacteria in hepatocytes that enable them to continue to grow? Transcriptome data from HepG2 cells suggest perhaps a phagosomal pathway?

      We thank the reviewer for this insightful question. As rightly pointed out by the reviewer, transcription data indeed suggests changes in several important pathways like macroautophagy, golgi vesicular transport and vacuolar transport, which can affect the subcellular localisation of Mtb within hepatocytes. High resolution microscopic studies with respect to the subcellular localisation of labelled Mtb within Primary hepatocytes, HepG2 and THP-1 has been conducted and the % colocalization within different intra-cellular compartments have been measured. The image of colocalization of labelled Mtb within PHCs is shown below along with the % colocalization within various compartments in PHCs, HepG2 and THP-1 is added. 

      Author response image 3.

      Colocalisation of Mtb-GFP with various intra-cellular markers within PHCs.

      Author response image 4.

      Percentage Colocalisation of Mtb-GFP with various intra-cellular markers within PHCs, HepG2 and THP-1.

      • Validation of some critical genes found in the HepG2 cells should be done by qRTPCR in primary hepatocytes.

      qRT-PCR analysis of some of the key genes in HepG2 have been validated in primary hepatocytes at 24 hours post infection. Majority of the genes show a similar trend.

      Author response image 5.

      Gene expression analysis of the mentioned genes in Mtb infected PHCs as compared to the uninfected control.

      • Lines 259-260: The authors state a high degree of co-localization. The photomicrograph of a single cell in Fig. 5D is not convincing. I'm not even sure that they are really in the same subcellular compartment. Co-localization stated in Fig. S8B is also not convincing as shown.

      The image currently shown in figure 3D is a maximum intensity projection image of multiple z-stacks encompassing the entire cell.

      We agree with the reviewer with respect to figure Fig S8B and will modify the text and the figure legend accordingly.

      Copywriting edits:

      • It is difficult to see individual gene names in Figures 4D and 4E. A higher resolution or larger font would be appreciated for the reader.

      An excel file with the top differentially regulated genes at both 0 hours post infection and 48 hours post infection has been added.

      • Figure 5A has a shadow on the top right image.

      We have changed the image in the revised manuscript

      • Figure 5E is difficult to read the labels on the axes; it would be better in general to make the labels separately instead of relying on the graphing software, since these labels can get stretched when the size of the graph is modified.

      We agree with the reviewer and have made necessary changes.

      • Line 163: should be "percent" and not "perfect."

      We thank the reviewer for pointing it out and have corrected it

      • Line 190: is missing a period at the end of the sentence "...for further experiments"

      We thank the reviewer for pointing it out and have corrected it

      • Line 332: should be "hepatocytes" instead of "hepatoctyte" [sic]

      We thank the reviewer for pointing it out and have corrected it

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      Li et al investigated how adjuvants such as MPLA and CpG influence antigen presentation at the level of the Antigen-presenting cell and MHCII : peptide interaction. They found that the use of MPLA or CpG influences the exogenous peptide repertoire presented by MHC II molecules. Additionally, their observations included the finding that peptides with low-stability peptide:MHC interactions yielded more robust CD4+ T cell responses in mice. These phenomena were illustrated specifically for 2 pattern recognition receptor activating adjuvants. This work represents a step forward for how adjuvants program CD4+ Th responses and provides further evidence regarding the expected mechanisms of PRR adjuvants in enhancing CD4+ T cell responses in the setting of vaccination.

      Strengths:

      The authors use a variety of systems to analyze this question. Initial observations were collected in an H pylori model of vaccination with a demonstration of immunodominance differences simply by adjuvant type, followed by analysis of MHC:peptide as well as proteomic analysis with comparison by adjuvant group. Their analysis returns to peptide immunization and analysis of strength of relative CD4+ T cell responses, through calculation of IC:50 values and strength of binding. This is a comprehensive work. The logical sequence of experiments makes sense and follows an unexpected observation through to trying to understand that process further with peptide immunization and its impact on Th responses. This work will premise further studies into the mechanisms of adjuvants on T cells.

      Weaknesses:

      Comment 1. While MDP has a different manner of interaction as an adjuvant compared to CpG and MPLA, it is unclear why MDP has a different impact on peptide presentation and it should be further investigated, or at minimum highlighted in the discussion as an area that requires further investigation.

      Thank you for the suggestion. We investigated the reasons for the different effects of MDP on peptide presentation compared with those of CpG and MPLA. We found that the expression of some proteins involved in antigen processing and presentation, such as CTSS, H2-DM, Ifi30, and CD74, was substantially lower in the MDP-treated group than in the CpG- and MPLA-treated groups. To further confirm whether these proteins play a key role during adjuvant modification of peptide presentation, we knocked down them using shRNA and then performed immunopeptidomics. The original mass spectra and peptide spectrum matches have been deposited in the public proteomics repository iProX (https://www.iprox.cn/page/home.html) under accession number IPX0007611000. Unfortunately, the expected results for peptide presentation repertoires were not observed. Thus, we hypothesized that the different effects of MDP on peptide presentation might not result from differences in protein expression. We cannot exclude the possibility that some other proteins that may be important in this process were overlooked. We are still working on the mechanisms and do not have an exact conclusion. Thus, we did not present related data in this manuscript.

      The related statements were added in the Discussion section on page 13, lines 292–299: “In this study, we found that the peptide repertoires presented by APCs were significantly affected by the adjuvants CpG and MPLA, but not MDP. All three adjuvants belong to the PRR ligand adjuvant family. CpG and MPLA bind to TLRs and MDP is recognized by NOD2. Although the receptors are different, many common molecules are involved both in TLR and NLD pathway activation. Unfortunately, we did not demonstrate why the MDP had different impacts on peptide presentation compared with other adjuvants. Further investigation is required to clarify the mechanism by which MPLA, CpG, and MDP adjuvants modulate the presentation of peptides with different stabilities.”

      Comment 2. It is alluded by the authors that TLR activating adjuvants mediate selective, low affinity, exogenous peptide binding onto MHC class II molecules. However, this was not demonstrated to be related specifically to TLR binding. I wonder if some work with TLR deficient mice (TLR 4KO for example) could evaluate this phenomenon more specifically.

      Thank you for the suggestion. This is an important point that was overlooked in this study. Based on published research on the mechanisms of PRR adjuvants, CpG and MPLA, we believe that the effect of CpG and MPLA on APCs-selective epitope presentation needs to be bound to the corresponding receptor, although we did not give a definitive conclusion in the manuscript.

      To confirm the TLR-activating adjuvants affecting peptides presented on MHC molecules specifically through TLR binding, we have used CRISPR-cas9 to knock out TLR4 and TLR9 of A20 cells and repeated the experiments, as suggested. We chose TLR4- and TLR9- knockout A20 cell lines instead of TLR-deficient mice because a large number of APCs are required for immunopeptidomics. Moreover, the data observed in this study were based on the A20 cell line. However, these experiments are time-consuming. Unfortunately, we were unable to provide timely data. In addition, we believe that elucidating the downstream molecular mechanisms of TLR activation is necessary, as mentioned in comment 1. All these data will be combined and reported in our upcoming publications.

      Comment 3. It is unclear to me if this observation is H pylori model/antigen-specific. It may have been nice to characterize the phenomenon with a different set of antigens as supplemental. Lastly, it is unclear if the peptide immunization experiment reveals a clear pattern related to high and low-stability peptides among the peptides analyzed.

      Q1: It is unclear to me if this observation is H. pylori model/antigen-specific. It may have been nice to characterize the phenomenon with a different set of antigens as supplemental.

      Thank you for the comment. To confirm the effect of the adjuvant on the exogenous peptide repertoire presented by MHC II molecules, a set of antigens from another bacterium, Pseudomonas aeruginosa, was used, and the experiments were repeated. The A20 cells were treated with CpG and pulsed with Pseudomonas aeruginosa antigens. Twelve hours later, MHC-II–peptide complexes were immunoprecipitated, and immunopeptidomics were performed. The data are shown below (Author response image 1). Information on the MHC-peptides from Pseudomonas aeruginosa is given in the Supplementary Table named “Table S3 Response to comment3”. A total of 713 and 205 bacterial peptides were identified in the PBS and CpG groups (Author response image 1A). The number of exogenous peptides in the CpG-treated group was significantly lower than that in the PBS-treated control group (Author response image 1B). A total of 568 bacterial peptides were presented only in the PBS group; 60 bacterial peptides were presented in the CpG-treated group, and 145 bacterial peptides were presented in both groups (Author response image 1C). We then analyzed the MHC-binding stability of the peptides present in the adjuvant-treated group and that of the peptide-deficient after adjuvant stimulation using the IEDB website. We found that the IC50 of the peptides in the adjuvant-treated group were much higher than those of the deficient peptides, which indicated that the peptides presented in the CpG-treated groups have lower binding stability for MHC-II (Author response image 1D). These results indicate that CpG adjuvant affects the presentation of exogenous peptides with high binding stability, which is consistent with the data reported in our manuscript. Using another set of antigens, we confirmed that our observations were not H. pylori model- or antigen-specific.

      Author response image 1.

      MHC-II peptidome measurements in adjuvant-treated APCs pulsed with Pseudomonas aeruginosa antigens. (A) Total number of bacterial peptides identified in the PBS- and CpG-treated groups. (B) The number and length distribution of bacterial peptides in different groups were compared. (C) Venn diagrams showing the distribution of bacterial peptides in different groups. (D) IC50 of the presented, deficient, and co-presented peptides post-adjuvant stimulation from immunopeptidome binding to H2-IA and H2-IE were predicted using the IEDB website. High IC50 means low binding stability. *p<0.05, **p<0.01.

      Q2: Lastly, it is unclear if the peptide immunization experiment reveals a clear pattern related to high and low-stability peptides among the peptides analyzed.

      In this study, we used a peptide immunization experiment to evaluate the responses induced by the screened peptides with different stabilities. In addition to this method, tetramer staining and ELISA have been used to assess epitope-specific T-cell proliferation and cytokine secretion. Among these, tetramer staining is often used in studies involving model antigens. However, as many peptides were screened in our study, synthesizing a sufficient number of tetramers was difficult. However, we believe that the experimental data obtained in this study support the conclusion. Nevertheless, we agree that more methods applied will make the pattern more clearly.

      Reviewer #2 (Public Review):

      Adjuvants boost antigen-specific immune responses to vaccines. However, whether adjuvants modulate the epitope immunodominance and the mechanisms involved in adjuvant's effect on antigen processing and presentation are not fully characterized. In this manuscript, Li et al report that immunodominant epitopes recognized by antigen-specific T cells are altered by adjuvants.

      Using MPLA, CpG, and MDP adjuvants and H. pylori antigens, the authors screened the dominant epitopes of Th1 responses in mice post-vaccination with different adjuvants and found that adjuvants altered antigen-specific CD4+ T cell immunodominant epitope hierarchy. They show that adjuvants, MPLA and CpG especially, modulate the peptide repertoires presented on the surface of APCs. Surprisingly, adjuvant favored the presentation of low-stability peptides rather than high-stability peptides by APCs. As a result, the low stability peptide presented in adjuvant groups elicits T cell response effectively.

      Thanks a lot for your comments.

      Reviewer #1 (Recommendations For The Authors):

      Recommendation 1. Figure 6: The peptides considered low affinity- it would be helpful to specify from which adjuvant they were collected from. When they are pooled it is unclear if we are analyzing peptides collected from adjuvanting with any of the three adjuvants studied.

      Thank you for the suggestion. The related description in Figure 6 has been modified in the revised manuscript. Data for the peptides identified from the adjuvants MPLA- and CpG-treated groups are shown separately.

      Recommendation 2. It is unclear to me why the A20 cell line is less preferred to the J774 line for the immunopeptidome analysis - can the authors expand on this?

      We apologize for not clearly explaining this in the original manuscript. In fact, the A20 cell line is better than J774A.1 cell line for immunopeptidomics experiments. Compared to J774A.1 cells, more MHC-II peptides were obtained from a smaller number of A20 cells using immunopeptidomics. At the beginning of this study, we chose the J774A.1 cell line as it is a macrophage cell line. J774A.1 cells (up to 5×108) were pulsed with the antigens, and MHC-II–peptide complexes were eluted from the cell surface for immunopeptidomics. Unfortunately, only a few hundred peptides from the host were detected and no exogenous peptides were detected. Next, we tested the A20 cell line. In total, 108 A20 cells were used in this study. More than 3500 host peptides and approximately 50 exogenous peptides have been identified. These data indicate that the A20 cell line was better.

      To investigate the reasons for this, we detected MHC-II expression on cell surfaces using FACS. Our purpose was to elute peptides from MHC–peptide complexes present on the cell surface. Low MHC expression resulted in the elution of a few peptides. We found the MFI of MHC-II molecules on J774A.1 cell is about 500; however, the MFI of MHC-II molecules on A20 cells is more than 300,000. These data indicate that MHC-II expression on A20 cells was much higher than that on J774A.1 cells. J774A.1 cell is a macrophage cell line. Macrophages have excellent antigen phagocytic capabilities; however, their ability to present antigens is relatively weak. MHC molecules on the macrophage cell surface can be upregulated in the stimulation of some cytokines, for example, IFN-γ. In this study, we used adjuvants as stimulators and did not want to use additional cytokine stimulators. Thus, J774A.1 cells were not used in the present study.

      The related statements are reflected on page 6 lines 120–128 “We also selected another H-2d cell J774A.1, a macrophage cell line, for immunopeptidome analysis in this study. Briefly, 5×108 J774A.1 cells were used for immunopeptidomics. Moreover, fewer than 350 peptides were observed at a peptide spectrum match (PSM) level of < 1.0% false discovery rate (FDR). However, more than 5500 peptides were detected in 108 A20 cells at FDR < 1.0% (Figure S2A). CD86 and MHC-II molecule expression on J774A.1 cells was substantially lower than that on A20 cells (Figure S2B). Low MHC-II expression on J774A.1 cells could be the reason for the lack of peptides identified by LC–MS/MS. Thus, A20 cells instead of J774A.1 cells were used for the subsequent experiments.”

      Recommendation 3. Lines 172-177, can more details be provided about the whole proteome analysis? The plots are shown for relative representation of protein expression to PBS, but it is unclear to me what examples of these proteins are (IFN pathway, Ubiquitination pathway). Could these be confirmed by protein expression analyses in supplemental?

      Thank you for the suggestion. In this study, we conducted whole proteome analysis to investigate changes in protein expression across different pathways in the adjuvant groups. Through KEGG enrichment analysis, we compared the differential expression of MHC presentation pathway proteins (such as H2-M, Ifi30, CD74, CTSS, proteasome, and peptidase subunits) between the PBS- and adjuvant-treated groups using our proteome data. In addition, we focused on IFN and ubiquitination pathways that play crucial roles in antigen presentation modification and immune response. The proteins and their relative expression in these pathways are shown in Figure S4B. Details regarding the protein names and expressions are provided in Supplemental Table S2 of the revised manuscript.

      The original statements in the results “Then, we analyzed the whole proteome data to determine whether the proteins involved in antigen presentation and processing were altered. We found that proteins involved in antigen processing, peptidase function, ubiquitination pathway, and interferon (IFN) signaling were altered post adjuvants treatment, especially in MPLA and CpG groups (Figure 5C; Figure S4B and S4C). These data suggest that adjuvants MPLA and CpG may affect the antigen processing of APCs, resulting in fewer peptides presentation.” This has been revised on page 8 lines 172–182 as “We then investigated whole-proteome data to determine the evidence of adjuvant modification of antigen presentation. We focused on the proteins involved in antigen processing, peptidase function, ubiquitination pathway, and IFN signaling. The ubiquitination pathway and IFN signaling play crucial roles in the modification of antigen presentation and immune responses. Through KEGG enrichment analysis, we found that many proteins involved in antigen processing, peptidase function, ubiquitination pathways, and IFN signaling were altered after adjuvant treatment, particularly in the MPLA- and CpG-treated groups (Figure 5C; Figure S4B). The expression of each protein is shown in Figure S4C and Supplementary Table 2. These data suggest that MPLA and CpG adjuvants may affect the antigen processing of APCs, resulting in fewer peptide presentations.”

      Recommendation 4. Lines 212-218: I think there needs to be more discussion of interpretation here. Only one of the low-stability peptides required low concentrations for CD4+ T cell responses in vitro. What about the other peptides in the analysis? Perhaps if the data is taken together there is not a clear pattern?

      Thank you for the comment. In this study, epitope-specific CD4+ T-cells were expanded in vitro from the spleens of peptide-pool-immunized mice. T-cell responses to individual peptides were detected using ICS and FACS. Only one peptide, recA #23, with low binding stability, and one high-stability peptide, ureA #2, induced effective T-cell responses. Peptide ureA #3 with high stability induces low Th1 responses. The other peptides cannot induce CD4+ T-cell secreting IFN-γ (Data are shown in Author response image 2). Thus, we compared the strength of IFN-γ responses induced by these three peptides at a set of low concentrations. Data for other peptides without any response could not be taken together.

      Author response image 2.

      The expanded CD4+T cells from peptides immunized mice were screened for their response to the peptides in an ICS assay.

      In this study, we used a peptide pool containing four low-stability peptides to vaccinate mice; however, only one peptide induced an effective CD4+ T-cell response. We speculate that the possible reasons are as follows. First, the number of peptides used for vaccination is too small. Only four low-stability peptides were synthesized and used to immunize mice. Three of these could not induce an effective T-cell response, possibly because of their low immunogenicity. If more peptides are synthesized and used, more peptides that induce T-cell responses may be observed. Second, epitope-specific T-cell responses are variable. Responses to the subdominant peptides can be inhibited by the dominant peptide. The subdominant peptide can become dominant by changing the peptide dose or in the absence of the dominant peptide. Thus, we believe that responses to the other three peptides may be detected if mice are immunized with a peptide pool that does not contain a response epitope.

      The corresponding statements have been added to the Discussion section on page 13 lines 287–291 as “Unfortunately, only one peptide, recA #23, with low binding stability and induced significant Th1 responses, was identified in this study. To further confirm that low-stability peptides can induce stronger and higher TCR-affinity antigen-specific T-cell clonotype responses than high-stability peptides, further studies should monitor more peptides with different stabilities.”

      Recommendation 5. There are some areas where additional editing to text would be beneficial due to grammar (eg lines 122-126; line 116, etc).

      The manuscript has been edited by a professional language editing company.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 1. It is interesting that there was no difference in IFNg responses induced by different adjuvants.

      Thank you for the comment. Possible reasons for the lack of difference in IFN-γ responses could be as follows. First, all adjuvants used in this study have been confirmed to effectively induce Th1 responses. Second, in this study, IFN-γ responses were examined using expanded antigen-specific T cells in vitro. The in vitro cell expansion efficiency may have affected these results.

      Recommendation 2. The data to support the claim that changes in exogenous peptide presentation among adjuvant groups were not due to differences in antigen phagocytosis is insufficient.

      Thank you for the comment. In this study, proteomics of A20 cells pulsed with antigens in different adjuvant-treated groups were used to determine exogenous antigens phagocytosed by cells. In addition, we used fluorescein isothiocyanate (FITC)-labeled OVA to pulse APCs and detected antigen phagocytosis by APCs after treatment with different adjuvants. The MFI of FITC was detected by FACS at different time points. The data are shown below (Author response image 3). No obvious differences in FITC MFI were detected after adjuvant stimulation, indicating that antigen phagocytosis among the adjuvant groups was almost the same.

      A20 cells, used as APCs, are the B-cell line. Antigen recognition and phagocytosis by B-cells depends on the B-cell receptor (BCR) on the cell surface. The ability of BCRs to bind to different antigens varies, leading to significant differences in the phagocytosis of different antigens by B-cells. Therefore, detecting the phagocytosis of a single antigen may not reflect the overall phagocytic state of the B-cells. Thus, in this study, we used proteomics to detect exogenous proteins in B-cells pulsed with H. pylori antigens, which contain thousands of components, to evaluate their overall phagocytic capacity. Only the proteomic data are presented in our manuscript.

      Author response image 3.

      Antigen phagocytosis of A20 cells were measured using FITC-labeled OVA. (A) A20 cells were pulsed with FITC-labeled OVA. MFI of FITC was measured after 1 h. (B) MFI of FITC was examined post the stimulation of adjuvants at different time points.

      Recommendation 3. It is not clear how MPLA, CpG, and MDP adjuvants modulate the presentation of low vs high stability peptides.

      Thank you for pointing this out. We acknowledge that we did not clarify the mechanisms by which adjuvants affect the stability of the peptide presentations of APCs.

      We performed experiments to detect the expression of proteins involved in antigen processing and presentation in the different adjuvant-treated groups. Furthermore, shRNAs were used to knock down the expression of key molecules. Immunopeptidomics was used to detect peptide presentation. Unfortunately, the expected results for peptide presentation repertoires were not observed. We are still working on the mechanisms.

      Please also see our response to comment 1 of reviewer 1

      The related statements were added in the Discussion section on page 13, lines 292–299: “In this study, we found that the peptide repertoires presented by APCs were significantly affected by the adjuvants CpG and MPLA, but not MDP. All three adjuvants belong to the PRR ligand adjuvant family. CpG and MPLA bind to TLRs and MDP is recognized by NOD2. Although the receptors are different, many common molecules are involved both in TLR and NLD pathway activation.  Unfortunately, we did not demonstrate why the MDP had different impacts on peptide presentation compared with other adjuvants. Further investigation is required to clarify the mechanism by which MPLA, CpG, and MDP adjuvants modulate the presentation of peptides with different stabilities.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Valk and Engert et al. examined the potential relations between three different mental training modules, hippocampal structure and functional connectivity, and cortisol levels over a 9-month period. They found that among the three types of mental training: Presence (attention and introspective awareness), Affect (socio-emotional - compassion and prosocial motivation), and Perspective (socio-cognitive - metacognition and perspective taking) modules; Affect training most consistently related to changes in hippocampal structure and function - specifically, CA1-3 subfields of the hippocampus. Moreover, decreases in diurnal cortisol correlated to bilateral increases in volume, and decreases in diurnal and chronic cortisol left CA1-3 functional connectivity. Chronic cortisol levels also related to right CA4/DG volume and left subiculum function. The authors demonstrate that mindfulness training programs impact hippocampus and are a potential avenue for stress interventions, a potential avenue to improve health. The data contribute to the literature on plasticity of hippocampal subfields during adulthood, the impact of mental training interventions on the brain, and the link between CA1-3 and both short- and long-term stress changes. Additional clarification and extension of the methods is needed to strengthen the authors' conclusions.

      We thank the Reviewer for their positive evaluation and summary of our findings and work. We made additional changes as suggested by the Reviewer and hope this clarified any open points.

      (1) The authors thoughtfully approached the study of hippocampal subfields, utilizing a method designed for T1w images that outperformed Freesurfer 5.3 and that produced comparable results to an earlier version of ASHS. However, given the use of normalized T1-weighted images to delineate hippocampal subfield volume, some caution may be warranted (Wisse et al. 2020). While the authors note the assessment of quality control processes, the difficulty in ensuring valid measurement is an ongoing conversation in the literature. This also extends to the impact of functional co-registration using segmentations. I appreciate the inclusion of Table 5 in documenting reasons for missing data across subjects. Providing additional details on the distribution of quality ratings across subfields would help contextualize the results and ensure there is equal quality of segmentations across subfields.

      We thank the Reviewer for bringing up this point. In the current work, we assessed the overall segmentation of all six subfields per individual. Thus, unfortunately, we have no data of quality of segmentation of individual subfields beyond our holistic assessment. Indeed, registration of hippocampal subfields remains a challenge and we have further highlighted this limitation in the Discussion of the current work.

      “It is of note that the current work relies on a segmentation approach of hippocampal subfields including projection to MNI template space, an implicit correction for total brain volume through the use of a stereotaxic reference frame. Some caution for this method may be warranted, as complex hippocampal anatomy can in some cases lead to over- as well as underestimation of subfield volumes, as well as subfield boundaries may not always be clearly demarcated (1). Future work, studying the hippocampal surface at higher granularity, for example though unfolding the hippocampal sheet (2-5), may further help with both alignment and identification of not only subfield-specific change but also alterations as a function of the hippocampal long axis, a key dimension of hippocampal structural and functional variation that was not assessed in the current work (6, 7).”

      (2) Given the consistent pattern of finding results with CA1-3, in contrast to other subfields, it would help to know if the effects of the different training modules on subfields differed from each other statistically (i.e., not just that one is significant, and one is not) to provide an additional context of the strength of results focused on Affect training and CA1-3 (for example, those shown in Figure 3).

      Our work investigated i) whether the effects of the individual Training Modules differed from each other statistically. We found that the Affect Training Module showed increases in CA1-3 volume, and that these increases remained when testing effects relative to changes in this subfield following Perspective training and in retest controls. Moreover, in CA1-3 we found changes in functional connectivity when comparing the Affect to Perspective training Module. These changes were only present in this contrast, but not significant in each of the Training Modules per se. To test for specificity, we additionally evaluated whether subfield-specific changes were present above and beyond changes in the other ipsilateral hippocampal subfields. Relative to other subfields, right CA1-3 showed increases in the Affect vs Perspective contrast (left: t-value: 2.298, p=0.022, Q>0.1; right: t-value: 3.045, p=0.0025, Q=0.015). No other subfield showed significant changes. We now include this statement in the revised Results and Supplementary Tables.

      “Moreover, associations between CA1-3 and Affect, relative to Perspective, seemed to go largely above and beyond changes in the other subfields (left: t-value: 2.298, p=0.022, Q>0.1; right: t-value: 3.045, p=0.0025, Q=0.015, see further Supplementary File 1h).”

      Author response table 1.

      Subfield-specific changes following the Training Modules, controlling for the other two ipsilateral subfields

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 1, using different colors for subfields versus the modules (yellow, red, green) would help as it could lead the reader to try to draw connections between the two when it is namely a depiction of the delineations.

      As suggested, we updated Figure 1 accordingly and present the subfields in different shades of purple for clarity. Please find the updated figure below.

      Author response image 1.

      (2) In the Results, it was at times hard to follow when Affect off Perspective where the focus of the results. Perhaps the authors could restructure or add additional context for clarity.

      We are happy to clarify. For the first analysis on Module-specific changes in hippocampal subfield volume, we compared effects across Training Modules. Here, main contrasts were ran between subjects: Presence vs active control and within subjects: Affect versus Perspective. In additional secondary contrasts, we studied training effects vs retest control. After observing consistent increases in bilateral CA1-3 following Affect, in the following analysis, we evaluated 1) intrinsic functional networks in main and supplementary contrasts and 2) diurnal cortisol measures within the Training modules only and all three Training Modules combined, and also adopted 3) a multivariate approach (PLS) (see comments Reviewer 2). We now also report effects of cortisol change on structural and functional subfield change in Presence and Perspective, for additional completeness and clarity.

      “To study whether there was any training module-specific change in hippocampal subfield volumes following mental training, we compared training effects between all three Training Modules (Presence, Affect, and Perspective). Main contrasts were: Presence vs Active control (between subjects) and Affect vs Perspective (within subjects). Supplementary comparisons were made vs retest controls and within training groups.”

      “Overall, for all hippocampal subfields, findings associated with volume increases in CA1-3 fol-lowing the Affect training were most consistent across timepoints and contrasts (Supplementary File 1a-f).”

      “Subsequently, we studied whether hippocampal CA1-3 would show corresponding changes in intrinsic function following the Affect mental training.”

      “In particular, the moderately consistent CA1-3 volume increases following Affect training were complemented with differential functional connectivity alterations of this subfield when comparing Affect to Perspective training”

      “Last, we probed whether group-level changes in hippocampal subfield CA1-3 volume would correlate with individual-level changes in diurnal cortisol indices (Presence: n= 86; Affect: n=92; Perspective: n=81), given that the hippocampal formation is a nexus of the HPA-axis (8). We took a two-step approach. First, we studied associations between cortisol and subfield change, particularly focusing on the Affect module and CA1-3 volume based on increases in CA1-3 volume identified in our group-level analysis.”

      “We observed that increases in bilateral CA1-3 following Affect showed a negative association with change in total diurnal cortisol output […]”

      “We did not observe alterations in CA1-3 volume in relation to change in cortisol markers in Presence or Perspective. Yet, for Presence, we observed association between slope and LCA4/DG change (t=-2.89, p=0.005, q=0.03), (Supplementary File 1uv).”

      “In case of intrinsic function, we also did not observe alterations in CA1-3 in relation to change in cortisol markers in Presence or Perspective, nor in other subfields (Supplementary File 1wx).”

      Author response table 2.

      Correlating change in subfield volume and diurnal cortisol indices in Presence. Main focus was on CA1-3 based on volumetric observations and are highlighted in bold.

      Author response table 3.

      Correlating change in subfield volume and diurnal cortisol indices in Perspective. Main focus was on CA1-3 based on volumetric observations and are highlighted in bold.

      Author response table 4.

      Association between stress-markers and within functional network sub-regions in Affect and Perspective.

      Author response table 5.

      Correlating change in subfield function and diurnal cortisol indices in Presence. Main focus was on CA1-3 based on volumetric observations and are highlighted in bold. For these multiple comparisons (FDRq, corrected for two subfields) values are reported if uncorrected p values are below p<.05.

      Author response table 6.

      Correlating change in subfield function and diurnal cortisol indices in Perspective. Main focus was on CA1-3 based on volumetric observations and are highlighted in bold. For these multiple comparisons (FDRq, corrected for two subfields) values are reported if uncorrected p values are below p<.05.

      (3) In the Methods, the authors note that corrections for multiple comparisons were used where needed, throughout the manuscript there is some switching between corrected and uncorrected p-values. At times, this made it difficult to follow in terms of when these corrections were needed.

      For clarity, we added explicit multiple comparisons information a) in main and supplementary results, and b) wherever extra information was needed. Also, we only included main contrasts in Table 1-3 to avoid confusion and moved the information on changes in SUB and CA4/DG to the Supplementary tables.

      (4) Typically, when correcting for intracranial volume the purpose is the ensure that sexual dimorphism in the size of the brain is accounted for. I would recommend the authors assess whether sex differences are accounted for by the MNI normalization approach taken. In the reading of the original Methods paper for the patch-based algorithm used, ICV was used to transform to MNI152 space. It would help to have additional information on how the normalization was done in the current study in order to draw comparisons to other findings in the literature.

      We are happy to further clarify. In the current work, we used the same approach as in the original paper. Volumes were linearly registered to the MNI template using FSL flirt. We now provided this additional information in the revised methods.

      “Hippocampal volumes were estimated based on T1w data that were linearly registered to MNI152 using FSL flirt (http://www.fmrib.ox.ac.uk/fsl/), such that intracranial volume was implicitly controlled for.”

      We agree with the Reviewer that sex differences may still be present, and investigated this. At baseline, sex differences were found in all subfields in the left hemisphere, and right CA4/DG (FDRq<0.05). Regressing out ICV resolved remaining sex differences. We then evaluated whether main results of volumetric subfield change were impacted by ICV differences. Differences between Affect and Perspective remained stable. We have now added this additional analysis in the Supplementary Materials.

      “Although stereotaxic normalization to MNI space would in theory account for global sex differences in intra-cranial volume, we still observed sex differences in various subfield volumes at baseline. Yet, accounting for ICV did not impact our main results suggesting changes in CA1-3 following Affect were robust to sex differences in overall brain volume (Supplementary File1j).”

      Author response table 7.

      Sex differences (female versus male) in hippocampal subfield volumes.

      Reviewer #2 (Public Review):

      In this study, Valk, Engert et al. investigated effects of stress-reducing behavioral intervention on hippocampal structure and function across different conditions of mental training and in relation to diurnal and chronic cortisol levels. The authors provide convincing multimodal evidence of a link between hippocampal integrity and stress regulation, showing changes in both volume and intrinsic functional connectivity, as measured by resting-state fMRI, in hippocampal subfield CA1-3 after socio-affective training as compared to training in a socio-cognitive module. In particular, increased CA1-3 volume following socio-affective training overlapped with increased functional connectivity to medial prefrontal cortex, and reductions in cortisol. The conclusions of this paper are well supported by the data, although some aspects of the data analysis would benefit from being clarified and extended.

      A main strength of the study is the rigorous design of the behavioral intervention, including test-retest cohorts, an active control group, and a previously established training paradigm, contributing to an overall high quality of included data. Similarly, systematic quality checking of hippocampal subfield segmentations contributes to a reliable foundation for structural and functional investigations.

      We thank the Reviewer for the thoughtful summary and appreciation of our work, as well as requests for further clarification and analyses. We addressed each of them in a point by point fashion below.

      Another strength of the study is the multimodal data, including both structural and functional markers of hippocampal integrity as well as both diurnal and chronic estimates of cortisol levels.

      (1) However, the included analyses are not optimally suited for elucidating multivariate interrelationships between these measures. Instead, effects of training on structure and function, and their links to cortisol, are largely characterized separately from each other. This results in the overall interpretation of results, and conclusions, being dependent on a large number of separate associations. Adopting multivariate approaches would better target the question of whether there is cortisol-related structural and functional plasticity in the hippocampus after mental training aimed at reducing stress.

      We thank the Reviewer for this suggestion. Indeed, our project combined different univariate analyses to uncover the association between hippocampal subfield structure, function, and cortisol markers. While systematic, a downside of this approach is indeed that interpretation of our results depend on a large number of analyses. To further explore the question whether there is cortisol-related structural and functional plasticity in the hippocampus, we followed the Reviewer’s suggestion and additionally adopted a multivariate partial least squares (PLS) model. We ran two complementary models. One focusing on the bilateral CA1-3, as this region showed increases in volume following Affect training and differential change between Affect and Perspective training in our resting state analyses and one model including all subfields. Both models included all stress markers. We found that both models could significantly relate stress markers to brain measures, and that in particular Affect showed strong associations with significant the latent markers. Both analyses showed inverse effects of structure and function in relation to stress markers and both slope and AUC changes showed strongest loadings. We now include these analyses the revised manuscript.

      Abstract

      “Of note, using a multivariate approach we found that other subfields, showing no group-level changes, also contributed to alterations in cortisol levels, suggesting circuit-level alterations within the hippocampal formation.”

      Methods

      “Partial least squares analysis

      To assess potential relationships between cortisol change and hippocampal subfield volume and functional change, we performed a partial least squares analysis (PLS) (9, 10). PLS is a multivariate associative model that to optimizes the covariance between two matrices, by generating latent components (LCs), which are optimal linear combinations of the original matrices (9, 10). In our study, we utilized PLS to analyze the relationships between change in volume and intrinsic function of hippocampal subfields and diurnal cortisol measures. Here we included all Training Modules and regressed out effects of age, sex, and random effects of subject on the brain measures before conducting the PLS analysis. The PLS process involves data normalization within training groups, cross-covariance, and singular value decomposition. Subsequently, subfield and behavioral scores are computed, and permutation testing (1000 iterations) is conducted to evaluate the significance of each latent factor solution (FDR corrected). We report then the correlation of the individual hippocampal and cortisol markers with the latent factors. To estimate confidence intervals for these correlations, we applied a bootstrapping procedure that generated 100 samples with replacement from subjects’ RSFC and behavioral data.”

      Results

      “Last, to further explore the question whether there is concordant cortisol-related structural and functional plasticity in the hippocampus we adopted a multivariate partial least square approach, with 1000 permutations to account for stability (9, 10) and bootstrapping (100 times) with replacement. We ran two complementary models including all Training Modules whilst regressing out age, sex and random effects of subject. First, we focused on the bilateral CA1-3, as this region showed increases in volume following Affect training and differential change between Affect and Perspective training in our resting state analyses. In the second model included structural and functional data of all subfields. Both models included all stress markers. We found that both models could identify significant associations between cortisol stress markers and hippocampal plasticity (FDRq<0.05), and that in particular Affect showed strongest associations with the latent markers for CA1-3 (Table 5). Both analyses showed inverse effects of subfield structure and function in relation to stress markers and both slope and AUC changes showed strongest associations with the latent factor.”

      Author response table 8.

      Multivariate PLS analyses linking cortisol markers to hippocampal subfield volume and function.

      Discussion

      “Last, performing multivariate analysis, we again observed associations between CA1-3 volume and function plasticity and stress change, strongest in Affect. Yet combining all subfields in a single model indicated that other subfields also link to stress alterations, indicating that ultimately circuit-level alterations within the hippocampal formation relate to latent changes in diurnal stress markers across Training Modules.”

      “This interpretation is also supported by our multivariate observations.”

      “In line with our observations in univariate analysis, we found multivariate associations between hippocampal subfield volume, intrinsic function and cortisol markers. Again, the contribution of volume and intrinsic function was inverse. This may possibly relate to the averaging procedure of the functional networks. Combined, outcomes of our univariate and multivariate analyses point to an association between change in hippocampal subfields and stress markers, and that these changes, at the level of the individual, ultimately reflect complex interactions within and across hippocampal subfields and may capture different aspects of diurnal stress. Future work may more comprehensively study the plasticity of the hippocampal structure, and link this to intrinsic functional change and cortisol to gain full insights in the specificity and system-level interplay across subfields, for example using more detailed hippocampal models (3). Incorporating further multivariate, computational, models is needed to further unpack and investigate the complex and nuanced association between hippocampal structure and function, in particular in relation to subfield plasticity and short and long-term stress markers.”

      “…based on univariate analysis. Our multivariate analysis further nuanced this observation, but again pointed to an overall association between hippocampal subfield changes and cortisol changes, but this time more at a systems level.”

      “Lastly, our multivariate analyses also point to a circuit level understanding of latent diurnal stress scores.”

      Author response image 2.

      Multivariate associations between changes in structure and function of hippocampal subfield volume and markers of stress change in Affect. A) Multivariate associations between bilateral CA1-3 volume and intrinsic function and stress markers. Left: Scatter of loadings, colored by Training Module; Right upper: individual correlations of stress markers; Right lower: individual correlation of subfields; B). Multivariate associations between all subfields’ volume and intrinsic function and stress markers. Left: Scatter of loadings, colored by Training Module; Right upper: individual correlations of stress markers; Right lower: individual correlation of subfields.

      (2) The authors emphasize a link between hippocampal subfield CA1-3 and stress regulation, and indeed, multiple lines of evidence converge to highlight a most consistent role of CA1-3. There are, however, some aspects of the results that limit the robustness of this conclusion. First, formal comparisons between subfields are incomplete, making it difficult to judge whether the CA1-3, to a greater degree than other subfields, display effects of training.

      We thank the Reviewer for this comment. To further test for specificity, we additionally evaluated subfield-specific changes relative to other subfields for our main contrasts (Presence versus Active Control and Affect versus Perspective). Relative to other subfields, right CA1-3 showed increases in the Affect vs Perspective contrast (left: t-value: 2.298, p=0.022, Q>0.1; right: t-value: 3.045, p=0.0025, Q=0.015); no other subfield showed significant changes. We now include this statement in Results and Supplementary Tables.

      “Moreover, associations between CA1-3 and Affect, relative to Perspective, seemed to go largely above and beyond changes in the other subfields (left: t-value: 2.298, p=0.022, Q>0.1; right: t-value: 3.045, p=0.0025, Q=0.015, see further Supplementary File 1h).”

      Author response table 9.

      Subfield-specific changes following the Training Modules, controlling for the other two ipsilateral subfields

      (3) Relatedly, it would be of interest to assess whether changes in CA1-3 make a significant contribution to explaining the link between hippocampal integrity and cortisol, as compared to structure and functional connectivity of the whole hippocampus.

      We thank the Reviewer for this comment. Please see the PLS analysis performed above (R2Q1). Indeed, not only CA1-3 but also other subfields seem to show a relationship with cortisol, in line with circuit level accounts on stress regulation and hippocampal circuit alterations (8, 11-15).

      (4) Second, both structural and functional effects (although functional to a greater degree), were most pronounced in the specific comparison of "Affect" and "Perspective" training conditions, possibly limiting the study's ability to inform general principles of hippocampal stress-regulation.

      We agree with the Reviewer that the association between stress and hippocampal plasticity, on the one hand, and mental training and hippocampal plasticity, on the other hand, make it not very straightforward to inform general principles on hippocampal stress regulation. However, as underscored in the discussion, in previous work we could also link mental training to stress reductions(16-18). We hope that the additional analyses and explanations further explain the multilevel insights of the current work, on the one hand using group-level analysis to investigate and illustrate the association between mental training and hippocampal subfield volume and intrinsic function, and on the other hand using individual level analysis to unpack the association between cortisol change and hippocampal subfield change.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the Results, the description of how the hippocampal subfields' functional networks were defined would benefit from some clarification. It is also somewhat unclear what is meant by (on page 10): "Evaluating functional connectivity changes, we found that connectivity of the right CA1-3 functional network showed differential changes when comparing Affect training to Perspective training (2.420, p=0.016, FDRq=0.032, Cohens D =0.289), but not versus retest control (Table 1 and Supplementary Table 8-14)." Were there significant changes in CA1-3 FC following both training conditions (but these differed from each other)? A description of what this difference reflected would increase the reader's understanding.

      We are happy to clarify. We included information of change of individual modules in the Supplementary materials, Supplementary Table 1 and 2, 9 and 10. Changes for functional connectivity were largely due to the differences in Modules, but did not show strong effects in one Module alone. We now include information on Affect and Perspective un-contrasted change in the main results text:

      “… which could be attributed to decreases in right CA1-3 mean FC following Perspective (t=-2.012, p=0.045, M:-0.024, std: 0.081, CI [-0.041 -0.006]), but not Affect (t=1.691, p=0.092, M: 0.010, std: 0.098, CI [-0.01 0.031]); changes were not present when comparing Affect training versus retest control (Table 1 and Supplementary File 1k-q).”

      (2) As described in the Public Review, the lack of multivariate assessments may risk selling the data short. Including analyses of concomitant functional and structural changes, in relation to cortisol, seems like an approach better adapted to characterize meaningful interrelationships between these measures.

      We thank the Reviewer for suggesting multivariate assessments. To understand the interrelation between behavioral intervention, hippocampal plasticity, and cortisol changes, the current work first evaluates a simpler operationalization of the relationship between hippocampal subfield structure and volume, and cortisol as a function of mental training. Thus, given the complex nature of the study, we initially opted for a model where we assess structural and functional changes independently, with structural changes as the basis of our investigations. Now we have also included a multivariate approach (PLS) to further test the association between hippocampal subfields and cortisol markers, please see our additions to the manuscript above. We now highlighted multivariate associations in the Discussion as well, and suggest this as an important next step for more detailed, future investigations.

      “Incorporating further multivariate, computational, models is needed to further unpack and investigate the complex and nuanced association between hippocampal structure and function, in particular in relation to subfield plasticity and short and long-term stress markers.”

      (3) A minor comment regards the Figures. Some main effects should be visualized in a clearer manner. For instance, the scatterplots in Figure 1, panel D. Also, some of the current headings within the figures could be made more intuitive to the reader.

      We thank the Reviewer for this comment. To improve clarity, we updated figure headings. For Figure 1D, the challenge is that the data are quite scattered and we aimed to visualize our observations in a naturalistic way. Therefore, we added additional y-axis information to further clarify the figures. Creating more overlap or differentiation would make other elements of the figure less clear, hence we remained with the current set-up detailing the intra- and inter-individual alterations of the current model.

      (1) Wisse LEM, Chetelat G, Daugherty AM, de Flores R, la Joie R, Mueller SG, et al. (2021): Hippocampal subfield volumetry from structural isotropic 1 mm(3) MRI scans: A note of caution. Hum Brain Mapp. 42:539-550.

      (2) DeKraker J, Kohler S, Khan AR (2021): Surface-based hippocampal subfield segmentation. Trends Neurosci. 44:856-863.

      (3) DeKraker J, Haast RAM, Yousif MD, Karat B, Lau JC, Kohler S, et al. (2022): Automated hippocampal unfolding for morphometry and subfield segmentation with HippUnfold. Elife. 11.

      (4) Vos de Wael R, Lariviere S, Caldairou B, Hong SJ, Margulies DS, Jefferies E, et al. (2018): Anatomical and microstructural determinants of hippocampal subfield functional connectome embedding. Proc Natl Acad Sci U S A. 115:10154-10159.

      (5) Bernhardt BC, Bernasconi A, Liu M, Hong SJ, Caldairou B, Goubran M, et al. (2016): The spectrum of structural and functional imaging abnormalities in temporal lobe epilepsy. Ann Neurol. 80:142-153.

      (6) Vogel JW, La Joie R, Grothe MJ, Diaz-Papkovich A, Doyle A, Vachon-Presseau E, et al. (2020): A molecular gradient along the longitudinal axis of the human hippocampus informs large-scale behavioral systems. Nat Commun. 11:960.

      (7) Genon S, Bernhardt BC, La Joie R, Amunts K, Eickhoff SB (2021): The many dimensions of human hippocampal organization and (dys)function. Trends Neurosci. 44:977-989.

      (8) McEwen BS (1999): Stress and hippocampal plasticity. Annu Rev Neurosci. 22:105-122.

      (9) Kebets V, Holmes AJ, Orban C, Tang S, Li J, Sun N, et al. (2019): Somatosensory-Motor Dysconnectivity Spans Multiple Transdiagnostic Dimensions of Psychopathology. Biol Psychiatry. 86:779-791.

      (10) McIntosh AR, Lobaugh NJ (2004): Partial least squares analysis of neuroimaging data: applications and advances. Neuroimage. 23 Suppl 1:S250-263.

      (11) Paquola C, Benkarim O, DeKraker J, Lariviere S, Frassle S, Royer J, et al. (2020): Convergence of cortical types and functional motifs in the human mesiotemporal lobe. Elife. 9.

      (12) DeKraker J, Ferko KM, Lau JC, Kohler S, Khan AR (2018): Unfolding the hippocampus: An intrinsic coordinate system for subfield segmentations and quantitative mapping. Neuroimage. 167:408-418.

      (13) McEwen BS, Nasca C, Gray JD (2016): Stress Effects on Neuronal Structure: Hippocampus, Amygdala, and Prefrontal Cortex. Neuropsychopharmacology. 41:3-23.

      (14) Sapolsky RM (2000): Glucocorticoids and hippocampal atrophy in neuropsychiatric disorders. Arch Gen Psychiatry. 57:925-935.

      (15) Jacobson L, Sapolsky R (1991): The role of the hippocampus in feedback regulation of the hypothalamic-pituitary-adrenocortical axis. Endocr Rev. 12:118-134.

      (16) Engert V, Hoehne K, Singer T (2023): Specific reduction in the cortisol awakening response after socio-affective mental training. Mindfulness.

      (17) Puhlmann LMC, Vrticka P, Linz R, Stalder T, Kirschbaum C, Engert V, et al. (2021): Contemplative Mental Training Reduces Hair Glucocorticoid Levels in a Randomized Clinical Trial. Psychosom Med. 83:894-905.

      (18) Engert V, Kok BE, Papassotiriou I, Chrousos GP, Singer T (2017): Specific reduction in cortisol stress reactivity after social but not attention-based mental training. Sci Adv. 3:e1700495.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Govindan and Conrad use a genome-wide CRISPR screen to identify genes regulating retention of intron 4 in OGT, leveraging an intron retention reporter system previously described (PMID: 35895270). Their OGT intron 4 reporter reliably responds to O-GlcNAc levels, mirroring the endogenous splicing event. Through a genome-wide CRISPR knockout library, they uncover a range of splicing-related genes, including multiple core spliceosome components, acting as negative regulators of OGT intron 4 retention. They choose to follow up on SFSWAP, a largely understudied splicing regulator shown to undergo rapid phosphorylation in response to O-GlcNAc level changes (PMID: 32329777). RNA-sequencing reveals that SFSWAP depletion not only promotes OGT intron 4 splicing but also broadly induces exon inclusion and intron splicing, affecting decoy exon usage. While this study offers interesting insights into intron retention and O-GlcNAc signaling regulation, the RNA sequencing experiments lack the essential controls needed to provide full confidence to the authors' conclusions. 

      Strengths: 

      (1) This study presents an elegant genetic screening approach to identify regulators of intron retention, uncovering core spliceosome genes as unexpected positive regulators of intron retention. 

      (2) The work proposes a novel functional role for SFSWAP in splicing regulation, suggesting that it acts as a negative regulator of splicing and cassette exon inclusion, which contrasts with expected SR-related protein functions. 

      (3) The authors suggest an intriguing model where SFSWAP, along with other spliceosome proteins, promotes intron retention by associating with decoy exons. 

      We thank the reviewer for recognizing and detailing the strengths of our manuscript. 

      Weaknesses: 

      (1) The conclusions on SFSWAP impact on alternative splicing are based on cells treated with two pooled siRNAs for five days. This extended incubation time without independent siRNA treatments raises concerns about off-target effects and indirect effects from secondary gene expression changes, potentially limiting confidence in direct SFSWAP-dependent splicing regulation. Rescue experiments and shorter siRNA-treatment incubation times could address these issues. 

      We repeated our SFSWAP knockdown analysis and analyzed both OGT e4-e5 junction splicing and SFSWAP transcript levels by RT-qPCR (now included in Sup. Fig. S4) from day 2 to day 5 post siRNA treatment. We observed that the time point at which OGT intron 4 removal increases (day 2) coincides with the time at which SFSWAP transcript levels start decrease, consistent with a direct effect of SFSWAP knockdown on OGT intron 4 splicing. Moreover, the effect of SFSWAP knockdown on OGT intron 4 splicing peaks between day 4-5, supporting our use of these longer time points to cast a wide net for SFSWAP targets.

      (2) The mechanistic role of SFSWAP in splicing would benefit from further exploration. Key questions remain, such as whether SFSWAP directly binds RNA, specifically the introns and exons (including the decoy exons) it appears to regulate. Furthermore, given that SFSWAP phosphorylation is influenced by changes in O-GlcNAc signaling, it would be interesting to investigate this relationship further. While generating specific phosphomutants may not yield definitive insights due to redundancy and also beyond the scope of the study, the authors could examine whether distinct SFSWAP domains, such as the SR and SURP domains, which likely overlap with phosphorylation sites, are necessary for regulating OGT intron 4 splicing. 

      We absolutely agree with the reviewer that the current work stops short of a detailed mechanistic study, and we have made every attempt to be circumspect in our interpretations to reflect that limitation. In addition, we are very interested in delving more deeply into the mechanistic aspects of this regulation. In fact, we have initiated many of the experiments suggested by the reviewer (and more), but in each case, rigorous interpretable results will require a minimum another year’s time. 

      For example, we have used crosslinking and biotin labeling techniques (using previously available reagents from Eclipsebio) to test whether SFSWAP binds RNA. The results were negative, but the lack of strong SFSWAP antibodies required that we use a transiently expressed myc-tagged SFSWAP. Therefore, this negative result could be an artifact of the exogenous expression and/or tagging. Given the difficulties of “proving the negative”, considerably more work will be required to substantiate this finding. As another example, we intend to develop a complementation assay as suggested. For an essential gene, the ideal complementation system employs a degron system, and we have spent months attempting to generate a homozygous AID-tagged SFSWAP. Unfortunately, we so far have only found heterozygotes. Of course, this could be because the tag interferes with function, the insert was not efficiently incorporated by homologous repair, or that we simply haven’t yet screened a sufficient number of clones. We’re confident that these technical issues that can be addressed, but they will take a significant amount of time to resolve. While we would ideally define a mechanism, we think that the data reported here outlining functions for SFSWAP in splicing represent a body of work sufficient for publication. 

      (3) Data presentation could be improved (specific suggestions are included in the recommendations section). Furthermore, Excel tables with gene expression and splicing analysis results should be provided as supplementary datasheets. Finally, a more detailed explanation of statistical analyses is necessary in certain sections. 

      We have addressed all specific suggestions as detailed in the recommendations below.

      Reviewer #2 (Public review): 

      Summary: 

      The paper describes an effort to identify the factors responsible for intron retention and alternate exon splicing in a complex system known to be regulated by the O-GlcNAc cycling system. The CRISPR/Cas9 system was used to identify potential factors. The bioinformatic analysis is sophisticated and compelling. The conclusions are of general interest and advance the field significantly. 

      Strengths: 

      (1) Exhaustive analysis of potential splicing factors in an unbiased screen. 

      (2) Extensive genome wide bioinformatic analysis. 

      (3) Thoughtful discussion and literature survey. 

      We thank the reviewer for recognizing and detailing the strengths of our manuscript. 

      Weaknesses: 

      (1) No firm evidence linking SFSWAP to an O-GlcNAc specific mechanism. 

      We couldn’t agree more with this critique. Indeed, our intention at the outset for the screen was to find an O-GlcNAc sensor linking OGT splicing with O-GlcNAc levels. As often occurs with high-throughput screens, we didn’t find exactly what we were looking for, but the screen nonetheless pointed us to interesting biology. Prompted by our screen, we describe new insights into the function of SFSWAP a relatively uncharacterized essential gene. Currently, we are testing other candidates from our screen, and we are performing additional studies to identify potential O-GlcNAc sensors.  

      (2) Resulting model leaves many unanswered questions. 

      We agree (see Reviewer 1, point 2 response).  

      Reviewer #3 (Public review): 

      Summary: 

      The major novel finding in this study is that SFSWAP, a splicing factor containing an RS domain but no canonical RNA binding domain, functions as a negative regulator of splicing. More specifically, it promotes retention of specific introns in a wide variety of transcripts including transcripts from the OGT gene previously studied by the Conrad lab. The balance between OGT intron retention and OGT complete splicing is an important regulator of O-GlcNAc expression levels in cells. 

      Strengths: 

      An elegant CRISPR knockout screen employed a GFP reporter, in which GFP is efficiently expressed only when the OGT retained intron is removed (so that the transcript will be exported from the nucleus to allow for translation of GFP). Factors whose CRISPR knockdown causes decreased intron retention therefore increase GFP, and can be identified by sequencing RNA of GFP-sorted cells. SFSWAP was thus convincingly identified as a negative regulator of OGT retained intron splicing. More focused studies of OGT intron retention indicate that it may function by regulating a decoy exon previously identified in the intron, and that this may extend to other transcripts with decoy exons. 

      We thank the reviewer for recognizing the strengths of our manuscript. 

      Weaknesses: 

      The mechanism by which SFSWAP represses retained introns is unclear, although some data suggests it can operate (in OGT) at the level of a recently reported decoy exon within that intron.

      Interesting/appropriate speculation about possible mechanisms are provided and will likely be the subject of future studies. 

      We completely agree that this is a limitation of the current study (see above). Now that we have a better understanding of SFSWAP functions, we will continue to explore SFSWAP mechanisms as suggested. 

      Overall the study is well done and carefully described but some figures and some experiments should be described in more detail. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Clarify and add missing statistical details across the figures. For example, Figure S2 lacks statistical comparisons, and in Figures 4A and 4C the tests applied should be specified in the legend. 

      We have added appropriate statistical analysis wherever missing and edited figure legends to specify the tests used.

      (2) The authors are strongly encouraged to provide detailed tables of gene expression and alternative splicing analyses from RNA-Seq experiments (e.g., edgeR, rMATS, Whippet, and MAJIQ), as this would enhance transparency and facilitate data interpretation. 

      We have added tables for gene expression and alternate splicing analysis as suggested (Suppl. tables 3-

      6).

      (3) Although the legend sometimes indicates differently (e.g., Figure 3b, 5a, 5c, etc), the volcano plots showing the splicing changes do not contain a cutoff for marginally differential percent spliced in or intron retention values. 

      The legends have been edited to reflect the correct statistical and/or PSI cutoffs.

      (4) For consistency, use a consistent volcano plot format across all relevant figures (Figures 3b, 5a-c, S3, S4, S7, and S8), including cutoffs for differential splicing and the total count of up- and down-regulated events. 

      Due to different statistical frameworks and calculations employed by different alternate splicing pipelines, we could not use the same cutoffs for different pipelines.  However, we have now indicated the number of up- and down-regulated events for consistency among the volcano plots.

      (5) What is the overlap of differentially regulated events between the different analytical methodologies applied? 

      We analyzed the degree of overlap between the three pipelines used in the paper using a Venn diagram (added to Suppl. Fig. S7). However, as widely reported in literature (e.g., Olofsson et al., 2023; Biochem Biophys Res Commun. 2023; doi: 10.1016/j.bbrc.2023.02.053.), the degree of overlap between pipelines is quite low.

      (6) To further substantiate your conclusions, additional validations of RNA-Seq splicing data, ideally visualized on an agarose gel, would be valuable, especially for exons and introns regulated by SFSWAP, and particularly for OGT decoy exons in Figure 4c. 

      We have not included these experiments as we focused on other critiques for this resubmission. Because the RNA-seq, RT-PCR and RT-qPCR data all align, we are confident that the products we are seeing are correctly identified and orthogonally validated (Figs 2d, 4a, 4b, and 4c).  

      (7) It would be more informative if the CRISPR screen data were presented in a format where both the adjusted p-value and LFC values of the hits are presented. Perhaps a volcano plot? 

      We have now included these graphs in revised Supplementary Figure S2. 

      (8) In Figure 2d, a cartoon showing primer binding sites for each panel could aid interpretation, particularly in explaining the unexpected simultaneous increase in OGT mRNA and intron retention upon SFSWAP knockdown. 

      We have added a cartoon showing primer binding sites similar to that shown in Fig. 4a.

      (9) Page 9, line 1, states that SFSWAP autoregulates its expression by controlling intron retention. Including a Sashimi plot would provide visual support for this claim. 

      The data suggesting that SFSWAP autoregulates its own transcript abundance were reported in Zachar et al. (1994), not from our own studies. Validation of those data with our RNA-seq data is confounded by the fact that we are using siRNAs to knockdown the SFSWAP RNA at the transcript level (Fig. S15). 

      (10) In the legend of Figure S2 the authors state that negative results are inconclusive because RNA knockdowns are not verified by western blotting or qRT-PCR. This is correct, but the reviewer would also argue that the positive results are also inconclusive as they are not supported by a rescue experiment to confirm that the effect is not due to off-target effects. 

      This is a fair point with respect to the siRNA experiments on their own. However, the CRISPR screen was performed with sgRNAs, and MAGeCK RRA scores are high only for those genes that have multiple sgRNAs that up-regulate the gene. Examination of the SFSWAP sgRNAs individually shows that three of four SFSWAP sgRNAs had false discovery rates ≤10<sup>-42</sup> for GFP upregulation. Thus, the siRNAs provide an additional orthogonal approach. It seems unlikely that the siRNAs, and three independent sgRNAs will have the same off-target results. Thus, these combined observations support the conclusion that SFSWAP loss leads to decreased OGT intron retention.  

      (11) For clarity in Figure 3a, consider using differential % spliced in or intron retention bar plots with directionality (positive and negative axis) and labeling siSFSWAP as the primary condition. 

      (12) Consider presenting Figure 5D as a box plot with a Wilcoxon test for statistical comparison. 

      For both points 11 and 12, we have tried the graphs as the reviewer suggested. While these were good suggestions, in both cases we felt that the original plots ended up presenting a clearer presentation of the data (see Author response image 1).

      Author response image 1.

      (13) Please expand the Methods section to detail the Whippet and MAJIQ analyses. 

      We have expanded the methods section to include additional details of the alternate splicing analysis.

      (14) Include coordinates for the four possible OGT decoy exon combinations analyzed in the Methods section. 

      We have added the coordinates of all four decoy forms in the methods section.  

      (15) A section on SFSWAP mass spectrometry is listed in Methods but is missing from the manuscript. 

      This section has now been removed.

      Reviewer #2 (Recommendations for the authors): 

      This is an excellent contribution. The paper describes an effort to identify the factors responsible for intron retention and alternate exon splicing in a complex system known to be regulated by the O-GlcNAc cycling system. The CRISPR/Cas9 system was used to identify potential factors. The bioinformatic analysis is sophisticated and compelling. The conclusions are of general interest and advance the field significantly. 

      Some specific recommendations. 

      (1) The plots in Figure 3 describing SI and ES events are confusing to this reader. Perhaps the violin plot is not the best way to visualize these events. The same holds true for the histograms in the lower panel of Figure 3. Not sure what to make of these plots. 

      For Figure 3b, we include both scatter and violin plots to represent the same data in two distinct ways. For Figure 3d, we agree that these are not the simplest plots to understand, and we have spent significant time trying to come up with a better way of displaying these trends in GC content as they relate to SE and RI events. Unfortunately, we were unable to identify a clearer way to present these data. 

      (2) The model (Figure 6) is very useful but confusing. The legend and the Figure itself are somewhat inconsistent. The bottom line of the figure is apparent but I fear that the authors are trying to convey a more complete model than is apparent from this figure. Please revise. 

      We have simplified the figure from the previous submission. As mentioned above, we admit that mechanistic details remain unknown. However, we have tried to generate a model that reflects our data, adds some speculative elements to be tested in the future, but remains as simple as possible. We are not quite sure what the reviewer was referring to as “somewhat inconsistent”, but we have attempted to clarify the model in the revised Discussion and Figure legend.  

      (3) It is unclear how normalization of the RNA seq experiments was performed (eg. Figure S5 and 6).  

      The normalization differences in Fig. S5 and S6 (now Fig S8 and S9) were due to scaling differences during the use of rmats2sashimiplot software. We have now replaced Fig. S5 to reflect correctly scaled images.

      I am enthusiastic about the manuscript and feel that with some clarification it will be an important contribution. 

      Thank you for these positive comments about our study!

      Reviewer #3 (Recommendations for the authors): 

      (1) In Figure 1f, it is clear that siRNA-mediated knockdown of OGT greatly increases spliced RNA as the cells attempt to compensate by more efficient intron removal (three left lanes). However, there is no discussion of the various treatments with TG or OSMI. Might quantitation of these lanes not also show the desired effects of TG and OSMI on spliced transcript levels? 

      The strong effect of OGT knockdown masks the (comparatively modest) effects of subsequent inhibitor treatments on the reporter RNA. We have edited the results section to clarify this.

      (2) In Figure 2c, why is the size difference between spliced RNA and intron-retained RNA so different in the GFP-probed gel (right) compared with the OGT-probed gel (left)? Even recognizing that the GFP probe is directed against reporter transcripts, and the OGT probe (I think) is directed against endogenous OGT transcripts, shouldn't the difference between spliced and unspliced bands be the same, i.e., +/- the intron 4 sequence. Also, why does the GFP probe detect the unspliced transcript so poorly? 

      The fully spliced endogenous OGT mRNA is ~5.5 kb while the fully spliced reporter is only ~1.6kb, so the difference in size (the apparent shift relative to the mRNA) is quite different. Moreover, the two panels in Fig 2c are not precisely scaled to one another, so direct comparisons cannot be made. 

      The intron retained isoform does not accumulate to high levels in this reporter, a phenotype that we also observed with our GFP reporter designed to probe the regulation of the MAT2A retained intron (Scarborough et al., 2021). We are not certain about the reason for these observations, but suspect that the reporter RNA’s retained intron isoforms are less stable in the nucleus than their endogenous counterparts. Alternatively, the lack of splicing may affect 3´ processing of the transcripts so that they do not accumulate to the high levels observed for the wild-type genes. 

      (3) Please provide more information about the RNA-seq experiments. How many replicates were performed under each of the various conditions? The methods section says three replicates were performed for the UPF1/TG experiments; was this also true for the SFSWAP experiments?  

      All RNA-seq experiments were performed in biological triplicates. We have edited the methods section to clarify this.

      (4) Relatedly, the several IGV screenshots shown in Figure 3C presumably represent the triplicate RNA seq experiments. In part D, how many experiments does the data represent? Is it a compilation of three experiments? 

      Fig. 3d is derived from alternate splicing analysis performed on three biological replicates. We have added the number of replicates (n=3) on the figure to clarify this. We have also noted that the three IGV tracks represent biological replicates in the Figure legend for 3c.  

      (5) Please provide more details regarding the qRT-PCR experiments. 

      We have provided the positions of primer sets used for RT-qPCR analysis and cartoon depictions of target sites below the data wherever appropriate.

      (6) In the discussion of decoy exon function (in the Discussion section), several relevant observations are cited to support a model in which decoy exons promote assembly of splicing factors. One might also cite the finding that eCLIP profiling has found enriched binding of U2AF1 and U2AF2 at the 5' splice site region of decoy exons (reference 16). 

      Excellent point. This has now been added to the Discussion. 

      Minor corrections / clarifications: 

      (1) In the Figure 2A legend, CRISPR is misspelled. 

      Corrected.

      (2) In the discussion, the phrase "indirectly inhibits splicing of exons 4 and 5, but promoting stable unproductive assembly of the spliceosome", the word "but" should probably be "by". 

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper develops a phase method to obtain the excitatory and inhibitory afferents to certain neuron populations in the brainstem. The inferred contributions are then compared to the results of voltage clamp and current clamp experiments measuring the synaptic contributions to post-I, aug-E, and ramp-I neurons.

      Strengths:

      The electrophysiology part of the paper is sound and reports novel features with respect to earlier work by JC Smith et al 2012, Paton et al 2022 (and others) who have mapped circuits of the respiratory central pattern generator. Measurements on ramp-I neurons, late-I neurons, and two types of post-I neurons in Figure 2 besides measurements of synaptic inputs to these neurons in Figure 5 are to my knowledge new.

      Weaknesses:

      The phase method for inferring synaptic conductances fails to convince. The method rests on many layers of assumptions and the inferred connections in Figure 4 remain speculative. 

      We hope that the additional method justifications now incorporated in the manuscript will make our method more convincing and change this reviewer’s opinion.

      To be convincing, such a method ought to be tested first on a model CPG with known connectivity to assess how good it is at inferring known connections back from the analysis of spatio-temporal oscillations. 

      We respectfully disagree with this critique. Existing respiratory CPG models are based on a conductance-based formalism. Since the neurons recorded using our approach are typically hyperpolarized, in the model at the corresponding values of the membrane potential, all voltage-gated channels will be deactivated. Therefore, the current balance equation used in this study will closely align with the descriptions used in these models. This alignment will result in a near-exact correspondence between the synaptic conductance values inferred by our method and their model counterparts. However, we believe that such a demonstration, while predetermined to be successful, would not be convincing for a computationally savvy audience.

      For biological data, once the network connectivity has been inferred as claimed, the straightforward validation is to reconstruct the experimental oscillations (Figure 2) noting that Rybak et al (Rybak, Paton Schwaber J. Neurophysiol. 77, 1994 (1997)) have already derived models for the respiratory neurons.

      Running such simulations is beyond the scope of this paper, which focuses on our methods for extracting synaptic conductances during network activity cycles from intracellular recordings. However, the existing, largely speculative, respiratory CPG models can be validated against the "ground truth" of the inferences we present here. To illustrate how our circuit connection motifs elaborate on existing respiratory CPG models, we have now included a combinatorial connectivity model in the manuscript derived from the connectivity motifs in the supplemental figures (Figure 4 Supplemental Figure 1) with comparisons to the model schematic utilized by Rybak, Smith et al. in simulation studies to simulate a rhythmic three-phase respiratory pattern. There are conserved mechanistically important connectivity features between these schematics that it is possible to suggest that our more elaborate connectivity scheme would almost certainly generate the three-phase patterns of neuronal firing and network rhythmic activity.

      The transformation from time to phase space, unlike in the Kuramoto model, is not justified here (Line 94) and is wrong. The underpinning idea that "the synaptic conductances depend on the cycle phase and not on time explicitly" is flawed because synapses have characteristic decay times and delays to response which remain fixed when the period of network oscillations increases. Synaptic properties depend on time and not on phase in the network. 

      The primary assumption of our method is that all variables within the system are periodic functions of time. Therefore, the inputs to the recorded neuron, at minimum, are fully defined by the oscillation's phase. While the transduction of the input into postsynaptic conductance may have its own time dependence, the characteristic timescale of synaptic dynamics (10-20 ms, as suggested by the reviewer) is much smaller than the period of network oscillations. This is certainly true for the test system we are using. This valid assumption of our method is now further clarified in the revised manuscript.

      One major consequence relevant to the present identification of excitatory or inhibitory behaviour, is that it cannot account for change in the behaviour of inhibitory synapses - from inhibitory to excitatory action - when the inhibitory decay time becomes commensurable to the period of network oscillations (Wang & Buzsaki Journal of Neuroscience 16, 6402 (1996), van Vreeswijk et al. J. Comp. Neuroscience 1,313 (1994), Borgers and Kopell Neural Comput. 15, 2003). 

      Our method focuses on recovering synaptic conductances rather than directly measuring presynaptic inputs. The conversion of presynaptic inputs (spike trains) into postsynaptic conductances involves its own time scales. This can lead to complex dynamical effects when synaptic delay or decay times are comparable to the oscillation period. In such cases, although our conductance calculation remains accurate, we might misinterpret the phase of the presynaptic input, as it may not align with the phase of the postsynaptic conductance peak. However, this discrepancy is not significant for applications where the synaptic delay/decay times are considerably shorter than the oscillation period.

      In addition, even small delays in the inhibitory synapse response relative to the pre-synaptic action potential also produce in-phase synchronization (Chauhan et al., Sci. Rep. 8, 11431 (2018); Borgers and Kopell, Neural Comput. 15, 509 (2003)). 

      The reviewer is referring to a phenomenon involving interspike synchronization that generates oscillations with very short periods, comparable to synaptic delay times. Our technique, in contrast, is designed for systems of asynchronously firing neurons forming functional populations whose oscillations emerge on a much longer time scale or are driven by periodic stimuli (e.g., sensory input) with a period much longer than the interspike intervals of individual neurons. The time scale difference we are addressing in our test system is two orders of magnitude.

      The present assumptions are way too simplistic because you cannot account for these commensurability effects with a single parameter like the network phase. There is therefore little confidence that this model can reliably distinguish excitatory from inhibitory synapses when their dynamic properties are not properly taken into account.

      As we explained in our previous responses, in our test system, we can reliably resolve post-synaptic conductance variations at 1/100th of the oscillation period. This is due to a >100X time scale difference between the oscillation period and the synaptic/membrane decay time constants. The efficiency of our method in other systems may vary depending on the relationship between the membrane time constant and the oscillation period. The text now provides a clearer discussion of the method's resolution.

      To interpret post-synaptic conductance profiles in terms of presynaptic inputs (e.g., to reconstruct connectivity), one should consider the input-to-conductance transduction processes.We did not aim to provide a general solution for this step in our paper (hence the title) as these processes may differ for different neurotransmitter systems and involve individual dynamics. However, in our test system, as discussed, the oscillation period is much longer than the synaptic decay times of the fast-acting neurotransmitters involved (i.e., glutamate, glycine, and GABA). This means that the possible phase difference between presynaptic neuronal activity and the corresponding postsynaptic conductances is negligible. This allows for a straightforward interpretation of conductance profiles in terms of the functional connectivity of the network. In other systems, the situation may, of course, be different and additional efforts for inferring the presynaptic activity from postsynaptic conductance profiles may be necessary.

      Line 82, Equation 1 makes extremely crude assumptions that the displacement current (CdV/dt) is negligible and that the ion channel currents are all negligible. Vm(t) is also not defined. The assumption that the activation/inactivation times of all ion channels are small compared to the 10-20ms decay time of synaptic currents is not true in general. Same for the displacement current. The leak conductance is typically g~0.05-0.09ms/cm^2 while C~1uF/cm^2. Therefore the ratio C/g leak is in the 10-20ms range - the same as the typical docking neurotransmitter time in synapses.

      We have explicitly included capacitive current in the model formulation and described the time scale separation requirement that justifies our approach. Additionally, we now explain within the text that the current injection protocol involves hyperpolarizing the recorded neuron to ensure voltage-dependent currents remain deactivated during the recording. The remarkable linearity of the current-voltage relationships observed in the vast majority of recorded neurons provides post-hoc evidence supporting this assumption. For further details, please refer to our responses to Reviewer 2 and Figure 1 Supplemental Figure 1 as an example.

      Models of brainstem CPG circuits have been known to exist for decades: JC Smith et al 2012, Paton et al 2022, Bellingham Clin. Exp. Pharm. And Physiol. 25, 847 (1998); Rubin et al., J. Neurophysiol. 101, 2146 (2009) among others. The present paper does not discuss existing knowledge on respiratory networks and gives the impression of reinventing the wheel from scratch. How will this paper add to existing knowledge?

      We appreciate this comment, and in fact, in the original submitted version of this manuscript, we discussed existing knowledge of respiratory networks, but there was editorial concern that this material was above and beyond the technical aspects that we were trying to convey and therefore may detract from the paper as a technical submission. To strike a balance, we have re-incorporated some of this material in abbreviated form into the Discussion section “Implications of reconstructed synaptic conductance profiles for respiratory functional circuit architecture”.

      Reviewer #2 (Public review):

      Summary:

      By measuring intracellular changes in membrane voltage from a single neuron of the medulla the authors describe a method for determining the balance of excitatory and inhibitory synaptic drive onto a single neuron within this important brain region.

      Strengths:

      This approach could be valuable in describing the microcircuits that generate rhythms within this respiratory control centre. This method could more generally be used to enable microcircuits to be studied without the need for time-consuming anatomical tracing or other more involved electrophysiological techniques.

      Weaknesses:

      This approach involves assuming the reversal potential that is associated with the different permeant ions that underlie the excitation and inhibition as well as the application of Ohms law to estimate the contribution of excitation and inhibitory conductance. My first concern is that this approach relies on a linear I-V relationship between the measured voltage and the estimated reversal potential. However, open rectification is a feature of any I-V relationship generated by asymmetric distributions of ions (see the GHK current equation) and will therefore be a particular issue for the inhibition resulting from asymmetrical Cl- ion gradients across GABA-A receptors. The mixed cation conductance that underlies most synaptic excitation will also generate a non-linear I-V relationship due to the inward rectification associated with the polyamine block of AMPA receptors. Could the authors please speculate what impact these non-linearities could have on results obtained using their approach?

      In our Figure 1 Supplemental Figure 1, we illustrated that I-V relationships for each particular phase of the cycle (except for transitions between inspiration and expiration where our error estimates are greatest) are remarkably linear. 

      In Author response iamge 1 we compare the I-V dependence for Cl- as predicted by the GHK equation and its linear approximation using constant conductance and the Cl- Nernst potential. One can see that in the typical range of voltages used (shown by solid black vertical lines), the linear approximation appears quite adequate.

      Author response image 1.

      This approach has similarities to earlier studies undertaken in the visual cortex that estimated the excitatory and inhibitory synaptic conductance changes that contributed to membrane voltage changes during receptive field stimulation. However, these approaches also involved the recording of transmembrane current changes during visual stimulation that were undertaken in voltage-clamp at various command voltages to estimate the underlying conductance changes. Molkov et al have attempted to essentially deconvolve the underlying conductance changes without this information and I am concerned that this simply may not be possible. 

      This was why we compared the results of our reconstructions applied to current- and voltage-clamp recordings from the same neurons and we found, as illustrated, that the synaptic conductance profiles are qualitatively identical with both techniques.

      The current balance equation (1) cited in this study is based on the parallel conductance model developed by Hodgkin & Huxley. However, one key element of the HH equations is the inclusion of an estimate of the capacitive current generated due to the change in voltage across the membrane capacitance. I would always consider this to be the most important motivation for the development of the voltage-clamp technique in the 1930's. Indeed, without subtraction of the membrane capacitance, it is not possible to isolate the transmembrane current in the way that previous studies have done. In the current study, I feel it is important that the voltage change due to capacitive currents is taken into consideration in some way before the contribution of the underlying conductance changes are inferred.

      We have incorporated the capacitive current into the initial model formulation and established explicit requirements for time scale separation. These requirements justify the application of our method. Specifically, the membrane time constant (C/g ~ 10ms in our test system) must be substantially shorter than the period of network oscillations (T ~ 2s in our test system). Under this condition, aggregate variations in synaptic conductances can be considered slow, allowing us to treat membrane voltage as being in instantaneous equilibrium. This defines the time resolution of our method. Please refer to our responses to Reviewer 1 and the revised manuscript text for a more detailed explanation.

      Studies using acute slicing preparations to examine circuit effects have often been limited to the study of small microcircuits - especially feedforward and feedback interneuron circuits. It is widely accepted that any information gained from this approach will always be compromised by the absence of patterned afferent input from outside the brain region being studied. In this study, descending control from the Pons and the neocortex will not be contributing much to the synaptic drive and ascending information from respiratory muscles will also be absent completely. This may not have been such a major concern if this study was limited to demonstrating the feasibility of a methodological approach. However, this limitation does need to be considered when using an approach of this type to speculate on the prevalence of specific circuit motifs within the medulla (Figure 4). Therefore, I would argue that some discussion of this limitation should be included in this manuscript.

      Our experimental brainstem-spinal cord in situ preparation does include important inputs from the pons that are necessary to generate the 3-phase respiratory pattern (e.g., Smith et al. (2013). Brainstem respiratory networks: building blocks and microcircuits. Trends Neurosci, 36(3), 152-162), but we agree that other inputs such as from midbrain and cortex as well as important peripheral afferents are absent, and we have now noted this limitation in the text at the end of the new section “Implications of reconstructed synaptic conductance profiles for respiratory functional circuit architecture“. We show specific circuit motifs simply to illustrate how our readout of synaptic conductances from single neurons and the information on the main neuronal activity patterns in our experimental preparation can be interpreted. We thought that it would be useful to illustrate and interpret inferred connectivity motifs as an output of our methodological approach. As we now discuss in the section “Implications of reconstructed synaptic conductance profiles for respiratory functional circuit architecture” in response to Reviewer #1, our circuit motifs are consistent with some sets of connections that have been speculated in the literature, but they also provide some novel information about connectivity that we have been able to infer for respiratory circuits from the complex sets of synaptic conductances indicated by our approach. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) My recommendation is to clarify how each neuron population was identified. Individual populations are very hard to identify based on morphology alone in brain slices such as Supplemental Figure 1. I assume the authors identified each population based on their phase difference relative to the inspiratory pulse in the phrenic nerve. This ought to be clarified. 

      Neuronal populations were classified based on their firing patterns within the respiratory cycle. Immunohistochemistry was only used for post-hoc identification of the transmitter phenotype in select neurons. Specifically, recorded neurons were categorized according to the phase range of the respiratory cycle in which they fired and their firing pattern during that range. For example, neurons firing during inspiration (synchronously with the phrenic nerve) with a progressively increasing firing rate were classified as ramp-I, etc., as illustrated in the figure depicting phase-dependent firing patterns. This classification is detailed in the "Firing patterns of respiratory interneurons" sub-section.

      It would also be beneficial to discuss the benefits and limitations of using this preparation relative to brainstem slices and in-vivo preparations (e.g. Moraes et al. J. Physiol. 599, 3237 (2021)) for measuring live network activity.

      We provided the reference to an important recent review (Paton et al. 2022, Advancing respiratory-cardiovascular physiology with the working heart-brainstem preparation over 25 years. J Physiol, 600(9), 2049-2075) on the benefits and limitations of using the in situ rodent brainstem-spinal cord preparation employed in our study. 

      (2) The background on inference methods is similarly thin. The works in line 47 are mainly experimental characterizations of excitatory and inhibitory cells. Techniques for estimating network conductances/parameters ought to be covered. One reference that comes to mind: Armstrong, E. Statistical data assimilation for estimating electrophysiology simultaneously with connectivity within a biological neuronal network. Physical Review E 101, 012415, 2020.

      Our technique is not intended to estimate synaptic connections between neurons from paired recordings. Instead, we calculate the dynamics of inhibitory and excitatory synaptic conductances that result from many concurrent synaptic inputs representing aggregate activities of the functionally interacting populations. The previous studies that we cited are the ones that have direct or indirect relation to this paradigm. 

      (3) How the "patterns of synaptic conductances" in phase diagrams imply the network connectivity (l.244) is not clear. Are the patterns of "activity patterns" depicted in Figure 2 the only neuron populations driving the postsynaptic neurons in Figure 4? 

      Figure 2 shows all of the basic firing patterns that we have recorded in our experimental preparation. So, yes, assuming that all periodic inputs in this network originate from within the network, those 6 populations are the main sources of the corresponding patterns.

      The methodology for constructing the networks is unclear, 

      This is explained in detail in the section "Synaptic Conductances and Functional Connectome of Respiratory Interneurons". Specifically, when a neuron with a given firing pattern (and thus belonging to a corresponding population, e.g., pre-I/I) exhibits excitatory or inhibitory conductance during a particular phase of the respiratory cycle (e.g., inhibition during the first half of expiration, as in Figure 3A1), we infer that the population with the same firing pattern receives input from a population with an activity pattern matching the postsynaptic conductance profile (e.g., the pre-I/I population receives post-I inhibition, as in Figure 4A1).

      yet 6 lines later (l.251) the narrative jumps to the conclusion that "the information on inhibitory transmitter phenotypes...indeed corroborates that subsets of the presynaptic neurons are inhibitory" and further "conductance profiles, which gives additional confidence in the correlation between pre-synaptic firing patterns and likely post-synaptic interactions". The method also blends in empirical information from immune labelling. It is unclear what method can actually infer on its own.

      The functional connections that we were able to infer implied that neurons with specific firing patterns (e.g., post-I neurons) must include neurons with specific transmitter phenotypes (e.g., inhibitory). Immune labeling results were used to show that there are indeed neurons having corresponding firing patterns and neurotransmitter phenotypes. It has nothing to do with the inference method. It just shows that our assumption about various inhibitory inputs originating from within the network is plausible.

      (4) Figure 3 - why does the Early-I population which is connected by the same mutually inhibitory links as Post-I and Aug-E within the respiratory CPG have the opposite conductance activation sequence as post-I and aug-E. Namely, it receives excitatory input at phases 0,1,2 when post-I and aug-E receive inhibitory input?

      We added the section “Implications of reconstructed synaptic conductance profiles for respiratory functional circuit architecture” discussing the correspondence and inconsistencies between our findings and existing respiratory CPG models (see Figure 4 Supplemenntal Figure 1). For this specific question, phase 0, 1 and 2 represent the same phase of the respiratory cycle corresponding to a transition from expiration to inspiration. According to the Rybak et al. models, the early-I population receives excitation from the pre-I/I population which is active at the E-I transition and throughout the entire inspiratory phase of the cycle. This is largely consistent with our findings shown in Figure 3. Also, according to Rybak et al., post-I and aug-E populations are inhibited by early-I neurons, which is also consistent with inspiratory inhibition in all examples of these neurons that we show in Figure 3. As noted in other responses to the reviewers’ comments, we have now discussed in the “Implications of reconstructed synaptic conductance profiles for respiratory functional circuit architecture” which covers some comparisons to previously inferred connectivity in the respiratory network.

      Minor comments:

      (1) l.39 - The terminology "patterns of inhibitory and excitatory synaptic conductances" used throughout the manuscript (l.66, 241, 244, 259...) is vague.

      We defined this terminology in the updated version.

      (2) Figure 1 what is the integration time of the moving median in Figure 1a?

      0.1s. Now included in the figure legend.

      (3) L.128 "rhythmic inspiratory neuron" which one is this post-I, aug-E, early-I?

      This example demonstrates a pre-I/I firing pattern, as the neuron begins firing slightly before the phrenic burst and continues throughout inspiration (as defined by phrenic nerve activity). However, this is merely an arbitrary example used to illustrate the methodology. The actual firing pattern of the recorded neuron is not considered in any way for synaptic conductance inference.

      (4) Figure 3 What the panel labelling means A1, B1, A2, etc. is not disclosed in the caption.

      These labels are used in the text to refer to specific examples. Now it is explained in the caption that the letter corresponds to the firing phenotype indicated on the top of each column and the digit refers to the example number.

      (5) L.129/ L.133 - the diagram of the medulla in Supplementary Figure 1 ought to be inserted early on in the main text when introducing the respiratory CPG, phrenic and vagal signals.

      This is a good suggestion and we have linked this figure specifically to Figure 2 as Figure 2 Supplemental Figure 1 in the main text to better orient readers.

      (6) L. 457 - Reference needed on reversal potentials.

      We report what we observed, so it is unclear what reference the reviewer means.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reply to Public Reviews:

      Reply to Reviewer #1:

      This is a carefully performed and well-documented study to indicate that the FUS protein interacts with the GGGGCC repeat sequence in Drosophila fly models, and the mechanism appears to include modulating the repeat structure and mitigating RAN translation. They suggest FUS, as well as a number of other G-quadruplex binding RNA proteins, are RNA chaperones, meaning they can alter the structure of the expanded repeat sequence to modulate its biological activities.

      Response: We would like to thank the reviewer for her/his time for evaluating our manuscript. We are very happy to see the reviewer for highly appreciating our manuscript.

      1. Overall this is a nicely done study with nice quantitation. It remains somewhat unclear from the data and discussions in exactly what way the authors mean that FUS is an RNA chaperone: is FUS changing the structure of the repeat or does FUS binding prevent it from folding into alternative in vivo structure?

      Response: We appreciate the reviewer’s constructive comments. Indeed, we showed that FUS changes the higher-order structures of GGGGCC [G4C2] repeat RNA in vitro, and that FUS suppresses G4C2 RNA foci formation in vivo. According to the established definition of RNA chaperone, RNA chaperones are proteins changing the structures of misfolded RNAs without ATP use, resulting in the maintenance of proper RNAs folding (Rajkowitsich et al., 2007). Thus, we consider that FUS is classified into RNA chaperone. To clarify these interpretations, we revised the manuscript as follows.

      (1) On page 10, line 215-219, the sentence “These results were in good agreement with our previous study on SCA31 showing the suppressive effects of FUS and other RBPs on RNA foci formation of UGGAA repeat RNA as RNA chaperones …” was changed to “These results were in good agreement with … RNA foci formation of UGGAA repeat RNA through altering RNA structures and preventing aggregation of misfolded repeat RNA as RNA chaperones …”.

      (2) On page 17, line 363-366, the sentence “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure, as evident by CD and NMR analyses (Figure 5), suggesting its functional role as an RNA chaperone.” was changed to “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure as evident by CD and NMR analyses (Figure 5, Figure 5—figure supplement 2), and suppresses RNA foci formation in vivo (Figures 3A and 3B), suggesting its functional role as an RNA chaperone.”

      Reply to Reviewer #2:

      Fuijino et al. provide interesting data describing the RNA-binding protein, FUS, for its ability to bind the RNA produced from the hexanucleotide repeat expansion of GGGGCC (G4C2). This binding correlates with reductions in the production of toxic dipeptides and reductions in toxic phenotypes seen in (G4C2)30+ expressing Drosophila. Both FUS and G4C2 repeats of >25 are associated with ALS/FTD spectrum disorders. Thus, these data are important for increasing our understanding of potential interactions between multiple disease genes. However, further validation of some aspects of the provided data is needed, especially the expression data.

      Response: We would like to thank the reviewer for her/his time for evaluating our manuscript and also for her/his important comments that helped to strengthen our manuscript.

      Some points to consider when reading the work:

      1. The broadly expressed GMR-GAL4 driver leads to variable tissue loss in different genotypes, potentially confounding downstream analyses dependent on viable tissue/mRNA levels.

      Response: We thank the reviewer for this constructive comment. In the RT-qPCR experiments (Figures 1E, 3C, 4G, 6D and Figure 1—figure supplement 1C), the amounts of G4C2 repeat transcripts were normalized to those of gal4 transcripts expressed in the same tissue, to avoid potential confounding derived from the difference in tissue viability between genotypes, as the reviewer pointed out. To clarify this process, we have made the following change to the revised manuscript.

      (1) On page 30, line 548-550, the sentence “The amounts of G4C2 repeat transcripts were normalized to those of gal4 transcripts in the same sample” was changed to “The amounts of G4C2 repeat transcripts were normalized to those of gal4 transcripts expressed in the same tissue to avoid potential confounding derived from the difference in tissue viability between genotypes”.

      2. The relationship between FUS and foci formation is unclear and should be interpreted carefully.

      Response: We appreciate the reviewer’s important comment. We apologize for the lack of clarity. We showed the relationship between FUS and RNA foci formation in our C9-ALS/FTD fly, that is, FUS suppresses RNA foci formation (Figures 3A and 3B), and knockdown of endogenous caz, a Drosophila homologue of FUS, enhanced it conversely (Figures 4E and 4F). We consider that FUS suppresses RNA foci formation through altering RNA structures and preventing aggregation of misfolded G4C2 repeat RNA as an RNA chaperone. To clarify these interpretations, we revised the manuscript as follows.

      (1) On page 10, line 215-219, the sentence “These results were in good agreement with our previous study on SCA31 showing the suppressive effects of FUS and other RBPs on RNA foci formation of UGGAA repeat RNA as RNA chaperones …” was changed to “These results were in good agreement with … RNA foci formation of UGGAA repeat RNA through altering RNA structures and preventing aggregation of misfolded repeat RNA as RNA chaperones …”.

      (2) On page 17, line 363-366, the sentence “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure, as evident by CD and NMR analyses (Figure 5), suggesting its functional role as an RNA chaperone.” was changed to “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure as evident by CD and NMR analyses (Figure 5, Figure 5—figure supplement 2), and suppresses RNA foci formation in vivo (Figures 3A and 3B), suggesting its functional role as an RNA chaperone.”

      Reply to Reviewer #3:

      In this manuscript Fujino and colleagues used C9-ALS/FTD fly models to demonstrate that FUS modulates the structure of (G4C2) repeat RNA as an RNA chaperone, and regulates RAN translation, resulting in the suppression of neurodegeneration in C9-ALS/FTD. They also confirmed that FUS preferentially binds to and modulates the G-quadruplex structure of (G4C2) repeat RNA, followed by the suppression of RAN translation. The potential significance of these findings is high since C9ORF72 repeat expansion is the most common genetic cause of ALS/FTD, especially in Caucasian populations and the DPR proteins have been considered the major cause of the neurodegenerations.

      Response: We would like to thank the reviewer for her/his time for evaluating our manuscript. We are grateful to the reviewer for the insightful comments, which were very helpful for us to improve the manuscript.

      1. While the effect of RBP as an RNA chaperone on (G4C2) repeat expansion is supposed to be dose-dependent according to (G4C2)n RNA expression, the first experiment of the screening for RBPs in C9-ALS/FTD flies lacks this concept. It is uncertain if the RBPs of the groups "suppression (weak)" and "no effect" were less or no ability of RNA chaperone or if the expression of the RBP was not sufficient, and if the RBPs of the group "enhancement" exacerbated the toxicity derived from (G4C2)89 RNA or the expression of the RBP was excessive. The optimal dose of any RBPs that bind to (G4C2) repeats may be able to neutralize the toxicity without the reduction of (G4C2)n RNA.

      Response: We appreciate the reviewer’s constructive comments. We employed the site-directed transgenesis for the establishment of RBP fly lines, to ensure the equivalent expression levels of the inserted transgenes. We also evaluated the toxic effects of overexpressed RBPs themselves by crossbreeding with control EGFP flies, showing in Figure 1A. To clarify them, we have made the following changes to the revised manuscript.

      (1) On page 8, line 166-168, the sentence “The variation in the effects of these G4C2 repeat-binding RBPs on G4C2 repeat-induced toxicity may be due to their different binding affinities to G4C2 repeat RNA, and their different roles in RNA metabolism.” was changed to “The variation in the effects of these G4C2 repeat-binding RBPs on G4C2 repeat-induced toxicity may be due to their different binding affinities to G4C2 repeat RNA, and the different toxicity of overexpressed RBPs themselves.”.

      (2) On page 29, line 519-522, the sentence “By employing site-specific transgenesis using the pUASTattB vector, each transgene was inserted into the same locus of the genome, and was expected to be expressed at the equivalent levels.” was added.

      2. In relation to issue 1, the rescue effect of FUS on the fly expressing (G4C2)89 (FUS-4) in Figure 4-figure supplement 1 seems weaker than the other flies expressing both FUS and (G4C2)89 in Figure 1 and Figure 1-figure supplement 2. The expression level of both FUS protein and (G4C2)89 RNA in each line is important from the viewpoint of therapeutic strategy for C9-ALS/FTD.

      Response: We appreciate the reviewer’s important comment. The FUS-4 transgene is expected to be expressed at the equivalent level to the FUS-3 transgene, since they are inserted into the same locus of the genome by the site-directed transgenesis. Thus, we suppose that the weaker suppressive effect of FUS-4 coexpression on G4C2 repeat-induced eye degeneration can be attributed to the C-terminal FLAG tag that is fused to FUS protein expressed in FUS-4 fly line. Since the caz fly expresses caz protein also fused to FLAG tag at the C-terminus, we used this FUS-4 fly line to directly compare the effect of caz on G4C2 repeat-induced toxicity to that of FUS.

      3. While hallmarks of C9ORF72 are the presence of DPRs and the repeat-containing RNA foci, the loss of function of C9ORF72 is also considered to somehow contribute to neurodegeneration. It is unclear if FUS reduces not only the DPRs but also the protein expression of C9ORF72 itself.

      Response: We thank the reviewer for this comment. We agree that not only DPRs, but also toxic repeat RNA and the loss-of-function of C9ORF72 jointly contribute to the pathomechanisms of C9-ALS/FTD. Since Drosophila has no homolog corresponding to the human C9orf72 gene, the effect of FUS on C9orf72 expression cannot be assessed. Our fly models are useful for evaluating gain-of-toxic pathomechanisms such as RNA foci formation and RAN translation, and the association between FUS and loss-of function of C9ORF72 is beyond the scope of this study.

      4. In Figure 5E-F, it cannot be distinguished whether FUS binds to GGGGCC repeats or the 5' flanking region. The same experiment should be done by using FUS-RRMmut to elucidate whether FUS binding is the major mechanism for this translational control. Authors should show that FUS binding to long GGGGCC repeats is important for RAN translation.

      Response: We would like to thank the reviewer for these insightful comments. Following the reviewer’s suggestion, we perform in vitro translation assay again using FUS-RRMmut, which loses the binding ability to G4C2 repeat RNA as evident by the filter binding assay (Figure 5A), instead of BSA. The results are shown in the figures of Western blot analysis below. The addition of FUS to the translation system suppressed the expression levels of GA-Myc efficiently, whereas that of FUS-RRMmut did not. FUS decreased the expression level of GA-Myc at as low as 10nM, and nearly eliminated RAN translation activity at 100nM. At 400nM, FUS-RRMmut weakly suppressed the GA-Myc expression levels probably because of the residual RNA-binding activity. These results suggest that FUS suppresses RAN translation in vitro through direct interactions with G4C2 repeat RNA.

      Unfortunately, RAN translation from short G4C2 repeat RNA was not investigated in our translation system, although the previous study reported the low efficacy of RAN translation from short G4C2 repeat RNA (Green et al., 2017).

      Author response image 1.

      (A) Western blot analysis of the GA-Myc protein in the samples from in vitro translation. (B) Quantification of the GA-Myc protein levels.

      We have made the following changes to the revised manuscript.

      (1) Figure 5F was replaced to new Figures 5F and 5G.

      (2) On page 14-15, line 326-330, the sentence “Notably, the addition of FUS to this system decreased the expression level of GA-Myc in a dose-dependent manner, whereas the addition of the control bovine serum albumin (BSA) did not (Figure 5F).” was changed to “Notably, upon the addition to this translation system, FUS suppressed RAN translation efficiently, whereas FUS-RRMmut did not. FUS decreased the expression levels of GA-Myc at as low as 10nM, and nearly eliminated RAN translation activity at 100nM. At 400nM, FUS-RRMmut weakly suppressed the GA-Myc expression levels probably because of the residual RNA-binding activity (Figure 5F and 5G).”.

      (3) On page 15, line 330-332, the sentence “Taken together, these results indicate that FUS suppresses RAN translation from G4C2 repeat RNA in vitro as an RNA chaperone.” was changed to “Taken together, these results indicate that FUS suppresses RAN translation in vitro through direct interactions with G4C2 repeat RNA as an RNA chaperone.”.

      (4) On page 37, line 720-723, the sentence “For preparation of the FUS protein, the human FUS (WT) gene flanked at the 5¢ end with an Nde_I recognition site and at the 3¢ end with a _Xho_I recognition site was amplified by PCR from pUAST-_FUS.” was changed to “For preparation of the FUS proteins, the human FUS (WT) and FUS-RRMmut genes flanked at the 5¢ end with an Nde_I recognition site and at the 3¢ end with a _Xho_I recognition site was amplified by PCR from pUAST-_FUS and pUAST- FUS-RRMmut, respectively.”.

      (5) On page 41, line 816-819, the sentence “FUS or BSA at each concentration (10, 100, and 1,000 nM) was added for translation in the lysate.” was changed to “FUS or FUS-RRMmut at each concentration (10, 100, 200, 400, and 1,000 nM) was preincubated with mRNA for 10 min to facilitate the interaction between FUS protein and G4C2 repeat RNA, and added for translation in the lysate.”.

      5. It is not possible to conclude, as the authors have, that G-quadruplex-targeting RBPs are generally important for RAN translation (Figure 6), without showing whether RBPs that do not affect (G4C2)89 RNA levels lead to decreased DPR protein level or RNA foci.

      Response: We appreciate the reviewer’s critical comment. Following the suggestion by the reviewer, we evaluate the effect of these G-quadruplex-targeting RBPs on RAN translation. We additionally performed immunohistochemistry of the eye imaginal discs of fly larvae expressing (G4C2)89 and these G-quadruplex-targeting RBPs. As shown in the figures of immunohistochemistry below, we found that coexpression of EWSR1, DDX3X, DDX5, and DDX17 significantly decreased the number of poly(GA) aggregates. The results suggest that these G-quadruplex-targeting RBPs regulate RAN translation as well as FUS.

      Author response image 2.

      (A) Immunohistochemistry of poly(GA) in the eye imaginal discs of fly larvae expressing (G4C2)89 and the indicated G-quadruplex-targeting RBPs. (B) Quantification of the number of poly(GA) aggregates.

      We have made the following changes to the revised manuscript.

      (1) Figures 6E and 6F were added.

      (2) On page 6-7, line 135-137, the sentence “In addition, other G-quadruplex-targeting RBPs also suppressed G4C2 repeat-induced toxicity in our C9-ALS/FTD flies.” was changed to “In addition, other G-quadruplex-targeting RBPs also suppressed RAN translation and G4C2 repeat-induced toxicity in our C9-ALS/FTD flies.”.

      (3) On page 15, line 344-346, the sentence “As expected, these RBPs also decreased the number of poly(GA) aggregates in the eye imaginal discs (Figures 6E and 6F).” was added.

      (4) On page 15, line 346-347, the sentence “Their effects on G4C2 repeat-induced toxicity and repeat RNA expression were consistent with those of FUS.” was changed to “Their effects on G4C2 repeat-induced toxicity, repeat RNA expression, and RAN translation were consistent with those of FUS.”

      (5) On page 16, line 355-357, the sentence “Thus, some G-quadruplex-targeting RBPs regulate G4C2 repeat-induced toxicity by binding to and possibly by modulating the G-quadruplex structure of G4C2 repeat RNA.” was changed to “Thus, some G-quadruplex-targeting RBPs regulate RAN translation and G4C2 repeat-induced toxicity by binding to and possibly by modulating the G-quadruplex structure of G4C2 repeat RNA.”

      (6) On page 19, line 417-421, the sentence “We further found that G-quadruplex-targeting RNA helicases, including DDX3X, DDX5, and DDX17, which are known to bind to G4C2 repeat RNA (Cooper-Knock et al., 2014; Haeusler et al., 2014; Mori et al., 2013a; Xu et al., 2013), also alleviate G4C2 repeat-induced toxicity without altering the expression levels of G4C2 repeat RNA in our Drosophila models.” was changed to “We further found that G-quadruplex-targeting RNA helicases, … ,also suppress RAN translation and G4C2 repeat-induced toxicity without altering the expression levels of G4C2 repeat RNA in our Drosophila models.”.

      Reply to Recommendations For The Authors:

      1) It is not clear from the start that the flies they generated with the repeat have an artificial vs human intronic sequence ahead of the repeat. It would be nice if they presented somewhere the entire sequence of the insert. The reason being that it seems they also tested flies with the human intronic sequence, and the effect may not be as strong (line 234). In any case, in the future, with a new understanding of RAN translation, it would be nice to compare different transgenes, and so as much transparency as possible would be helpful regarding sequences. Can they include these data?

      Response: We thank the editors and reviewers for this comment. We apologize for the lack of clarity. We used artificially synthesized G4C2 repeat sequences when generating constructs for (G4C2)n transgenic flies, so these constructs do not contain human intronic sequence ahead of the G4C2 repeat in the C9orf72 gene, as explained in the Materials and Methods section. To clarify the difference between our C9-ALS/FTD fly models and LDS-(G4C2)44GR-GFP fly model (Goodman et al., 2019), we have made the following change to the revised manuscript.

      (1) Schema of the LDS-(G4C2)44GR-GFP construct was presented in Figure 3—figure supplement 1.

      Furthermore, to maintain transparency of the study, we have provided the entire sequence of the insert as the following source file.

      (2) The artificial sequences inserted in the pUAST vector for generation of the (G4C2)n flies were presented in Figure 1—figure supplement 1—source data 1.

      2) It is really nice how they quantitated everything and showed individual data points.

      Response: We thank the editors and reviewers for appreciating our data analysis method. All individual data points and statistical analyses are summarized in source data files.

      3) So when they call FUS an RNA chaperone, are they simply meaning it is changing the structure of the repeat, or could it just be interacting with the repeat to coat the repeat and prevent it from folding into whatever in vivo structures? Can they speculate on why some RNA chaperones lead to presumed decay of the repeat and others do not? Can they discuss these points in the discussion? Detailed mechanistic understanding of RNA chaperones that ultimately promote decay of the repeat might be of highly significant therapeutic benefit.

      Response: We appreciate these critical comments. Indeed, we showed that FUS changes the higher-order structures of G4C2 repeat RNA in vitro, and that FUS suppresses G4C2 RNA foci formation. According to the established definition of RNA chaperone, RNA chaperones are proteins changing the structures of misfolded RNAs without ATP use, resulting in the maintenance of proper RNAs folding (Rajkowitsich et al., 2007). Thus, we consider that FUS is classified into RNA chaperone. To clarify these interpretations, we revised the manuscript as follows.

      (1) On page 10, line 215-219, the sentence “These results were in good agreement with our previous study on SCA31 showing the suppressive effects of FUS and other RBPs on RNA foci formation of UGGAA repeat RNA as RNA chaperones …” was changed to “These results were in good agreement with … RNA foci formation of UGGAA repeat RNA through altering RNA structures and preventing aggregation of misfolded repeat RNA as RNA chaperones …”.

      (2) On page 17, line 363-366, the sentence “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure, as evident by CD and NMR analyses (Figure 5), suggesting its functional role as an RNA chaperone.” was changed to “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure as evident by CD and NMR analyses (Figure 5, Figure 5—figure supplement 2), and suppresses RNA foci formation in vivo (Figures 3A and 3B), suggesting its functional role as an RNA chaperone.”

      Besides these RNA chaperones, we observed the expression of IGF2BP1, hnRNPA2B1, DHX9, and DHX36 decreased G4C2 repeat RNA expression levels. In addition, we recently reported that hnRNPA3 reduces G4C2 repeat RNA expression levels, leading to the suppression of neurodegeneration in C9-ALS/FTD fly models (Taminato et al., 2023). We speculate these RBPs could be involved in RNA decay pathways as components of the P-body or interactors with the RNA deadenylation machinery (Tran et al., 2004; Katahira et al., 2008; Geissler et al., 2016; Hubstenberger et al., 2017), possibly contributing to the reduced expression levels of G4C2 repeat RNA. To clarify these interpretations, we revised the manuscript as follows.

      (3) On page 18, line 392-398, the sentences “Similarly, we recently reported that hnRNPA3 reduces G4C2 repeat RNA expression levels, leading to the suppression of neurodegeneration in C9-ALS/FTD fly models (Taminato et al., 2023). Interestingly, these RBPs have been reported to be involved in RNA decay pathways as components of the P-body or interactors with the RNA deadenylation machinery (Tran et al., 2004; Katahira et al., 2008; Geissler et al., 2016; Hubstenberger et al., 2017), possibly contributing to the reduced expression levels of G4C2 repeat RNA.” was added.

      4) What is the level of the G4C2 repeat when they knock down caz? Is it possible that knockdown impacts the expression level of the repeat? Can they show this (or did they and I miss it)?

      Response: We thank the editors and reviewers for this comment. The expression levels of G4C2 repeat RNA in (G4C2)89 flies were not altered by the knockdown of caz, as shown in Figure 4G.

      5) A puzzling point is that FUS is supposed to be nuclear, so where is FUS in the brain in their lines? They suggest it modulates RAN translation, and presumably, that is in the cytoplasm. Is FUS when overexpressed now in part in the cytoplasm? Is the repeat dragging it into the cytoplasm? Can they address this in the discussion? If FUS is never found in vivo in the cytoplasm, then it raises the point that the impact they find of FUS on RAN translation might not reflect an in vivo situation with normal levels of FUS.

      Response: We appreciate these important comments. We agree with the editors and reviewers that FUS is mainly localized in the nucleus. However, FUS is known as a nucleocytoplasmic shuttling RBP that can transport RNA into the cytoplasm. Indeed, FUS is reported to facilitate transport of actin-stabilizing protein mRNAs to function in the cytoplasm (Fujii et al., 2005). Thus, we consider that FUS binds to G4C2 repeat RNA in the cytoplasm and suppresses RAN translation in this study.

      6) When they are using 2 copies of the driver and repeat, are they also using 2 copies of FUS? These are quite high levels of transgenes.

      Response: We thank the editors and reviewers for this comment. We used only 1 copy of FUS when using 2 copies of GMR-Gal4 driver. Full genotypes of the fly lines used in all experiments are described in Supplementary file 1.

      7) In Figure5-S1, FUS colocalizing with (G4C2)RNA is not clear. High-magnification images are recommended.

      Response: We appreciate this constructive comment on the figure. Following the suggestion, high-magnification images are added in Figure 5—figure supplement 1.

      8) I also suggest that the last sentence of the Discussion be revised as follows: Thus, our findings contribute not only to the elucidation of C9-ALS/FTD, but also to the elucidation of the repeat-associated pathogenic mechanisms underlying a broader range of neurodegenerative and neuropsychiatric disorders than previously thought, and it will advance the development of potential therapies for these diseases.

      Response: We appreciate this recommendation. We have made the following change based on the suggested sentence.

      (1) On page 20-21, line 455-459, “Thus, our findings contribute not only towards the elucidation of repeat-associated pathogenic mechanisms underlying a wider range of neuropsychiatric diseases than previously thought, but also towards the development of potential therapies for these diseases.” was changed to “Thus, our findings contribute to the elucidation of the repeat-associated pathogenic mechanisms underlying not only C9-ALS/FTD, but also a broader range of neuromuscular and neuropsychiatric diseases than previously thought, and will advance the development of potential therapies for these diseases.”.

      Authors’ comment on previous eLife assessment:

      We thank the editors and reviewers for appreciating our study. We mainly evaluated the function of human FUS protein on RAN translation and G4C2 repeat-induced toxicity using Drosophila expressing human FUS in vivo, and the recombinant human FUS protein in vitro. To validate that FUS functions as an endogenous regulator of RAN translation, we additionally evaluated the function of Drosophila caz protein as well. We are afraid that the first sentence of the eLife assessment, that is, “This important study demonstrates that the Drosophila FUS protein, the human homolog of which is implicated in amyotrophic lateral sclerosis (ALS) and related conditions, …” is somewhat misleading. We would be happy if you modify this sentence like “This important study demonstrates that the human FUS protein, which is implicated in amyotrophic lateral sclerosis (ALS) and related conditions, …”.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Figure 2 and related text: it would be useful to explain more explicitly what is meant by "neurogenic" and "non-neurogenic" models. I presume that the total number of neurons in non-neurogenic models is lower than in neurogenic models because no new neurons are added. It would be useful to plot the number of GCs as a function of timesteps.

      We have clarified the distinction between neurogenic and non-neurogenic models in the text (Lines 142-145), explicitly noting that in non-neurogenic models, no new GCs are added, resulting in a lower total neuron count over time. In response to the reviewer’s suggestion, we generated a plot showing the number of GCs over time (see below). Because the neurogenic model exhibits a simple linear increase, we found this plot not especially informative for inclusion in the manuscript. However, we agree with the reviewer’s later comments that similar plots are useful for interpreting specific results, and we have included those where appropriate.

      Author response image 1.

      Number of GCs over time for neurogenic (solid line) and non-neurogenic (dotted line) networks

      (2) Figure 2F, G: memory declines dramatically when the number of GCs at enrichment onset increases beyond an optimum. Why?

      We have explained the reasoning more thoroughly in the text (Lines 174-177) and added a new supplemental figure to support this reasoning (Figure S2). As the number of GCs increases, the network becomes overly inhibited and the response of abGCs to the stimuli decreases (Fig S2A). This leads to a smaller population of GCs being able to integrate with the stimulus (Fig S2B) which is expected given the activity-dependent plasticity rule. Moreover, it can be seen in Fig S2C that for networks with increasing size, the GCs that do learn only connect to MCs that are driven strongest by the stimuli until they struggle to connect to any MCs at all.

      In principle, a homeostatic mechanism like synaptic scaling could reduce activity to restore balance, but such a mechanism would also likely disrupt existing memories. Alternatively, we suggest activity-dependent apoptosis as a superior homeostatic mechanism because it leads to a stable level of activity without substantially erasing existing memories.

      (3) The paragraph describing synaptic connectivity of abGCs (related to Figure 2H) is confusing. What is the directionality of synapses considered here: mitral-to-granule, or granule-to-mitral? The text is opaque here. Connectivity matrix in Figure 2H: who is presynaptic, who is postsynaptic? If I understand correctly, these questions are actually irrelevant because all mitralgranule synapses in the network are reciprocal. This should be pointed out explicitly in the figure legend. Generally: the fact that the network is fully reciprocal (if I understand correctly) is very important but not stated with sufficient emphasis. It should be stated very explicitly in the text that connectivity matrices are fully reciprocal, and an equation clarifying this point should be included in Methods.

      (6) Connectivity matrix: to what degree was connectivity between mitral and granule cells reciprocal (fraction of connections in either direction that were paired with a connection in the opposite direction between the same cell pair)? Was connectivity shaped by experience (enrichment) reciprocal?

      (7) Directly related to the above: it would be useful to show the disynaptic connectivity matrix between mitral cells and analyze its symmetry. For the symmetric component, it should then be analyzed what fraction of this can be attributed to the reciprocal synapses, and what fraction is contributed by connectivity via different granule cells. This should then be compared to models with biologically realistic fractions of reciprocal connections. Is the model proposed here consistent with a biologically realistic fraction of reciprocal synapses between mitral-granule cell pairs?

      We appreciate these insightful and detailed comments. We agree that the assumption that MC-GC synapses were fully reciprocal was not clearly stated. We now explicitly state this in the main text (lines 90-94, 369-370, Figure 2 caption) and methods (line 561), emphasize its importance. As the reviewer points out, this is a simplifying assumption and does not fully reflect the biology because not all synapses are reciprocal in the true system. We also note that our synaptic plasticity model does not break the reciprocity assumption: all connections added or pruned during learning remain reciprocal. As a result, the disynaptic connectivity matrix (Bottom panel below, MCs sorted by stimulus as shown in the top panel) is always symmetric.

      We have now made these statements explicit in the main text and in the methods. Regarding functional consequences of this assumption, earlier work by our group has examined the impact of the degree of reciprocity of MC-GC synapses in a similar OB model (Chow, Wick & Riecke, Plos Comp Bio 2012). The study examined three different changes in reciprocity by (1) redirecting a fraction of the inhibitory connections of each GC to randomly chosen MCs instead of the MCs that drive that GC, (2) allowing heterogeneity in reciprocal weights so that there is no relationship between the strength of the MC -> GC synapse and the GC -> MC synapse, (3) reducing the level of self-inhibition a MC receives from the GCs that it excites. The model was found to be quite robust to each of these manipulations, suggesting that our present model likely remains functionally relevant even if biological reciprocity is partial. We reference this work now in the discussion, lines 490-492.

      Author response image 2.

      Disynaptic connectivity. Top: MC activity in response to the two stimuli, sorted by MC selectivity. Bottom: Disynaptic connectivity matrix (diagonal subtracted).

      (4) How were mitral cells sorted in Figure 2H? This needs to be explained.

      (5) Directly related to the point above: the text mentions that synaptic connectivity between GCs of the "learning cluster" and mitral cells (which direction?) is increased for mitral cells responding by enrichment odors, but this is not shown in the figure. This statement suggests that mitral cells sorted to the bottom of the y-axis respond more strongly to enrichment odors, but the information is not given directly. Please provide more information to back up your statements.

      Indeed as the reviewer inferred, MCs in Figure 2H were sorted so that those that receive the strongest stimulation from the odor were at the bottom of the y-axis. We have clarified this in the Figure 2 caption and added a subplot to Figure 2H showing the average MC input to make this more explicit.

      (8) Apoptosis (Figure 4 and related text): paragraph 231ff is somewhat difficult to comprehend because the "number" of enrichments should really be the "frequency" of enrichments. In Figure 4, it is not mentioned explicitly that each enrichment is with different random new odors.

      We agree that the term “number” of enrichments was imprecise and have revised the text to refer instead to the frequency of enrichment events (Lines 255-267). We also clarified that in Figure 4, each enrichment corresponds to a different set of randomly sampled odors, and we now state this explicitly in both the Figure 4 legend and main text (Lines 260-261).

      (9) Apoptosis: apoptosis improves memory but the underlying reason remains opaque. A simple prediction of the data in Figure 4D and 4E is that the number of GCs in 4E. It would be helpful to show this. Furthermore, an obvious question that arises is whether a higher frequency of enrichments improves memories because the total number of granule cells is kept low, or because granule cells are removed specifically based on their activity (or both). This could be addressed easily by artificially removing a random subset of granule cells in a simulation such as 4E to match granule cell numbers to the case in 4D.

      Apoptosis improves learning is because it reduces the total inhibition in the network by removing GCs and thus prevents deficits in learning that occur in Fig. 2G as GCs accumulate in the network. As the reviewer inferred, the number of GCs in Figure 4D is lower than in 4E and this is now clarified in the text. This difference was shown implicitly in Supplementary Figure S4D (previously S3D), but we now explicitly reference this plot to support this point as well (Line 266).

      As the reviewer notes, there is a question in whether increased enrichment frequency improves memory because it limits the total number of GCs, or because apoptosis selectively removes GCs based on their activity, or both. Our model supports both mechanisms. Importantly, simply reducing GC numbers through random deletion will degrade existing memories: random removal erodes memory representations encoded by those GCs. In contrast, our age and activity dependent apoptosis rule targets a specific cohort of adult-born GCs. This selective removal minimizes damage to existing memories encoded by GCs outside of this cohort while keeping GC numbers within a regime that supports robust learning (as shown in Figure 2G).

      However, we note that if enrichment frequency becomes too high, even recent memories can be lost due to premature pruning of GCs that have not yet stabilized their synaptic connections. This tradeoff has been shown experimentally (Forest et al., Nat Comm 2019) which we reproduce in our model (Figure S4).

      (10) Text related to Figure 5: "Learning flexibility...approached a steady state when the growth of the network started to saturate". Please show the growth (better: size) of the network (total number of GCs) for these simulations (and other panels in Figure 5). It would also be useful to show the total number of GCs in other figures (e.g. Figure 4; see above).

      We have now added a supplementary figure (Figure S6) that shows the total number of GCs over time for the simulations presented. This confirms that the network size approaches a steady state around the same time that learning flexibility begins to plateau, as noted in the original text (now line 275), and highlights the large number of GCs without apoptosis as well as the slightly reduced number of GCs in the permanent encoding model (line 312).

      (11) As much as I appreciate the comprehensive discussion of the results in a broader context, I feel that the discussion can be somewhat shortened. The section on lateral inhibition is not fully valid given that synaptic connectivity is reciprocal. I also feel that much of the final section (Model assumptions and outlook) can be dropped (except for the last paragraph), not because anything is irrelevant, but because these points have been made, onen repeatedly, in the text above.

      We agree that the discussion could be streamlined and have revised the manuscript accordingly. Specifically, we have shortened the section on lateral inhibition and clarified that the OB relies predominantly on reciprocal connectivity (Line 370). We also agree that parts of the final section were repetitive and have removed these. However, to address comments by Reviewer 3, we also expanded on some of the model assumptions. We thank the reviewer for helping us improve the clarity and focus of the manuscript.

      (12) Figure 5: bolding every 5th curve is confusing.

      We have adjusted our figure accordingly.

      (13) "...we biased the dendritic field...": it would be helpful to explain the idea of a "dendritic field" in a bit more detail prior to this sentence.

      We have now noted that GC’s "dendritic field" refers to the subset of MCs with which it is capable of forming synaptic connections when we initially describe the model (Line 97).

      Reviewer #3:

      (1) The authors find that a network with age-dependent synaptic plasticity outperforms one with constant age-independent plasticity and that having more GC per se is not sufficient to explain this effect. In addition, having an initial higher excitability of GCs leads to increased performance. To what degree the increased excitability of abGCs is conceptually necessarily independent of them having higher synaptic plasticity rates / fast synapses?

      We thank the reviewer for this question, as the difference between excitability and plasticity rate in memory formation is something we intended to highlight in this study. We have updated the (Lines 157-198) to clarify this.

      At the cellular level, a neuron's excitability and its rate of synaptic plasticity are mechanistically distinct: excitability is governed by factors such as ion channel expression or membrane resistance, whereas plasticity rates are influenced by molecular pathways involved in synapse and dendritic spine formation and remodeling. While these are independent properties, they are functionally coupled: most synaptic plasticity rules are activity-dependent, so greater excitability can increase the likelihood of plasticity being induced but does not itself guarantee learning.

      Our model reflects this distinction. Increased excitability biases which neurons become activated and thus eligible to undergo plasticity, but actual learning still depends on the plasticity rate itself. This can be seen by comparing the model constant plasticity and excitability (solid blue and green curves in Figure 2C) to the model with only transient excitability (solid blue and green lines in Figure 2E). In both cases, the strength and duration of the memory remain limited by the plasticity rate. We note additionally that, in this network, neurons compete to learn new stimuli: as GCs start to learn, they suppress MC activity through recurrent inhibition which suppresses learning in other GCs who otherwise would have been in position to learn the odor. As a result there is not a significant increase in the overall number of neurons recruited to learn (Figure 2J). In a different network architecture, such as a feedforward network, we would not expect this to be the case; greater excitability in a population of neurons would likely increase the memory by increasing the number of neurons recruited to learn. Transiently enhanced excitability biases which neurons join the memory engram (Figure 2J), but the extent and rate of learning still depend on the plasticity rates themselves. We did note in the original text (now lines 284-286) that this bias in recruitment subtly increases memory stability, but the extent is not great. In principle, a model can be engineered to rely on transiently increased excitability to encode memories in orthogonal subpopulations of neurons and that this could resolve the flexibility-stability dilemma. However, in that case, the number of memories that can be stored within a short time would be bounded by the size of this subpopulation such that even if a large number of odors are presented, mature GCs cannot become part of the engram and the network would likely fail to learn the stimuli. However, when this was tested experimentally (Forest et al. Cereb Cor. 2020), it was found that mature GCs participated in the engram when the number of odors was sufficiently high. Our results are consistent with these experiments: for complex odor environments, neonatal GCs, which are mature during odor exposure, and abGCs both participate in the engrams.

      Author response image 3.

      Simulating learning in more complex odor environments. Top: enrichment consisted of three odor pairs presented sequentially in a random order. Bottom: enrichment consisted of five odor pairs. Left: discriminability of the odor pairs over time. Middle: connectivity between MCs (sorted by odor selectivity) and GCs (sorted by age). In both cases AbGCs develop a clear connectivity structure. In more complex environments neonatal GCs also start to develop a clear connectivity structure. Right: combined engram membership across all stimuli by GC age.

      In sum, transiently increased excitability alone will not make learning any faster, so a fast learning system must have a high plasticity rate. If this plasticity rate stays high, then memories stored in these neurons, even if no longer highly excitable, will be vulnerable as the neurons can still be driven above their plasticity threshold by moderately interfering stimuli and will thus be quickly forgotten. Conversely, if the reviewer is wondering if a greater increase in the plasticity rate of new neurons can compensate for a lack of excitability, this is not the case: if a newborn neuron is not sufficiently driven by the stimulus it will not learn regardless of how high its plasticity rate is.

      (2) The authors do not mention previous theoretical work on the specificity of mitral to granule cell interactions from several groups (Koulakov & Rinberg - Neuron, 2011; Gilra & Bhalla, PLoSOne, 2015; Grabska-Bawinska...Mainen, Pouget, Latham, Nat. Neurosci. 2017; Tootoonian, Schaefer, Latham, PLoS Comput. Biol., 2022), nor work on the relevance of top-down feedback from the olfactory cortex on the abGC during odor discrimination tasks (Wu & Komiyama, Sci. Adv. 2020), or of top-down regulation from the olfactory cortex on regulating the activity of the mitral/tuned cells in task engaged mice (Lindeman et al., PLoS Comput. Biol., 2024), or in naïve mice that encounter odorants (in the absence of specific context; Boyd, et al., Cell Rep, 2015; Otazu et al., Neuron 2015, Chae et al., Neuron, 2022). In particular, the presence of rich topdown control of granule cell activity (including of abGCs) puts into question the plausibility of one of the opening statements of the authors with respect to relying solely on local circuit mechanisms to solve the flexibility-stability dilemma. I think the discussion of this work is important in order to put into context the idea of specific interactions between the abGCs and the mitral cells.

      We thank the reviewer for these detailed and thorough comments, and whole-heartedly agree that it is important to discuss the listed studies in order to contextualize our work through the broader lens of how information is processed in the OB. We have expanded our discussion to further acknowledge and integrate insight from previous theoretical and experimental work cited by the reviewer. (Lines 361-366, 493-550)

      Regarding the importance of top-down feedback, we of course recognize that in practice cortical inputs play a critical role in abGC survival and synaptic integration. However, its nature is not quite clear and is likely variable across behavioral seungs. In the paradigm that we study in the manuscript, there is likely no key reward value or contextual signal that is relayed to the OB. One plausible interpretation is that in this task, cortical feedback provides a random, variable baseline excitatory drive to GCs. This would likely be consistent with many of the listed studies, e.g.

      (1) Glomerular layer targeting of feedback would be explicitly unrelated to glomerular odor specificity, as in Boyd et al.

      (2) GC activity would decrease if these cortical inputs were silenced, resulting in stronger MC responses as in Otazu et al., Chae et al.

      (3) Silencing PCx during learning would prevent GCs from reaching activity-dependent plasticity thresholds, resulting in decreased spine density as in Wu & Komiyama.

      Likewise activating PCx would lead to increased spine density.

      In this interpretation, the effect of top-down input could be captured implicitly by adjusting model parameters such as activity or plasticity thresholds. For the purposes of our study, we opted to neglect these inputs in favor of model simplicity.

      Critically, even if top-down inputs play a substantially larger role, by perhaps even going as far as providing signals to abGCs to modulate their development, the core solution to the flexibility-stability dilemma that we describe stays local: we predict that the memory persists in the same network in which it was formed.

      (3) To what the degree of specific connectivity reflects a specific stimulus configuration, and is a good proxy for determining the stimulus discriminability and memory capacity in terms of temporal activity patterns (difference in latency/phase with respect to the respiration cycle, etc.) which may account to a substantial fraction of ability to discriminate between stimuli? The authors mention in the discussion that this is, indeed, an upper bound and specific connectivity is necessary for different temporal activity patterns, but a further expansion on this topic would help in understanding the limitations of the model.

      We thank the reviewer for raising this important point. Indeed, there have been several recent experimental studies indicating that much of the information needed for olfactory discrimination is encoded in the temporal activity patterns of mitral and tuned cells. Our model does not explicitly simulate these dynamics. It was for this reason that we defined memory in terms of the learned structure of the network rather than by firing rate activity. This is motivated by the idea that learned patterns of connectivity constrain the space of neural activity the network can support, and thus shape stimulus responses. We now make this limitation more explicit in the discussion and clarify that the specific MC–GC connectivity we analyze should be seen as a structural substrate that constrains the possible temporal transformations the network could support (Lines 492-506).

      (4) Reward or reward prediction error signals are not considered in the model. They however are ubiquitous in nature and likely to be encountered and shape the connectivity and activity patterns of the abGC-mitral cell network. Including a discussion of how the model may be adjusted to incorporate reward/error signals would strengthen the manuscript.

      We appreciate the reviewer’s suggestion and agree that reward and reward prediction error signals are critical components of many learning paradigms. We deliberately chose not to model associative learning, reward signals or top-down neuromodulation in this work. Our goal is to investigate the role of adult neurogenesis in a regime where its contribution has been shown to be experimentally necessary. Specifically, we focused on an unsupervised perceptual learning paradigm where adult neurogenesis is required for successful odor discrimination (Moreno et al. PNAS, 2008). In contrast, when the same odors are used in a rewarded learning paradigm, performance remains intact even when adult neurogenesis is ablated (Imayoshi et al., Nat. Neuro., 2008). This dissociation suggests that neurogenesis is dispensable in contexts where reward can guide learning. As such, we argue that isolating the contribution of local circuit dynamics in an unsupervised setting is critical to understanding what neurogenesis is uniquely enabling, especially given the evolutionary cost of maintaining it.

      We agree that extending this work to incorporate reward-driven plasticity or neuromodulatory influences would be a valuable direction for future research. In particular, it could help clarify how different learning paradigms engage distinct abGC cohorts (e.g., Mandairon et al., eLife 2018; Wu & Komiyama, Sci. Adv. 2020), and how task structure shapes memory allocation and engram composition. We have incorporated this into the discussion regarding extending our model to include top down feedback (lines 539-553).

      Specific comments

      (1) Lines 84-86; 507-509; Eq(3): Sensory input is defined by a basal parameter of MCs spontaneous activity (Sspontaneus) and the odor stimuli input (Siodor) but is not clear from the main text or methods how sensory inputs (glomerular patterns) were modeled

      We now clarify in the Methods section "Stimulus model" how the sensory inputs were modeled. Specifically, odor-evoked inputs to mitral cells (Siodor) were generated either as Gaussian profiles across the mitral cell population (Figs. 2,3) or as sparser random patterns (Figs. 4,5). In Figures 2 and 3, the denser Gaussian stimuli require more GCs to learn the odors, aiding in visualization of the connectivity matrix (Figure 2H) and abGC recruitment plots (Figure 2I,J; Figure 3C,E). However, real olfactory stimuli activate a sparse set of MCs, so in Figures 4 and 5 where we address learning of many stimuli, we utilize sparser, binary, stimuli delivered to only 10% of MCs, in range of experimental data (Wachowiak and Cohen, Neuron, 2001). The fact that the stimuli are binary, however, is not realistic and leads to denser representations. This leads to a worst-case scenario for the model as denser memory representations are easier to overwrite. These points has been added explicitly to the Methods section "Stimulus model" to improve clarity.

      (2) Lines 118-122: The used perceptual learning task explanation is done only in the context of the discriminability of similar artificial stimuli using the Fisher discriminant and "Memory" metric. A detailed description of the logic of the perceptual learning task methods and objective, taking into account Comment 1, would help to better understand the model.

      We thank the reviewer for pointing out had not adequately described the task and have updated the main text (lines 125-132) and included a new methods section "Perceptual learning task" to describe it more explicitly. The experiments that inspired the simulation followed an ecological model of discrimination learning (Moreno et al. PNAS 2009): For one hour a day over a ten day "enrichment period", two tea balls containing similar but distinct odors were suspended from the lid of each mouse's home cage. The mice engaged with the stimuli under self-directed conditions, therefore learning through natural experience. As a result the mice use olfactory information to discriminate between the similar stimuli, a skill potentially relevant for navigation or social behaviors.

      In our simulations, we model these experiments as follows. During the enrichment period, the model is stimulated with a randomly selected stimulus chosen from a set of two similar stimuli, corresponding to a mouse choosing to sniff one of the tea balls. During enrichment, in between these bouts of "sniffing", the model only receives spontaneous activity, reflecting the temporal sparsity of sensory input even over the enrichment period. Outside of enrichment, the model again receives only spontaneous input.

      (3) Rapid re-learning of forgotten odor pair is enabled by sensory-dependent dendritic elaboration of neurons that initially encoded the odors and the observed re-learning would occur even if neurogenesis was blocked following the first enrichment and even though the initial learning did require neurogenesis. When this would ever occur in nature? The re-learning of an odor period? Why is this highlighted in the study?

      We believe that this sort of learning is certainly relevant in nature. To clarify: by “learning,” we do not refer to the memory of an entire “odor period”, but simply an altered mapping of specific stimuli. Therefore, forgeung could occur if these specific stimuli are absent from the environment for a period of time, and re-learning would occur when these stimuli are re-encountered. Natural odor environments are highly dynamic, as environmental conditions and social contexts change over time. The odors an animal encounters also depend strongly on its own behavior; as it explores different environments, it may be exposed to particular odors intermittently: it could encounter them in one location, then not return to that location for some time before returning again.

      Such natural variability in odor exposure makes the ability to forget and re-learn especially valuable, allowing the animal to prioritize relevant information while maintaining flexibility. To this end, we show in Figure 5G that the synaptic forgetting of odors is beneficial to the performance of the model because it reduces interference in the network. Therefore we highlight that re-learning enabled by adult neurogenesis is a highly efficient strategy for memory storage and retrieval, which is why he emphasize it in this study.

      (4) Figure 2A: I understand that the ages shown at the bottom of the colored boxes represent the GC age. If so, find a better way to express that to avoid confusing 'GC ages' from the days shown in the perceptual learning task description (Figure 2B).

      We have updated the text in the figure to disambiguate the two and refer to the “days” shown in the perceptual learning task description now as “time relative to enrichment”

      (5) Figure 2B: Clarify how the two-dimensional arrays are arranged to represent the patterns shown. Does each point of the array represent one neuron? If so, are these neurons re-arranged to help the readers visually differentiate patterns A and B? Are the patterns of activity of MCs in the model spatially and temporally sparse as observed in experimental work?

      In Figure 2B, each point in the two-dimensional array represents the activity of a single mitral cell. The layout is purely for visualization—neurons are re-arranged to make the differences between odor patterns A and B visually apparent. This ordering does not reflect anatomical position or model architecture. We revised the Figure 2 caption to say this explicitly.

      Regarding spatial sparseness, as we mentioned in the response to the reviewer’s comment (1), the activity of mitral cells in response to odors is spatially sparse in the model. Regarding temporal sparseness, while the model is not spiking and does not include temporal dynamics within the timescale of the breath, however, odor input is delivered in discrete, odorspecific epochs interleaved with periods of no input, which leads to temporally structured activity patterns. This information has been made explicit in the new methods sections "Stimulus model" and "Perceptual learning task"

      (6) Figure 3C and Line 189: potential confusion between the color code mentioned in the legend for the enrichment and developing periods.

      It appeared to be a confusion in the text and has been corrected (Lines 212-213).

      (7) Figure 5F: For clarity, this would benefit from replacing the bold line with areas in the plot to depict the enrichment periods.

      We agree that replacing the bolded line segments with shaded areas is more clear and have updated the figure accordingly, and appreciate the reviewer's suggestion to clarify the figure.

      (8) Lines 380, 416: Potential role of cortical feedback and or neuromodulation depending on behavioral relevance or permanent exposure? Later mentioned in Lines 467 - 474.

      We have updated the text to acknowledge the role of potential cortical feedback and neuromodulation, now in lines 403-407.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Response to Reviewer Comments:

      We thank the editors and reviewers for their careful consideration of our revised manuscript. Reviewers 2 and 3 indicated that their previous comments had been satisfactorily addressed by our revisions. Reviewer 1 raised several points and our point by point responses can be found below.

      Reviewer #1 (Recommendations For The Authors):

      1) Please clarify the terminology of spontaneous recovery in your study.

      According to Rescorla RA 2004 ( http://www.learnmem.org/cgi/doi/10.1101/lm.77504.), he defines spontaneous recovery as "with the passage of time following nonreinforcement, there is some "spontaneous recovery" of the initially learned behavior. ". So in this study, I thought Test2 is spontaneous recovery while the Test1 is extinction test as most studies do. But authors seem to define spontaneous recovery from the last trial of Extinction3 to the first trial of Test1, which is confusing to me.

      We agree with the reviewer (and Rescorla, 2004) that spontaneous recovery is defined as the return of the initially learned behaviour after the passage of time. In our study, Test 1 is conducted 24-hours after the final extinction session (Extinction 3) and in our view, the return of responding following that 24-hour delay can be considered spontaneous recovery. Rescorla (2004 and elsewhere) also points out that the magnitude of spontaneous recovery may be greater with larger delays between extinction and testing. This in part motivated our second test 7 days following the last extinction session with optogenetic manipulation. We did not find evidence of greater spontaneous recovery in the test 7 days later, however, the additional extinction trials in Test 1 may have reduced the opportunity to detect such an effect.

      2) Why are E6-8 plots of Offset group in Figure 3E and F different?

      We apologise for this error and have corrected it. This was an artifact of an older version of the figure before final exclusions. The E6-8 data is now the same for panels 2E and 2F.

      3) Related to 2, Please clarify what type of data they are in Figure3E,F Figure5H, and I . If it's average, please add error bars. Also, it's hard to see the statistical significance at the current figure style.

      The data in these panels are the mean lever presses per trial as labeled on the y-axis of the figures. In our view, in this instance, error bars (or lines and other markers of significance) detract from the visual clarity of the figure. The statistical approach and outcomes are included in the figure legend and when presented alongside the figure in the final version of the paper should directly clarify these points.

      Reviewer #2 (Recommendations For The Authors):

      The authors have addressed my previous comments to my satisfaction.

      Reviewer #3 (Recommendations For The Authors):

      The authors have adequately addressed each of the points raised in my original review. The paper will make a nice contribution to the field.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      • It would be interesting if the authors would do calcium imaging or electrophysiology from LCNA neurons during appetitive extinction.

      Indeed these are interesting ideas. We have plans to pursue them but ongoing work is not yet ready for publication.

      • LC-NA neuronal responses during the omission period seem to be important for appetitive extinction as described in the manuscript (Park et al., 2013; Sara et al., 1994; Su & Cohen 2022). It would be nice to activate/inactivate LC-NA neurons during the omission period.

      Optogenetic manipulation was given for the duration of the stimulus (20 seconds; when reward should be expected contingent upon performance of the instrumental response). We believe the reviewer is suggesting briefer manipulation only at the precise time the pellet would have been expected but omitted. If so, the implementation of that is complex because animals were trained on random ratio schedules and so when exactly the pellet(s) was earned was variable and so when precisely the animal experiences “omission” is difficult to know with better temporal specificity than used in the current experiments. But we agree with the reviewer that now we see that there is an effect of LC manipulation, in future studies we could alter the behavioral task so that the timing of reward is consistent (e.g., train the animals with fixed ratio schedules or continuous reinforcement, or use a Pavlovian paradigm) where a reasonable assertion about when the outcome should occur, and thus when its absence would be detected, can be made and then manipulation given at that time to address this point.

      • Does LC-NA optoinhibition affect the expression of the conditioned response (the lever presses at early trials of Extinction 1)? It's hard to see this from the average of all trials.

      The eNpHR group responded numerically less overall during extinction. This effect appears greatest in the first extinction session, but fails to reach statistical significance [F(1,15)= 3.512, p=0.081]. Likewise, analysis of the trial by trial data for the first extinction session failed to reveal any group differences [F(1,15)= 3.512, p=0.081] or interaction [trial x group; F(1,15)=0.550, p=0.470].

      Comparison of responding in the first trial also failed to reveal group differences [F(1.15)=1.209, p=0.289]. Thus while there is a trend in the data, this is not borne out by the statistical analysis, even in early trials of the session.

      • While the authors manipulate global LC-NA neurons, many people find the heterogeneous populations in the LC. It would be great if the authors could identify the subpopulation responsible for appetitive extinction.

      We agree that it would be exciting to test whether and identify which subpopulation(s) of cells or pathway(s) are responsible for appetitive extinction. While related work has found that discrete populations of LC neurons mediate different behaviours and states, and may even have opposing effects, our initial goal was to determine whether the LC was involved in appetitive extinction learning. These are certainly ideas we hope to pursue in future work.

      Minor:

      • Why do the authors choose 10Hz stimulation?

      The stimulation parameters were based on previously published work. We have added these citations to the manuscript.

      Quinlan MAL, Strong VM, Skinner DM, Martin GM, Harley CW, Walling SG. Locus Coeruleus Optogenetic Light Activation Induces Long-Term Potentiation of Perforant Path Population Spike Amplitude in Rat Dentate Gyrus. Front Syst Neurosci. 2019 Jan 9;12:67. doi: 10.3389/fnsys.2018.00067. PMID: 30687027; PMCID: PMC6333706.

      Glennon E, Carcea I, Martins ARO, Multani J, Shehu I, Svirsky MA, Froemke RC. Locus coeruleus activation accelerates perceptual learning. Brain Res. 2019 Apr 15;1709:39-49. doi: 10.1016/j.brainres.2018.05.048. Epub 2018 May 31. PMID: 29859972; PMCID: PMC6274624.

      Vazey EM, Moorman DE, Aston-Jones G. Phasic locus coeruleus activity regulates cortical encoding of salience information. Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):E9439-E9448. doi: 10.1073/pnas.1803716115. Epub 2018 Sep 19. PMID: 30232259; PMCID: PMC6176602.

      • The authors should describe the behavior task before explaining Fig1e-g results.

      We agree that introducing the task earlier would improve clarity and have added a brief summary of the task at the beginning of the results section (before reference to Figure 1) and point the reader to the schematics that summarize training for each experiment (Figures 2A and 4D).

      NOTE R2 includes specific comments in their Public review. We have considered those as their recommendations and address them here.

      1) In such discrimination training, Pavlovian (CS-Food) and instrumental (LeverPress-Food) contingencies are intermixed. It would therefore be very interesting if the authors provided evidence of other behavioural responses (e.g. magazine visits) during extinction training and tests.

      In a discriminated operant procedure, the DS (e.g. clicker) indicates when the instrumental response will be reinforced (e.g., lever-pressing is reinforced only when the stimulus is present, and not when the stimulus is absent). This is distinct from something like a Pavlovianinstrumental transfer procedure and so we wish to just clarify that there is no Pavlovian phase where the stimuli are directly paired with food. After a successful lever-press the rat must enter the magazine to collect the food, but food is only delivered contingency upon lever-pressing and so magazine entries here are not a clear indicator of Pavlovian learning as they may be in other paradigms.

      Nonetheless, we have compiled magazine entry data which although not fully independent of the lever-press response in this paradigm, still tells us something about the animals’ expectation regarding reward delivery.

      For the ChR2 experiment, largely paralleling the results seen in the lever-press data, there were no group differences in magazine responses at the end of training [F(2,40)=2.442, p=0.100].

      Responding decreased across days of extinction (when optogenetic stimulation was given) [F(2, 80)=38.070, p<0.001], but there was no effect of group [F(2,40)=0.801, p=0.456] and no interaction between day and group [F(4,40)=1.461, p=0.222]. Although a similar pattern is seen in the test data, group differences were not statistically different in the first [F(2,40)=2.352, p=0.108] or second [F(2,40)=1.900, p=0.166] tests, perhaps because magazine responses were quite low. Thus, overall, magazine data do not present a different picture than lever-pressing, but because of the lack of statistical effects during testing, we have chosen not to include these data in the manuscript.

      For the eNpHR experiment, again a similar pattern to lever-pressing was seen. There were no group differences at the end of acquisition [F(1,15)=0.290, p=0.598]. Responding decreased across days of extinction [F(2, 30)=4.775, p=0.016] but there was no main effect of group [F(1,15)=1.188, p=0.293], and no interaction between extinction and group [F(2,30)=0.070, p=0.932]. There were no group differences in the number of magazine entries in Test 1 [F(1,15)=1.378, p=0.259] or Test 2 [F(1,15)=0.319, p=0.580].

      Author response image 1.

      Author response image 2.

      2) In Figure 1, the authors show the behavioural data of the different groups of control animals which were later collapsed in a single control group. It would be very nice if the authors could provide the data for each step of the discrimination training.

      We are a little confused by this comment. Figure 1, panels E, F, and G show the different control groups at the end of training, for each day of extinction (when manipulations occurred) and for each test, respectively. It’s not clear if there is an additional step the reviewer is interested in? We note neural manipulation only occurred during extinction sessions.

      We chose to compare the control groups initially, and finding no differences, to collapse them for subsequent analyses as this simplifies the statistical analysis substantially; when group differences are found, each of the subgroups has to be investigated (including the different controls means there are 5 groups instead of 3). It doesn’t change the story because we tested that there were not differences between controls before collapsing them, but collapsing the controls makes the presentation of the statistical data much shorter and easier to follow.

      3) Inspection of Figures 2C & 2D shows that responding in control animals is about the same at test 2 as at the end of extinction training. Therefore, could the authors provide evidence for spontaneous recovery in control animals? This is of importance given that the main conclusion of the authors is that LC stimulation during extinction training led to an increased expression of extinction memory as expressed by reduced spontaneous recovery.

      To address this we have added analyses of trial data, specifically comparison of the final 3 trials of extinction to the subsequent three trials of each test. These analyses are included on page 5 of the manuscript and additional data figures can be found as panels 2E and 2F and pasted below.

      What we observe in the trial data for controls is an increase in responding from the end of extinction to the beginning of each test, thus demonstrating spontaneous recovery. Importantly, responding in the ChR2 group does not increase from the end of extinction to the beginning of the test, illustrating that LC stimulation during extinction prevents spontaneous recovery.

      Comparison of the final three trials of Extinction to the three trials of Test 1:

      Author response image 3.

      Comparison of the final three trials of Extinction to the three trials of Test 2:

      Author response image 4.

      Halorhodopsin Experiment Tests 1 and 2, respectively.

      Author response image 5.

      4) Current evidence suggests that there are differences in LC/NA system functioning between males and females. Could the authors provide details about the allocation of male and female animals in each group?

      More females had surgical complications (excess bleeding) than males resulting in the following allocations; control group; 14 males and 8 females; ChR2 group 8 males and 7 females; offset 6 males.

      In our dataset, we did not detect sex differences in training [no main effect of sex: F(1,38)=1.097, p=0.302, sex x group interaction: F(1,38)= 1.825, p=0.185], extinction [no effect of sex; F(1,38)=0.370, p=0.547; no sex x extinction interaction: F(2,76)=0.701, p=0.499 ; no sex x extinction x group interaction: F(2,76)=2.223, p=0.115] or testing [Test 1 no effect of sex: F(1,38)=1.734, =0.196; no sex x group interaction: F(1,38)=0.009, p=0.924; Test 2 no effect of sex: F(1,38)=0.661, p=0.421; no sex x group interaction: F(1,38)=0.566, p=0.456].

      5) The histology section in both experiments looks a bit unsatisfying. Could the authors provide more details about the number of counted cells and also their distribution along the anteroposterior extent of the LC. Could the authors also take into account the sex in such an analysis?

      The antero-posterior coordinates used for cell counts and calculation of % infection rates were between -9.68 and -10.04 (Paxinos and Watson, 2007, 6th Edition) as infection rates were most consistent in this region and it was well-positioned relative to the optic probe although TH and mCherry positive cells were observed both rostral and caudal to this area. For each animal, an average of ~116+/- 25 TH-positive LC neurons as determined by DAPI and GFP positive cells were identified. Viral expression was identified by colocalized mCherry staining. Animals that did not have viral expression in the LC were not included in the experimental groups. We have added these details to the histology results on page 4.

      Males and females showed very similar infection rates (Males, 74%; Females, 72%). While sex differences, such as total number of LC cells or total LC volume have been reported (Guillamon, A. et al. 2005), Garcia-Falgueras et al. (2005) reported no differences in LC volume or number of LC neurons between male and female Long-Evans rats. So while differences may exist in the LC of Long-Evans rats, the cell counts here were comparable between groups (males, 103 +/- 27; females, 129 +/- 17; t-test, p>0.05).

      References:

      1) Garcia-Falgueras, A., Pinos, H., Collado, P., Pasaro, E., Fernandez, R., Segovia, S., & Guillamon, A. (2005). The expression of brain sexual dimorphism in artificial selection of rat strains. Brain Research, 1052(2), 130–138. https://doi.org/10.1016/j.brainres.2005.05.066

      2) Guillamon, A., De Bias, M. R., & Segovia, S. (1988). Effects of sex steroids on the of the locus coeruleus in the rat. Developmental Brain Research, 40, 306–310.

      Reviewer #3 (Recommendations For The Authors):

      MAJOR

      1) It is worth noting that responding in Group ChR2 decreased from Extinction 3 to Test 1, while responding in the other two groups appears to have remained the same. This suggests that there was no spontaneous recovery of responding in the controls; and, as such, something more must be said about the basis of the between-group differences in responding at test. This is particularly important as each extinction session involved eight presentations of the to-betested stimulus, whereas the test itself consisted of just three stimulus presentations. Hence, comparing the mean levels of performance to the stimulus across its extinction and testing overestimates the true magnitude of spontaneous recovery, which is simply not clear in the results of this study. That is, it is not clear that there is any spontaneous recovery at all and, therefore, that the basis of the difference between Group ChR2 and controls at test is in terms of spontaneous recovery.

      The reviewer is correct that there were a different number of trials in extinction vs. test sessions making direct comparison difficult and displaying the data as averages of the test session does not demonstrate spontaneous recovery per se. To address this we have added analyses of trial data and comparison of the final 3 trials of extinction to the subsequent three trials of each test. These analyses are included on page 5 and 6 of the manuscript and additional data figures can be found as panels 2E and 2F and 4 H and I, and pasted below.<br /> What we observe in the trial data for controls is an increase in responding from the end of extinction to the beginning of each test, thus demonstrating spontaneous recovery. Importantly, responding in the ChR2 group does not increase from the end of extinction to the beginning of the test, illustrating that LC stimulation during extinction prevents spontaneous recovery.

      Comparison of the final three trials of Extinction to the three trials of Test 1:

      Author response image 6.

      Comparison of the final three trials of Extinction to the three trials of Test 2:

      Author response image 7.

      Halorhodopsin Experiment Tests 1 and 2, respectively.

      Author response image 8.

      2a) Did the manipulations have any effect on the rates of lever-pressing outside of the stimulus?

      We did not detect any effect of the optogenetic manipulations on rates of lever pressing outside of the stimulus. This is demonstrated in the pre-CS intervals collected on stimulation days (i.e., extinction sessions) where we see similar response rates between controls and the ChR2 and Offset groups as shown below. There was no effect of group [F(2,40)=0.156, 0.856] or group x extinction day interaction [F(2,40)=0.146, p=0.865].

      Author response image 9.

      2b) Did the manipulations have any effect on rates of magazine entry either during or after the stimulus?

      For the ChR2 experiment, there were no group differences in magazine responses at the end of training [F(2,40)=2.442, p=0.100]. Responding decreased across days of extinction (when optogenetic stimulation was given) [F(2, 80)=38.070, p<0.001], but there was no effect of group [F(2,40)=0.801, p=0.456] and no interaction between day and group [F(4,40)=1.461, p=0.222]. Although a similar pattern is seen in the test data, group differences were not statistically different in the first [F(2,40)=2.352, p=0.108] or second [F(2,40)=1.900, p=0.166] tests, perhaps because magazine responses were quite low. Thus, overall, magazine data do not present a different picture than lever-pressing, but because of the lack of statistical effects during testing, we have chosen not to include these data in the manuscript.

      For the eNpHR experiment, again a similar pattern to lever-pressing was seen. There were no group differences at the end of acquisition [F(1,15)=0.290, p=0.598]. Responding decreased across days of extinction [F(2, 30)=4.775, p=0.016] but there was no main effect of group [F(1,15)=1.188, p=0.293], and no interaction between extinction and group [F(2,30)=0.070, p=0.932]. There were no group differences in the number of magazine entries in Test 1 [F(1,15)=1.378, p=0.259] or Test 2 [F(1,15)=0.319, p=0.580].

      Author response image 10.

      Author response image 11.

      2c) Did the manipulations affect the coupling of lever-press and magazine entry responses? I imagine that, after training, the lever-press and magazine entry responses are coupled: rats only visit the magazine after having made a lever-press response (or some number of leverpress responses). Stimulating the LC clearly had no acute effect on the performance of the lever-press response. If it also had no effect on the total number of magazine entries performed during the stimulus, it would be interesting to know whether the coupling of lever-presses and magazine entries had been disturbed in any way. One could assess this by looking at the jointdistribution of lever-presses (or runs of lever-presses) and magazine visits in each extinction session, or across the three sessions of extinction. As a proxy for this, one could look at the average latency to enter the magazine following a lever-press response (or run of leverpresses). Any differences here between the Controls and Group ChR2 would be informative with respect to the effects of the LC manipulations: that is, the results shown in Figure indicate that stimulating the LC has no acute effects on lever-pressing but protects against something like spontaneous recovery; whereas the results shown in Figure 4 indicate that inhibiting the LC facilitates the loss of responding across extinction without protecting against spontaneous recovery. The additional data/analyses suggested here would indicate whether LC stimulation had any acute effects on responding that might explain the protection from spontaneous recovery; and whether LC inhibition specifically reduced lever-pressing across extinction or whether it had equivalent effects on rates of magazine entry.

      Lever-press and magazine response data were collected trial by trial but not with the temporal resolution required for the analyses suggested by the reviewer. We do not have timestamps for magazine entries nor latency data. We can collect this type of data in future studies. At the session or trial level, magazine entries generally correspond to lever-pressing; being trained on ratio schedules, and from informal observation, rats will do several lever-presses and then check the magazine. Rates of each decrease across extinction (magazine data included in response to comment 2b. above). Optogenetic manipulation appeared to have no immediate effect on either response during extinction.

      ROCEDURAL

      1) Why were there three discriminative stimuli in acquisition: a light, white noise, and clicker?

      This was done to be consistent with and apply parameters similar to previous, related studies (Rescorla, 2006; Janak & Corbit, 2011) and to allow comparison to potential future studies that may involve stimulus compounds etc. (requiring training of multiple stimuli).

      2) Why were some rats extinguished to the noise while others were extinguished to the clicker? Were the effects of LC stimulation/inhibition dependent on the identity of the extinguished stimulus?

      Because the animals were trained with multiple stimuli, it allowed us some ability to choose amongst those stimuli to best balance response rates across groups before the key manipulations. The effects of LC manipulation did not differ between animals based on the identity of the extinguished stimulus.

      3) Did the acute effects of LC inhibition on extinction vary as a function of the stimulus identity?

      No

      4) Was the ITI in extinction the same as that in acquisition?

      Yes, the ITI was the same for acquisition and extinction sessions (variable, averaging to 90 seconds). We have added a sentence to the methods (p. 11) to reflect this.

      5) For Group Offset, when was the photo-stimulation applied in relation to the extinguished stimulus: was it immediately upon offset of the stimulus or at a later point in the ITI?

      The group label “Offset” was used to be consistent with Umaetsu et al. (2017) that delivered stimulation 50-70s after a trial. SImilarly, we mean it as discontinuous with the stimulus, not at the termination of the stimulus. We have revised the description of this group on page 11 to clarify the timing of the photostimulation as follows:

      “Animals in the Offset group (and relevant controls) underwent identical training with the exception that stimulation in extinction sessions occurred in the middle of the variable length ITI (45s after stimulus termination, on average).”

      MINOR

      1) "Such recovery phenomena undermine the success of extinction-based therapies..."

      ***Perhaps a different phrasing is needed here: "These phenomena show that extinction-based therapies are not always effective in suppressing an already-established response..."

      We have revised this sentence in line with the reviewer’s suggestion:

      “These phenomena mean that extinction-based therapies are not always successful in suppressing previously-established behaviours” (first paragraph of the introduction).

      2) Typo in para 1 of results: "F(2,19)=0.0.352"

      Thank you for finding this typo. It has been corrected. (p.4)

      3) "As another example of modular functional organization, no improvements to strategy setshifting following global LC stimulation, but improvements were observed when LC terminals in the medial prefrontal cortex were targeted (Cope et al., 2019)." ***This sentence is missing a "there were" before "no improvements".

      Thank you for finding this error. It has been corrected. (p.8)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Butkovic et al. perform a genome-wide association (GWA) study on Arabidopsis thaliana inoculated with the natural pathogen turnip mosaic virus (TuMV) in laboratory conditions, with the aim to identify genetic associations with virus infection-related parameters. For this purpose, they use a large panel of A. thaliana inbred lines and two strains of TuMV, one naïve and one pre-adapted through experimental evolution. A strong association is found between a region in chromosome 2 (1.5 Mb) and the risk of systemic necrosis upon viral infection, although the causative gene remains to be pinpointed.

      This project is a remarkable tour de force, but the conclusions that can be reached from the results obtained are unfortunately underwhelming. Some aspects of the work could be clarified, and presentation modified, to help the reader.

      (Recommendations For The Authors):

      • It is important to note that viral accumulation and symptom development do not necessarily correlate, and that only the former is a proxy for "virus performance". These concepts need to be clear throughout the text, so as not to mislead the reader.

      This has been explained better in line 118-120, “Virus performance has been removed.

      • Sadly, only indirect measures of the viral infection (symptoms) are used, and not viral accumulation. It is important to note that viral accumulation and symptom development do not necessarily correlate and that only the former is a proxy for "virus performance". These concepts need to be clear throughout the text, so as not to mislead the reader. The mention of "virus performance" in line 143 is therefore not appropriate, nor is the reference to viral replication and movement in the Discussion section.

      "Virus performance" was removed. Also, the reference to viral replication and movement in the Discussion section has been removed.

      Now we mention: “We did not measure viral accumulation, but note this is significantly correlated with intensity of symptoms within the Col-0 line (Corrêa et al. 2020), although it is not clear if this correlation occurs in all lines.”

      • Since symptoms are at the center of the screen, images representing the different scores in the arbitrary scales should ideally be shown.

      Different Arabidopsis lines would look different and this could mislead a reader not familiar with the lines. In order to make a representation of our criteria to stablish the symptoms, we believe that a schematic representation is clearer to interpret. Here are some pictures of different lines showing variating symptoms:

      Author response image 1.

      • Statistical analyses could be added to the figures, to ease interpretation of the data presented.

      Statistical analysis can be found in methods. We prefer to keep the figure legend as short as possible.

      • The authors could include a table with the summary of the phenotypes measured in the panel of screened lines (mean values, range across the panel, heritability, etc.).

      These data are plotted in Fig. 1. We believe that repeating this information in tabular form would not contribute to the main message of the work. Phenotype data and the code to reproduce figure 1 are available at GitHub (as stated in Data Availability), anyone interested can freely explore the phenotypes of the screened lines.

      • The definition of the association peak found in chromosome 2 could be explained further: is the whole region (1.5 Mb) in linkage disequilibrium? How many genes are found within this interval, and how were the five strong candidates the authors mention in line 161 selected? It is also not clear which are these 5 candidates, apart from AT2G14080 and DRP3B - and among those in Table 1 (which, by the way, is cited only in the Discussion and not in the Results section)? Why were AT2G14080 and DRP3B in particular chosen?

      We have replaced Table 1 with an updated Table S1 listing all genes found within the range of significant SNPs for each peak. We now highlight a subset of these genes as candidate genes if they have functions related to disease resistance or defence, and mentioned them explicitly in the text (lines 173-179. We have explicitly described how this table was constructed in the methods (lines 525-538).

      • Concerning the validation of the association found in chromosome 2 (line 169 and onward): the two approaches followed cannot be considered independent validations; wouldn't using independent accessions, or an independent population (generated by the cross between two parental lines, showing contrasting phenotypes, for example) have been more convincing?

      We aim to compare the hypothesis that the association is due to a causal locus to the null hypothesis that the observed association is a fluke due to, for example, the small number of lines showing necrosis. If this null hypothesis is true then we would not expect to see the association if we run the experiment again using the same lines. An alternative hypothesis is that the genotype at the QTL and disease phenotypes are not directly causally linked, but are both correlated with some other factor, such as another QTL, or maternal effects. We agree that an independent sample would be required to exclude the latter hypothesis, but argue that the former is the more pertinent. We have edited the text to be explicit about the hypothesis we are testing, and altered the language to shift the focus from ‘validation’ to ‘confirming the robustness’ of the association (line 182).

      • Regarding the identification of the transposon element in the genomic region of AT2G14080: is the complementation of the knock-out mutant with the two alleles (presence/absence of the transposon) possible to confirm its potential role in the observed phenotype?

      This could be feasible but we cannot do it as none of the researchers can continue this project.

      • On the comparison between naïve and evolved viral strains: is the evolved TuMV more virulent in those accessions closer to Col-0?

      This is not something we have looked at but would certainly be an interesting follow-up investigation.

      • The Copia-element polymorphism is identified in an intron; the potential functional consequences of this insertion could be discussed. In the example the authors provide, the transposable element is inserted into the protein-coding sequence instead.

      We now state explicitly that such insertions are expected to influence expression; beyond that we can only speculate. We have removed the reference to the insertion in the coding sequence.

      • The authors state in line 398 that "susceptibility is unquestionably deleterious" - is this really the case? Are the authors considering susceptibility as the capacity to be infected, or to develop symptoms? Viral infections in nature are frequently asymptomatic, and plant viruses can confer tolerance to other stresses.

      We have tone down the expression and clarify our wording: “Given that potyvirus outbreaks are common in nature (Pagán et al., 2010) and susceptibility to symptomatic infection can be deleterious”

      Additional minor comments:

      • In Table 1, Wu et al., 2018 should refer to DRP2A and 2B, not 3B.

      We have removed Table 1 altogether.

      • Line 126: a 23% increase in symptom severity is mentioned, but how is this calculated, considering that severity is measured in four different categories?

      This is the change in mean severity of symptoms between the two categories.

      • Figure 1F: "...symptoms"

      Fixed.

      • Line 179: "...suggesting an antiviral role..."

      Changed.

      • Lines 288-300: This paragraph does not fit into the narrative and could be omitted.

      It has been removed and some of the info moved to the last paragraph of the Intro, when the two TuMV variants were presented.

      • Lines 335-337: The rationale here is unclear since DRP2B will also be in the background - wouldn't DRPB2B and 3B be functionally redundant in the viral infection?

      Our results suggest that DRPB3B is redundant with DRPB2B for the ancestral virus but not for the evolved viral strain. We speculate that the evolved viral isolate may have acquired the capacity to recruit DRPB3B for its replication and hence it produces less symptoms when the plant protein is missing.

      We have spotted a mistake that may have add to the confusion. Originally the text said “In contrast, loss of function of DRP3B decreased symptoms relative to those in Col-0 in response to the ancestral, but not the evolved virus”. The correct statement is “In contrast, loss of function of DRP3B decreased symptoms relative to those in Col-0 in response to the evolved, but not the ancestral virus.”  

      Reviewer #2 (Public Review):

      The manuscript presents a valuable investigation of genetic associations related to plant resistance against the turnip mosaic virus (TuMV) using Arabidopsis thaliana as a model. The study infects over 1,000 A. thaliana inbred lines with both ancestral and evolved TuMV and assesses four disease-related traits: infectivity, disease progress, symptom severity, and necrosis. The findings reveal that plants infected with the evolved TuMV strain generally exhibited more severe disease symptoms than those infected with the ancestral strain. However, there was considerable variation among plant lines, highlighting the complexity of plant-virus interactions.

      A major genetic locus on chromosome 2 was identified, strongly associated with symptom severity and necrosis. This region contained several candidate genes involved in plant defense against viruses. The study also identified additional genetic loci associated with necrosis, some common to both viral isolates and others specific to individual isolates. Structural variations, including transposable element insertions, were observed in the genomic region linked to disease traits.

      Surprisingly, the minor allele associated with increased disease symptoms was geographically widespread among the studied plant lines, contrary to typical expectations of natural selection limiting the spread of deleterious alleles. Overall, this research provides valuable insights into the genetic basis of plant responses to TuMV, highlighting the complexity of these interactions and suggesting potential avenues for improving crop resilience against viral infections.

      Overall, the manuscript is well-written, and the data are generally high-quality. The study is generally well-executed and contributes to our understanding of plant-virus interactions. I suggest that the authors consider the following points in future versions of this manuscript:

      1. Major allele and minor allele definition: When these two concepts are mentioned in the figure, there is no clear definition of the two words in the text. Especially for major alleles, there is no clear definition in the whole text. It is recommended that the author further elaborate on these two concepts so that readers can more easily understand the text and figures.

      We agree that the distinction between major/minor alleles and major/minor associations in our previous manuscript may have been confusing. In the current manuscript we now define the minor allele at a locus as the less-common allele in the population (line 167). We have removed references to major/minor associations, and instead refer to strong/weak associations.

      1. Possible confusion caused by three words (Major focus / Major association and major allele): Because there is no explanation of the major allele in the text, it may cause readers to be confused with these two places in the text when trying to interpret the meaning of major allele: major locus (line 149)/ the major association with disease phenotypes (line 183).

      See our response to the previous comment.

      1. Discussion: The authors could provide a more detailed discussion of how the research findings might inform crop protection strategies or breeding programs.

      We would prefer to restrain speculating about future applications in breeding programs.

      (Recommendations For The Authors):

      1. Stacked bar chart for the Fig 1F. It is recommended that the author use the form of a stacked bar chart to display the results of Fig 1F. On the one hand, it can fit in with the format of Fig 1D/E/G, on the other hand, it can also display the content more clearly.

      We think the results are easier to interpret without the stacked bar chart.

      1. Language Clarity: While there are no apparent spelling errors, some sentences could be rewritten for greater clarity, especially when explaining the results in Figure 1 and Figure 2.

      We have reviewed these sections and attempted to improve clarity where that seemed appropriate.

      There are some possibilities to explore in the future. For example: clarity of mechanisms for the future. While the study identifies genetic associations, it lacks an in-depth exploration of the underlying molecular mechanisms. Elaborating on the mechanistic aspects would enhance the scientific rigor and practical applicability of the findings.

      Yes, digging into the molecular mechanisms is an ongoing task and will be published elsewhere. It was out of the scope of this already dense manuscript.  

      Reviewer #3 (Public Review):

      Summary of Work

      This paper conducts the largest GWAS study of A. thaliana in response to a viral infection. The paper identifies a 1.5 MB region in the chromosome associated with disease, including SNPs, structural variation, and transposon insertions. Studies further validate the association experimentally with a separate experimental infection procedure with several lines and specific T-DNA mutants. Finally, the paper presents a geographic analysis of the minor disease allele and the major association. The major take-home message of the paper is that structural variants and not only SNPs are important changes associated with disease susceptibility. The manuscript also makes a strong case for negative frequency-dependent selection maintaining a disease susceptibility locus at low frequency.

      Strengths and Weaknesses

      A major strength of this manuscript is the large sample sizes, careful experimental design, and rigor in the follow-up experiments. For instance, mentioning non-infected controls and using methods to determine if geographic locus associations were due to chance. The strong result of a GWAS-detected locus is impressive given the complex interaction between plant genotypes and strains noted in the results. In addition to the follow-up experiments, the geographic analysis added important context and broadened the scope of the study beyond typical lab-based GWAS studies. I find very few weaknesses in this manuscript.

      Support of Conclusions

      The support for the conclusions is exceptional. This is due to the massive amount of evidence for each statement and also due to the careful consideration of alternative explanations for the data.

      Significance of Work

      This manuscript will be of great significance in plant disease research, both for its findings and its experimental approach. The study has very important implications for genetic associations with disease beyond plants.

      (Recommendations For The Authors):

      Line 41 - Rephrase, not clear "being the magnitude and sign of the difference dependent on the degree of adaptation of the viral isolate to A. thaliana."

      Now it reads: “When inoculated with TuMV, loss-of-function mutant plants of this gene exhibited different symptoms than wild-type plants, where the scale of the difference and the direction of change between the symptomatology of mutant and wild-type plants depends on the degree of adaptation of the viral isolate to A. thaliana.”

      Line 236 - typo should read: "and 21-fold"

      Changed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      I would suggest that the authors focus on what I think is the main goal of the work, namely, to consider the whole cell contour when characterizing cell shape instead of only some points on the contour. A reference to the connection with Minkowski tensors and the biologically relevant mathematical consequences of this connection would suffice; a detailed definition of the Minkowski tensors does not seem to be necessary. Especially because you do not really use them. You could use the analysis of the simulation data to explain what the γ<sub>p</sub> miss and for which statements they would be sufficient.

      We argue that the explanation of Minkowski tensors is helpful and should remain in the Methods and materials section. There are two reasons: First, our argumentation relays on the robustness and stability properties of Minkowski tensors. Introducing q<sub>p</sub> without the connection to Minkowski tensors would not allow us to make these statements. Second, Minkowski tensors seem not well known in the community, otherwise measures like γ<sub>p</sub> would not have been introduced. Furthermore, readers not interested in the technical details could skip this part of the manuscript and directly go to the Results section. Concerning the questions, what the γ<sub>p</sub> miss and for which statements they would be sufficient, the answer from a purly mathematical point of view is rather simple: As γ<sub>p</sub> does not share robustness and stability it should not be used in any case! The provided results on computational and experimental data demonstrate the consequences of using such measures. In case of the proposed nematic-hexatic transition in Armengol-Collade et al. (2023) the consequence is severe, as this transition is specific only to the used method but not to the underlying physics. A second aspect which we now further highlight is the influence of approximating a cell by a polygon. We demonstrate that this approximation is responsible for a strong hexatic order on the cellular scale in the considered MDCK data from Armengol-Collade et al. (2023).

      It is not clear to me what we should learn about the two tissue models by using q<sub>2</sub> and q<sub>6</sub> to quantify cell shape. Can you clearly formulate one or more conclusions?

      What we can learn from the research is a dependence of q<sub>p</sub> on model parameters in the two tissue models is

      increases with higher activity or deformability

      decreases with higher activity or deformability.

      Furthermore, q<sub>2</sub> and q<sub>6</sub> are independent and describe distinct properties. Using these models as a basis to coarse-grain and derive continuous models on the tissue scale, these results indicate that more general p-atic liquid crystal theories should be used and the simplest nematic liquid crystal theories might not be sufficient.

      The experimental data and their analysis does not seem to add anything to the work. Do you report only data from independent measurements, or did you consider all images of a monolayer?

      As we now also analyze experimental data from Armengol-Collado et al. (2023) which confirm our findings on independency of q<sub>2</sub> and q<sub>6</sub> and also confirm that the proposed nematic-hexatic transition is only specific to the use of γ<sub>p</sub> for characterizing the shape, additional experimental data are indeed no longer needed. We, therefore, skip the detailed analysis of this data and only keep the results in Fig 1 and Fig 2 and the corresponding figures in the appendix as illustrating examples.

      L13: ”P-atic liquid crystal theories offer new perspectives on how cells self-organize (...)” This is a difficult entry, because the average reader of eLife might not be familiar with p-atic liquid crystals.

      We agree that p-atic liquid crystals might not be familiar to the average reader. For this reason we introduce orientational order in the introduction with examples demonstrating that not only nematic, but also tetratic and hexatic order have been identified in tissue and introduce the different symmetries. Furthermore, we provide examples for p-atic liquid crystals from other fields and various references. In the conclusion, we also cite models for p-atic liquid crystal theories. Even if the average reader is not familiar with these theories, it should become evident that nematic order might not be sufficient to describe tissue as other symmetries are present as well.

      L32: ”nematic” needs to be introduced.

      Nematic order is already explained as rotational order with 180° degrees. The references cited discuss nematic liquid crystals in the context of morphological changes in tissue. We therefore only added a standard text book as reference for liquid crystal theories and refrain introducing it in more detail in the manuscript.

      Figure 1: Why do you show the data for q<sub>3</sub>, q<sub>4</sub>, and q<sub>5</sub>, which you do not really consider in this manuscript? Same for Figure 2. Why not combine the two figures? Furthermore, you show q<sub>p</sub> without having defined them yet.

      We consider all p \= 2,3,4,5,6, but focus on p = 2,6 in the main text and p = 3,4,5 in the appendix. Figures 1 and 2 essentially only introduce the subject and help to relate p-atic order to cell shapes and introduce the methodology to analyze the data. Our conclusion is that all p can be important and should be considered in continuous descriptions of tissue.

      Equation 1: The notation is confusing: the domain of integration (C or ∂C) also appears as the variable you integrate.

      The equation is correct. The variable of integration is 1 or H and the domain of integration is C (cell) or ∂C (cell contour).

      L68: ”a snapshot of the considered monolayer of wild-type MDCK cells”. Did you analyse only one monolayer? Please, provide information about the number of monolayers that were imaged and how many cell shapes were analyzed.

      We have analyzed one monolayer and have added the missing information.

      L86: ”field-specific prefactors” I do not understand what is meant by these.

      Different communities, e.g. physics, mathematics, cosmology, .... use different prefactors in the definition. We have removed this statement.

      L89: ”Hadwiger’s characterization theorem”. What is this?

      This mathematical result is important to claim robustness and stability, it can be found in the cited reference.

      L104: ”the essential property is the continuity”. Essential for what?

      Essential ”for our purpose” to characterize the shape of cells by a robust method.

      L120: ”the theory also guarantees robust description of p-atic orientation for p = 3,4,5,6,...” I do not understand what you mean.

      The previous examples only consider p \= 2. However, the cited theoretical results also hold for p = 3,4,5,6,..

      Equations (5) and (6): You define ψ<sub>p</sub>(C) twice. Are the definitions equivalent? Why do you need both?

      This is not a different definition, equation (6) is a reformulation which is more useful for our purpose. But we indeed define ϑ<sub>p</sub> twice. We now use a new symbol to distinguish ϑ<sub>p</sub> in Equation 7 and 9.

      Figure 4: ”The visualization uses rotationally-symmetric direction fields (known as p-RoSy fields in computer graphics (Vaxman et al., 2016)).” I guess that you have used these fields already in Figure 1, so why introduce them only now?

      We have moved this comment to Figure 1.

      Figure 6: Using a few discrete values cannot illustrate continuity. Also, the ”jump” in γ<sub>p</sub> results from deleting a vertex, so I doubt that this is a fair comparison. Still, I think that it is important to point out to the reader that the value γ<sub>p</sub> depends on the number of vertices (here, I allow that two edges connected by a vertex are aligned).

      We adjusted the caption to make our point more clear. The last image is a triangle and according to the definition of γ<sub>p</sub> is, therefore, described by only three vertices. So, it is indeed a fair comparison. The reviewer is right that the value of γ<sub>p</sub> has a strong dependency of the number of used vertices, this is exactly the point that we are trying to make with this figure. Also, adding vertices artificially to make γ<sub>p</sub> continuous leads to more problems, as the values for γ<sub>p</sub> change if we change the number of vertices. But an equilateral triangle should be recognized as an equilateral triangle, no matter if there is an artificial fourth vertex or not. The triangle in our picture and the triangle that the reviewer mentioned (so our triangle with an artificial fourth vertex) both have the shape of an equilateral triangle, yet for one it is |γ<sub>3</sub>| = 1.0 and for the other one it is |γ<sub>3</sub>| = 0.935.

      While we agree on the reviewers statement about continuity, we did not modify the sentence, as the meaning should be clear.

      L160: The definition of the center of mass is incorrect as it is not that of an extended object whose contour is defined by a polygon, but only of the set of vertices. In Figure 6 you write ”the choice of the center of mass highly influences the value of γ<sub>p</sub>” - is there really a choice of the center of mass? I thought that it was uniquely defined.

      We here only repeat the definition from Armengol-Collado et al. (2023) in order to be able to directly compare our analyses with the results presented therein. We adjusted the caption to be more clear.

      L166: What is the weighting you refer to in Equation 9?

      We apologize, the reference is to Equation 8. We have modified this.

      L312: ”Quantifying orientational order in biological tissues can be realized by Minkowsky tensors”. As mentioned above, you do not really use them, but use Equation (7), which can be defined without reference to Minkowski tensors.

      Eq. (7) is part of the irreducible representations of the Minkowsky tensor. Therefore the sentence is correct.

      L318: I do not quite understand the link between being able (or not) to compare q<sub>p</sub>’s for different values of p and the interpretability of q<sub>2</sub> and q<sub>6</sub>. Also, since you introduce q<sub>p</sub>, how can the question about their comparability be a recurrent challenge? Finally, would you agree that even though a comparison between the absolute values of q<sub>2</sub> and q<sub>6</sub> is inappropriate, one can still meaningfully compare relative changes as a parameter is changed or when comparing cells in different conditions?

      We have modified the sentence. Furthermore we agree that one can still meaningfully compare relative changes as a parameter is changed, as we do. However, our claim that q<sub>2</sub> and q<sub>6</sub> are independent, does not allow to conclude any kind of nematic-hexatic phase transition. We have now provided further evidence using the published data of Armengol-Collado et al. (2023), which unequivocally supports this statement. We would also like to remark that the detection of a phase-transition requires a single order parameter, which cannot exist as q<sub>2</sub> and q<sub>6</sub> are independent.

      We have further explained this in the main text.

      Figure 7: The axes are not labeled.

      We added the labels.

      L359: ”q<sub>2</sub> and q<sub>6</sub> values cluster tightly”, L362 ”q<sub>2</sub> and q<sub>6</sub> values become highly scattered” Please, quantify.

      We kept these formulations but have added statistical measures to these qualitative descriptions, see Supplementary Figures to Fig 7 for the distance correlation and the P-values of the distance correlation. These data support our claim of independence.

      L362: ”each q<sub>2</sub> value spans a broad range of q<sub>6</sub> values and vice versa, demonstrating their independence”. Please, use a quantitative test of statistical independence.

      We have added statistical information by using the distance correlation and statistical tests, see Supplementary Figures to Fig 7. Similar results are obtained for the Pearson correlation and corresponding tests. However, they are not included as the distance correlation is more general.

      L371: Please, define Q<sub>2</sub> and Q<sub>6</sub> in the main text.

      We have now added the definition to the Materials and methods section.

      L420: A reference seems to be missing.

      Thanks for pointing this out. This was a formatting error, we only wanted to cite Balasubramaniam et al. (2021).

      L425: ”strong dependence of cell shape on cell density”. But q<sub>6</sub> seems to be rather independent of density, see Figure 11. Also, what do you mean by ”strong”? Can you quantify?

      The dependency of the cell shape on the cell density is shown in detail in (Eckert et al., 2023). Furthermore, to describe the cell shape the values for all p are needed. So the change in q<sub>2</sub> already indicates a change in the overall cell shape even as q<sub>6</sub> is barely changing. As we excluded these experimental results now in favor of the experimental data also used in Armengol-Collado et al. (2023), we did not add further evaluations regarding cell density.

      L453 ”These divergences [nonmonotonic dependence of γ<sub>p</sub> on activity or deformability] highlight the limitations of γ<sub>p</sub> in capturing consistent patterns”. I am not sure to follow your argument here.

      Besides the quantitative differences seen in comparing Fig. 1 and Fig 2 with the corresponding figures in the appendix, these results show qualitative differences. Using a method which is not robust and not continuous leads to qualitative different results. The nonmonotonic dependence of γ<sub>p</sub> is specific to the method but not to the underlying physics.

      Appendix 3 - Figure 20: It is not clear how to compare this figure to Figure 3e of Armengol-Collado et al 2023. Please, provide more details.

      Appendix 3 - Figure 20 (Appendix 3 - Figure 25 in the revised version) and Figure 3e in Armengol-Collado et al. (2023) cannot be directly compared. Fig 3e shows results of experiments and multiphase field simulations for one parameter stetting and Fig 20 results of the active vertex model for various parameter settings. But both are considered using γ<sub>p</sub> and Γ<sub>p</sub>. We have added these computation, see Fig. 13, which indeed reproduces the results from Fig 3e. We refrain from considering corresponding plots to Fig 20 for the multiphase field model, as this first requires computing the vertices and no additional information can be expected.

      Reviewer 2:

      The manuscript lacks statistical information. The following should be addressed: How often have the experiments been performed? How many monolayers have been analyzed? How many time steps have been considered and in what duration? How many cells have been included in the analysis? What are the p-values to determine if q<sub>p</sub>’s (Figure 2, panel a) and γ<sub>p</sub>’s (Appendix 3-Figure 17, panel a) are significantly different? Same figures: How many cells and experiments have been considered here? Figure 11: What is the density of cells for each condition? Please provide the corresponding values. How significant are the differences? How many times has the experiment been repeated? Figure 12: Due to cell proliferation, the cell density changes over time. Does this need to be taken into account?

      We agree, our information have only been qualitative. We have added the missing information. Especially we added statistical information by using the distance correlation and statistical tests, see Supplementary Figures to Fig. 7. Similar results are obtained for the Pearson correlation and corresponding tests (not included). As we excluded the experimental results previously shown in Figure 11 and Figure 12, in the revised version in favor of the experimental data that is already published in Armengol-Collado et al. (2023), we did not add further statistics regarding this. We added the number of frames and cells in the text.

      The image analysis part of the Method section states that time-series were xy-drift corrected, and cells were tracked. However, the manuscript does not contain results of dynamical data, timedependent analyses, or discussions of how q<sub>p</sub> changes over time. The authors mention that the fluidity of the tissue was confirmed by the MSD, neighbor number variance, and the self-intermediate scattering function, but none of the results are shown in the manuscript. I would like to ask the authors to provide the results and related content in the Method section.

      We have modified the description and removed all parts related to dynamical data. Due to the heavy overload of images in the manuscript we refrain from providing all the results for the phase diagram to distinguish solid and fluid phase. These measures have been provided previously for the considered modeling approaches and provide here only a side remark. Our results do not depend on an exact localization of a solid-fluid phase boundary.

      Additional information is missing in the Image analysis part of the Method section. Could the authors provide the information on the image analysis steps between obtaining the segmented image and inputting the parameters for the Minkowski tensor? This should include how the normal vectors have been determined and whether this has been done for all pixels along the contour.

      We added further details in the section Extraction of the contour in Experimental setup in Methods and Materials and also provide the code to compute q<sub>p</sub> for segmented images.

      The authors have analyzed low-resolution phase contrast images acquired with a 10x objective to experimentally support their introduced Minkowski tensors. This may have decreased the resolution of the cell boundary detection and its curvature. I strongly suggest imaging the tissue with higher magnification (40x or 63x) and/or fluorescent markers to visualize the cell boundaries in high quality. This would allow the authors to distinguish between circles and circle-like shapes (lines 432-434) and to further investigate differences between MDCK wild-type and MDCK E-cad KO cells.

      We agree that higher resolution of the images would be beneficial. However, we are convinced that this will not influence our findings. Instead of performing the experiments with higher magnification or using fluorescent markers, we have considered the experimental data from Armengol-Collado et al. (2023) to support our results.

      The authors have coarse-grained the shape function, Γ<sub>p</sub>, and have chosen the active vertex model (Appendix 3-Figure 20) for comparison with the Minkowski tensors, Q<sub>p</sub> (Appendix 2 Figure 13). In both figures, the hexatic-nematic crossover does not occur. Armengol-Collado et al. have previously reported that the Voronoi model failed to achieve the hexatic-nematic crossover and argued that this is due to the artificial enhancement of the polygon’s hexagonality, leading to high hexatic order at the tissue scale. Since the authors have used the Voronoi-tailing method (line 196), I would like to ask the authors to compare the multiphase field models for Γ<sub>p</sub> andQ<sub>p</sub> instead.

      We would like to mention that we do not consider a Voronoi model but an active vertex model. A Voronoi model is only used for initialization. Both models are certainly related but not identical and claims for a Voronoi model do not need to hold for an active vertex model. The suggested comparison for the multi phasefield model is not an easy task as it requires to compute the vertices from the phase field variables. There are gaps between cells and a reliable algorithm to identify the vertices is a task on its own. We, therefore, refrain from doing these calculations. Instead, we have used the experimental data from Armengol-Collado et al. (2023) for which the polygonal information are provided, see Figure 11. Especially for p \= 6, strong differences can be seen by comparing the PDF obtained by the full shape and the polygonal shape. Indeed, the strong hexatic order at the cellular scale is only a consequence of the approximation by polygons. With this result analysing the multi phasefield data by γ<sub>p</sub> does not add any new information as this first requires an approximation by polygons.

      The authors show the q<sub>p</sub> distributions for the experimental systems (Figure 2, Figure 11). For completeness, I would like to ask the authors to also coarse-grain q<sub>p</sub> and γ<sub>p</sub> of the experimental data as shown for the computational models in Appendix 2 - Figure 13 and Appendix 2 - Figure 14. It would be interesting to see if the hexatic-nematic crossover appears. I would recommend that the authors avoid using the Voronoi tailing of the experimental system, as this may fail to obtain the crossover as explained in (5) above. Instead, I suggest using the real vertex positions for γ<sub>p</sub>, which can be obtained from the segmented images.

      It remains open what is meant by ”the real vertex positions for γ<sub>p</sub>, which can be obtained from the segmented images”. Segmenting the images leads to smooth contours, partly even with gaps between cells. As the magnitude of γ<sub>p</sub> depends on the number of points used in the calculation it is not meaningful to use all points of the contour for calculating γ<sub>p</sub>, as this would lead to artificially low values for |γ<sub>p</sub>|. Identifying the vertex positions for an approximating polygon is an issue of its own and the consequence of this approximation is already mentioned above. For a comparison we therefore added the experimental data from Armengol-Collado et al (2023) and used the provided vertex positions to compute q<sub>p</sub> and γ<sub>p</sub> as well as the raw data and performed the segmentation and used these data to compute q<sub>p</sub>. See Figure 11. These results confirm our findings and show that the proposed nematic-hexatic phase transition is specific to γ<sub>p</sub> to characterize shape.

      In order to show that shape descriptors like the shape function, γ<sub>p</sub>, introduced by Armengol-Collado et al., ’fail to capture the nuance of irregular shapes’ (line 445), the authors have compared γ<sub>p</sub> with the Minkowski tensors, q<sub>p</sub>, using the same dataset (Figure 1 with Appendix 3 - Figure 16, Figure 2 with Appendix 3 - Figure 17, and Figure 4 with Appendix 3 - Figure 15 Appendix 3). I agree that γ<sub>p</sub> and q<sub>p</sub> are different, not showing identical values. However, I see no evidence in these figures that q<sub>p</sub> describes the symmetry of a cell better than γ<sub>p</sub>, since the values are similar and vary quite similarly between different p-atic orders. What is the quantitative difference that shows the failure of the shape function to capture the nuance of irregular shapes?

      The statement already follows from the mathematical properties of robustness and stability, which is illustrated in Fig. 6. The mentioned comparisons for simulation and experimental data only demonstrate that the lack of robustness and stability of γ<sub>p</sub> also leads to different results if applied to averages of cell measures. The differences are twofold, first the approximation of cells by polygons leads to different results, and second even for polygons different results follow, as only one approach is continuous and the other not. This has strong consequences for the proposed nematic-hexatic phase transition if coarse-grained. Our added results for the experimental data from Armengo-Collado et al. (2023) show that this behavior is not a physical feature but only specific to the use of γ<sub>p</sub>.

      The authors claim that the Minkowski tensors provide a ’reliable framework’ and that this framework ’opens new pathways for understanding the role of orientational symmetries in tissue mechanics and development’ (line 78-79). However, the p-atic orders in the experimental systems peak at very low orders of q<sub>p</sub> < 0.3, which may not allow conclusions about (non-)dominant orientational symmetry(ies) of cells. Can this framework be applied to experimental systems? Since the Minkowski tensors display the independence of the hexatic and nematic symmetry, the variations of cell shapes in experimental systems are too strong to provide any additional results (line 437), as stated by the authors, and no crossover was found, while the crossover was reported by Armengol-Collado et al., what new pathways can be opened to study tissues?

      We have added a comparison with experimental data from Armengol-Collado et al. (2023) and demonstrate that the proposed nematic-hexatic transition is only specific to the use of γ<sub>p</sub> for characterizing the shape. So our results first of all essentially close the ”pathway for understanding the role of orientational symmetries in tissue mechanics and development”, which was proposed on this nematic-hexatic transition. On the other side, even if q<sub>p</sub> peaks at relatively low values, the results demonstrate independence of the measures for different p’s, for two different modeling approaches and two different sets of experimental data. This motivates to consider p-atic order for different p simultaneously. Such theories of ”multi”-p-atic liquid crystals, as proposed in the conclusions, are the mentioned new pathways.

      In principle, the introduced Minkowski tensors integrate the orientation of the normal vectors (Equation 6) and consider the perimeter of the contour (Equation 1). Do the tensors distinguish between convex and concave curvature since both are present in tissues? Does a square with 4 concave and a square with 4 convex edges (same curvature) have the same q<sub>p</sub> values?

      For the specific situation of a square with 4 concave or 4 convex edges even p would lead to the same orientation and the same value for q<sub>p</sub>, as even p have a 180 degree symmetry. Odd p would result in the same value for q<sub>p</sub> but in a different orientation ϑ<sub>p</sub>. In more general cases, e.g. shapes with concave and convex edges, no general statements can be made. In general the theoretical results on stability of q<sub>p</sub> only hold for convex shapes. However, as discussed in Methods and materials the known counterexamples for concave shapes are not relevant for cell shapes.

      In lines 169-172 and Figure 6, the authors report a jump in γ<sub>p</sub>. Why has the fourth vertex in the last image been removed? The vertices are essential for the calculation of γ<sub>p</sub>. If the fourth vertex is not removed, the following values result: γ<sub>3</sub> = 0.935 and γ<sub>4</sub> = 0.474, which leads to changes of the same order of magnitude as those of q<sub>p</sub>. I think it is therefore not the choice of the center of mass that ’heavily influences the value of γ<sub>p</sub>’, but the removal of the fourth vertex.

      We adjusted the caption to make our point more clear. The last image is a triangle and according to the definition of γ<sub>p</sub> is therefore described by only three vertices. The reviewer is right that the value of γ<sub>p</sub> has a strong dependency of the number of used vertices, this is exactly the point that we are trying to make with this figure. An equilateral triangle should be recognized as an equilateral triangle, no matter if there is an artificial fourth vertex or not. The triangle in our picture and the triangle that the reviewer described (so our triangle with an artificial fourth vertex) both have the shape of an equilateral triangle, yet for one |γ<sub>3</sub>| = 1.0 and for the other one it is |γ<sub>3</sub>| = 0.935. This can be seen even more clearly if even more artificial vertices on the outline of the equilateral triangle are added, which will decrease |γ<sub>3</sub>| even more. Furthermore, we think there was a misunderstanding regarding our statement about the center of mass. The general problem of γ<sub>p</sub> - so the dependence of the values on the number of vertices - is independent of the calculation of the center of mass. The exact values of γ<sub>p</sub> on the other hand depend on the choice of this. We follow Armengol-Collado et al. (2023) and use the mean of all vertex coordinates as center of mass. If the reviewer would use the center of mass of the equilateral triangle and do the same calculations the resulting values for γ<sub>p</sub> would be different. This is what we meant with ’heavily influences the value of γ<sub>p</sub>’.

      In Appendix 3 - Figure 18, the authors show that the shape function, γ<sub>6</sub>, exhibits a non-monotonic trend as a function of activity and deformability. I have no objection to this statement. However, I would like to ask the authors to check the values for γ<sub>6</sub>. In the bottom-left corner, for example, γ<sub>6</sub> = 0.55. This value seems very low to me. In Appendix 3-Figure 20, |Q<sub>6</sub>| for R/Rcell = 2 is already in this range, while |Q<sub>6</sub>| for R/Rcell = 1 (not shown), corresponding to γ<sub>6</sub>, must be even higher. Also, the parameters p<sub>6</sub> = 3.5 and v<sub>0</sub> = 0.1 should result in a nearly hexagonal lattice, which should be captured with high γ<sub>6</sub> values. I would expect γ<sub>6</sub> to be in the same range as q<sub>6</sub>.

      Many thanks for pointing this out. There are two different points addressed in this question: The first is if |Γ<sub>p</sub>| is too high. We checked the values, |Γ<sub>p</sub>| = 0.5075 for R/R<sub>cell</sub> = 2, so it is lower than = 0.58. The second question is why γ<sub>p</sub> and q<sub>p</sub> are not in the same value range. You are right that for a perfectly hexagonal lattice both should give the same value, namely = = 1.0. However, even at p<sub>6</sub> = 3.5 and v<sub>0</sub> = 0.1 this is not a perfectly hexagonal lattice anymore and how fast the values of q<sub>6</sub> and |γ<sub>6</sub>| drop if we move away from a perfect hexagon scales differently. As q<sub>p</sub> is stable and only changes slightly for slight changes in the shape it makes sense, that q<sub>p</sub> is still close to 1.0 . We included an image, see below, of one time step in said parameter to showcase that cells do not form a perfect hexagonal lattice anymore.

      Reviewer 3:

      Could the authors show why and how this method could bring new information which were missing so far in the understanding of morphogenesis in vitro and in vivo with the current quantification?

      The introduction provides examples of how orientational order and its topological defects can be linked to morphological changes in tissues. The orientational order emerges from the shape of the cells. Most commonly nematic order has been considered, but more recently also hexatic order and even a nematic-hexactic crossover on larger scales. This suggests a mechanical mechanism for morphogenesis, like a phase transition from hexatic to nematic, which would have consequences on the evolution of shape. We demonstrate that the measures q<sub>2</sub> and q<sub>6</sub> are independent. Furthermore the proposed nematic-hexatic transition is only specific to the use of γ<sub>p</sub> for characterizing the shape and coarse-graining of the associated order. These measures are not robust and therefore should not be used. Results for the robust measures q<sub>p</sub> suggest to consider all p for a coarse-grained theory to model morphological changes in tissues.

      Could authors show quantitative comparisons between available methods with the same sets of data and highlight pros and cons?

      Author response image 1.

      Screenshot from p<sub>6</sub> = 3.5 and v<sub>0</sub> = 0.1

      In addition to what was already done for the simulation data we have added data from Armengol-Collado et al. (2023) and compared the results for q<sub>p</sub> and Q<sub>p<sub> and γ<sub>p</sub> and Γ<sub>p</sub>. The theoretical results and the illustrating example in Fig. 6 already show that there are no pros for γ<sub>p</sub>. Other methods belong to the class of bond-order methods and measure neighbor relations instead of shape. We already comment that these methods are inappropriate to classify shape, see Methods and materials, last sentence and Mickel et al. (2013) for a detailed discussion why these methods are not robust.

      Instead of using phase contrast images, which exhibit curved cell-cell contours, could authors use data with E-cadherin staining instead - as used in many epithelial studies in vitro and in vivo? Could they show both images for wild type and for the E-cadherin KO cell lines with fluorescent readout?

      We are convinced that our results do not depend on the way to visualize the cell contours. Furthermore the images do not provide additional information. To further strengthen the experimental part of the manuscript, we instead analyzed data from Armengol-Collado et al. (2023).

      They confirm our findings.

      The authors acknowledge differences in density between cell lines p. 13 so this calls for new experiments with solid readouts and analysis using comparable experimental conditions.

      Additionally, we analyzed data from Armengol-Collado et al. (2023) which confirm our findings. Our results are now supported by two different modeling approaches and two different experimental settings. Because of redundancy we removed the original experimental data from the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors employed direct RNA sequencing with nanopores, enhanced by 5' end adaptor ligation, to comprehensively interrogate the human transcriptome at singlemolecule and nucleotide resolution. They conclude that cellular stress induces prevalent 5' end RNA decay that is coupled to translation and ribosome occupancy. Contrary to the literature, they found that, unlike typical RNA decay models in normal conditions, stress-induced RNA decay is dependent on XRN1 but does not depend on the removal of the poly(A) tail. The findings presented are interesting but a substantial amount of work is needed to fully establish these paradigm-shifting findings.

      Strengths:

      These are paradigm-shifting observations using cutting-edge technologies.

      Weaknesses:

      The conclusions do not appear to be fully supported by the data presented.

      Our response to the reviewer comments is provided at the end of this document in the section "Recommendations For The Authors"

      Reviewer #2 (Public Review):

      In the manuscript "Full-length direct RNA sequencing uncovers stress-granule dependent RNA decay upon cellular stress", Dar, Malla, and colleagues use direct RNA sequencing on nanopores to characterize the transcriptome after arsenite and oxidative stress. They observe a population of transcripts that are shortened during stress. The authors hypothesize that this shortening is mediated by the 5'-3' exonuclease XRN1, as XRN1 knockdown results in longer transcripts. Interestingly, the authors do not observe a polyA-tail shortening, which is typically thought to precede decapping and XRN1-mediated transcript decay. Finally, the authors use G3BP1 knockout cells to demonstrate that stress granule formation is required for the observed transcript shortening.

      The manuscript contains intriguing findings of interest to the mRNA decay community. That said, it appears that the authors at times overinterpret the data they get from a handful of direct RNA sequencing experiments. To bolster some of the statements additional experiments might be desirable.

      A selection of comments:

      (1) Considering that the authors compare the effects of stress, stress granule formation, and XRN1 loss on transcriptome profiles, it would be desirable to use a single-cell system (and validated in a few more). Most of the direct RNAseq is performed in HeLa cells, but the experiments showing that stress granule formation is required come from U2OS cells, while short RNAseq data showing loss of coverage on mRNA 5'ends is reanalyzed from HEK293 cells. It may be plausible that the same pathways operate in all those cells, but it is not rigorously demonstrated.

      We agree with the reviewer that performing all experiments in a single cell system would be desirable. Presently, our core findings on 5’ RNA shortening are all performed in HeLa cells: the identification of 5’ RNA shortening, the reliance of shortening through XRN1 silencing, suppression of shortening by translation inhibition, and now the relationship between 5’ shortening and deadenylation/decapping through experiments described further below. Our use of other cell lines is primarily to show that 5’ shortening is a general phenomenon, and we have now done this for U20S cells, HEK293 cells, and primary 3T3 cells from mouse. 

      Regarding stress granule formation, we are unfortunately restricted by the lack of available wellcharacterized resources. The DDG3BP1/2 U2OS is a well characterized cell line that has been extensively used for stress granule-related experiments. We have therefore opted to use it and performed experiments to verify both the occurrence of stress-induced RNA shortening as well as the rescue in the absence of stress granules. The reproducibility and breadth of the cell lines used in our analysis makes us confident on the generality of our findings.

      (2) An interesting finding of the manuscript is that polyA tail shortening is not observed prior to transcript shortening. The authors would need to demonstrate that their approach is capable of detecting shortened polyA tails. Using polyA purified RNA to look at the status of polyA tail length may not be ideal (as avidity to oligodT beads may increase with polyA tail length and therefore the authors bias themselves to longer tails anyway). At the very least, the use of positive controls would be desirable; e.g. knockdown of CCR4/NOT.

      We thank the reviewer for their comment. Previous studies, using in vitro transcribed RNA molecules, have shown that direct RNA sequencing can capture and quantify poly(A) tails of varying lengths (Krause et al. 2019). Specifically, a range of 10 to 150 nt has been tested and a high concordance between known and dRNA-Seq determined values was observed. Both tailfindR and nanopolish (used in this work) showed high poly(A) tail estimation accuracy.

      Regardless, we agree with the reviewer that our method depends on poly(A) tail capture and thus may be incomplete for fully quantifying poly(A) length changes. We therefore opted to replace these data and instead follow this and other reviewers’ suggestions and perform experiments following knockdown of CCR4/NOT using cells expressing a catalytically inactive CNOT8 (CNOT8*) dominant negative mutant (Chang et al. 2019). Our new data show that stress-induced 5’ end decay is indeed not dependent on prior removal of the poly(A) tail. Specifically, we find that transcript shortening is still observed upon oxidative stress in cells expressing CNOT8* compared to control cells. We present these new results in Fig. 3 and Sup. Fig 3. 

      (3) The authors use a strategy of ligating an adapter to 5' phosphorylated RNA (presumably the breakdown fragments) to be able to distinguish true mRNA fragments from artifacts of abortive nanopore sequencing. This is a fantastic approach to curating a clean dataset. Unfortunately, the authors don't appear to go through with discarding fragments that are not adapter-ligated (presumably to increase the depth of analysis; they do offer Figure 1e that shows similar changes in transcript length for fragments with adapter, compared to Figure 1d). It would be good to know how many reads in total had the adapter. Furthermore, it would be good to know what percentage of reads without adapters are products of abortive sequencing. What percentage of reads had 5'OH ends (could be answered by ligating a different adapter to kinasetreated transcripts). More read curation would also be desirable when building the metagene analysis - why do the authors include every 3'end of sequenced reads (their RNA purification scheme requires a polyA tail, so non-polyadenylated fragments are recovered in a nonquantitative manner and should be discarded).

      We thank the reviewer for appreciating our approach. The reviewer is correct that we do not discard reads that are not adapter-ligated. As the reviewer correctly mentions this is to increase the sequencing depth. We have found that the ligation efficiency is very low, ~1-2 % of total reads (now in Sup. Table. 1), across all libraries, and so the percentage of REL5-ligated reads does not directly infer the total amount of non-artifactual 5’ ends. Instead, we use these REL5ligated reads as a subset of our data for which we have extremely high confidence in the true 5’end. Our results show that non-ligated reads display the same length distribution as ligated ones, and that the results are reproducible regardless of read selection (e.g. Fig. 1c, e, Sup. Fig. 1k, l, Fig. 3b, c). This strong concordance between REL5-ligated and non-ligated reads suggests that our conclusions on 5’ end shortening are not substantially influenced by abortive sequencing or other artefactual creation of 5’ shortening. We have modified the text to clarify these points and have added plots using only ligated molecules for relevant figures that this was not previously done (Sup. Fig 1l, 3c)

      We agree with the reviewer that non-polyadenylated reads could be discarded from metagene analysis and we have performed this change in the revised version. Our conclusions following removal of non-polyadenylated reads remain unchanged (Sup. Fig. 1g).

      (4) The authors should come to a clear conclusion about what "transcript shortening" means. Is it exonucleolytic shortening from the 5'end? They cannot say much about the 3'ends anyway (see above). Or are we talking about endonucleolytic cuts leaving 5'P that then can be attached by XRN1 (again, what is the ratio of 5'P and 5'OH fragments; also, what is the ratio of shortened to full-length RNA)?

      We thank the reviewer for their suggestion. We have performed additional experiments to investigate the role of deadenylation and decapping by expressing dominant negative forms of the NOT8 deadenylase (NOT8*) and DCP2 decapping (DCP2*) enzyme in HeLa cells. Our results show that neither expression of NOT8* nor DCP2* can inhibit stress-induced transcript shortening following arsenite treatment (Fig. 3e-f). These new data suggest that neither deadenylation nor decapping are required for stress-induced RNA decay. Instead, our data are more compatible with endonucleolytic cleavage as the most likely mechanism for stressinduced RNA decay. We have incorporated these results in the text and present them in Fig. 3 and Sup. Fig. 3.

      (5) The authors should clearly explain how they think the transcript shortening comes about. They claim it does not need polyA shortening, but then do not explain where the XRN1 substrate comes from. Does their effect require decapping? Or endonucleolytic attacks?

      Please also refer to our answer to the previous comment (#4). Collectively, our results from a) the dominant negative expression of NOT8* and DCP2* that show no effect on stress-induced shortening and b) the rescue of transcript length upon translation initiation inhibition, indicate a potential endonucleolytic mechanism as a mediator of stress-induced RNA decay. However, we believe that extensive, further studies currently beyond the scope of this work, will be required to discover the nuclease and to dissect the exact molecular mechanisms that define the 5' ends of mRNAs upon stress-induced decay. We now discuss these points in the discussion.

      (6) XRN1 KD results in lengthened transcripts. That is not surprising as XRN1 is an exonuclease - and XRN1 does not merely rescue arsenite stress-mediated transcript shortening, but results in a dramatic transcript lengthening.

      The reviewer raises an intriguing point. Additional analysis of data has showed that in fact, in unstressed cells, XRN1 KD leads to modestly significant reduction in overall transcript length (Fig. 3b, c). This could possibly be the result of an accumulation of intermediate cleavage products normally expected to be degraded by XRN1 as previously described (Pelechano, Wei, and Steinmetz 2015; Ibrahim et al. 2018).

      Instead, we find that under stress, XRN1 KD shows an almost identical transcript length distribution to unstressed cells and significantly higher than siCTRL stressed cells (Fig. 3b, c). These results indicate that in the absence of XRN1, stress-induced decay is largely abolished. As the reviewer correctly points out, this seems to affect the majority of RNAs which we believe is evidence of the general lack of specificity in the mechanism. Nevertheless, we find that transcripts that are the primary substrates to stress-induced shortening are substantially more lengthened than all other transcripts (Fig. 3e). This indicates that transcripts primarily affected by stress-induced decay are also lengthened the most in the absence of XRN1 and at an even higher level than expected by general XRN1 KD effects.

      Reviewer #3 (Public Review):

      The work by Dar et al. examines RNA metabolism under cellular stress, focusing on stressgranule-dependent RNA decay. It employs direct RNA sequencing with a Nanopore-based method, revealing that cellular stress induces prevalent 5' end RNA decay that is coupled to translation and ribosome occupancy but is independent of the shortening of the poly(A) tail. This decay, however, is dependent on XRN1 and enriched in the stress granule transcriptome. Notably, inhibiting stress granule formation in G3BP1/2-null cells restores the RNA length to the same level as wild-type. It suppresses stress-induced decay, identifying RNA decay as a critical determinant of RNA metabolism during cellular stress and highlighting its dependence on stress-granule formation.

      This is an exciting and novel discovery. I am not an expert in sequencing technologies or sequencing data analysis, so I will limit my comments purely to biology and not technical points. The PI is a leader in applying innovative sequencing methods to studying mRNA decay.

      One aspect that appeared overlooked is that poly(A) tail shortening per se does lead to decapping. It is shortening below a certain threshold of 8-10 As that triggers decapping. Therefore, I found the conclusion that poly(A) tail shortening is not required for stress-induced decay to be somewhat premature. For a robust test of this hypothesis, the authors should consider performing their analysis in conditions where CNOT7/8 is knocked down with siRNA.

      We agree with the reviewer. We have now performed experiments in cells expressing a well characterized catalytically inactive dominant negative NOT8 isoform (NOT8*) (Chang et al.

      2019). Our new data show that stress-induced decay still occurs in cells expressing NOT8*.

      These results confirm our findings that stress-induced decay does not require deadenylation. We present these new results in Fig. 3 and Sup. Fig. 3. 

      Similarly, as XRN1 requires decapping to take place, it necessitates the experiment where a dominant-negative DCP2 mutant is over-expressed.

      We agree with the reviewer and have performed this experiment as requested. Expression of a dominant negative DCP2 (DCP2*) isoform (Loh, Jonas, and Izaurralde 2013) in HeLa cells showed that decapping is also not required for stress-induced decay. We present these new results in Fig. 3 and Sup. Fig. 3.

      Are G3BP1/2 stress granules required for stress-induced decay or simply sites for storage? This part seems unclear. A very worthwhile test here would be to assess in XRN1-null background.

      We thank the reviewer for their comment. Our data show that stress-induced decay is not observed in DDG3BP1/2 U2OS cells, unable to form stress granules (Fig. 6). This result suggests that G3BP1/2 SGs are either a) required for 5’ RNA shortening or b) preserve partially fragmented RNAs that would otherwise be rapidly degraded. We find the second option unlikely for two reasons. First, even if the fragments were rapidly degraded, we would still expect to find evidence of their presence in our data. However, Fig. 6f shows that the length distribution of DDG3BP1/2 U2OS cells, with and without arsenite, are almost identical, thus arguing against the presence of such a pool of rapidly degrading RNAs. Second, if these RNAs were protected by SGs, then they would be expected to be downregulated in the absence of SGs in DDG3BP1/2 U2OS cells treated with arsenite. Our results contradict this hypothesis as no association is found between the level of downregulation in arsenite-treated DDG3BP1/2 U2OS cells and the observed stress-induced fragmentation in WT. Collectively our results point towards G3BP1/2 stress granules being required for stress-induced decay. We have expanded on these points in the manuscript to clarify.

      Finally, the authors speculate that the mechanism of stress-induced decay may have evolved to relieve translational load during stress. But why degrade the 5' end when removing the cap may be sufficient? This returns to the question of assessing the role of decapping in this mechanism.

      The reviewer raises a very interesting point. Our new results, following expression of dominant negative DCP2, show that stress-induced decay does not require decapping. It is therefore plausible that a stress-induced co-translational mechanism cleaves mRNAs endonucleolyticaly to reduce the translational load. Such a mechanism would have many functional benefits as it would acutely reduce the translational load, degrade non-essential RNAs, preserve energy and release ribosomes for translation of the stress response program. We have expanded the discussion to mention these points.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      As you can see from the comments, although the reviewers appreciate the novelty of your findings, there was a consensus opinion from all reviewers that the authors overinterpreted their data, since they only have one assay and did not fully analyze it, as laid out in one of the reviewer's critiques. Some orthogonal validation of the "groundbreaking" claims is necessary. Examination of the effects of upstream events in 5'-to-3' decay, namely deadenylation, and decapping, would be necessary for a better understanding of the phenomena the authors describe. Many tools and approaches for studying this are described well in the literature (CNOT7-KD, dominant negative DCP2 E148Q, XRN1-null cell lines), so it is well within the authors' reach. Overall, while some of the evidence presented is novel and solid, for some of the claims there is only incomplete evidence.

      We thank the reviewers and the editor for their comments and suggestions. We have performed several additional experiments to further support our conclusions. We have notably investigated the role of deadenylation and decapping in the stress-induced decay by expressing dominant negative NOT8 and DCP2, respectively, as suggested. Our results show that neither deadenylation nor decapping is necessary for stress-induced transcript shortening, suggesting an endonucleolytic event. We believe that these additional experiments strengthen the main conclusions of our work. 

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) The experiments were conducted in two unrelated cell lines, HeLa and U2OS. The authors should determine if the 5'end RNA decay in response to stress is also observed in normal human cells such as normal human diploid fibroblasts. Furthermore, it would be important to know if this mechanism is conserved between human and mouse cells. This can be tested in mouse embryonic fibroblasts.

      We thank the reviewer for their suggestion. We have now also performed experiments in the mouse embryonic fibroblast NIH 3T3 cell line. Our new results confirm that stress-induced 5’ end RNA decay is also observed in this primary cell line and is conserved between human and mouse (Sup. Fig. 1k, I). 

      (2) The authors state that they monitored cell viability up to 24 hours after Arsenite treatment, but the data is shown up to 240 min (Suppl. 1a). Also, the Y-axis label of this Figure is "Active cells (%)". This should be changed to "Live cells (%)" if this is what they are referring to.

      We thank the reviewer for identifying this mistake. Cell viability was monitored up to 4 hours after arsenite treatment. We have corrected the text and modified the figure according to the reviewer’s suggestion.

      (3) Based on direct Nanopore-based RNA-seq the authors surprisingly found that RNAs in oxidative stress were globally shorter than unstressed cells. Since Nanopore-based RNA-seq will not detect RNAs that lack a poly A-tail, are they not missing out on RNAs that have already started getting degraded due to the loss of a poly A-tail? Also, I am not sure if they used a spikein control which would be critical to claim global changes in RNA expression.

      We agree with the reviewer that our strategy does not capture RNA molecules without a poly(A) tail. Nevertheless, our data do identify shortening upon stress at the 5’ end of RNAs that include poly(A) tails. We considered this as direct evidence that decay at the 5’ end does not require prior removal of the poly(A) tail. Otherwise, these molecules would not have been captured and observed. Indeed, our newly added data from cells expressing a well characterized catalytically inactive dominant negative NOT8 isoform (Chang et al. 2019) show that stress-induced decay occurs even upon silencing of the CCR4-NOT deadenylation complex. We present these results in Fig. 3 and Sup. Fig 3.

      We would like to clarify that in our results we did not use a spike-in control and thus refrain from claiming global changes in RNA expression. Instead, we compare relative ratios of groups of molecules within libraries that are internally normalized, we perform correlative comparisons that are invariant to normalization and we perform differential gene expression using established normalization schemes such as DESeq2 (Love, Huber, and Anders 2014). 

      (4) Many graphs are confusing and inconsistent. For example, samples for Nanopore RNA-seq were prepared in triplicates. Biological or technical? The schematic in Figure 1a shows ISRIB but it appears from Figure 4 onwards. It is missing in the Figure 1 results and the Figure legend. The X-axis labels of many graphs are confusing. For example, Supplementary Figure 1d, 1e, 1g and 1h. It says transcript length but are these nucleotides? P-values are missing from many of these graphs. For some graphs, the authors compared Unstressed vs Arsenite (Figure 1), but in other panels they state No Ars vs 0.5 mM Ars (Fig. 3a) or Control vs Ars (Figure 5c). Likewise, in Figure 1b, Expression change (log2) is unstressed vs Arsenite or Arsenite vs unstressed?

      We thank the reviewer identifying these inconsistencies in the presentation of our results. The replicates for nanopore RNA-seq experiments were biological. We have now clarified this point in the text. Furthermore, we have removed “ISRIB” from Fig. 1a to avoid any confusion. We have also made our labelling across all figures more consistent using ‘unstressed’ for NO arsenite treatment vs “arsenite” or ‘+ Ars’ for arsenite treatment. 

      (5) The authors transfected cells with siCTRL or siXRN1 using electroporation and treated the cells 72 hours after transfection. Since XRN1 is an essential gene, it would be important to determine the viability of cells 72 hours after transfection. Along these lines, in Figure 3b, it would be important to determine the effect of XRN1 knockdown in unstressed cells. Currently, there are only 3 comparisons in Figure 3b - unstressed, siCTRL + Ars and siXRN1 + Ars, and this is insufficient to conclude the effects of XRN1 knockdown in the presence of Arsenite.

      We thank the reviewer for their suggestion. We have updated Fig. 3b and the text to show the requested conditions: siCTRL and siXRN1 with and without arsenite. While XRN2 is an essential gene for many organisms, XRN1 is not essential in mammalian cells and no increased cell death has been reported for XRN1-KO or –KD cells (Brothers et al. 2023). We have also tested different concentration (up to 40 nM) of siRNA and monitored the cells up to five days after transfection without observing any cell toxicity, as previously reported.

      (6) More broadly, the whole study is somewhat descriptive. The biological effect of 5'end mRNA shortening on gene expression is unclear. There is no data indicating how these changes in RNA lengths impact protein expression. Global quantitative proteomics would be critical to determine this.

      We thank the reviewer for their suggestion. To address this concern we have performed additional experiments using cells expressing catalytically inactive forms of NOT8 (Chang et al. 2019) and DCP2 (Loh, Jonas, and Izaurralde 2013) to inhibit deadenylation and decapping.

      These experiments provide additional mechanistic details for 5’ shortening and suggest endonucleolytic cleavage as a critical step (Fig. 3 and Sup. Fig. 3). We agree that it would be interesting to study the fate of these shortened transcripts notably regarding translation. However, given the complexity of the expected proteome changes also following global translation arrest under stress (Harding et al., 2003; Pakos-Zebrucka et al., 2016), we think that this work is beyond the scope of this manuscript and will be the subject of future studies. 

      Minor comments:

      (1) Some of the affected RNAs can be validated in HeLa and other cell lines.

      We thank the reviewer for their suggestion. We have performed RT-qPCR on 3 different mRNAs that present 5’ shortening upon oxidative stress using different primers located along the mRNA. We hypothesized that the closer the primer set is located to the 5’ end, the less abundant the corresponding region would be for arsenite-treated compared to untreated cells. Our results show indeed that the measured level of these mRNAs depends on the location of the primer sets used for the qPCR, the closer to the 5’end it is, the less abundant the mRNA is upon oxidative stress compared to control cells. We present these data as well as a schematic representing the positions of the primers in Sup. Fig. 2d. 

      (2) The authors should check whether XRN1 also co-localizes in SGs.

      We thank the reviewer for their suggestion. We have performed immunofluorescence on U2OS and HeLa upon oxidative stress and did not observe a co-localization of XRN1 with TIA-1, a marker of stress granules (see below). These results are consistent with (Kedersha et al. 2005) that have shown that XRN1 mainly co-localizes to processing bodies and are very weakly detectable in SGs in DU145 cells. We think that this result is beyond the scope of this study and thus decided to only include it for the reviewers.

      Author response image 1.

      Representative immunofluorescence merged image of HeLa (left panel) and U2OS (right panel) cells treated with sodium arsenite and labelled with anti-TIA1 (red), anti-XRN1 (green) antibodies and DAPI (blue). Scale bar 50 µm.

      (3) XRN1 should be knocked down with more than one siRNA.

      We thank the reviewer for this suggestion. Our results show that our XRN1 KD specifically rescues the length of the most shortened mRNAs (Fig. 3e). This is a highly specific effect that makes us confident it is not mediated by non-specific siRNA binding; thus, we do not consider it necessary to repeat the experiment.

      (4) There are typos in the text regarding Figure 6d, e, and f. Also, Supplementary Figure 4a.

      We thank the reviewer for identifying these mistakes. We have corrected the typos. 

      Reviewer #3 (Recommendations For The Authors):

      The authors should consider testing their hypotheses by arresting the decay pathway using the approaches I mentioned previously. As it stands, some conclusions are somewhat speculative.

      We have replied to the reviewer comments in the public review section. 

      References:

      • Brothers, William R., Farah Ali, Sam Kajjo, and Marc R. Fabian. 2023. “The EDC4-XRN1 Interaction Controls P-Body Dynamics to Link MRNA Decapping with Decay.” The EMBO Journal, August, e113933.

      • Chang, Chung-Te, Sowndarya Muthukumar, Ramona Weber, Yevgen Levdansky, Ying Chen, Dipankar Bhandari, Catia Igreja, Lara Wohlbold, Eugene Valkov, and Elisa Izaurralde. 2019. “A Low-Complexity Region in Human XRN1 Directly Recruits Deadenylation and Decapping Factors in 5’-3’ Messenger RNA Decay.” Nucleic Acids Research 47 (17): 9282–95.

      • Harding, Heather P., Yuhong Zhang, Huiquing Zeng, Isabel Novoa, Phoebe D. Lu, Marcella Calfon, Navid Sadri, et al. 2003. “An Integrated Stress Response Regulates Amino Acid Metabolism and Resistance to Oxidative Stress.” Molecular Cell 11 (3): 619–33.

      • Ibrahim, Fadia, Manolis Maragkakis, Panagiotis Alexiou, and Zissimos Mourelatos. 2018. “Ribothrypsis, a Novel Process of Canonical MRNA Decay, Mediates Ribosome-Phased MRNA Endonucleolysis.” Nature Structural & Molecular Biology 25 (4): 302–10.

      • Kedersha, Nancy, Georg Stoecklin, Maranatha Ayodele, Patrick Yacono, Jens Lykke-Andersen, Marvin J. Fritzler, Donalyn Scheuner, Randal J. Kaufman, David E. Golan, and Paul Anderson. 2005. “Stress Granules and Processing Bodies Are Dynamically Linked Sites of MRNP Remodeling.” The Journal of Cell Biology 169 (6): 871–84.

      • Krause, Maximilian, Adnan M. Niazi, Kornel Labun, Yamila N. Torres Cleuren, Florian S. Müller, and Eivind Valen. 2019. “Tailfindr: Alignment-Free Poly(A) Length Measurement for Oxford Nanopore RNA and DNA Sequencing.” RNA  25 (10): 1229–41.

      • Loh, Belinda, Stefanie Jonas, and Elisa Izaurralde. 2013. “The SMG5-SMG7 Heterodimer Directly Recruits the CCR4-NOT Deadenylase Complex to MRNAs Containing Nonsense Codons via Interaction with POP2.” Genes & Development 27 (19): 2125–38.

      • Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15 (12): 550.

      • Pakos-Zebrucka, Karolina, Izabela Koryga, Katarzyna Mnich, Mila Ljujic, Afshin Samali, and Adrienne M. Gorman. 2016. “The Integrated Stress Response.” EMBO Reports 17 (10): 1374–95.

      • Pelechano, Vicent, Wu Wei, and Lars M. Steinmetz. 2015. “Widespread Co-Translational RNA Decay Reveals Ribosome Dynamics.” Cell 161 (6): 1400–1412.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      The three experiments are well designed and the various conditions are well controlled. The rationale of the study is clear, and the manuscript is pleasant to read. The analysis choices are easy to follow, and mostly appropriate.

      We are grateful to the reviewer’s thoughtful comments.

      Weaknesses:

      I only have one potential worry. The analysis for gait tracking (1 Hz) in Experiment 2 (Figures 3a/b) starts by computing a congruency effect (A/V stimulation congruent (same frequency) versus A/V incongruent (V at 1 Hz, A at either 0.6 or 1.4 Hz), separately for the Upright and Inverted conditions. Then, this congruency effect is contrasted between Upright and Inverted, in essence computing an interaction score (Congruent/Incongruent X Upright/Inverted). Then, the channels in which this interaction score is significant (by cluster-based permutation test; Figure 3a) are subselected for further analysis. This further analysis is shown in Figure 3b and described in lines 195-202. Critically, the further analysis exactly mirrors the selection criteria, i.e. it is aimed at testing the effect of Congruent/Incongruent and Upright/Inverted. This is colloquially known as "double dipping", the same contrast is used for selection (of channels, in this case) as for later statistical testing. This should be avoided, since in this case even random noise might result in a significant effect. To strengthen the evidence, either the authors could use a selection contrast that is orthogonal to the subsequent statistical test, or they could skip either the preselection step or the subsequent test. (It could be argued that the test in Figure 3b and related text is not needed to make the point - that same point is already made by the cluster-based permutation test.)

      Thanks for the helpful suggestions. In Experiment 2, to investigate whether the multisensory integration effect was specialized for biological motion perception, we contrasted the congruency effect between the upright and inverted conditions to search for clusters showing a significant interaction effect. We performed further analyses based on neural responses from this cluster to examine whether the congruency effect was significant in the upright and the inverted conditions, respectively, following the logic of post hoc comparisons after identifying an interaction effect. However, we agree with the reviewer that comparing the congruency effects between the upright and inverted conditions again based on data from this cluster was redundant and resulted in doubledipping. Therefore, we have removed this comparison from the main text and optimized the way to present our results in the revised Fig. 3).

      Related to the above: the test for the three-way interaction (lines 211-216) is reported as "marginally significant", with a p-value of 0.087. This is not very strong evidence.

      As shown in Fig.3b & e, the magnitude of amplitude differs between the gaitcycle frequency (mean = 0.008, SD = 0.038) and the step-cycle frequency (mean = 0.052; SD =0.056), which might influence the statistical results of the interaction effect. To reduce such influence, we converted the amplitude data at each frequency condition into Z-scores, separately. The repeated-measures ANOVA analysis on these normalized amplitude data revealed a significant three-way interaction (F (1,23) = 7.501, p = 0.012, ƞ<sub>p</sub><sup>2</sup> \= 0.246). We have updated the results in the revised manuscript (lines 218-225).

      Reviewer #1 (Recommendations For The Authors):

      -  Which variable caused one data point to be classified as outlier? (line 221).

      The outlier is a participant whose audiovisual congruency effect (Upright – Inverted) in neural responses at the frequency of interest exceeds 3 SD from the group mean. It is marked by a red diamond in Author response 2. Before removing the data, the correlation between the AQ score and the congruency effect is r \= -0.396, p \= 0.055. For comparison, the results after removing the outlier are shown in Fig. 3c of the revised manuscript. We have added more information about the variable causing the outlier in the revised manuscript (lines 231-232).

      Author response image 1.

      The correlation between AQ score and congruency effect

      -  The authors cite Maris & Oostenveld (2007) in line 415 as the main reference for the FieldTrip toolbox, but the correct reference here is different, see https://www.fieldtriptoolbox.org/faq/how_should_i_refer_to_fieldtrip_in_my_p ublication/

      Thank you for pointing out this issue. Citation corrected.

      -  The authors could consider giving some more background on the additive vs superadditive distinction in the Introduction, which may increase the impact; as it stands the reader might not know why this is particularly interesting. Summarize some of the takeaways of the Stevenson et al. (2014) review in this respect.

      Thanks for the suggestion and we have added the following relevant information in the Introduction (lines 80-90):

      “Moreover, we adopted an additive model to classify multisensory integration based on the AV vs A+V comparison. This model assumes independence between inputs from each sensory modality and distinguishes among sub-additive (AV < A+V), additive (AV = A+V), and super-additive (AV > A+V) response modes (see a review by Stevenson et al., 2014). The additive mode represents a linear combination between two modalities. In contrast, the super-additive and subadditive modes indicate non-linear interaction processing, either with potentiated neural activation to facilitate the perception or detection of nearthreshold signals (super-additive) or a deactivation mechanism to minimize the processing of redundant information cross-modally (sub-additive) (Laurienti et al., 2005; Metzger et al., 2020; Stanford et al., 2005; Wright et al., 2003).”

      Reviewer #2 (Public Review):

      Strengths:

      The manuscript is well-written, with a concise and clear writing style. The visual presentation is largely clear. The study involves multiple experiments with different participant groups. Each experiment involves specific considered changes to the experimental paradigm that both replicate the previous experiment's finding yet extend it in a relevant manner.

      We thank the reviewer for the valuable feedback.

      Weaknesses:

      The manuscript interprets the neural findings using mechanistic and cognitive claims that are not justified by the presented analyses and results.

      First, entrainment and cortical tracking are both invoked in this manuscript, sometimes interchangeably so, but it is becoming the standard of the field to recognize their separate evidential requirements. Namely, step and gate cycles are striking perceptual or cognitive events that are expected to produce event-related potentials (ERPs). The regular presentation of these events in the paradigm will naturally evoke a series of ERPs that leave a trace in the power spectrum at stimulation rates even if no oscillations are at play. Thus, the findings should not be interpreted from an entrainment framework except if it is contextualized as speculation, or if additional analyses or experiments are carried out to support the assumption that oscillations are present. Even if oscillations are shown to be present, it is then a further question whether the oscillations are causally relevant toward the integration of biological motion and for the orchestration of cognitive processes.

      Second, if only a cortical tracking account is adopted, it is not clear why the demonstration of supra-additivity in spectral amplitude is cognitively or behaviorally relevant. Namely, the fact that frequency-specific neural responses to the [audio & visual] condition are stronger than those to [audio] and [visual] combined does not mean this has implications for behavioral performance. While the correlation to autism traits could suggest some relation to behavior and is interesting in its own right, this correlation is a highly indirect way of assessing behavioral relevance. It would be helpful to test the relevance of supra-additive cortical tracking on a behavioral task directly related to the processing of biological motion to justify the claim that inputs are being integrated with the service of behavior. Under either framework, cortical tracking or entrainment, the causal relevance of neural findings toward cognition is lacking.

      Overall, I believe this study finds neural correlates of biological motion, and it is possible that such neural correlates relate to behaviorally relevant neural mechanisms, but based on the current task and associated analyses this has not been shown.

      Thanks for raising the important concerns regarding the interpretation of our results within the entrainment or the cortical tracking frame. A strict neural entrainment account emphasizes the alignment of endogenous neural oscillations with external rhythms, rather than a mere regular repetition of stimulus-evoked responses. However, it is challenging to fully dissociate these components, given that rhythmic stimulation can shape intrinsic neural oscillations, resulting in an intricate interplay between endogenous neural oscillations and stimulus-evoked responses (Duecker et al., 2024; Herrmann et al., 2016; Hosseinian et al., 2021). Therefore, some research, including the current study, use the term “entrainment” to refer to the alignment of brain activity to rhythmic stimulation in a broader context, without isolating the intrinsic oscillations and evoked responses (e.g., Ding et al., 2016; Nozaradan et al., 2012; Obleser & Kayser, 2019). Nevertheless, we agree with the reviewer that since the current results did not examine or provide direct evidence for endogenous oscillations, it is better to contextualize the oscillation view as speculations. Hence, we have replaced most of the expressions about “entrainment” with a more general term “tracking” in the revised manuscript (as well as in the title of the manuscript). We only briefly mentioned the entrainment account in the Discussion to facilitate comparison with the literature (lines 307-312).

      Regarding the relevance between neural findings and cognition or behavioral performance, the first supporting evidence comes from the inversion effect in Experiment 2. For the neural responses at gait-cycle frequency, we observed a significantly enhanced audiovisual congruency effect in the upright condition compared with the inverted condition. Inversion disrupts the distinctive kinematic features of biological motion (e.g., gravity-compatible ballistic movements) and significantly impairs biological motion processing, but it does not change the basic visual properties of the stimuli, including the rhythmic signals generated by low-level motion cues. Therefore, the inversion effect has long been regarded as an indicator of the specificity of biological motion processing in numerous behavioral and neuroimaging studies (Bardi et al., 2014; Grossman & Blake, 2001; Shen, Lu, Yuan, et al., 2023; Simion et al., 2008; Troje & Westhoff, 2006; Vallortigara & Regolin, 2006; Wang et al., 2014; Wang & Jiang, 2012; Wang et al., 2022). Here, our finding of the cortical tracking of higher-order rhythmic structures (gait cycles) present in the upright but not in the inverted condition suggests that this cortical tracking effect can not be explained by ERPs evoked by regular onsets of rhythmic events. Rather, it is closely linked with the specialized cognitive processing of biological motion. Furthermore, we found that the BM-specific cortical tracking effect at gait-cycle frequency (rather than the non-selective tracking effect at step-cycle frequency) correlates with observers’ autistic traits, indicating its functional relevance to social cognition. These findings convergingly suggest that the cortical tracking effect that we currently observed engages cognitively relevant neural mechanisms. In addition, our recent behavioral study showed that listening to frequency-congruent footstep sounds, compared with incongruent sounds, enhanced the visual search for human walkers but not for non-biological motion stimuli containing the same rhythmic signals (Shen, Lu, Wang, et al., 2023). These results suggest that audiovisual correspondence specifically enhances the perceptual and attentional processing of biological motion. Future research could examine whether the cortical tracking of rhythmic structures plays a functional role in this process, which may shed more light on the behavioral relevance of the cortical tracking effect to biological motion perception. We have incorporated the above information into the Discussion (lines 268-293).

      Reviewer #2 (Recommendations For The Authors):

      In Figure 1c, it could be helpful to add the word "static" in the illustration for the auditory condition so that readers understand without reading the subtext that it is a static image without biological motion.

      Suggestion taken.

      In the Discussion, I believe it is important to justify an oscillation and entrainment account, or if it cannot be justified based on the current results and analyses (which is my opinion), it could be helpful to explicitly frame it as speculation.

      We agree with the reviewer. For more clarification, please refer to our response to the public review.

      L335, I did not understand this sentence - a reformulation would be helpful.

      The point-light stimuli were created by capturing the motion of a walking actor (Vanrie & Verfaillie, 2004). The global motion of the walking sequences was eliminated so that the point-light walker looks like walking on a treadmill without translational motion. We have reformulated the sentence as follows: “The point-light walker was presented at the center of the screen without translational motion.”

      The results in Figure 2a and 2d are derived by performing a t-test between the amplitude at the frequency of gait and step cycles and zero. Comparison against amplitude of zero is too liberal; the possibility for a Type-I error is inflated because even EEG data with only noise will not have amplitudes of zero at all frequencies. A better baseline (H0) is either the 1/frequency trend in the power spectrum derived using methods like FOOOF (https://fooof-tools.github.io/fooof/) or by performing non-parametric shuffling based methods (https://doi.org/10.1016/j.jneumeth.2007.03.024).

      In our data analysis, instead of performing the t-test between raw amplitude with zero, we compared the normalized amplitude at each frequency bin (by subtracting the average amplitude measured at the neighboring frequency bins from the original amplitude data) against zero. Such analysis is equal to contrasting the raw amplitude to its neighboring frequency bins, allowing us to test whether the neural response in each frequency bin showed a significant enhancement compared with its neighbors. The multiple comparisons on each frequency bin were controlled by false discovery rate (FDR) correction, reducing the Type-I error. Such analysis procedures help reduce (though not totally remove) the influence of the 1/f trend and have been widely used in this field (Cirelli et al., 2016; Henry & Obleser, 2012; Lenc et al., 2018; Nozaradan et al., 2012; Peter et al., 2023).

      To further verify our findings, we adopted the reviewer’s suggestion and created a baseline by performing a non-parametric shuffling-based analysis. More specifically, to establish the statistical significance of amplitude peaks, we carried out a surrogate analysis on each condition. For each participant, a single control surrogate dataset was derived from their actual dataset by jittering the onset of each step-cycle relative to the actual original onset by a randomly selected integer value ranging between − 490–490 ms. This procedure removed the consistent relationship between the EEG signal and the stimuli while preserving each epoch’s general timing within the exposure period. Then, epochs were extracted based on surrogate stimuli onset, and amplitude was computed across frequencies through FFT under a null model of non-entrainment (Moreau et al., 2022). This entire procedure was performed 100 times, producing a surrogate amplitude distribution of 100 group-averaged values for each condition. If the observed amplitude values at the frequency of interest exceeded the value corresponding to the 95th percentile of the surrogate distribution (p < .05) within a given condition (e.g., AV), the amplitude peak was considered significant (Batterink, 2020). As shown in Author response image 2, the statistical results from these analyses are similar to those reported in the manuscript, confirming the significant amplitude peaks at the frequencies of interest.

      Author response image 2.

      Non-parametric analysis for spectral peak. The dotted lines represent the random data based on shuffling analysis. The solid lines represent the observed data in measured EEG signals. All conditions induced significant peaks at step-cycle frequency and its harmonic, while only the AV condition induced a significant peak at gait-cycle frequency.

      Reviewer #3 (Public Review):

      Strengths:

      The main strengths of the paper relate to the conceptualization of BM and the way it is operationalized in the experimental design and analyses. The use of entrainment, and the tracking of different, nested aspects of BM result in seemingly clean data that demonstrate the basic pattern. The first experiments essentially provide the basic utility of the methodological innovation and the second experiment further hones in on the relevant interpretation of the findings by the inclusion of better control stimuli sets.

      Another strength of the work is that it includes at a conceptual level two replications.

      We appreciate the reviewer for the comprehensive review and positive comments.

      Weaknesses:

      The statistical analysis is misleading and inadequate at times. The inclusion of the autism trait is not foreshadowed and adequately motivated and is likely underpowered. Finally, a broader discussion over other nested frequencies that might reside in the point-light walker stimuli would also be important to fully interpret the different peaks in the spectra.

      (1) Regarding the nested frequency peaks in the spectra, we did observe multiple significant amplitude peaks at 1f (1/0.83 Hz), 2f (2/1.67 Hz), and 4f (4/3.33 Hz) relative to the gait-cycle frequency (Fig. 2 a&d). To further test the functional roles of the neural activity at different frequencies, we analyzed the audiovisual integration modes at each frequency. Note that we collapsed the data from Experiments 1a & 1b in the analysis as they yielded similar results. Overall, results show a similar additive audiovisual integration mode at 2f and 4f and a super-additive integration mode only at 1f (Figure S1), suggesting that the cortical tracking effects at 2f and 4f may be functionally linked but independent of that at 1f. We have reported the detailed results in the Supplementary Information.

      (2) For the reviewer’s other concerns about statistical analysis and autism traits, please refer to our responses below to the Recommendations for the authors.

      Reviewer #3 (Recommendations For The Authors):

      The description of the analyses performed for experiment 2 comes across as double dipping. Congruency effects for BM and non-BM motion (inverted) were compared using cluster-based statistics. Then identified clusters informed an averaging of signals which then were subjected to a paired comparison. At this point, it is no surprise that these paired comparisons are highly significant seeing that the channels were selected based on a cluster analysis of the same exact contrast. This approach should be avoided.

      In the analysis of the repeated measures ANOVA reporting a trend as marginally significant is misleading. Reporting the statistical results whilst indicating that those do not reach significance is the appropriate way to communicate this finding. Other statistics can be used in order to provide the likelihood of those findings supporting H1 or H0 if the authors would like to state something more precise (Bayesian).

      Thanks for the comments. We have addressed these two points in our response to the public review of Reviewer #1.

      The authors perform a correlation along "autistic trait" scores in an individual differences approach. Individual differences are typically investigated in larger samples (>n=40). In addition, the range of AQ scores seems limited to mostly average or lower-than-average AQs (barring a couple). These points make the conclusions on the possible role of BM in the autistic phenotype very tentative. I would recommend acknowledging this.

      An alternative analysis approach that might better suit the smaller sample size is a comparison between high and low AQ participants, defined based on a median split.

      Many thanks for the suggestion. We agree with the reviewer that the sample size (n = 24) in the current study is not large for exploring the correlation between BM and autistic traits. The narrow range of AQ scores was due to the fact that all participants were non-clinical populations and we did not pre-select participants by AQ scores. To further confirm our findings, we adopted your suggestion to compare the BM-specific cortical tracking effect (i.e., audiovisual congruency effect (Upright - Inverted)) between high and low AQ participants split by the median AQ score (20) of this sample. Similar to correlation analysis, one outlier, whose audiovisual congruency effect (Upright – Inverted) in neural responses at 1 Hz exceeds 3 SD from the group mean, was removed from the following analysis. As shown in Figure S3, at 1 Hz, participants with low AQ showed a greater cortical tracking effect compared with high AQ participants (t (21) = 2.127, p \= 0.045). At 2 Hz, low and high AQ participants showed comparable neural responses (t (22) = 0.946, p \= 0.354). These results are in line with the correlation analysis, providing further support to the functional relevance between social cognition and cortical tracking of biological motion as well as its dissociation at the two temporal scales. We have added these results to the main text (lines 238-244) and the supplementary information.

      Writing

      The narrative could be better unfolded and studies better motivated. The transition from basic science research on BM to possibly delineating a mechanistic understanding of autism was a surprise at the end of the intro. Once the authors consider the suggestions and comments above it would be good to have this detail and motivation more obviously foreshadowed in the text.

      Thanks for the great suggestion and we have provided an introduction about how audiovisual BM processing links with social cognition and ASD in the first paragraph of the revised manuscript (lines 46-56). In particular, integrating multisensory BM cues is foundational for perceiving and attending to other people and developing further social interaction. However, such ability is usually compromised in people with social deficits, such as individuals with autism spectrum disorder (ASD) (Feldman et al., 2018), and even in non-clinical populations with high autistic traits (Ujiie et al., 2015). These behavioral findings underline the close relationship between multisensory BM processing and one’s social cognitive capability, motivating us to further explore this issue at the neural level in the current study. We have also modified the relevant content in the last paragraph of the Introduction (lines 100-108), briefly mentioning the methods that we used to investigate this issue.

      The use of terminology related to neural oscillations which are entraining to the BM seems to suggest that the rhythmic tracking inevitably stems from the shaping of existing intrinsic dynamics of the brain. I am not sure this is necessarily the case. I would therefore adopt a more concrete jargon for the description of the entrainment seen in this study. If a discussion over internal dynamics shaped by external stimuli should be invoked, it should be done explicitly with appropriate references (but in my opinion, it isn't quite required).

      Please refer to our response to a similar point raised in the public review of Reviewer #2.

      References

      Bardi, L., Regolin, L., & Simion, F. (2014). The First Time Ever I Saw Your Feet: Inversion Effect in Newborns’ Sensitivity to Biological Motion. Developmental Psychology, 50. https://doi.org/10.1037/a0034678

      Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., & Clubley, E. (2001). The autism-spectrum quotient (AQ): Evidence from Asperger syndrome/highfunctioning autism, males and females, scientists and mathematicians. Journal of Autism and Developmental Disorders, 31(1), 5–17. https://doi.org/10.1023/a:1005653411471

      Batterink, L. (2020). Syllables in Sync Form a Link: Neural Phase-locking Reflects Word Knowledge during Language Learning. Journal of Cognitive Neuroscience, 32(9), 1735–1748. https://doi.org/10.1162/jocn_a_01581

      Cirelli, L. K., Spinelli, C., Nozaradan, S., & Trainor, L. J. (2016). Measuring Neural Entrainment to Beat and Meter in Infants: Effects of Music Background. Frontiers in Neuroscience, 10. https://doi.org/10.3389/fnins.2016.00229

      Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164. https://doi.org/10.1038/nn.4186

      Duecker, K., Doelling, K. B., Breska, A., Coffey, E. B. J., Sivarao, D. V., & Zoefel, B. (2024). Challenges and approaches in the study of neural entrainment. Journal of Neuroscience, 44(40). https://doi.org/10.1523/JNEUROSCI.1234-24.2024

      Falck-Ytter, T., Nyström, P., Gredebäck, G., Gliga, T., Bölte, S., & the EASE team. (2018). Reduced orienting to audiovisual synchrony in infancy predicts autism diagnosis at 3 years of age. Journal of Child Psychology and Psychiatry, 59(8), 872–880. https://doi.org/10.1111/jcpp.12863

      Feldman, J. I., Dunham, K., Cassidy, M., Wallace, M. T., Liu, Y., & Woynaroski, T. G. (2018). Audiovisual multisensory integration in individuals with autism spectrum disorder: A systematic review and meta-analysis. Neuroscience & Biobehavioral Reviews, 95, 220–234. https://doi.org/10.1016/j.neubiorev.2018.09.020

      Grossman, E. D., & Blake, R. (2001). Brain activity evoked by inverted and imagined biological motion. Vision Research, 41(10), 1475–1482. https://doi.org/10.1016/S0042-6989(00)00317-5

      Henry, M. J., & Obleser, J. (2012). Frequency modulation entrains slow neural oscillations and optimizes human listening behavior. Proceedings of the National Academy of Sciences, 109(49), 20095–20100. https://doi.org/10.1073/pnas.1213390109

      Herrmann, C. S., Murray, M. M., Ionta, S., Hutt, A., & Lefebvre, J. (2016). Shaping Intrinsic Neural Oscillations with Periodic Stimulation. Journal of Neuroscience, 36(19), 5328–5337. https://doi.org/10.1523/JNEUROSCI.0236-16.2016

      Hosseinian, T., Yavari, F., Biagi, M. C., Kuo, M.-F., Ruffini, G., Nitsche, M. A., & Jamil, A. (2021). External induction and stabilization of brain oscillations in the human. Brain Stimulation, 14(3), 579–587. https://doi.org/10.1016/j.brs.2021.03.011

      Klin, A., Lin, D. J., Gorrindo, P., Ramsay, G., & Jones, W. (2009). Two-year-olds with autism orient to non-social contingencies rather than biological motion. Nature, 459(7244), 257–261. https://doi.org/10.1038/nature07868

      Laurienti, P. J., Perrault, T. J., Stanford, T. R., Wallace, M. T., & Stein, B. E. (2005). On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental Brain Research, 166(3), 289–297. https://doi.org/10.1007/s00221-005-2370-2

      Lenc, T., Keller, P. E., Varlet, M., & Nozaradan, S. (2018). Neural tracking of the musical beat is enhanced by low-frequency sounds. Proceedings of the National Academy of Sciences, 115(32), 8221–8226. https://doi.org/10.1073/pnas.1801421115

      Metzger, B. A., Magnotti, J. F., Wang, Z., Nesbitt, E., Karas, P. J., Yoshor, D., & Beauchamp, M. S. (2020). Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 40(36), 6938–6948. https://doi.org/10.1523/JNEUROSCI.0279-20.2020

      Moreau, C. N., Joanisse, M. F., Mulgrew, J., & Batterink, L. J. (2022). No statistical learning advantage in children over adults: Evidence from behaviour and neural entrainment. Developmental Cognitive Neuroscience, 57, 101154. https://doi.org/10.1016/j.dcn.2022.101154

      Nozaradan, S., Peretz, I., & Mouraux, A. (2012). Selective Neuronal Entrainment to the Beat and Meter Embedded in a Musical Rhythm. Journal of Neuroscience, 32(49), 17572–17581. https://doi.org/10.1523/JNEUROSCI.3203-12.2012

      Obleser, J., & Kayser, C. (2019). Neural Entrainment and Attentional Selection in the Listening Brain. Trends in Cognitive Sciences, 23(11), 913–926. https://doi.org/10.1016/j.tics.2019.08.004

      Peter, V., Goswami, U., Burnham, D., & Kalashnikova, M. (2023). Impaired neural entrainment to low frequency amplitude modulations in English-speaking children with dyslexia or dyslexia and DLD. Brain and Language, 236, 105217. https://doi.org/10.1016/j.bandl.2022.105217

      Shen, L., Lu, X., Wang, Y., & Jiang, Y. (2023). Audiovisual correspondence facilitates the visual search for biological motion. Psychonomic Bulletin & Review, 30(6), 2272–2281. https://doi.org/10.3758/s13423-023-02308-z

      Shen, L., Lu, X., Yuan, X., Hu, R., Wang, Y., & Jiang, Y. (2023). Cortical encoding of rhythmic kinematic structures in biological motion. NeuroImage, 268, 119893. https://doi.org/10.1016/j.neuroimage.2023.119893

      Simion, F., Regolin, L., & Bulf, H. (2008). A predisposition for biological motion in the newborn baby. Proceedings of the National Academy of Sciences, 105(2), 809–813. https://doi.org/10.1073/pnas.0707021105

      Stanford, T. R., Quessy, S., & Stein, B. E. (2005). Evaluating the Operations Underlying Multisensory Integration in the Cat Superior Colliculus. Journal of Neuroscience, 25(28), 6499–6508. https://doi.org/10.1523/JNEUROSCI.5095-04.2005

      Stevenson, R. A., Ghose, D., Fister, J. K., Sarko, D. K., Altieri, N. A., Nidiffer, A. R., Kurela, L. R., Siemann, J. K., James, T. W., & Wallace, M. T. (2014). Identifying and Quantifying Multisensory Integration: A Tutorial Review. Brain Topography, 27(6), 707–730. https://doi.org/10.1007/s10548-014-0365-7

      Troje, N. F., & Westhoff, C. (2006). The Inversion Effect in Biological Motion Perception: Evidence for a “Life Detector”? Current Biology, 16(8), 821–824. https://doi.org/10.1016/j.cub.2006.03.022

      Ujiie, Y., Asai, T., & Wakabayashi, A. (2015). The relationship between level of autistic traits and local bias in the context of the McGurk effect. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.00891

      Vallortigara, G., & Regolin, L. (2006). Gravity bias in the interpretation of biological motion by inexperienced chicks. Current Biology, 16(8), R279–R280. https://doi.org/10.1016/j.cub.2006.03.052

      Vanrie, J., & Verfaillie, K. (2004). Perception of biological motion: A stimulus set of human point-light actions. Behavior Research Methods, Instruments, & Computers, 36(4), 625–629. https://doi.org/10.3758/BF03206542

      Wang, L., & Jiang, Y. (2012). Life motion signals lengthen perceived temporal duration. Proceedings of the National Academy of Sciences of the United States of America, 109(11), E673-677. https://doi.org/10.1073/pnas.1115515109

      Wang, L., Yang, X., Shi, J., & Jiang, Y. (2014). The feet have it: Local biological motion cues trigger reflexive attentional orienting in the brain. NeuroImage, 84, 217–224. https://doi.org/10.1016/j.neuroimage.2013.08.041

      Wang, Y., Zhang, X., Wang, C., Huang, W., Xu, Q., Liu, D., Zhou, W., Chen, S., & Jiang, Y. (2022). Modulation of biological motion perception in humans by gravity. Nature Communications, 13(1), Article 1. https://doi.org/10.1038/s41467-022-30347-y

      Wright, T. M., Pelphrey, K. A., Allison, T., McKeown, M. J., & McCarthy, G. (2003). Polysensory Interactions along Lateral Temporal Regions Evoked by Audiovisual Speech. Cerebral Cortex, 13(10), 1034–1043. https://doi.org/10.1093/cercor/13.10.1034

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The manuscript discusses the role of phosphorylated ubiquitin (pUb) by PINK1 kinase in neurodegenerative diseases. It reveals that elevated levels of pUb are observed in aged human brains and those affected by Parkinson's disease (PD), as well as in Alzheimer's disease (AD), aging, and ischemic injury. The study shows that increased pUb impairs proteasomal degradation, leading to protein aggregation and neurodegeneration. The authors also demonstrate that PINK1 knockout can mitigate protein aggregation in aging and ischemic mouse brains, as well as in cells treated with a proteasome inhibitor. While this study provided some interesting data, several important points should be addressed before being further considered.

      Strengths:

      (1) Reveals a novel pathological mechanism of neurodegeneration mediated by pUb, providing a new perspective on understanding neurodegenerative diseases.

      (2) The study covers not only a single disease model but also various neurodegenerative diseases such as Alzheimer's disease, aging, and ischemic injury, enhancing the breadth and applicability of the research findings.

      Weaknesses:

      (1) PINK1 has been reported as a kinase capable of phosphorylating Ubiquitin, hence the expected outcome of increased p-Ub levels upon PINK1 overexpression. Figures 5E-F do not demonstrate a significant increase in Ub levels upon overexpression of PINK1 alone, whereas the evident increase in Ub expression upon overexpression of S65A is apparent. Therefore, the notion that increased Ub phosphorylation leads to protein aggregation in mouse hippocampal neurons is not yet convincingly supported.

      Indeed, overexpression of sPINK1 alone resulted in minimal changes in Ub levels in the soluble fraction (Figure 5E), which is expected given that the soluble Ub pool remains relatively stable and buffered. However, sPINK1* overexpression led to a marked increase in Ub levels in the insoluble fraction, indicative of increased protein aggregation (Figure 5F). The molecular weight distribution of Ub in the insoluble fraction was predominantly below 70 kDa, suggesting that phosphorylation inhibits Ub chain elongation.

      To further validate this mechanism, we utilized the Ub/S65A mutant to antagonize Ub phosphorylation and observed a significant reduction in the intensity of aggregated bands at low molecular weights, indicating restored proteasomal activity. The observed increase in Ub levels in the soluble fraction upon Ub/S65A overexpression is likely due to enhanced ubiquitination driven by elevated Ub-S65A, and notably, Ub/S65A was also detectable using an antibody against wild-type Ub.

      Consistent with these findings, overexpression of Ub/S65E resulted in a further increase in Ub levels in the insoluble fraction, with intensified low molecular weight bands. The effect was even more pronounced than that observed with sPINK1 transfection, likely resulting from the complete phosphorylation mimicry achieved by Ub/S65E, compared to the relatively low levels of phosphorylation by PINK1.

      These findings collectively support the conclusion that sPINK1 promotes protein aggregation via Ub phosphorylation. We have updated the Results and Discussion sections to more clearly present the data and explain the various controls.

      (2) The specificity of PINK1 and p-Ub antibodies requires further validation, as a series of literature indicate that the expression of the PINK1 protein is relatively low and difficult to detect under physiological conditions.

      We acknowledge the challenges in achieving high specificity with commercially available and customgenerated antibodies targeting PINK1 and pUb, particularly given their low endogenous expression under physiological conditions. However, in our study, we observed robust immunofluorescent staining for PINK1 (Figures 1A, 1C, and 1G) and pUb (Figures 1B, 1D, and 1G) in human brain samples from Alzheimer's disease (AD) patients, as well as in mouse models of AD and cerebral ischemia. The clear visualization can be partly attributed to the pathological upregulation of PINK1 and pUb under disease conditions. Importantly, the images from pink1<sup>-/-</sup> mice exhibit much weaker staining.

      Additionally, we detected a significant elevation in the pUb levels in aged mouse brains compared to younger ones (Figures 1E and 1F). In contrast, pink1<sup>-/-</sup> mice showed no change in pUb levels with aging, despite some background signals, demonstrating that pUb accumulation during aging is PINK1dependent. Collectively, these results support the specificity of the antibodies used in detecting pathophysiological changes in PINK1 and pUb levels.

      For cultured cells, pink1<sup>-/-</sup> cells served as a negative control for both PINK1 (Figures 2B and 2C) and pUb (Figures 2D and 2E). While the pUb Western blot exhibited some nonspecific background, pUb levels in pink1<sup>-/-</sup> cells remained unchanged across all MG132 treatment conditions (Figures 2D and 2E), further attesting the usability of the antibodies in conjunction with appropriated controls.

      We have updated the manuscript with higher-resolution images; individual image files have been uploaded separately.

      (3) In Figure 6, relying solely on Western blot staining and Golgi staining under high magnification is insufficient to prove the impact of PINK1 overexpression on neuronal integrity and cognitive function. The authors should supplement their findings with immunostaining results for MAP2 or NeuN to demonstrate whether neuronal cells are affected.

      We included NeuN immunofluorescent staining at 10, 30, and 70 days post transfection in Figure 5— figure supplement 2. The results clearly demonstrate a significant loss of NeuN-positive cells in the hippocampus following Ub/S65E overexpression, while no apparent reduction was observed with sPINK1 transfection alone. 

      We have also quantified MAP2 protein levels via Western blotting and examined morphology of neuronal dendrite and synaptic structure using Golgi staining. These analyses revealed a significant reduction in MAP2 levels and synaptic damage upon sPINK1 or Ub/S65E overexpression (Figures 6F and 6H), consistent with the proteomics analysis (Figure 5—figure supplementary 5). Notably, these detrimental effects could be rescued by co-expression of Ub/S65A, reinforcing the role of pUb in mediating these structural changes.

      Together, our findings from NeuN immunostaining, MAP2 protein analysis, proteomics analysis, and Golgi staining provide strong evidence for the impact of PINK1 overexpression and pUb elevation on neuronal integrity and synaptic structure.

      (4) The authors should provide more detailed figure captions to facilitate the understanding of the results depicted in the figures.

      Figure captions have been updated with more details incorporated in the revised manuscript.

      (5) While the study proposes that pUb promotes neurodegeneration by affecting proteasomal function, the specific molecular mechanisms and signaling pathways remain to be elucidated.

      The molecular mechanisms and signaling pathways through which pUb promotes neurodegeneration are likely multifaceted and interconnected. Our findings suggest that mitochondrial dysfunction plays a central role following sPINK1* overexpression. This is supported by (1) an observed increase in full-length PINK1, indicative of impaired mitochondrial quality control, and (2) proteomic data showing enhanced mitophagy at 30 days post-transfection, followed by substantial mitochondrial injuries at 70 days post-transfection (Figure 5—figure supplement 5 and Supplementary Data). The progressive mitochondrial damage caused by protein aggregates would exacerbate neuronal injury and degeneration.

      Additionally, reduced proteasomal activity may lead to the accumulation of inhibitory proteins that are normally degraded by the ubiquitin-proteasome system. Our proteomics analysis identified a >50fold increase in CamK2n1 (UniProt ID: Q6QWF9), an endogenous inhibitor of CaMKII activation, following sPINK1* overexpression. The accumulation of CamK2n1 suppresses CaMKII activation, thereby inhibiting the CREB signaling pathway (Figure 7), which is essential for synaptic plasticity and neuronal survival. This disruption can further contribute to neurodegenerative processes.

      Thus, our findings underscore the complexity of pUb-mediated neurodegeneration and call for further investigation into downstream consequences.

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improved or additional experiments, data or analyses.

      We have performed additional experiments to investigate how the impairment of ubiquitinproteasomal activity contributes to neurodegeneration. Specifically, we investigated CamK2n1, an endogenous inhibitor of CaMKII, which is normally degraded by the proteasome to allow CaMKII activation. Our proteomics analysis revealed a significant (>50-fold) elevation of CamKI2n1 following sPINK1 overexpression (Figure 5—figure supplement 5 and Supplementary Data).

      To validate this mechanism, we conducted immunofluorescence and Western blot analyses, demonstrating reduced levels of phosphorylated CaMKII (pCaMKII) and phosphorylated CREB (pCREB), as well as reduced levels of downstream proteins such as BDNF and ERK. These results have been incorporated into the revised manuscript (Figure 7).

      As the proteasome is crucial in maintaining proteostasis, its dysregulation would trigger neurodegeneration through multiple pathways, contributing to a broad cascade of pathological events.

      Reviewer #2 (Public review):

      Summary:

      The manuscript makes the claim that pUb is elevated in a number of degenerative conditions including Alzheimer's Disease and cerebral ischemia. Some of this is based on antibody staining which is poorly controlled and difficult to accept at this point. They confirm previous results that a cytosolic form of PINK1 accumulates following proteasome inhibition and that this can be active. Accumulation of pUb is proposed to interfere with proteostasis through inhibition of the proteasome. Much of the data relies on over-expression and there is little support for this reflecting physiological mechanisms.

      Weaknesses:

      The manuscript is poorly written. I appreciate this may be difficult in a non-native tongue, but felt that many of the problems are organizational. Less data of higher quality, better controls and incision would be preferable. Overall the referencing of past work is lamentable. Methods are also very poor and difficult to follow.

      Until technical issues are addressed I think this would represent an unreliable contribution to the field.

      (1) Antibody specificity and detection under pathological conditions

      We recognize the limitations of commercially available antibodies for detecting PINK1 and pUb. Nevertheless, our findings reveal a significant elevation in PINK1 and pUb levels under pathological conditions, such as Alzheimer's disease (AD) and ischemia. Additionally, we observed an increase in pUb level during brain aging, further demonstrating its relevance and a potentially causative role for this special pathological condition. Similarly, elevated pUb levels were observed for cultured cells following pharmacological treatment or oxygen-glucose deprivation (OGD).

      In contrast, in pink1<sup>-/-</sup> mice and HEK293 cells used as negative controls, PINK1 and pUb levels remained consistently low. Therefore, the observed elevation of PINK1 and pUb are associated with special pathological conditions, rather than an antibody-detection anomaly.

      (2) Overexpression as a model for pathological conditions

      To investigate whether the inhibitory effects of sPINK1 on the ubiquitin-proteasome system (UPS) depend on its kinase activity, we employed a kinase-dead version of sPINK1* as a negative control. Given that PINK1 targets multiple substrates, we also investigated whether its effects on UPS inhibition were specifically mediated by ubiquitin phosphorylation. To this end, we used Ub/S65A (a phospho-null mutant) to block Ub phosphorylation by sPINK1, and Ub/S65E (a phospho-mimetic mutant) to mimic phosphorylated Ub. These well-defined controls ensured the robustness of our conclusions.

      Although overexpression does not perfectly replicate physiological conditions, it provides a valuable model for studying pathological scenarios such as neurodegeneration and brain aging, where pUb levels are elevated. For example, we observed a 30.4% increase in pUb levels in aged mouse brains compared to young brains (Figure 1F). Similarly, in our sPINK1 overexpression model, pUb levels increased by 43.8% and 59.9% at 30- and 70-days post-transfection, respectively, compared to controls (Figures 5A and 5C). Notably, co-expression of sPINK1* with Ub/S65A almost entirely prevented sPINK1* accumulation (Figure 5B), indicating that an active UPS can efficiently degrade this otherwise stable variant of sPINK1.

      Together, our findings demonstrate that sPINK1 accumulation inhibits UPS activity, an effect that can be reversed by the phospho-null Ub mutant. The overexpression model mimics pathological conditions and provides valuable insights into pUb-mediated proteasomal dysfunction.

      (3) Organization of the manuscript

      Following your suggestion, we have restructured the manuscript to present the key findings in a more logical and cohesive sequence:

      (a) Evidence for elevated PINK1 and pUb levels across a broad spectrum of pathological and neurodegenerative conditions;

      (b) The effects of pUb elevation in cultured cells, focusing on the proteasome;

      (c) Mechanistic insights into how pUb elevation inhibits proteasomal activity;

      (d) The absence of PINK1 and pUb alleviates protein aggregation;

      (e) Evidence for the causative relationship between elevated pUb levels and proteasomal inhibition;

      (f) Demonstration that pUb elevation directly contributes to neuronal degeneration;

      (g) Give an additional evidence to explain the mechanism of neuronal degeneration post sPINK1* over-expression. The downstream effects of elevated CamK2n1, an inhibitor of CaMKII, resulting from proteasomal inhibition.

      This reorganization should ensure a clear and progressive narrative, and enhance the overall coherence and impact of the revised manuscript.

      (4) Revisions to writing, referencing, and methodology

      We have made a great effort to enhance the clarity and flow of the manuscript, including the addition of references to appropriately acknowledge prior work. We have also expanded the Methods section with additional details to improve readability and ensure reproducibility. We believe these revisions effectively address the concerns raised and strengthen the overall quality of the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Figure 1: PINK1 is a poorly expressed protein and difficult to detect by Western blot let alone by immunofluorescence. I have direct experience of the antibody used in this study and do not consider it reliable. There are much cleaner reagents out there, although they still have many challenges. The minimal requirement here is for the PINK1 antibody staining to be compared in wild-type and knockout mice. One would also expect to see a mitochondrial staining which would require higher magnification to be definitive, but it does not look like it to me. This is a key foundational figure and is unreliable. The pUb antibody also has a high background, see for example figure 2E.

      Under physiological conditions, PINK1 and pUb levels are indeed low, making their detection challenging. However, under pathological conditions, their expression is significantly elevated, correlating with disease severity. Given the limitations of available reagents, using appropriate controls is a standard approach in biological research.

      Nevertheless, we observed robust immunofluorescent staining for PINK1 (Figures 1A, 1C, and 1G) and pUb (Figures 1B, 1D, and 1G) in human brain samples from Alzheimer’s disease (AD) patients and mouse models of AD and cerebral ischemia. Compared to healthy controls, the significant elevation of PINK1 and pUb under these pathological conditions accounts for their clear visualization. To validate antibody specificity, we have included images from pink1<sup>-/-</sup> mice as negative controls (Figure 1C and 1D, third panel).

      Furthermore, we analyzed pUb levels in both young and aged mice, using pink1<sup>-/-</sup> mice as controls.

      Our results revealed a significant increase in pUb levels in aged wild-type mice (Figures 1E and 1F), In contrast, pink1<sup>-/-</sup> mice exhibited relatively low pUb levels, with no notable change between young and aged groups. These findings reinforce the conclusion that pUb accumulation during aging is dependent on PINK1.Furthermore, we analyzed pUb levels in both young and aged mice, using pink1<sup>-/-</sup> mice as controls.

      For HEK293 cells, pink1<sup>-/-</sup> cells were used as a negative control for assessing PINK1 (Figures 2B and 2C) and pUb levels (Figures 2D and 2E). While the pUb Western blot did show some nonspecific background, as you have noted, pUb levels significantly increased following MG132 treatment of the wildtype cells. In contrast, no such increase was observed in pink1<sup>-/-</sup> cells (Figure 2D and 2E). These results further validate the reliability of our findings.

      Regarding mitochondrial staining, we recognize that PINK1 localization can vary depending on the pathological context. For example, in Alzheimer’s disease, PINK1 exhibits relatively high nuclear staining, while in cerebral ischemia and brain aging, it is predominantly cytoplasmic and punctate. In contrast, in young, healthy mouse brains, PINK1 is more uniformly distributed. The observed elevation in pUb levels could arise from mitochondrial PINK1 or soluble sPINK1 in the cytoplasm, and it remains unclear whether nuclear PINK1 contributes to pUb accumulation. Investigating the role of PINK1 in different forms and subcellular localizations will be an important avenue for future research.

      To enhance clarity, we have updated our images and replaced them with higher-resolution versions in the revised manuscript.

      Please also confirm that the GAPDH loading controls represent the same gels, to my eye they do not match.

      We have reviewed all the bands, and confirmed that the GAPDH loading controls correspond to the same gels. For different gels, we use separate GAPDH loading controls. There are two experimental scenarios to consider:

      (1) When there is a large difference in molecular weight between target proteins, we cut the gel into sections and incubate each section with different antibodies separately.

      (2) When the molecular weight difference is small and cutting is not feasible, we first probe the membrane with one antibody, strip it, and then re-incubate the membrane with a second antibody.

      These approaches ensure accurate and reliable detection of target proteins with various molecular weights relative to GAPDH.

      1H. Ponceau.

      We have corrected the spelling.

      Figure 2 many elements are confirmation of work already reported and this must be made clearer in the text. 

      Indeed, the elevation of sPINK1 and pUb upon proteasomal inhibition has been previously reported, and these studies have been acknowledged (Gao, et al, 2016; Dantuma, et al, 2000). In the present study, we expand on these findings by conducting a detailed analysis of the time- and concentrationdependent effects of MG132 on sPINK1 and pUb levels, establishing a causative relationship between pUb accumulation and proteasomal inhibition. Furthermore, we demonstrate that sPINK1 overexpression and MG132-induced proteasomal inhibition exhibit no additive effect, indicating that both converge on the same pathway, resulting in the impairment of proteasomal activity.

      It has been established that ubiquitin phosphorylation inhibits Ub chain elongation (Wauer, et al, 2015). However, our study provides novel insights by identifying an additional mechanism: phosphorylated Ub also interferes with the noncovalent interactions between Ub chain and Ub receptors in the proteasome, which further contributes to the impairment of UPS function.

      The PINK1 kinase-dead mutant construction (Figure 2F) and the use of Ub-GFP as a proteasomal substrate were based on established methodologies, which have been appropriately cited in the manuscript (Beilina, etal 2005 for KD sPINK1; Yamano, et al for endogenous PINK1; Samant, et al, 2018 and Dantuma, et al, 2000 for Ub-GFP probe). Similarly, our use of puromycin and BALA treatments follows previously reported protocols (Gao, et al, 2016), which allowed us to dissect the relative contributions of sPINK1* overexpression to proteasomal vs. autophagic dysfunction.

      As you have noted, our study has built upon prior findings while introducing new mechanistic insights into sPINK1 and pUb-mediated proteasomal dysfunction.

      2C 24h MG132 not recommended, most cells are dead by then.

      We used MG132 treatment for 24 hours to evaluate the time-course effects of proteasomal inhibition on PINK1 and pUb levels in HEK293 cells (Figures 2C and 2E). We did observe some decrease in both PINK1 and pUb levels at 24 hours compared to 12 hours, which may result from some extend of cell death at the longer treatment duration.

      In SH-SY5Y cells, we collected cells at 24 hours after MG132 administration (Figure 5—figure supplementary 1). Though protein aggregation was evident in these cells, we did not observe pronounced cell death under these conditions, justifying our treatment.

      Our findings are consistent with previous studies demonstrating that MG132 at 5 µM for 24 hours effectively induces proteasomal inhibition without substantial cytotoxicity. For example, studies using human esophageal squamous cancer cells have reported that this treatment condition inhibits cell proliferation while maintaining cell viability, with cell viability >70% after 24-hour treatment with 5 µM MG132 (Int J Mol Med 33: 1083-1088, 2014). 

      MG132 has been commonly used at concentrations ranging from 5 to 50 µM for durations of 1 to 24 hours, as stated at the vendor’s website (https://www.cellsignal.com/products/activatorsinhibitors/mg-132/2194).

      2I what is BALA do they mean bafilomycin. This is a v-ATPase inhibitor, not just an autophagy inhibitor.

      We appreciate the reviewer’s comment regarding the use of BALA in Figure 2I. To clarify, BALA refers to bafilomycin A1, a well-established v-ATPase inhibitor that blocks lysosomal acidification. While bafilomycin A1 is commonly used as an autophagy inhibitor, its primary mechanism involves inhibiting lysosomal function, which is critical for autophagosome-lysosome fusion and subsequent degradation of autophagic cargo.

      In our study, we used bafilomycin A1 in conjunction with puromycin to dissect the relative contributions of sPINK1 overexpression on proteasomal and autophagic activities. Puromycin induces protein misfolding and aggregation, causing stress on both degradation pathways. By inhibiting lysosomal function with bafilomycin A1 and blocking the protein degradation load at various stages, we can tell the relative contributions of autophagy and UPS pathways.

      We acknowledge that bafilomycin A1’s effects extend beyond autophagy, as it also inhibits v-ATPase activity. However, its inhibition of lysosomal degradation is integral to distinguishing autophagy’s contribution under the experimental conditions, and BALA treatment has been used in extensively in previous studies (Mauvezin and Neufeld, 2015). 

      We have further clarified this treatment in the revised manuscript.

      Figure 3. Legend or text needs to be more explicit about how chains have been produced. From what I can gather from methods only a single E2 has been trialed. Authors should use at least one of the criteria used by Wauer et al. (2014) to confirm the stoichiometry of phosphorylation. The concept that pUb can interfere with E2 discharging is not new, but not universal across E2s.

      We have cited in the manuscript that PINK1-mediated ubiquitin phosphorylation can interfere with ubiquitin chain elongation for certain E2 enzymes (Wauer et al., 2015). 

      To clarify, the focus of our current work is on how elevation of Ub phosphorylation impacts UPS activity, rather than exploring the broader effects of Ub phosphorylation on Ub chain elongation. For this reason, we have used the standard E2 that is well-established for generating K48-linked polyUb chain (Pickart CM, 2005). Moreover, our findings go further and by demonstrate that phosphorylated K48-linked polyubiquitin exhibits weaker non-covalent interactions with proteasomal ubiquitin receptors. This dual effect—on both covalent chain elongation and non-covalent interactions— contributes to the observed reduction in ubiquitin-proteasome activity, a novel aspect of our study.

      To address the reviewer’s concerns, we have added details in the Methods section and figure legends regarding the generation of ubiquitin chains. Specifically, we used ubiquitin-activating enzyme E1 (UniProt ID: P22314) and ubiquitin-conjugating enzyme E2-25K (UniProt ID: P61086) to generate K48-linked ubiquitin chains. 

      Our ESI-MS analysis showed that only 1–2 phosphoryl groups were incorporated into the K48-linked tetra-ubiquitin chains (Figure 3—figure supplement 2). This is consistent with our in vivo findings, where pUb levels increased by 30.4% in aged mouse brains compared to young brains (Figure 1F). Notably, even sub-stoichiometric phosphorylation onto the K48-linked ubiquitin chain significantly weakens the non-covalent interactions with the proteasome (Figures 3E and 3H).

      Figure 4. I could find no definition of the insoluble fraction, nor details on how it is prepared.

      The insoluble fraction primarily contains proteins that are aggregated or associated with hydrophobic interactions and cannot be solubilized by RIPA buffer. We have provided more details in the Methods of the revised manuscript about how the insoluble fraction was prepared. Our approach was based on established protocols for fractionating soluble and insoluble proteins from brain tissues (Wirths, 2017). Here is an outline of the procedure, which enables the separation and subsequent analysis of distinct protein populations:

      • Lysis and preparation of soluble fraction: Cells and brain tissues were lysed using RIPA buffer (Beyotime Biotechnology, cat# P0013B) containing protease (P1005) and phosphatase inhibitors (P1081) on ice for 30 minutes, with gentle vortexing every 10 minutes. Brain samples were homogenized using a precooled TissuePrep instrument (TP-24, Gering Instrument Company). Lysates were centrifuged at 12,000 rpm for 30 minutes at 4°C. The supernatant was collected as the soluble protein fraction.

      • Preparation of insoluble fraction: The pellet was resuspended in 20 µl of SDS buffer (2% SDS, 50 mM Tris-HCl, pH 7.5) and subjected to ultrasonic pyrolysis at 4°C for 8 cycles (10 seconds ultrasound, 30 seconds interval). The samples were then centrifuged at 12,000 rpm for 30 minutes at 4°C. The supernatant obtained after this step was designated as the insoluble protein fraction.

      • Protein quantification: Protein concentrations for both soluble and insoluble fractions were determined using the BCA Protein Assay Kit (Beyotime Biotechnology, cat# P0009).

      Figure 5. What is the transfection efficiency? How many folds is sPINK1 over-expressed? Typically, a neuron will have only a few hundred copies of PINK1 at the basal state. How much mutant ubiquitin is expressed relative to wild type, seeing the free ubiquitin signals on the gels might be helpful here, but they seem to have been cut off. 

      We appreciate the reviewer's insightful comments regarding transfection efficiency, the extent of sPINK1 overexpression, and the expression levels of mutant ubiquitin relative to wild-type ubiquitin. Below, we provide detailed responses to each point:

      Transfection Efficiency: Our immunofluorescent staining for NeuN, a neuronal marker, demonstrated that over 90% of NeuN-positive cells were co-localized with GFP (Figure 5—figure supplement 2), indicating a high transfection efficiency in our neuronal cultures.

      Extent of sPINK1 Overexpression: Quantifying the exact fold increase of sPINK1 upon overexpression is inherently difficult due to its low basal expression under physiological conditions, making the relative increase difficult to measure (small denominator effect). However, our Western blot analysis shows that ischemic events can cause a substantial elevation of PINK1 levels, including both full-length and cleaved forms (Figure 1H). This suggests that our overexpression model recapitulates the pathological increase in PINK1, making it a relevant system for studying disease mechanisms.

      From Figure 5B, it is evident that sPINK1 levels differ significantly between neurons overexpressing sPINK1 alone and those co-expressing sPINK1 + Ub/S65A (70 days post-transfection). Overexpression of sPINK1 alone results in multiple PINK1 bands, consistent with sPINK1, endogenous PINK1 (induced by mitochondrial damage), and ubiquitinated sPINK1. In comparison, co-expressing Ub/S65A leads to faint PINK1 bands, suggesting that in the presence of a functionally restored proteasome, overexpressed sPINK1 is rapidly degraded. Therefore, actual accumulation of sPINK1 depends on proteasomal activity, and the “over-expressed” PINK1 level can be comparable to levels observed under native, pathological conditions.

      Expression Levels of Mutant Ubiquitin Relative to Wild-Type: Assessing the expression levels of mutant versus wild-type ubiquitin is indeed valuable. In Figure 5E, we observed a 38.9% increase in high-molecular-weight ubiquitin conjugates in the soluble fraction when comparing the sPINK1+Ub/S65A group to the control. This increase suggests that mutant ubiquitin is actively incorporated into polyubiquitin chains.

      Regarding free monomeric ubiquitin, its low abundance and rapid incorporation into polyubiquitin chains make it difficult to visualize in Western blots. Additionally, its low molecular weight and lower antibody binding valency further reduce its visibility.

      General: a number of effects are shown following over-expression but no case is made that these levels of pUb are ever attained physiologically. I am very unconvinced by these findings and think the manuscript needs to be improved at multiple levels before being added to the record.

      We understand the reviewer’s concerns regarding the relevance of pUb levels observed in our overexpression model. To clarify, our study is not focused on physiological levels of pUb, but rather on pathologically elevated levels, which have been documented in various neurodegenerative conditions. While overexpression is not a perfect replication of pathological states, it provides a valuable tool to investigate mechanisms that become relevant under disease conditions. Moreover, we have taken steps to ensure the validity of our findings and to address potential limitations associated with overexpression models:

      Pathological Relevance: Besides several reported literatures, we observed significant increases in PINK1 and pUb levels in human brain samples from Alzheimer's disease (AD) patients, as well as in mouse models of AD, cerebral ischemia (including mouse middle cerebral artery occlusion ischemic model and oxygen glucose deprivation cell model), and aging (e.g., Figures 1E, 1F, and 1H). All these data show that pUb levels are elevated under pathological conditions. Our overexpression model mimics these pathological scenarios by recreating the high levels of pUb, which lead to the impairment of proteasomal activity and subsequent disruption of proteostasis.

      Use of Robust Controls: To ensure the reliability of our results and interpretations, we employed multiple controls for our experiments. We have used pink1<sup>-/-</sup> mice and cells to confirm that pUb accumulation is PINK1-dependent (Figures 1C and 2C). We have also included kinase-dead sPINK1 mutant and Ub/S65A phospho-null mutants to negate/counteract the specific roles of PINK1 activity and pUb in proteasomal dysfunction. On the other hand, we have used Ub/S65E for phosphomimetic mutant, corresponding to a 100% Ub phosphorylation.

      Importantly, we have compared sPINK1 overexpression with both baseline and disease-mimicking conditions, thus to ensure that the observed effects are consistent with pathological changes. Furthermore, our findings are supported by complementary evidences from human brain samples, model animals, cell cultures, and molecular assays. Integrating the different controls and various approaches, we have provided mechanistic insights into how elevated pUb levels causes proteasomal impairment and contributes to neurodegeneration.

      Our findings elucidate how elevated pUb level contributes to the disruption of proteostasis in neurodegenerative conditions. While overexpression may have limitations, it remains a powerful tool for dissecting pathological mechanisms and testing hypotheses. Our results align with and expand upon previous studies suggesting pUb as a biomarker of neurodegeneration (Hou, et al, 2018; Fiesel, et al, 2015), and provide mechanistic insights into how elevated pUb and sPINK1 drive a viscous feedforward cycle, ultimately leading to proteasomal dysfunction and neurodegeneration. 

      We hope these clarifications highlight the relevance and rigor of our study, and welcome additional suggestions to improve the manuscript.

      Reviewer #3 (Public review):

      Summary:

      This study aims to explore the role of phosphorylated ubiquitin (pUb) in proteostasis and its impact on neurodegeneration. By employing a combination of molecular, cellular, and in vivo approaches, the authors demonstrate that elevated pUb levels contribute to both protective and neurotoxic effects, depending on the context. The research integrates proteasomal inhibition, mitochondrial dysfunction, and protein aggregation, providing new insights into the pathology of neurodegenerative diseases.

      Strengths:

      - The integration of proteomics, molecular biology, and animal models provides comprehensive insights.

      - The use of phospho-null and phospho-mimetic ubiquitin mutants elegantly demonstrates the dual effects of pUb.

      - Data on behavioral changes and cognitive impairments establish a clear link between cellular mechanisms and functional outcomes.

      Weaknesses:

      - While the study discusses the reciprocal relationship between proteasomal inhibition and pUb elevation, causality remains partially inferred.

      It has been well-established that protein aggregates, particularly neurodegenerative fibrils, can impair proteasomal activity (McDade, et al., 2024; Kinger, et al., 2024; Tseng, et al., 2008). Other contributing factors, including ATP depletion, reduced proteasome component expression, and covalent modifications of proteasomal subunits, can also lead to declined proteasomal function. Additionally, mitochondrial injury serves as an important source of elevated PINK1 and pUb levels. Recent studies have demonstrated that efficient mitophagy is essential to prevent pUb accumulation, whereas partial mitophagy failure results in elevated PINK1 levels (Chin, et al, 2023; Pollock, et al. 2024).

      While pathological conditions can impair proteasomal function and slow sPINK1 degradation, leading to its accumulation, our results demonstrate that overexpression of sPINK1 or PINK1 can initiate this cycle as well. Once this cycle is initiated, it becomes self-perpetuating, as sPINK1 and pUb accumulation progressively impair proteasomal function, leading to more protein aggregates and mitochondrial damages.

      Importantly, we show that co-expression of Ub/S65A effectively rescues cells from this cycle, which further illustrates the pivotal role of pUb in driving proteasomal inhibition and the causality between pUb elevation and proteasomal inhibition. At the animal level, pink1 knockout prevents protein aggregation under aging and cerebral ischemia conditions (Figures 1E and 1G). 

      Together, by controlling at protein, cell, and animal levels, our findings support this self-reinforcing and self-amplifying cycle of pUb elevation, proteasomal inhibition, protein aggregation, mitochondrial damage, and ultimately, neurodegeneration.

      - The role of alternative pathways, such as autophagy, in compensating for proteasomal dysfunction is underexplored.

      Indeed, previous studies have shown that elevated sPINK1 can enhance autophagy (Gao, et al., 2016,), potentially compensating for impaired UPS function. One mechanism involves PINK1mediated phosphorylation of p62, which enhances autophagic activity.

      In our study, we observed increased autophagic activity upon sPINK1 overexpression, as shown in Figure 2I (middle panel, without BALA). This increase in autophagy may facilitate the degradation of ubiquitinated proteins induced by puromycin, partially mitigating proteasomal dysfunction. This compensation might also explain why protein aggregation, though statistically significant, increased only slightly at 70 days post-sPINK1 transfection (Figure 5F). Additionally, we detected a mild but statistically insignificant increase in LC3II levels in the hippocampus of mouse brains at 70 days postsPINK1 transfection (Figure 5—figure supplement 6), further supporting the notion of autophagy activation.

      However, while autophagy may provide some compensation, its effect is likely limited. The UPS and autophagy serve distinct roles in protein degradation:

      • Autophagy is a bulk degradation pathway, primarily targeting damaged organelles, intracellular pathogens, and protein aggregates, often in a non-selective manner.

      • The UPS, in contrast, is highly selective, degrading short-lived regulatory proteins, misfolded proteins, and proteins tagged for degradation via ubiquitination.

      Thus, while sPINK1 overexpression enhances autophagy-mediated degradation, it simultaneously impairs UPS-mediated degradation. This suggests that autophagy partially compensates for proteasomal dysfunction but is insufficient to counterbalance the UPS's selective degradation function. We have incorporated additional discussion in the revised manuscript.

      - The immunofluorescence images in Figure 1A-D lack clarity and transparency. It is not clear whether the images represent human brain tissue, mouse brain tissue, or cultured cells. Additionally, the DAPI staining is not well-defined, making it difficult to discern cell nuclei or staging. To address these issues, lower-magnification images that clearly show the brain region should be provided, along with improved DAPI staining for better visualization. Furthermore, the Results section and Figure legends should explicitly indicate which brain region is being presented. These concerns raise questions about the reliability of the reported pUb levels in AD, which is a critical aspect of the study's findings.

      We have taken steps to address the concerns regarding clarity and transparency in Figure 1A-D. We have already addressed the source of tissues at the left of each images. For example, we have written “human brain with AD” at the left side of Figure 1A, and “mouse brains with AD” at the left side of Figure 1C.

      Briefly, the human brain samples in Figure 1 originate from the cingulate gyrus of Alzheimer’s disease (AD) patients. Our analysis revealed that PINK1 is primarily localized within cell bodies, whereas pUb is more abundant around Aβ plaques, likely in nerve terminals. For the mouse brain samples, we have now explicitly indicated in the figure legends and Results section that the images represent the neocortex of APP/PS1 mice, a mouse model relevant to AD pathology, as well as the corresponding regions in wild-type and pink1<sup>-/-</sup> mice. We have ensured that the brain regions and sources are clearly stated throughout the manuscript.

      Regarding image clarity, we have uploaded higher-resolution versions of the images in the revised manuscript to improve visualization of key features, including DAPI staining. We believe these revisions enhance the reliability and interpretability of our findings, particularly in relation to the reported pUb levels in AD. 

      - Figure 4B should also indicate which brain region is being presented.

      The images were taken for layer III-IV in the neocortex of mouse brains. We have included this information in the figure legend of the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      - Expand on the potential compensatory role of autophagy in response to proteasomal dysfunction.

      Upon proteasomal inhibition, cells may activate autophagy as an alternative pathway of degradation to help clear damaged or misfolded proteins. Autophagy is a bulk degradation process that targets long-lived proteins, damaged organelles, and aggregated proteins for lysosomal degradation. While this pathway can provide some compensation, it is distinct from the ubiquitin-proteasome system (UPS), which specializes in the selective degradation of short-lived regulatory proteins and misfolded proteins.

      In our study, we observed increased autophagic activity following sPINK1 overexpression (Figure 2J, middle panel, without BALA) and a slight, though statistically insignificant, increase in LC3II levels in the hippocampus of mouse brains at 70 days post-sPINK1 transfection (Figure 5—figure supplement 6). These findings suggest that autophagy is indeed upregulated as a compensatory response to proteasomal dysfunction, potentially facilitating the degradation of aggregated ubiquitinated proteins. Additionally, gene set enrichment analysis (GSEA) revealed similar enrichment of autophagy pathways at 30 and 70 days post-sPINK1 overexpression (Figure 5—figure supplement 5).

      However, the compensatory capacity of autophagy is likely limited. While autophagy can reduce protein aggregation, it is an inherently non-selective process and cannot fully replace the targeted functions of the UPS. Moreover, as we illustrate in Figure 7 of the revised manuscript, UPS is essential for degrading specific regulatory and inhibitory proteins and plays a critical role in cellular proteostasis, particularly in signaling regulation, cell cycle control, and stress responses.

      Together, while autophagy activation provides some degree of compensation, it cannot fully restore cellular proteostasis. The interplay between these two degradation pathways is an important area for future investigation. For the present study, our focus is on how pUb elevations impact proteasomal activity and elicits downstream effects.

      We have incorporated these additional discussions on this topic in the revised manuscript.

      - Simplify the discussion of complex mechanisms to improve accessibility for readers.

      We have revised the Discussion to present the mechanisms in a more coherent and accessible manner, ensuring clarity for a broader readership. These revisions should make the discussion more intuitive while preserving the depth of our findings.

      - Statistical analyses could benefit from clarifying how technical replicates and biological replicates were accounted for across experiments.

      We have clarified our statistical analysis in the Methods section and figure legends, explicitly detailing how many biological replicates were accounted for across experiments. These revisions should enhance transparency and clarity, ensuring that our findings are robust and reproducible.

      - The image in Figure 3D is too small to distinguish any signals. A larger and clearer image should be presented.

      We have expanded the images in Figure 3D. Additionally, we have replaced figures with version of better resolutions throughout the manuscript.

      - NeuN expression in Figure 4B differs between wildtype and pink-/- mice. Additional validation is needed to determine whether pink-/- enhances NeuN expression.

      The difference in NeuN immunofluorescence intensity between wild-type and pink1<sup>-/-</sup> mice in Figure 4B may simply result from variations in image acquisition rather than an actual difference in NeuN expression.

      Our single nuclei RNA-seq analyses of wild-type and pink1<sup>-/-</sup> mice at 3 and 18 months of age reveal no significant differences in NeuN expression at the transcript level (data provided below). This confirms that the observed variation in fluorescence intensity is unlikely to reflect an authentic upregulation of NeuN expression. Thus, factors like the concentration of antibody, image exposure and processing may contribute to differences in staining intensity.

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This paper examines the role of MLCK (myosin light chain kinase) and MLCP (myosin light chain phosphatase) in axon regeneration. Using loss-of-function approaches based on small molecule inhibitors and siRNA knockdown, the authors explore axon regeneration in cell culture and in animal models. Their evidence shows that MLCK activity facilitates axon extension/regeneration, while MLCP prevents it.

      Major concern:

      A global inconsistency in the conclusions of the authors is evident when trying to understand the role of NMII in axon growth and to understand the present results in light of previous reports by the authors and many others on the role of NMII in axon extension. The discussion of the matter fails to acknowledge a vast literature on how NMII activity is regulated. The authors study enzymes responsible for the phosphorylation and dephosphorylation of NMII, referring to something that is strongly proven elsewhere, that phosphorylation activates NMII and dephosphorylation deactivates it. The authors mention their own previous evidence using inhibitors of NMII ATPase activity (blebbistatin, Bleb for short) and inhibitors of a kinase that phosphorylates NMII (ROCK), highlighting that Bleb increases axon growth. Since Bleb inhibits the ATPase activity of NMII, it follows that NMII is in itself an inhibitor of axon growth, and hence when NMII is inhibited, the inhibition on axon growth is relieved, and axonal growth takes place (REF1). It is known that NMII exists in an inactive folded state, and ser19 phosphorylation (by MLCK or ROCK) extends the protein, allowing NMII filament formation, ATPase activity, and force generation on actin filaments (REF2). From this, it is derived that if MLCK is inhibited, then there is no NMII phosphorylation, and hence no NMII activity, and, according to their previous work, this should promote axon growth. On the contrary, the authors show the opposite effect: in the lack of phospho-MLC, authors show axon growth inhibition.

      We thank the Reviewer for taking time to review our manuscript, and we really appreciated the comments from the reviewer. We have tried our best to revise the manuscript to address all the comments raised by the Reviewer.

      Reporting evidence challenging previous conclusions is common business in scientific endeavors, but the problem with the current manuscript is that it fails to point to and appropriately discuss this contradiction. Instead, the authors refer to the fact that MLCK and Bleb inhibit NMII in different steps of the activation process. While this is true, this explanation does not solve the contradiction. There are many options to accommodate the information, but it is not the purpose of this revision to provide them. Since the manuscript is focused solely on phosphorylation states of MLC and axon extension, the claims are simply at odds with the current literature, and this important finding, if true, is not properly discussed.

      Thank you for reviewer's very good comments. As suggested by Reviewer, we discuss more detail it in our revised manuscripts (line 357-368; line 373-374).

      What follows is a discussion of the merits and limitations of different claims of the manuscript in light of the evidence presented.

      (1) Using western blot and immunohistochemical analyses, authors first show that MLCK expression is increased in DRG sensory neurons following peripheral axotomy, concomitant to an increase in MLC phosphorylation, suggesting a causal effect (Figure 1). The authors claim that it is common that axon growth-promoting genes are upregulated. It would have been interesting at this point to study in this scenario the regulation of MLCP, which is a main subject in this work, and expect its downregulation.

      We thank the Reviewer for taking time to review our manuscript, and we really appreciated the positive comments from the Reviewer.

      (2) Using DRG cultures and sciatic nerve crush in the context of MLCK inhibition and down-regulation, authors conclude that MLCK activity is required for mammalian peripheral axon regeneration both in vitro and in vivo (Figure 2).

      The in vitro evidence is of standard methods and convincing. However, here, as well as in all other experiments using siRNAs, it is not clear what the control is about (the identity of the plasmids and sequences, if any).

      We used the pCMV–EGFP–N3 as control, and the pCMV–EGFP–N3 plasmid was from Clontech, Inc. (line 114-115).   

      Related to this, it is not helpful to show the same exact picture as a control example in Figures 2 and 3 (panels J and E, respectively). Either because they should not have received the same control treatment, or simply because it raises concern that there are no other control examples worth showing. In these images, it is not also clear where and how the crush site is determined in the GFP channel. This is of major importance since the axonal length is measured from the presumed crush site. Apart from providing further details in the text, the authors should include convincing images.

      Thank you so much for your comments. We changed the control example in Figure 3J. For sciatic nerve regeneration experiments, the sciatic nerve was exposed at the sciatic notch by a small incision 2 days after the in vivo electroporation. The nerve was then crushed, and the crush site was marked with a 11-0 nylon epineural suture. After surgeries, the wound was closed, and the mice were allowed to recover. Three days after the sciatic nerve crush, the whole sciatic nerves from the perfused animals were dissected out and postfixed overnight in 4% PFA at 4°C. Before whole-mount flattening, it was confirmed that the place of epineural suture matched the injury site, and experiments were included in the analysis only when the crush site was clearly identifiable. Using whole mounted tissue, all identifiable EGFP-labeled axons in the sciatic nerve were manually traced from the crush site to the distal growth cone to measure the length of axon regeneration. (line 159-164).

      (3) The authors then examined the role of the phosphatase MLCP in axon growth during regeneration. The authors first use a known MLCP blocker, phorbol 12,13-dibutyrate (PDBu), to show that is able to increase the levels of p-MLC, with a concomitant increase in the extent of axon regrowth of DRG neurons, both in permissive as well as non-permissive. The authors repeat the experiments using the knockdown of MYPT1, a key component of the MLC-phosphatase, and again can observe a growth-promoting effect (Figure 3).

      The authors further show evidence for the growth-enhancing effect in vivo, in nerve crush experiments. The evidence in vivo deserves more evidence and experimental details (see comment 2). Some key weaknesses of the data were mentioned previously (unclear RNAi controls and duplication of shown images), but in this case, it is also not clear if there is a change only in the extent of growth, or also in the number of axons that are able to regenerate.

      Thank you so much for your comments. We used same control as in vitro experiments (the pCMV– EGFP–N3 plasmid was from Clontech, Inc), and we also changed the control image in Figure 3J. For in vivo axon regeneration experiments, we measured the lengths of all identifiable EGFP-labelled axons in the sciatic nerve from the crush site to the distal axonal ends. The number of EGFP labeled regenerating axons were actually determined by the electroporation rate of EGFP, which is similar, but not identical, in different mice. Thus, our data only can show the differences in axon lengths among different experimental conditions. Such approach has been used in many of our previously published papers (e.g. Saijilafu et al. Nature Communications, 2011, Saijilafu et al. Nature Communications, 2013). (line 152-153).

      (4) In the next set of experiments (presented in Figure 4) authors extend the previous observations in primary cultures from the CNS. For that, they use cortical and hippocampal cultures, and pharmacological and genetic loss-of-function using the above-mentioned strategies. The expected results were obtained in both CNS neurons: inhibition or knockdown of the kinase decreases axon growth, whereas inhibition or knockdown of the phosphatase increases growth. A main weakness in this set is that it is not indicated when (at what day in vitro, DIV) the treatments are performed. This is important to correctly interpret the results, since in the first days in vitro these neurons follow well-characterized stages of development, with characteristic cellular events with relevance to what is being evaluated. Importantly, this would be of value to understand whether the treatments affect axonal specification and/or axonal extension. Although these events are correlated, they imply a different set of molecular events.

      The treatments were started from the initial of cell culture period, and this procedure may affect axon specification as the Reviewer point out. However, we mainly focused on axon length in our experiments, thus, for quantification of axon length, neurons with processes longer than twice the diameter of cell bodies were photographed, and the longest axon of each neuron was measured. We revised the manuscript as suggested by the reviewer (line 143-145).

      The title of this section is misleading: line 241 "MLCK/MLCP activity regulated axon growth in the embryonic CNS"... the title (and the conclusion) implies that the experiments were performed in situ, looking at axons in the developing brain. The most accurate title and conclusion should mention that the evidence was collected in CNS primary cultures derived from embryos.

      We have revised the manuscript as suggested by the reviewer (line 251).

      (5) Performing nerve crush injury in CNS nerves (optic nerve and spinal cord), and the local application of PBDu, the author shows contrasting results (Figure 5). In the ON nerve, they can see axons extending beyond the lesion site due to PBDu. On the contrary, the authors fail to observe so in the corticospinal tract present in the spinal cord. The authors fail to discuss this matter in detail. Also, they accommodate the interpretation of the evidence in light of a process known as axon retraction, and its prevention by MLCP inhibition. Since the whole paper is on axon extension, and it is known that mechanistically axon retraction is not merely the opposite of axon extension, the claim needs far more evidence.

      Thank you so much for your comments. Compared to optic nerve axons, corticospinal tract axons exhibit a reduced intrinsic axon growth capability. Consequently, we observed that PBDu stimulates optic nerve axon regeneration. However, unfortunately, we did not detect any enhancement in corticospinal tract axons beyond the injury site in SCI following the inhibition of myosin light chain phosphatase (MLCP) with PBDu.

      In panel 5F and the supplementary data, the authors mention the occurrence of retraction bulbs, but the images are too small to support the claim, and it is not clear how these numbers were normalized to the number of axons labeled in each condition.

      Thank you so much for your comments. In this study, we used a similar method from Ertürk et al. (2007) to quantify the retraction bulb. Both maximum width of the enlarged distal tip of the axon and the width of its immediately adjacent axon shaft was measured. Then, the ratio of these two widths was then calculated. An axonal tip was considered as a retraction bulb if its tip/shaft ratio exceeded 4. Averages number of retraction bulb were calculated from 3 sections in every mice for each group (n=5). (line 187-191).

      [Ref] Ertürk A, Hellal F, Enes J, and Bradke F (2007). Disorganized microtubules underlie the formation of retraction bulbs and the failure of axonal regeneration. J. Neurosci 27, 9169–9180. [PubMed:17715353].

      (6) The author combines MLCK and MLCP inhibitors with Bleb, trying to verify if both pairs of inhibitors act on the same target/pathway (Figure 6). The rationale is wrong for at least two reasons.<br /> a- Because both lines of evidence point to contrasting actions of NMII on axon growth, one approach could never "rescue" the other.

      If MLCK regulates axon growth through the activation of Myosin, the inhibitory effect of ML-7 (an MLCK inhibitor) on axon growth might be influenced by Bleb, a NMII inhibitor. However, our findings reveal that the combination of Bleb and ML-7 does not alter the rate of axon outgrowth compared to ML-7 alone. This suggests that the roles of ML-7 and Bleb in axon growth are independent. It means MLCK may regulates axon growth independent of NMII activity.

      b. Because the approaches target different steps on NMII activation, one could never "prevent" or rescue the other. For example, for Bleb to provide a phenotype, it should find any p-MLC, because it is only that form of MLC that is capable of inhibiting its ATPase site. In light of this, it is not surprising that Bleb is unable to exert any action in a situation where there is no p-MLC (ML-7, which by inhibiting the kinase drives the levels of p-MLC to zero, Figure 4A). Hence, the results are not possible to validate in the current general interpretation of the authors. (See 'major concern').

      The reported mechanism of blebbistatin is not through competition with the ATP binding site of myosin. Instead, it selectively binds to the ATPase intermediate state associated with ADP and inorganic phosphate, which decelerates the phosphate release. Importantly, blebbistatin does not impede myosin's interaction with actin or the ATP-triggered disassociation of actomyosin. It rather inhibits the myosin head when it forms a product complex with a reduced affinity for actin. This indicates that blebbistatin functions by stabilizing a particular myosin intermediate state that is independent of the phosphorylation status of myosin light chain (MLC).

      [Ref] Kovács M, Tóth J et al. Mechanism of blebbistatin inhibition of myosin II. J Biol Chem. 2004 Aug 20;279(34):35557-63. doi: 10.1074/jbc.M405319200.

      (7) In Figure 7, the authors argue that the scheme of replating and using ML7 before or after replating is evidence for a local cytoskeletal action of the drug. However, an alternative simpler explanation is that the drug acts acutely on its target, and that, as such, does not "survive" the replating procedure. Hence, the conclusion raised by the evidence shown is not supported.

      In our study, we meticulously assessed the neuronal survival rates across various experimental groups. The findings indicate no significant variation in survival rates among the groups. This suggests that the drug treatment exerts no discernible influence on cell viability but primarily modulates axonal elongation."

      Author response image 1.

      (8) In Figure 8, the authors show that the inhibitory treatments on MLCK and MLCP (ML7 and PRBu) alter the morphology of growth cones. However, it is not clear how this is correlated with axon growth. The authors also mention in various parts of the text that a local change in the growth cone is evidence for a local action/activity of the drug or enzyme. However, the local change<->local action is not a logical truth. It can well be that MLCK and MLCP activity trigger molecular events that ultimately have an effect elsewhere, and by looking at "elsewhere" one observes of course a local effect but is not because the direct action of MLCK or MLCP are localized. To prove true localized effects there are numerous efforts that can be made, starting from live imaging, fluorescent sensors, and compartmentalized cultures, just to mention a few.

      About the relationship between growth cone size and its growth rate, the previous published literatures found that a fast-growing axon tended to have small growth cones (Mason C. et al. 1997). A recent study on Aplysia further supports this by noting that growth cones enlarge significantly when axonal elongation halts (Miller and Suter, 2018). Consistent with these findings, our data indicate that inhibiting MLCP with PDBu treatment leads to a reduction in growth cone size, which in turn promotes axon regeneration.

      [Ref] Mason CA, Wang LC. Growth cone form is behavior-specific and, consequently, position-specific along the retinal axon pathway. J Neurosci. 1997; 13:1086–1100. [PubMed: 8994063]

      [Ref] Miller KE, Suter DM. An Integrated Cytoskeletal Model of Neurite Outgrowth. Front Cell Neurosci. 2018 Nov 26;12:447. doi: 10.3389/fncel.2018.00447. eCollection 2018.

      References:

      (1) Eun-Mi Hur 1, In Hong Yang, Deok-Ho Kim, Justin Byun, Saijilafu, Wen-Lin Xu, Philip R Nicovich, Raymond Cheong, Andre Levchenko, Nitish Thakor, Feng-Quan Zhou. 2011. Engineering neuronal growth cones to promote axon regeneration over inhibitory molecules. Proc Natl Acad Sci U S A. 2011 Mar 22;108(12):5057-62. doi: 10.1073/pnas.1011258108.

      (2) Garrido-Casado M, Asensio-Juárez G, Talayero VC, Vicente-Manzanares M. 2024. Engines of change: Nonmuscle myosin II in mechanobiology. Curr Opin Cell Biol. 2024 Apr;87:102344. doi: 10.1016/j.ceb.2024.102344.

      (3) Karen A Newell-Litwa 1, Rick Horwitz 2, Marcelo L Lamers. 2015. Non-muscle myosin II in disease: mechanisms and therapeutic opportunities. Dis Model Mech. 2015 Dec;8(12):1495-515. doi: 10.1242/dmm.022103.

      Reviewer #2 (Public review):

      Summary:

      Saijilafu et al. demonstrate that MLCK/MLCP proteins promote axonal regeneration in both the central nervous system (CNS) and peripheral nervous system (PNS) using primary cultures of adult DRG neurons, hippocampal and cortical neurons, as well as in vivo experiments involving sciatic nerve injury, spinal cord injury, and optic nerve crush. The authors show that axon regrowth is possible across different contexts through genetic and pharmacological manipulation of these proteins. Additionally, they propose that MLCK/MLCP may regulate F-actin reorganization in the growth cone, which is significant as it suggests a novel strategy for promoting axonal regeneration.

      Strengths:

      This manuscript presents a comprehensive array of experimental models, addressing the biological question in a broad manner. Particularly noteworthy is the use of multiple in vivo models, which significantly strengthens the overall validity of the study.

      We thank the Reviewer for taking time to review our manuscript, and we really appreciated the positive comments from the Reviewer.

      Weaknesses:

      The following aspects apply:

      (1) The manuscript initially references prior research by the authors suggesting that NMII inhibition enhances axonal growth and that MLCK activates NMII. However, the study introduces a contradiction by demonstrating that MLCK inhibition (via ML-7 or siMLCK) inhibits axonal growth. This inconsistency is not adequately addressed or discussed in the manuscript.

      Thank you for reviewer's very good comments. As suggested by Reviewer, we discuss more detail it in our revised manuscripts (line 357-368; line373-374).

      (2) While the study proposes that MLCK/MLCP regulates F-actin redistribution in the growth cone, the mechanism is not explored in depth. The only figure showing how pharmacological manipulation affects the growth cone suggests that not only F-actin but also the microtubule cytoskeleton might be affected, indicating that the mechanism may not be specific. A deeper exploration of this relationship in DRG neurons, in addition to cortical neurons, as shown in the study, would be beneficial.

      Thank you for your insightful suggestion. However, our study primarily focuses on actin and myosin dynamics in the context of axonal elongation, as indicated by our direct observations in growing dorsal root ganglia (DRGs). Athamneh et al. (2017) elegantly demonstrated that the bulk movement of microtubules (MTs), rather than their assembly, predominantly drives MT advance during axonal elongation. Consequently, our manuscript concentrates on the actomyosin system, which is central to our findings. While the role of MTs in axonal growth is indeed significant and fascinating, the data we present is predominantly concerned with the actomyosin mechanism.

      [Ref] Athamneh, A. I. M. et al. Neurite elongation is highly correlated with bulk forward translocation of microtubules. Scientific Reports 7, (2017).

      (3) In the sciatic nerve injury experiments, it would be crucial to include additional controls that clearly demonstrate that siMYPT1 treatment increases MLCP in the L4-L5 ganglia. Additionally, although the manuscript mentions quantifying axons expressing EGFP, the Materials and Methods section only discusses siMYPT1 electroporation, which could lead to confusion.

      Thank you for your suggestion. However, due to the unavailability of a suitable commercial MLCP antibody, we were unable to directly detect MLCP expression. Instead, we assessed the phosphorylation level of myosin light chain (MLC) as a proxy to indicate that siMYPT1 transfection effectively downregulates MLCP activity in L4/5 dorsal root ganglia (DRG). This approach was taken to ensure the integrity of our findings despite the limitations in antibody availability.

      About the electroporation method section, we have now included detailed information about the control plasmid used in our experiments to ensure a clear understanding of our experimental setup and to validate our results. A 1 μl solution containing indicated siRNAs together with the plasmid encoding EGFP (pCMV–EGFP–N3) was then microinjected into the L4–L5 DRG….. (line 152-153).

      (4) In some panels, it is difficult to differentiate the somas from the background (Figures 3, 4, 7). In conditions where images with shorter axonal lengths are represented, it is unclear whether this is due to fewer cells or reduced axonal growth (Figures 2, 4, 6).

      In the original submission, there was some loss of image quality while converting the TIFF to PDF. We improved the quality of images in our revised manuscripts.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are a number of typos and language errors that should be thoroughly revised. For example, line 219: "It is well known that the opposite role of MLCK and MLCP to regulate the MLC phosphorylation status". The term "opposite role" is vague. Using "opposite roles" and specifying that they are in regulating MLC phosphorylation status clarifies the relationship between MLCK and MLCP. Also, the original phrase "to regulate" was not correctly integrated into the sentence. Rephrasing it to "in regulating" makes the role of MLCK and MLCP clearer.

      We have revised the manuscript as suggested by the reviewer (line 229).

      In the same line, there is a high number of panels that are not referred to in the text or references for panels that have another letter. Just to mention a few:

      - line 199: "(Figure 1F, G)", → BUT figure 1 contains no G panel.

      We have revised the manuscript as suggested by the reviewer (line 209).

      - line 203: "The results showed that ML-7 administration led to a significant reduction in MLC phosphorylation levels (Figure 2A, B) and impaired axonal growth in sensory neurons (Figure 2C, D). → BUT panel C is related to A and B, and only D and E show impaired axonal growth.

      We have revised the manuscript as suggested by the reviewer (line 214; line 215; line 217; line 219 ).

      Reviewer #2 (Recommendations for the authors):

      (1) Improving the quality of the images would significantly strengthen the results presented.

      In the original submission, there was some loss of image quality while converting the TIFF to PDF. We improved the quality of images in our revised manuscripts.

      (2) The representative images of controls do not always show the same number of cells or axonal growth (e.g., Figure 4).

      We have changed some images as suggested by the reviewer.

      (3) The text has citation errors when referring to the figure labels.

      Upon thorough review, we have carefully examined our manuscript and have made the necessary corrections to address the identified errors. We appreciate the opportunity to enhance the quality of our work and believe that these revisions have significantly improved the clarity of our manuscript.

      (4) What happens to MLCK levels when MLCP activity is inhibited in the optic nerve?

      Upon analyzing our experimental data, we observed no significant alterations in the protein levels of MLCK when the activity of MLCP was inhibited. This finding suggests that the regulatory mechanisms governing MLCK expression may not be directly influenced by short-term MLCP inhibition. It is plausible that the duration of the inhibition period was insufficient to elicit a detectable change in MLCK expression levels.

      (5) The text in line 266: "In contrast, local PBS administration at the injury site or intravitreal PDBu injection induced little axon regeneration beyond the injury site (Figure 5 A-C)." However, this is not reflected in the figure.

      In our revised manuscript, we have provided a more precise description of our findings: In contrast, local PBS administration at the injury site or intravitreal PDBu injection did not significantly enhance axon regeneration beyond the injury site (Figure 5 A-C). This observation suggests that the only treatment employed in the injury site (the inhibition of MLCP activity within the growth cone) effective promote axonal growth. (line 276-279).

      (6) Line 287: The phrase "Consistent with our previous study" requires a citation to support it.

      We added the reference paper; Consistent with our previous study 1, the inhibition of myosin II activity with 25 μM blebbistatin markedly promoted axonal growth (Figure 6A, B). (line 298)

      (7) Line 333: The paper cited by Yu P et al. (2012) does not mention MLCK or p-MLC, so it appears to be misquoted.

      Thank you for comments. We rechecked this cited paper and confirmed that the author provided the western data C in the supplementary figure 1, it showed that Bleb did not alter the phosphorylation status of MLC.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Dong et al here have studied the impact of the small Ras-like GTPase Rab10 on the exocytosis of dense core vesicles (DVC), which are important mediators of neuropeptide signaling in the brain. They use optical imaging to show that lentiviral depletion of Rab10 in mouse hippocampal neurons in culture independent of the established defects in neurite outgrowth hamper DCV exocytosis. They further demonstrate that such defects are paralleled by changes in ER morphology and defective ER-based calcium buffering as well as reduced ribosomal protein expression in Rab10-depleted neurons. Re-expression of Rab10 or supplementation of exogenous L-leucine to restore defective neuronal protein synthesis rescues impaired DCV secretion. Based on these results they propose that Rab10 regulates DCV release by maintaining ER calcium homeostasis and neuronal protein synthesis.

      Strengths:

      This work provides interesting and potentially important new insights into the connection between ER function and the regulated secretion of neuropeptides via DCVs. The authors combine advanced optical imaging with light and electron microscopy, biochemistry, and proteomics approaches to thoroughly assess the effects of Rab10 knockdown at the cellular level in primary neurons. The proteomic dataset provided may be valuable in facilitating future studies regarding Rab10 function. This work will thus be of interest to neuroscientists and cell biologists.

      We appreciate the positive evaluation of our manuscript.

      Weaknesses:

      While the main conclusions of this study are comparably well supported by the data, I see three major weaknesses:

      (1) For some of the data the statistical basis for analysis remains unclear. I.e. is the statistical assessment based on N= number of experiments or n = number of synapses, images, fields of view etc.? As the latter cannot be considered independent biological replicates, they should not form the basis of statistical testing.

      This is an important point and we agree that multiple samples from the same biological replicate are not independent observations. We reanalyzed all nested data using a linear mixed model and indicated this in the Methods section and the relevant figure legends (Brunner et al., 2022). In brief, biological replicates (individual neuronal cultures) were used as a linear predictor. Outliers were identified and excluded using the ROUT method in GraphPad. A fixed linear regression model was then fitted to the data using the lm() function in R. A one-way anova (analysis of variance) was used to assess whether including the experimental group as a second linear predictor (formula = y ~ Group + Culture) statistically improved the fit of a model without group information (formula = y ~ 1 + Culture). Post-hoc analysis was performed using the emmeans() function with Tukey’s adjustment when more than two experimental groups were present. Importantly, our conclusions remain unchanged.

      (2) As it stands the paper reports on three partially independent phenotypic observations, the causal interrelationship of which remains unclear. Based on prior studies (e.g. Mercan et al 2013 Mol Cell Biol; Graves et al JBC 1997) it is conceivable that defective ER-based calcium signaling and the observed reduction in protein synthesis are causally related. For example, ER calcium release is known to promote pS6K1 phosphorylation, a major upstream regulator of protein synthesis and ribosome biogenesis. Conversely, L-leucine supplementation is known to trigger calcium release from ER stores via IP3Rs. Given the reported impact of Rab10 on axonal transport of autophagosomes and, possibly, lysosomes via JIP3/4 or other mediators (see e.g. Cason and Holzbaur JCB 2023) and the fact that mTORC1, the alleged target of leucine supplementation, is located on lysosomes, which in turn form membrane contacts with the ER, it seems worth analyzing whether the various phenotypes observed are linked at the level of mTORC1 signaling.

      This is great suggestion that could indeed further clarify the potential interplay between ER-based Ca2+ signaling and protein synthesis. To address this, we assessed the phosphorylation level of pS6K1 in control and Rab10 knockdown (KD) neurons with or without leucine treatment. These data are included in the new Figure 8—figure supplement 1 in the revised manuscript. Our results indicate that pS6K1 phosphorylation was not upregulated in Rab10 KD neurons, suggesting that the level of mTORC1 signaling is not different between wild-type or KD neurons. Furthermore, leucine treatment increased the pS6K1 phosphorylation level, as expected, but this effect was similar in both groups. Hence, we conclude that differences in mTORC1 signaling induced by Rab10 loss is not a major factor in the observed impairment in protein synthesis.

      Author response image 1.

      Rab10 depletion does not upregulate mTORC1 pathway. (A)Typical immunoblot showing pS6K1 levels in each condition. (B) Quantification of relative pS6K1 levels in each condition. All Data are plotted as mean±s.e.m. (C) Control, Control + Leu: N = 2, n = 2, Rab10 KD, Rab10 KD + Leu: N = 2, n = 4.

      (3) The claimed lack of effect of Rab10 depletion on SV exocytosis is solely based on very strong train stimulation with 200 Aps, a condition not very well suited to analyze defects in SV fusion. The conclusion that Rab10 loss does not impact SV fusion thus seems premature.

      We agree that 200 APs stimulation might be too strong to detect specific effects on evoked synaptic vesicle release, although this stimulation pattern is an established pattern in hundreds of studies (Emperador-Melero et al., 2018; Granseth et al., 2006; Ivanova et al., 2021; Kwon and Chapman, 2011; Reshetniak et al., 2020). We have toned down our conclusions and clarified in the revised manuscript that Rab10 is dispensable for SV exocytosis evoked by intense stimulations. The corresponding statements in the text have been modified accordingly (p. 5, l. 98, 124) and in figure legend (p. 17, 490).

      Reviewer #2 (Public Review):

      Summary:<br /> In this paper, the authors assess the function of Rab10 in dense core vesicle (DCV) exocytosis using RNAi and cultured neurons. The author provides evidence that their knockdown (KD) is effective and provides evidence that DCV is compromised. They also perform proteomic analysis to identify potential pathways that are affected upon KD of Rab10 that may be involved in DCV release. Upon focusing on ER morphology and protein synthesis, the authors conclude that defects in protein synthesis and ER Ca2+ homeostasis contributes to the DVC release defect upon Rab10 KD. The authors claim that Rab10 is not involved in synaptic vesicle (SV) release and membrane homeostasis in mature neurons.

      Strengths:

      The data related to Rab10's role in DCV release seems to be strong and carried out with rigor. While the paper lacks in vivo evidence that this gene is indeed involved in DCV in a living mammalian organism, I feel the cellular studies have value. The identification of ER defect in Rab10 manipulation is not truly novel but it is a good conformation of studies performed in other systems. The finding that DCV release defect and protein synthesis defect seen upon Rab10 KD can be significantly suppressed by Leucine supplementation is also a strength of this work.

      We appreciate the positive evaluation of our manuscript.

      Weaknesses:

      The data showing Rab10 is NOT involved in SV exocytosis seems a bit weak to me. Since the proteomic analysis revealed so many proteins that are involved in SV exo/encodytosis to be affected upon Rab10, it is a bit strange that they didn't see an obvious defect. Perhaps this could have been because of the protocol that the authors used to trigger SV release (I am not an E-phys expert but perhaps this could have been a 'sledge-hammer' manipulation that may mask any subtle defects)? Perhaps the authors can claim that DCV is more sensitive to Rab10 KD than SV, but I am not sure whether the authors should make a strong claim about Rab10 not being important for SV exocytosis.

      We agree that 200 APs stimulation might be too strong to see specific effects on evoked synaptic vesicle release, although this stimulation pattern is an established pattern in hundreds of studies. We have toned down our conclusions and clarified in the revised manuscript that Rab10 is dispensable for SV exocytosis evoked by intense stimulations. The corresponding statements in the text have been modified accordingly (p. 5, l. 98, 124) and in figure legend (p. 17, 490).

      Also, the authors mention "Rab10 does not regulate membrane homeostasis in mature neurons" but I feel this is an overstatement. Since the authors only performed KD experiments, not knock-out (KO) experiments, I believe they should not make any conclusion about it not being required, especially since there is some level of Rab10 present in their cells. If they want to make these claims, I believe the authors will need to perform conditional KO experiments, which are not performed in this study.

      This is a valid point. We have changed the statement to “membrane homeostasis in mature neurons was unaffected by Rab10 knockdown” (p. 13, l.376-377).

      Finally, the authors show that protein synthesis and ER Ca2+ defects seem to contribute to the defect but they do not discuss the relationship between the two defects. If the authors treat the Rab10 KD cells with both ionomycin and Leucine, do they get a full rescue? Or is one defect upstream of the other (e.g. can they see rescue of ER morphology upon Leucine treatment)? While this is not critical for the conclusions of the paper, several additional experiments could be performed to clarify their model, especially considering there is no clear model that explains how Rab10, protein synthesis, ER homeostasis, and Ca2+ are related to DCV (but not SV) exocytosis.

      This is an important point and a great suggestion. We have now tested the rescue effects of leucine treatment on ER morphology, as suggested. These data are included in the new Figure 8—figure supplement 2 in the revised manuscript. Our results indicate that the same dose of leucine that rescues DCV fusion and protein translation failed to rescue ER morphology. Hence, the defects in ER morphology appear to be independent of the impaired protein translation.

      Author response image 2.

      Leucine supplementation does not rescue ER morphological deficiency in Rab10 KD neurons. (A) Typical examples showing the KDEL signals in each condition. (B) Quantification of RTN4 intensity in MAP2-positive dendrites. (C) The ratio of neuritic to somatic RTN4 intensity (N/S). All Data are plotted as mean±s.e.m. (B, C) Control: N = 3, n = 10; Rab10 KD: N = 3, n = 11; Rab10 KD + Leu: N = 3; n = 11. A one-way ANOVA tested the significance of adding experimental group as a predictor. **** = p<0.0001, ns = not significant.

      Reviewer #3 (Public Review):

      In the submitted manuscript, Dong and colleagues set out to dissect the role of the Rab10 small GTPase on the intracellular trafficking and exocytosis of dense core vesicles (DCVs). While the authors have already shown that Rab3 plays a central role in the exocytosis of DVC in mammalian neurons, the roles of several other Rab-members have been identified genetically, but their precise mechanism of action in mammalian neurons remains unclear. In this study, the authors use a carefully designed and thoroughly executed series of experiments, including live-cell imaging, functional calcium-imaging, proteomics, and electron microscopy, to identify that DCV secretion upon Rab10 depletion in adult neurons is primarily a result of dysregulated protein synthesis and, to a lesser extent, disrupted intracellular calcium buffering. Given that the full deletion of Rab10 has a deleterious effect on neurons and that Rab10 has a major role in axonal development, the authors cautiously employed the knock-down strategy from 7 DIV, to focus on the functional impact of Rab10 in mature neurons. The experiments in this study were meticulously conducted, incorporating essential controls and thoughtful considerations, ensuring rigorous and comprehensive results.

      We are grateful for the positive evaluation of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The work by Dong et al provides interesting and potentially important new insights into the connection between ER function and the regulated secretion of neuropeptides via DCVs. I suggest that the authors address the following points experimentally to increase the impact of this potentially important study.

      Major points:

      (1) As alluded to above, for some of the data the statistical basis for analysis remains unclear (examples are Figures 1C-F, J,K; Figure 2 1B-D,I-K; Figure 2 - Supplement 1D-F; Figure 2 - Supplement 2J,K, etc). I.e. is the statistical assessment based on N = number of experiments or n = number of synapses, images, fields of view etc.? As the latter cannot be considered independent biological replicates, they should not form the basis of statistical testing. The Ms misses also misses a dedicated paragraph on statistics in the methods section.

      See reply to reviewer 1 above. We fully agree and solved this point.

      (2) A main weakness of the paper is the missing connection between neuronal protein synthesis, and the observed structural and signaling defects at the level of the ER. I suggest that the authors analyze mTORC1 signaling in Rab10 depleted neurons and under rescue conditions (+Leu or re-expression of Rab10) as ribosome biogenesis is a major downstream target of mTORC1 and mTORC1 activity is related to lysosome position, which may be affected upon rab10 loss -either directly or via effects on the ER that forms tight contacts with lysosomes.

      See reply to reviewer 1 above. We agreed and followed up experimentally.

      (3) Related to the above: Does overexpression of SERCA2 restore normal DCV exocytosis in Rab10-depleted neurons? This would help to distinguish whether calcium storage and release at the level of the ER indeed contribute to the exocytosis defect.

      This is an important point and a great suggestion. We have now tested the rescue effects of overexpression of SERCA2 on DCV fusion. These data are included in the new Figure 8—figure supplement 3 in the revised manuscript. SERCA2 OE failed to rescue the DCV fusion defects in Rab10 KD neurons.

      Author response image 3.

      Overexpression of SERCA2 does not rescue DCV fusion deficits in Rab10 KD neurons. (A) Typical examples showing the SERCA2 signals in each condition. (B) Cumulative plot of DCV fusion events per cell. (C) Summary graph of DCV fusion events per cell. (A) Total number of DCVs (total pool) per neuron, measured as the number of NPY-pHluorin puncta upon NH4Cl perfusion. (B) Fraction of NPY-pHluorin-labeled DCVs fusing during stimulation. All Data are plotted as mean±s.e.m. (C-E) Control: N = 2, n = 10; Rab10 KD: N = 2, n = 13; SERCA2 OE: N = 2; n = 15. A one-way ANOVA tested the significance of adding experimental group as a predictor. *** = p<0.001, ** = p<0.01, ns = not significant.

      (4) The claimed lack of effect of Rab10 depletion on SV exocytosis is solely based on very strong train stimulation with 200 Aps, a condition not very well suited to analyze defects in SV fusion. The conclusion that Rab10 loss does not impact SV fusion thus seems premature. The authors should conduct additional experiments under conditions of single or few Aps (e.g. 4 or 10 Aps) to really assess whether or not Rab10 depletion alters SV exocytosis at the level of pHluorin analysis in cultured neurons.

      See reply to reviewer 2 above. Agreed to and made textual adjustments to solve this

      (5) Related to the above: I am puzzled by the data shown in Figure 1H-J: From the pHluorin traces shown I would estimate a tau value of about 20-30 s (e.g. decay to 1/e = 37% of the peak value). The bar graph in Figure 1K claims 3-4 s, clearly clashing with the data shown. Were these experiments conducted at RT (where expected tau values are in the range of 30s) or at 37{degree sign}C (one would expect taus of around 10 s in this case for Syp-pH)? I ask the authors to carefully check and possibly re-analyze their datasets.

      This is indeed a mistake. We thank the reviewer for flagging this miscalculation. Our original Matlab script used for calculating the tau value contained an error and the datasets were normalized twice by mistake. We now reanalyzed the data and the corresponding figures and texts have been updated. Our conclusion that Rab10 KD does not affect SV endocytosis remains unchanged since the difference in tau between the control (28.5 s) and Rab10 KD (32.8 s) suffered from the same systematic error and were/are not significantly different.

      (6) How many times was the proteomics experiment shown in Figure 3 conducted? I noticed that the data in panel H missed statistical analysis and error bars. Given the typical variation in these experiments, I suggest to only include data for proteins identified in at least 3 out of 4 experimental replicates.

      We agree that this information has not been clear. We have now explained replication in the Methods section (p. 42, l. 879-885). In brief, the proteomics experiment presented in Fig 3 was conducted with two independent cultures (‘biological replicates’), hence, formally only two independent observations. For each biological replicate, we performed four technical replicates. For our analysis, we only included peptides that were consistently detected across all samples (not only three as this reviewer suggests). Proteins in Panel H are ER-related proteins that are significantly different from control neurons with an adjusted FDR ≤ 0.01 and Log2 fold change ≥ 0.56. The primary purpose of our proteomics experiments was to generate hypotheses and guide subsequent experiments and the main findings were corroborated by other experiments presented in the manuscript.

      Minor:

      (7) Figure 2 - supplement 3 and Figure 4 - supplement 3 are only mentioned in the discussion. The authors should consider referring to these data in the results section.

      This is a valid point. We have now added a new statement “Moreover, only 10% of DCVs co-transport with Rab10” in the Results (p. 6-7, l. 162-164).

      (8) Where is the pHluorin data shown in Figure 1 bleach-corrected? If so, this should be stated somewhere in the Ms. Moreover, the timing of the NH4Cl pulse should be indicated in the scheme in panel I.

      We thank the reviewer for pointing these omissions out. We have now included information about the timing of NH4Cl pulse in panel I. We did not do bleach-correction for the pHluorin data shown in Figure 1. It has been shown that pHluorin is very stable with a bleaching rate in the alkaline state of 0.06% per second and 0.0024% per second in the quenched state (Balaji and Ryan, 2007). Indeed, we did not observe obvious photobleaching in the first 30s during our imaging as indicated by the average trace of pHluorin intensity in panel I.

      (9) Page 3/ lines 59-60: "...strongest inhibition of neuropeptide accumulation...". What is probably meant is "...strongest inhibition of neuropeptide release".

      We agree this statement is unclear. Sasidharan et al used a coelomocyte uptake assay as an indirect readout for DCV release. The ‘strongest inhibition of neuropeptide accumulation’ in coelomocytes in Rab10 mutant indicates DCV fusion deficits. We have now replaced the text with “Rab10 deficiency produces the strongest inhibition of neuropeptide release in C. elegans” to make it more clear.

      Reviewer #3 (Recommendations For The Authors):

      I strongly recommend the publishing of this study as a VOR with minor comments directed to the authors.

      (1) In Figure 4, the authors should include examples of tubular ER at the synapse, especially as this is an interesting point discussed in ln 226-229. Are there noticeable changes in the ER-mitochondria contacts at the synaptic boutons?

      We agree that examples of tubular ER at the synapse would improve the manuscript. We have now replaced the Figure 4A with such examples. We found it challenging to quantify ER-mitochondria contacts based on the electron microscopy (EM) images we currently have. The ER-mitochondria contact sites are quite rare in the cross-sections of our samples, making it difficult to perform a reliable quantitative analysis.

      (2) The limited impairment of calcium-ion homeostasis in Rab10 KD neurons is very interesting. Would the overexpression of Rab10T23N mimic the effect of a KD scenario? Is there a separation of function for Rab10 in calcium homeostasis vs. the regulation of protein synthesis?

      This is an interesting possibility. We tested this and expressed Rab10T23N in a new series of experiments. These data are presented as a new Figure 5 in the revised manuscript (p. 29). We observed that Ca2+ refilling after caffeine treatment was delayed to a similar extent in Rab10T23N-expressing and Rab10 KD neurons. While impaired Ca2+ homeostasis may affect protein synthesis through ER stress or mTORC1 activation, our findings indicate otherwise in Rab10 KD neurons. First, ATF4 levels, a marker of ER stress, were unaffected in Rab10 KD neurons. This indicates that any ER stress present is minimal or insufficient to significantly impact protein synthesis through this pathway. Second, we did not observe significant changes in mTORC1 activation in Rab10 KD neurons as indicated by a normal pS6K1 phosphorylation (see above). Based on these observations, we conclude that Rab10's roles in calcium homeostasis and protein synthesis are most likely separate.

      (3) The authors indicate that the internal release of calcium ions from the ER has no effect on DCV trafficking and fusion without showing the data. It is important to include this data as the major impact of the study is the dissecting of the calcium effects in mammalian neurons from the previous studies in invertebrates.

      We agree this is an important aspect in our reasoning. We are submitting the related manuscript on internal calcium stores to BioRVix. The link will be added to the consolidated version of our manuscript

      (4) The distinction between Rab3 and Rab10 co-trafficking on DCVs should be reported in the Results (currently, Figure 2 - supplement 3 is only mentioned in the Discussion) as it helps to understand the effects on DCV fusion.

      We agree. We now added a new statement “Moreover, only 10% of DCVs co-transport with Rab10” in the Results (p. 6, l. 162-163).

      Reference:

      Balaji, J., Ryan, T.A., 2007. Single-vesicle imaging reveals that synaptic vesicle exocytosis and endocytosis are coupled by a single stochastic mode. Proceedings of the National Academy of Sciences 104, 20576–20581. https://doi.org/10.1073/pnas.0707574105

      Brunner, J.W., Lammertse, H.C.A., Berkel, A.A. van, Koopmans, F., Li, K.W., Smit, A.B., Toonen, R.F., Verhage, M., Sluis, S. van der, 2022. Power and optimal study design in iPSC-based brain disease modelling. Molecular Psychiatry 28, 1545. https://doi.org/10.1038/s41380-022-01866-3

      Emperador-Melero, J., Huson, V., van Weering, J., Bollmann, C., Fischer von Mollard, G., Toonen, R.F., Verhage, M., 2018. Vti1a/b regulate synaptic vesicle and dense core vesicle secretion via protein sorting at the Golgi. Nat Commun 9, 3421. https://doi.org/10.1038/s41467-018-05699-z

      Granseth, B., Odermatt, B., Royle, S.J., Lagnado, L., 2006. Clathrin-Mediated Endocytosis Is the Dominant Mechanism of Vesicle Retrieval at Hippocampal Synapses. Neuron 51, 773–786. https://doi.org/10.1016/j.neuron.2006.08.029

      Ivanova, D., Dobson, K.L., Gajbhiye, A., Davenport, E.C., Hacker, D., Ultanir, S.K., Trost, M., Cousin, M.A., 2021. Control of synaptic vesicle release probability via VAMP4 targeting to endolysosomes. Science Advances 7, eabf3873. https://doi.org/10.1126/sciadv.abf3873

      Kwon, S.E., Chapman, E.R., 2011. Synaptophysin Regulates the Kinetics of Synaptic Vesicle Endocytosis in Central Neurons. Neuron 70, 847–854. https://doi.org/10.1016/j.neuron.2011.04.001

      Reshetniak, S., Fernández-Busnadiego, R., Müller, M., Rizzoli, S.O., Tetzlaff, C., 2020. Quantitative Synaptic Biology: A Perspective on Techniques, Numbers and Expectations. International Journal of Molecular Sciences 21, 7298. https://doi.org/10.3390/ijms21197298

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work analyzes how specialized cells in the auditory cells, known as the octopus cells, can detect coincidences in their inputs at the submillisecond time scale. While previous work indicated that these cells receive no inhibitory inputs, the present study unambiguously demonstrates that these cells receive inhibitory glycinergic inputs. The physiologic impact of these inputs needs to be studied further. It remains incomplete at present but could be made solid by addressing caveats related to similar sizes of excitatory postsynaptic potentials and spikes in the octopus neurons.

      We apologize for not explicitly describing our experimental methods and analyses procedures that ensure the discrimination between action potentials and EPSPs. This has been addressed in responses to reviewer comments and amended in the manuscript.

      Reviewer #1 (Public Review):

      Kreeger and colleagues have explored the balance of excitation and inhibition in the cochlear nucleus octopus cells of mice using morphological, electrophysiological, and computational methods. On the surface, the conclusion, that synaptic inhibition is present, does not seem like a leap. However, the octopus cells have been in the past portrayed as devoid of inhibition. This view was supported by the seeming lack of glycinergic fibers in the octopus cell area and the lack of apparent IPSPs. Here, Kreeger et al. used beautiful immunohistochemical and mouse genetic methods to quantify the inhibitory and excitatory boutons over the complete surface of individual octopus cells and further analyzed the proportions of the different subtypes of spiral ganglion cell inputs. I think the analysis stands as one of the most complete descriptions of any neuron, leaving little doubt about the presence of glycinergic boutons.

      Kreeger et al then examined inhibition physiologically, but here I felt that the study was incomplete. Specifically, no attempt was made to assess the actual, biological values of synaptic conductance for AMPAR and GlyR. Thus, we don't really know how potent the GlyR could be in mediating inhibition. Here are some numbered comments:

      (1) "EPSPs" were evoked either optogenetically or with electrical stimulation. The resulting depolarizations are interpreted to be EPSPs. However previous studies from Oertel show that octopus cells have tiny spikes, and distinguishing them from EPSPs is tricky. No mention is made here about how or whether that was done. Thus, the analysis of EPSP amplitude is ambiguous.

      We agree that large EPSPs can be difficult to distinguish from an octopus cell’s short spikes during experiments. During analysis, we distinguished spikes from EPSPs by generating phase plots, which allow us to visualize the first derivative of the voltage trace on the y-axis and the value of the voltage on the x-axis at each moment in time. In the example shown below, four depolarizing events were electrically evoked in an octopus cell (panel A). The largest of these events (shown in orange in panels B-D) has an amplitude of ~9mV and could be a small spike. The first derivative of the voltage (panel C) reveals a bi-phasic response in the larger orange trace, where during the rising phase (mV/ms > 0) of the EPSP there is a second, sharper rising phase for the spike. Like more traditionally sized action potentials, phase plots for octopus cell spikes also reveal a sharp change in the rate of voltage change over time (Author response image 1 panel D, ✱) after the rising action of the EPSP begins to slow. EPSPs (shown in blue in panels B-D) lack the deflection in the phase plot. Not all cases were as unambiguous as this example. Therefore, our analysis only included subthreshold stimulation that unambiguously evoked EPSPs, not spikes. A brief description of this analysis has been added to the methods text (lines 625-627) and we have noted in the results section that both ChR2-evoked and electrically-evoked stimulation can produce small action potentials, which were excluded from analysis (lines 156-158).

      Author response image 1.

      (2) For this and later analysis, a voltage clamp of synaptic inputs would have been a simple alternative to avoid contaminating spikes or shunts by background or voltage-gated conductances. Yet only the current clamp was employed. I can understand that the authors might feel that the voltage clamp is 'flawed' because of the failure to clamp dendrites. But that may have been a good price to pay in this case. The authors should have at least justified their choice of method and detailed its caveats.

      We agree that data collected using voltage-clamp would have eliminated the confound of short action potentials and avoided the influence of voltage-gated conductances. The large-diameter, and comparatively simple dendritic trees of octopus cells make them good morphological candidates for reliable voltage clamp. However, as suggested, we were concerned that the abundance of channels open at the neuron’s resting potential would make it difficult to sufficiently clamp dendrites. Ultimately, given the low input resistances of octopus cells and the fast kinetics of excitatory inputs, we determined that bad voltage clamp conditions were likely to result in unclamped synaptic events with unpredicted distortions in kinetics and attenuation (To et al. 2022; PMID: 34480986; DOI: 10.1016/j.neuroscience.2021.08.024). We therefore chose to focus our efforts on current-clamp.

      Beyond the limits of both current-clamp and voltage-clamp, we chose to leave all conductances that influence EPSP dendritic propagation intact because our model demonstrates that active Kv and leak conductances shape and attenuate synaptic inputs as they travel through the dendritic tree (Supp. Fig. 4F-G). The addition of voltage-clamp recordings would not impact the conclusions we make about EPSP summation at the soma. Future studies will need to focus on a dendrite-centric view of local excitatory and inhibitory summation. For dendrite-centric experiments, dendritic voltage-clamp recordings are well suited to answer that set of questions.

      (3) The modeling raised several concerns. First, there is little presentation of assumptions, and of course, a model is entirely about its assumptions. For example, what excitatory conductance amplitudes were used? The same for inhibitory conductance? How were these values arrived at? The authors note that EPSGs and IPSGs had peaks at 0.3 and 3 ms. On what basis were these numbers obtained? The model's conclusions entirely depend on these values, and no measurements were made here that could have provided them. Parenthetical reference is made to Figure S5 where a range of values are tested, but with little explanation or justification.

      We apologize for not providing this information. We used our octopus neuron model to fit both EPSP and IPSP parameters to match experimental data. We have expanded the methods to include final values for the conductances (lines 649-651), which were adjusted to match experimental values seen in current-clamp recordings. We have also expanded the results section to describe each of the parameters we tuned (lines 203-222). An example of these adjustments is illustrated in Fig. 4F where the magnitude of inhibitory potentials at different conductances (100nS and 1nS) was compared to experimental data over a range of octopus cell input resistance conditions. Kinetic parameters were determined by aligning modeled PSPs to the rise times and full width at half maximum (FWHM) measurements from experiments under control and Kv block conditions. The experimental data for EPSPs and IPSPs that was used to fit the model is shown in Author response image 2 below.

      Author response image 2.

      (4) In experiments that combined E and I stimulation, what exactly were time courses of the conductance changes, and how 'synchronous' were they, given the different methods to evoke them? (had the authors done voltage clamp they would know the answers).

      We chose to focus data collection on voltage changes at the soma under physiological conditions to better understand how excitation and inhibition integrate at the somatic compartment. Our conclusions in the combined E and I stimulation experiments require the resting membrane properties of octopus cells to be intact to make physiologically-relevant conclusions. Our current-clamp data includes the critical impact of leak, Kv, and HCN conductances on this computation. Reliable voltage-clamp would necessitate the removal of the Kv and HCN conductances that shape PSP magnitude, shape, and speed. Because it was not necessary to measure the conductances and kinetics of specific channels, we chose to use current-clamp.

      Evoked IPSPs and EPSPs had cell-to-cell variability in their latencies to onset. Somatically-recorded optically-evoked inhibition under pharmacological conditions that changed cable properties had onset latencies between 2.5 and 4.3ms; electrically-evoked excitation under control conditions had latencies between 0.8 and 1.4ms. To overcome cell-to-cell timing variabilities, we presented a shuffled set of stimulation pairings that had a 3ms range of timings with 200µs intervals. As the evoked excitation and inhibition become more ‘synchronous’, the impact on EPSP magnitude and timing is greatest. Data presented in this paper was for the stimulation pairings that evokes a maximal shift in EPSP timing. On average, this occurred when the optical stimulation began ~1.2ms before electrical stimulation. Stimulation pairing times ranged between a 0ms offset and a 1.8ms offset at the extremes. An example of the shuffled stimulation pairings is shown in Author response image 3 below, and we have included information about the shuffled stimulus in the methods (lines 627-630)

      Author response image 3.

      (5) Figure 4G is confusing to me. Its point, according to the text, is to show that changes in membrane properties induced by a block of Kv and HCN channels would not be expected to alter the amplitudes of EPSCs and IPSCs across the dendritic expanse. Now we are talking about currents (not shunting effects), and the presumption is that the blockers would alter the resting potential and thus the driving force for the currents. But what was the measured membrane potential change in the blockers? Surely that was documented. To me, the bigger concern (stated in the text) is whether the blockers altered exocytosis, and thus the increase in IPSP amplitude in blockers is due BOTH to loss of shunting and increase in presynaptic spike width. Added to this is that 4AP will reduce the spike threshold, thus allowing more ChR2-expressing axons to reach the threshold. Figure 4G does not address this point.

      These are valuable points that motivated us to improve the clarity of this figure and the corresponding text. We discussed two separate points in this paragraph and were not clear. Our intention with Figure 4G was to address concerns that using pharmacological blockers changes driving forces and may confound the measured change in magnitude of postsynaptic potentials. Membrane potentials hyperpolarized by approximately 8-10 mV after application of blockers. We corrected for this effect by adding a holding current to depolarize the neuron to its baseline resting potential. Text in the results (lines 187-190) and figure legends have been changed to clarify these points.

      We also removed any discussion of presynaptic effects from this portion of the text because our description was incomplete and we did not directly collect data related to these claims. We originally wrote, “While blocking Kv and HCN allowed us to reveal IPSPs at the soma, 4-AP increases the duration of the already unphysiological ChR2-evoked presynaptic action potential (Jackman et al., 2014; DOI: 10.1523/jneurosci.4694-13.2014), resulting in altered release probabilities and synaptic properties, amongst other caveats (Mathie et al., 1998; DOI: 10.1016/S0306-3623(97)00034-7)”. Ultimately, effects on exocytosis, presynaptic excitability, or release probability are only relevant for the experiments presented in Figure 4. Figure 4 serves as evidence that synaptic release of glycine elicits strychnine-sensitive inhibitory postsynaptic potentials in octopus cells. Concerns of presynaptic effects do not carry over to the data presented in Figure 5, as Kv and HCN were not blocked in these experiments. Therefore, we have removed this portion of the text.

      (6) Figure 5F is striking as the key piece of biological data that shows that inhibition does reduce the amplitude of "EPSPs" in octopus cells. Given the other uncertainties mentioned, I wondered if it makes sense as an example of shunting inhibition. Specifically, what are the relative synaptic conductances, and would you predict a 25% reduction given the actual (not modeled) values?

      We agree that both shunting and hyperpolarizing inhibition could play a role in the measured EPSP changes. Because we focused data collection on voltage changes at the soma under physiological conditions, we cannot calculate the relative synaptic conductances. Together, our experimental current-clamp results paired with estimates from the model provide compelling evidence for the change we observe in EPSPs. Regardless, the relative weights of the synaptic conductances is a very interesting question, but this information is not necessary to answer the questions posed in this study, namely the impact of dendritic inhibition on the arrival of EPSPs in the soma.

      (7) Some of the supplemental figures, like 4 and 5, are hardly mentioned. Few will glean anything from them unless the authors direct attention to them and explain them better. In general, the readers would benefit from more complete explanations of what was done.

      We apologize for not fully discussing these figures in the results text. We have fully expanded the results section to detail the experiments and results presented in the supplement (lines 203-238).

      Reviewer #2 (Public Review):

      Summary:

      Kreeger et.al provided mechanistic evidence for flexible coincidence detection of auditory nerve synaptic inputs by octopus cells in the mouse cochlear nucleus. The octopus cells are specialized neurons that can fire repetitively at very high rates (> 800 Hz in vivo), yield responses dominated by the onset of sound for simple stimuli, and integrate auditory nerve inputs over a wide frequency span. Previously, it was thought that octopus cells received little inhibitory input, and their integration of auditory input depended principally on temporally precise coincidence detection of excitatory auditory nerve inputs, coupled with a low input resistance established by high levels of expression of certain potassium channels and hyperpolarization-activated channels.

      In this study, the authors used a combination of numerous genetic mouse models to characterize synaptic inputs and enable optogenetic stimulation of subsets of afferents, fluorescent microscopy, detailed reconstructions of the location of inhibitory synapses on the soma and dendrites of octopus cells, and computational modeling, to explore the importance of inhibitory inputs to the cells. They determined through assessment of excitatory and inhibitory synaptic densities that spiral ganglion neuron synapses are densest on the soma and proximal dendrite, while glycinergic inhibitory synaptic density is greater on the dendrites compared to the soma of octopus cells. Using different genetic lines, the authors further elucidated that the majority of excitatory synapses on the octopus cells are from type 1a spiral ganglion neurons, which have low response thresholds and high rates of spontaneous activity. In the second half of the paper, the authors employed electrophysiology to uncover the physiological response of octopus cells to excitatory and inhibitory inputs. Using a combination of pharmacological blockers in vitro cellular and computational modeling, the authors conclude that glycine in fact evokes IPSPs in octopus cells; these IPSPs are largely shunted by the high membrane conductance of the cells under normal conditions and thus were not clearly evident in prior studies. Pharmacological experiments point towards a specific glycine receptor subunit composition. Lastly, Kreeger et. al demonstrated with in vitro recordings and computational modeling that octopus cell inhibition modulates the amplitude and timing of dendritic spiral ganglion inputs to octopus cells, allowing for flexible coincidence detection.

      Strengths:

      The work combines a number of approaches and complementary observations to characterize the spatial patterns of excitatory and inhibitory synaptic input, and the type of auditory nerve input to the octopus cells. The combination of multiple mouse lines enables a better understanding of and helps to define, the pattern of synaptic convergence onto these cells. The electrophysiology provides excellent functional evidence for the presence of the inhibitory inputs, and the modeling helps to interpret the likely functional role of inhibition. The work is technically well done and adds an interesting dimension related to the processing of sound by these neurons. The paper is overall well written, the experimental tests are well-motivated and easy to follow. The discussion is reasonable and touches on both the potential implications of the work as well as some caveats.

      Weaknesses:

      While the conclusions presented by the authors are solid, a prominent question remains regarding the source of the glycinergic input onto octopus cells. In the discussion, the authors claim that there is no evidence for D-stellate, L-stellate, and tuberculoventral cell (all local inhibitory neurons of the ventral and dorsal cochlear nucleus) connections to octopus cells, and cite the relevant literature. An experimental approach will be necessary to properly rule out (or rule in) these cell types and others that may arise from other auditory brainstem nuclei. Understanding which cells provide the inhibitory input will be an essential step in clarifying its roles in the processing of sound by octopus cells.

      We are glad that the reviewer agrees with the conclusions we have made and is interested in learning more about how these findings impact sound processing. We agree that defining the source of inhibition will dramatically shape our understanding of the computation octopus cells are making. However, this is not an easy task, given the small size of the octopus cell area, and will involve considerable additional work. Since the overall findings do not depend on knowing the source of inhibition, we have instead re-written the discussion to clarify the lack of evidence for intrinsic inhibitory inputs to octopus cells, in addition to presenting likely candidates. As genetic profiles of cochlear nucleus and other auditory brainstem neurons become available, we intend to make and utilize genetic mouse models to answer questions like this.

      The authors showed that type 1a SGNs are the most abundant inputs to octopus cells via microscopy. However, in Figure 3 they compare optical stimulation of all classes of ANFs, then compare this against stimulation of type 1b/c ANFs. While a difference in the paired-pulse ratio (and therefore, likely release probability) can be inferred by the difference between Foxg1-ChR2 and Ntng1-ChR2, it would have been preferable to have specific data with selective stimulation of type 1a neurons.

      We agree that complete genetic access to only the Ia population would have been the preferable approach, but we did not have an appropriate line when beginning these experiments. Because our results did not suggest a meaningful difference between the populations, we did not pursue further investigation once a line was available.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Besides the points mentioned in the main review:

      Minor

      (1) I really like the graphics and the immunohistological presentation.

      (2) Lines 316-319 say that octopus cells lack things like back-propagating spikes and dendritic Ca spikes. How do you know this?

      This statement was intended to be a summary of suggestions from the literature and lacked references and context as written. We have rewritten this section and clarified that our hypothesis was formed from data found in the literature (lines 334-337).

      (3) Spectrograms of Figure 6A...where were these data obtained?

      We recorded and visualized human-generated rhythmic tapping and high-frequency squeaking sounds using Audacity. The visualizations of rhythmic tapping and imitated vocalizations are meant to show two different types of multi-frequency stimuli we hypothesize would result in somatic summation within an octopus cell’s spike integration window, despite differences in timing. We rewrote the figure legend to explain more clearly what is shown and how it relates to the model in Figure 6.

      (4) 'on-path' and 'off-path' seem like jargon that may not be clear to the average reader.

      Thank you for pointing out our use of unapproachable jargon. We have replaced the term from the figure with “proximal” and “distal” inhibition. In the main text, we now describe on-path and off-path together as the effect of location of dendritic inhibition on somatically recorded EPSPs.

      (5) The paper could benefit from a table of modeled values.

      We have added specific details about the modelling in the text and clarified which modeled values were referenced from previous computational models and which were tuned to fit experimental data. Since most values were taken from a referenced publication, we did not add a table and instead point readers towards that source.

      (6) Figure S4A-C what currents were delivered to the modeled cells?

      The model cells were injected with a -0.8 nA DC current for 300 ms in current clamp mode. This information has been added to the figure legend.

      (7) In that figure "scaling factors" scale exactly which channels?

      Scaling factor is used to scale low-voltage activated K<sup>+</sup> (ḡ<sub>KLT</sub>), high threshold K<sup>+</sup> (ḡ<sub>KHT</sub>), fast transient K<sup>+</sup> (ḡ<sub>KA</sub>), hyperpolarization-activated cyclic nucleotide-gated HCN (ḡ<sub>h</sub>) but not fast Na<sup>+</sup> (ḡ<sub>Na</sub>) and leak K<sup>+</sup> (ḡ<sub>leak</sub>). This information has been added to the text (lines 205-208 and 646-653).

      (8) In performing and modeling Kv/HCN block, do you know how complete the level of the block is?

      Since we cannot assess how complete the level of block is, we have changed the language in the text to clarify that we are reducing Kv and HCN channel conductance to the degree needed to increase resistance of the neuron (line 185).

      (9) More on this Figure S4. It is hardly referred to in the text except to say that it supports that blocking the Kv/HCN channels will enhance the IPSP. Given how large the figure is, can you offer more of a conclusion than that? Also, in the synaptic model in that figure, the IPSCs are presumably happening in current-clamp conditions, and the reduction in amplitude of the IPSC (as opposed to the increase in IPSP) is due to hyperpolarization. Can you simply state that so readers can track what this figure is showing? Other similar things: what is a transfer impedance? How is it measured? What do we take from the analysis?

      We have elaborated on our description of both Supp. Fig. 4 and Supp. Fig. 5 in the results section of the text (lines 203-238).

      (10) Figure S5 also needs a better explanation. E.g., in C-D, what does 'average' mean? The gray is an SD of this average? You modeled a range of values...but which ones are physiological? To me, this is a key point.

      We have elaborated on our description of both Supp. Fig. 4 and Supp. Fig. 5 in the results section of the text (lines 203-238).

      Reviewer #2 (Recommendations For The Authors):

      General:

      The images and 3-D reconstructions are visually stunning, but they are not colorblind-friendly and in some cases, hard to distinguish. This shows up particularly in the green and blue colors used in Figure 1. Also, better representative images could be used for Figure 1B.

      Thank you for pointing out that blue and green were difficult to distinguish in Figure 1H. We have outlined the green inhibitory puncta in this image to make them more distinguishable. We have also increased the resolution of the image in Figure 1B for better clarity. All other colors are selected from Wong, 2011 (PMID: 21850730; DOI: https://doi.org/10.1038/nmeth.1618).

      Supplemental Figure 1D: The low-power view is good to have, but the CN is too small and the image appears a bit noisy. An inset showing the CN on a larger scale (higher resolution image?) would be more convincing. In this image, I see what appear to be cells in the DCN labeled, which calls into question the purity of the source of optogenetic synaptic activation. It is also difficult to tell whether there are other cells labeled in the VCN. Such inputs would still be minor, but it would be good to be very clear about the expression pattern.

      To offer more information about the activity of the Ntng1<sup>Cre</sup> line in other regions of the auditory system, we increased the resolution of the image included in Supp. Fig. 1D and have also included an additional image (Supp. Fig. 1E) of a coronal section of the cochlear nucleus complex with Ntng1-tdT labelling. This image provides additional context for the cells labeled in the DCN. The text in the figure legend has been changed to clarify that some cells in the DCN were labeled (lines 118-120).

      We agree that in the Ntng1<sup>Cre</sup> experiments, there is the possibility of minor contamination from excitatory cells that express ChR2 outside of the spiral ganglion. This is also true for our Foxg1<sup>Cre</sup> and Foxg1<sup>Flp</sup> experiments, because these lines label cortical cells in addition to cochlear cells. However, we do not observe direct descending inputs from the cortex into the PVCN, making contamination from other Foxg1<sup>Cre</sup>-positive neurons unlikely. While non-cochlear inputs from the Ntng1<sup>Cre</sup> line are possible, evidence from both lines gives us confidence that we are not capturing inputs to octopus cells outside the cochlea. Central axons from Type I spiral ganglion neurons have VGLUT1+ synaptic terminals. When comparing the overlap between VGLUT1+ terminals and Foxg1-tdT labelling, we see full coverage. That is, all VGLUT+ terminals on octopus cells are co-labelled by Foxg1<sup>Cre</sup>-mediated expression of tdTomato. An example image is shown below. Here, an octopus cell soma is labeled with blue fluorescent Nissl stain and inputs to the cochlear nucleus complex are labeled with Foxg1<sup>Cre</sup>-dependent tdTomato (Foxg1-tdT; magenta). We have also immunolabeled for VGLUT1 puncta in green. This eliminates the possibility that VGLUT+ cells from outside the cochlea and cortex are sources of excitation to octopus cells.

      Author response image 4.

      Further, we have looked at expression of Ntng1-tdT and Foxg1-EYFP together in the octopus cell area.  An example image is shown below. All Ntng1-tdT+ fibers (magenta) are also Foxg1-EYFP+ (green), suggesting that all Ntng1<sup>Cre</sup>-targeted inputs to octopus cells are a part of the Foxg1<sup>Cre</sup>-targeted input population, which are very likely to only be from the cochlea. We have expanded the results section to include information about the overlap in expression driven by the Ntng1<sup>Cre</sup> and Foxg1<sup>Flp</sup> lines.

      Author response image 5.

      Supplemental Figure 2 G: These are a bit hard to read. Perhaps use a different image, or provide a reference outline drawing telling us what is what.

      We have used a different image with a Thy1-YFP labeled octopus cell for clarity.

      In some places, the term "SGN" is used when referencing the axons and terminals within the CN, and without some context, this was occasionally confusing (SGN would seem to refer to the cell bodies). In some places in the text, it may be preferable to separate SGN, auditory nerve fibers (ANFs), and terminals, as entities for clarity.

      In order to make the study accessible to a broad neuroscience audience, we refer to the neurons of the spiral ganglion and their central axon projections using one name. We understand why, for those well acquainted with the auditory periphery, condensing terminology may feel awkward. However, for those readers unfamiliar with the anatomy of the cochlea and auditory nerve, we feel that the use of “SGN central axon” makes it clear that the “auditory nerve fibers” come from neurons in the spiral ganglion. This is clarified in the first paragraph of the introduction (lines 29-31) and in the methods (line 533).

      Specific: Numbers refer to the line numbers on the manuscript.

      L29-31: Cochlear nucleus neurons are more general in their responses than this sentence indicates. While we can all agree that they are specialized to carry (or improve upon) the representation of these specific features of sound, they also respond more generally to sounds that might not have specific information in any of these domains. They are not silos of neural computation, and their outputs become mixed and "re-represented" well before they reach the auditory cortex. Octopus cells are no exception to this. I suggest striking most of the first paragraph, and instead using the first sentence to lead into the second paragraph, and putting the last sentence (of the current first paragraph) at the end of the second (now first) paragraph.

      We agree with this assessment and have made major changes to the introduction in line with these suggestions.

      L33-46: A number of points in this paragraph need references (exp. line 41).

      We agree and have added references accordingly.

      L43: Not sure what is meant by "fire at the onset of the sound, breaking it up into its frequency components"?

      We changed this text as part of a major reworking of the introduction.

      L47-66: Again more citations are needed (at the end of sentence at line 55, probably moving some of the citations from the next sentence up).

      We agree and have added references accordingly.

      L51: The consistent orientation of octopus cell dendrites across the ANFs has been claimed in the literature (as mentioned here), but there are some (perhaps problematic - plane of sectioning?) counterexamples from the older Golgi-stained images, and even amongst intracellularly stained cells (for example see Reccio-Spinoza and Rhode, 2020). This is important with regards to the broader hypothesis regarding traveling-wave compensation (e.g., McGinley et al; but also many others); if the cells are not all in the appropriate orientation then such compensation may be problematic. Likewise, the data from Lu et al., 2022, points towards a range of sensitivity to frequency-swept stimuli, some of which work in opposition to the traveling wave compensation hypothesis. It would seem that with the Thy1 mice, you have an opportunity to clarify the orientation. Figures 1A and 2A show a consistent dendritic orientation, assuming that these drawings are reconstructions of the cells as they were actually oriented in the tissue. Can you either comment on this or provide clearer evidence?

      We are happy to offer more information about the appearances of octopus cells in our preparations. In our hands, sparsely labeled octopus cells in Thy1-YFP-H mice show consistent dendritic orientation when visualized in a 15 degree parasaggital plane, with the most diversity apparent in cells with somas located more dorsally in the octopus cell area. We hypothesize that this is due to the limited area through which the central projections of spiral ganglion neurons (i.e. ANFs) must pass through before they enter the dorsal cochlear nucleus and continue their tonotopic organization in that area.

      A caveat to studies without physiological or genetic identification of octopus cells is the assumption that all neurons in the octopus cell area are octopus cells. We find, especially along the borders of the octopus cell area, that stellate cells can be seen amongst octopus cells. Because stellate cell dendrites are not oriented like octopus cell dendrites, any stellate cells misidentified as octopus cells would appear to have poorly-oriented dendrites. This may explain why some studies report this finding. In addition, it can be difficult to assess tonotopic organization because of the 3D trajectory of tightly bundled axons, which is not capturable by a single section plane. Although a parasaggital plane of sectioning captures the tonotopic axis in one part of the octopus cell area, that same plane may be perpendicular at the opposing end.

      L67: canonical -> exceptional.

      Thank you for the suggestion. We have made this change in the introduction.

      L127: This paragraph was confusing on first reading. I don't think Supplemental Figure 1D shows the restricted pattern of expression very clearly. The "restricted to SGNs" might be better as "restricted to auditory nerve fibers" (except in the DCN, where there seem to be some scattered small cells?). A higher magnification image of the CN, but lower magnification than in panel E, would be helpful here.

      To avoid confusion, we have re-written this paragraph (lines 117-127) and included a higher magnification image of the CN in a revised Supp. Fig. 1.

      L168: Here, perhaps say ANFs instead of SGNs.

      As above, we have decided to describe ANFs as SGN central axons to make the anatomy more accessible to people unfamiliar with cochlear anatomy.

      L201-204: The IPSPs are surprisingly slow (Figures 5B, C), especially given the speed of the EPSPs/EPSCs in these cells. This is reminiscent of the asymmetry between EPSC and IPSC kinetics in bushy cells (Xie and Manis, 2014). The kinetics used in the model (3 ms; mentioned on line 624) however seem a bit arbitrary and no data is provided for the selection of that value. Were there any direct measurements of the IPSC kinetics (all of the traces in the paper are in the current clamp) that were used to justify this value?

      The kinetics of the somatically-recorded IPSPs are subject to the effects of our pharmacological manipulations. EPSPs measured at the soma under control conditions are small amplitude and rapid. With pharmacological reduction of HCN and Kv channels, EPSPs are larger and slower (please see figure in response to a similar question posed by Reviewer #1). We expect that this change also occurs with the IPSP kinetics under pharmacological conditions. Our justification of kinetics has been expanded and justified in the methods section (lines 641-661).

      L594: Technically, this is a -11 mV junction potential, but thanks for including the information.

      We have corrected this in the text (line 618). Thank you for the close reading of all experimental and methodological details.

      L595: The estimated power of the LED illumination at the focal plane should be measured and indicated here.

      We measured the power of the LED illumination at the focal plane using a PM100D Compact Power and Energy Meter Console (Thorlabs), a S120C Photodiode Power Sensor (Thorlabs), and a 1000µm diameter Circular Precision Pinhole (Thorlabs). Light intensity at the focal plane ranged between 1.9 and 4.1mW/mm<sup>2</sup>, corresponding to 6% and 10% intensity on the Colibri5 system. We have reported these measurements in the results section (Lines 621-622).

      L609: One concern about the model is that the integration time of 25 microseconds is rather close to the relative shifts in latency. While I doubt it will make a difference (except in the number), it may be worth verifying (spot checks, at least) that running the model with a 5 or 10-microsecond step yields a similar pattern of latency shifts (e.g., Supplementary Figure 5, Figure 5).

      Also, it is not clear what temperature the model was executed at (I would presume 35C); this needs to be given, and channel Q10's listed.

      We realize that additional information is needed to fully understand the model and have added this to the results and the methods. The synaptic mechanism (.mod) files were obtained from Manis and Campagnola (2018) (PMID: 29331233; DOI: https://doi.org/10.1016/j.heares.2017.12.017). Q10 (3) and temperature (22°C) were also matched to parameters from Manis and Campagnola (2018). Because temperature is a critical factor for channel kinetics, we verified that our primary results remain consistent under conditions using a temperature of 35°C and a time step of 5µs, depicted below. Panel A illustrates the increase in IPSP as a function of glycine conductance under Kv+HCN block conditions at 35°C. As at 22°C, an increase in IPSP magnitude is absent in the control condition at 35°C. Panels B and C provide a direct comparison between the initial (i.e. 22°C) and suggested (i.e. 35°C) simulation conditions. Again we found that temperature does not have a major impact on the amplitude of IPSPs. Thus, results at 35°C do not change the conclusions we make from the model.

      Author response image 6.

      The nominal conductance densities should at least be provided in a table (supplemental, in addition to including them in the deposited code). The method for "optimization" of the conductance densities to match the experimental recordings needs to be described; the parameter space can be quite large in a model such as this. The McGinley reference needs a number.

      We added a more thorough description of modeling parameters and justification of choices in the methods section of the text (lines 641-661). We have also added a reference number to the McGinley 2012 reference in the text.

      I think this is required by the journal:

      The model code, test results, and simulation results should be deposited in a public resource (Github would be preferable, but dryad, Zenodo, or Figshare could work), and the URL/doi for the resource provided in the manuscript. This includes the morphology swc/hoc file. The code should be in a form, and with a description, that readily allows an interested party with appropriate skills to download it and run it to generate the figures.

      We will upload the code and all associated simulation files to the ModelDB repository upon publication.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer #1:

      Thank you for the careful reading and the positive evaluation of our manuscript. As you mentioned, the present study tried to address the question of how the lost genomic functions could be compensated by evolutionary adaptation, indicating the potential mechanism of "constructive" rather than "destructive" evolution. Thank you for the instructive comments that helped us to improve the manuscript. We sincerely hope the revised manuscript and the following point-to-point response meet your concerns.

      • Line 80 "Growth Fitness" is this growth rate?

      Yes. The sentence was revised as follows.

      (L87-88) “The results demonstrated that most evolved populations (Evos) showed improved growth rates, in which eight out of nine Evos were highly significant (Fig. 1B, upper).”

      • Line 94 a more nuanced understanding of r/K selection theory, allows for trade-ups between R and K, as well as trade-offs. This may explain why you did not see a trade-off between growth and carrying capacity in this study. See this paper https://doi.org/10.1038/s41396-023-01543-5. Overall, your evos lineages evolved higher growth rates and lower carrying capacity (Figures 1B, C, E). If selection was driving the evolution of higher growth rates, it may have been that there was no selective pressure to maintain high carrying capacity. This means that the evolutionary change you observed in carrying capacity may have been neutral "drift" of the carrying capacity trait, during selection for growth rate, not because of a trade-off between R and K. This is especially likely since carrying capacity declined during evolution. Unless the authors have convincing evidence for a tradeoff, I suggest they remove this claim.

      • Line 96 the authors introduce a previous result where they use colony size to measure growth rate, this finding needs to be properly introduced and explained so that we can understand the context of the conclusion.

      • Line 97 This sentence "the collapse of the trade-off law likely resulted from genome reduction." I am not sure how the authors can draw this conclusion, what is the evidence supporting that the genome size reduction causes the breakdown of the tradeoff between R and K (if there was a tradeoff)?

      Thank you for the reference information and the thoughtful comments. The recommended paper was newly cited, and the description of the trade-off collapse was deleted. Accordingly, the corresponding paragraph was rewritten as follows.

      (L100-115) “Intriguingly, a positive correlation was observed between the growth fitness and the carrying capacity of the Evos (Fig. 1D). It was somehow consistent with the positive correlations between the colony growth rate and the colony size of a genome-reduced strain 11 and between the growth rates and the saturated population size of an assortment of genome reduced strains 13. Nevertheless, the negative correlation between growth rate and carrying capacity, known as the r/K selection30,31 was often observed as the trade-off relationship between r and K in the evolution and ecology studies 32 33,34. As the r/K trade-off was proposed to balance the cellular metabolism that resulted from the cost of enzymes involved 34, the deleted genes might play a role in maintaining the metabolism balance for the r/K correlation. On the other hand, the experimental evolution (i.e., serial transfer) was strictly performed within the exponential growth phase; thus, the evolutionary selection was supposed to be driven by the growth rate without selective pressure to maintain the carrying capacity. The declined carrying capacity might have been its neutral "drift" but not a trade-off to the growth rate. Independent and parallel experimental evolution of the reduced genomes selecting either r or K is required to clarify the actual mechanisms.”

      • Line 103 Genome mutations. The authors claim that there are no mutations in parallel but I see that there is a 1199 base pair deletion in eight of the nine evo strains (Table S3). I would like the author to mention this and I'm actually curious about why the authors don't consider this parallel evolution.

      Thank you for your careful reading. According to your comment, we added a brief description of the 1199-bp deletion detected in the Evos as follows.

      (L119-122) “The number of mutations largely varied among the nine Evos, from two to 13, and no common mutation was detected in all nine Evos (Table S3). A 1,199-bp deletion of insH was frequently found in the Evos (Table S3, highlighted), which well agreed with its function as a transposable sequence.”

      • Line 297 Please describe the media in full here - this is an important detail for the evolution experiment. Very frustrating to go to reference 13 and find another reference, but no details of the method. Looked online for the M63 growth media and the carbon source is not specified. This is critical for working out what selection pressures might have driven the genetic and transcriptional changes that you have measured. For example, the parallel genetic change in 8/9 populations is a deletion of insH and tdcD (according to Table S3). This is acetate kinase, essential for the final step in the overflow metabolism of glucose into acetate. If you have a very low glucose concentration, then it could be that there was selection to avoid fermentation and devote all the pyruvate that results from glycolysis into the TCA cycle (which is more efficient than fermentation in terms of ATP produced per pyruvate).

      Sorry for the missing information on the medium composition, which was additionally described in the Materials and Methods. The glucose concentration in M63 was 22 mM, which was supposed to be enough for bacterial growth. Thank you for your intriguing thinking about linking the medium component to the genome mutation-mediated metabolic changes. As there was no experimental result regarding the biological function of gene mutation in the present study, please allow us to address this issue in our future work.

      (L334-337) “In brief, the medium contains 62 mM dipotassium hydrogen phosphate, 39 mM potassium dihydrogen phosphate, 15 mM ammonium sulfate, 15 μM thiamine hydrochloride, 1.8 μM Iron (II) sulfate, 0.2 mM magnesium sulfate, and 22 mM glucose.”

      • Line 115. I do not understand this argument "They seemed highly related to essentiality, as 11 out of 49 mutated genes were essential (Table S3)." Is this a significant enrichment compared to the expectation, i.e. the number of essential genes in the genome? This enrichment needs to be tested with a Hypergeometric test or something similar.

      • Also, "As the essential genes were known to be more conserved than nonessential ones, the high frequency of the mutations fixed in the essential genes suggested the mutation in essentiality for fitness increase was the evolutionary strategy for reduced genome." I do not think that there is enough evidence to support this claim, and it should be removed.

      Sorry for the unclear description. Yes, the mutations were significantly enriched in the essential genes (11 out of 45 genes) compared to the essential genes in the whole genome (286 out of 3290 genes). The improper description linking the mutation in essential genes to the fitness increase was removed, and an additional explanation on the ratio of essential genes was newly supplied as follows.

      (L139-143) “The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008). As the essential genes were determined according to the growth35 and were known to be more conserved than nonessential ones 36,37, the high frequency of the mutations fixed in the essential genes was highly intriguing and reasonable.”

      • Line 124 Regarding the mutation simulations, I do not understand how the observed data were compared to the simulated data, and how conclusions were drawn. Can the authors please explain the motivation for carrying out this analysis, and clearly explain the conclusions?

      Random simulation was additionally explained in the Materials and Methods and the conclusion of the random simulation was revised in the Results, as follows.

      (L392-401) “The mutation simulation was performed with Python in the following steps. A total of 65 mutations were randomly generated on the reduced genome, and the distances from the mutated genomic locations to the nearest genomic scars caused by genome reduction were calculated. Subsequently, Welch's t-test was performed to evaluate whether the distances calculated from the random mutations were significantly longer or shorter than those calculated from the mutations that occurred in Evos. The random simulation, distance calculation, and statistic test were performed 1,000 times, which resulted in 1,000 p values. Finally, the mean of p values (μp) was calculated, and a 95% reliable region was applied. It was used to evaluate whether the 65 mutations in the Evos were significantly close to the genomic scars, i.e., the locational bias.”

      (L148-157) “Random simulation was performed to verify whether there was any bias or hotspot in the genomic location for mutation accumulation due to the genome reduction. A total of 65 mutations were randomly generated on the reduced genome (Fig. 2B), and the genomic distances from the mutations to the nearest genome reduction-mediated scars were calculated. Welch's t-test was performed to evaluate whether the genomic distances calculated from random mutations significantly differed from those from the mutations accumulated in the Evos. As the mean of p values (1,000 times of random simulations) was insignificant (Fig. 2C, μp > 0.05), the mutations fixed on the reduced genome were either closer or farther to the genomic scars, indicating there was no locational bias for mutation accumulation caused by genome reduction.”

      • Line 140 The authors should give some background here - explain the idea underlying chromosomal periodicity of the transcriptome, to help the reader understand this analysis.

      • Line 142 Here and elsewhere, when referring to a method, do not just give the citation, but also refer to the methods section or relevant supplementary material.

      The analytical process (references and methods) was described in the Materials and Methods, and the reason we performed the chromosomal periodicity was added in the Results as follows.

      (L165-172) “As the E. coli chromosome was structured, whether the genome reduction caused the changes in its architecture, which led to the differentiated transcriptome reorganization in the Evos, was investigated. The chromosomal periodicity of gene expression was analyzed to determine the structural feature of genome-wide pattern, as previously described 28,38. The analytical results showed that the transcriptomes of all Evos presented a common six-period with statistical significance, equivalent to those of the wild-type and ancestral reduced genomes (Fig. 3A, Table S4).”

      • Line 151 "The expression levels of the mutated genes were higher than those of the remaining genes (Figure 3B)"- did this depend on the type of mutation? There were quite a few early stops in genes, were these also more likely to be expressed? And how about the transcriptional regulators, can you see evidence of their downstream impact?

      Sorry, we didn't investigate the detailed regulatory mechanisms of 49 mutated genes, which was supposed to be out of the scope of the present study. Fig. 3B was the statistical comparison between 3225 and 49 genes. It didn't mean that all mutated genes expressed higher than the others. The following sentences were added to address your concern.

      (L181-185) “As the regulatory mechanisms or the gene functions were supposed to be disturbed by the mutations, the expression levels of individual genes might have been either up- or down-regulated. Nevertheless, the overall expression levels of all mutated genes tended to be increased. One of the reasons was assumed to be the mutation essentiality, which remained to be experimentally verified.”

      • Line 199 onward. The authors used WGCNA to analyze the gene expression data of evolved organisms. They identified distinct gene modules in the reduced genome, and through further analysis, they found that specific modules were strongly associated with key biological traits like growth fitness, gene expression changes, and mutation rates. Did the authors expect that there was variation in mutation rate across their populations? Is variation from 3-16 mutations that they observed beyond the expectation for the wt mutation rate? The genetic causes of mutation rate variation are well understood, but I could not see any dinB, mutT,Y, rad, or pol genes among the discovered mutations. I would like the authors to justify the claim that there was mutation rate variation in the evolved populations.

      Thank you for the intriguing thinking. We don't think the mutation rates were significantly varied across the nine populations, as no mutation occurred in the MMR genes, as you noticed. Our previous study showed that the spontaneous mutation rate of the reduced genome was higher than that of the wild-type genome (Nishimura et al., 2017, mBio). As nonsynonymous mutations were not detected in all nine Evos, the spontaneous mutation rate couldn't be calculated (because it should be evaluated according to the ratio of nonsynonymous and synonymous single-nucleotide substitutions in molecular evolution). Therefore, discussing the mutation rate in the present study was unavailable. The following sentence was added for a better understanding of the gene modules.

      (L242-245) “These modules M2, M10 and M16 might be considered as the hotspots for the genes responsible for growth fitness, transcriptional reorganization, and mutation accumulation of the reduced genome in evolution, respectively.”

      • Line 254 I get the idea of all roads leading to Rome, which is very fitting. However, describing the various evolutionary strategies and homeostatic and variable consequence does not sound correct - although I am not sure exactly what is meant here. Looking at Figure 7, I will call strategy I "parallel evolution", that is following the same or similar genetic pathways to adaptation and strategy ii I would call divergent evolution. I am not sure what strategy iii is. I don't want the authors to use the terms parallel and divergent if that's not what they mean. My request here would be that the authors clearly describe these strategies, but then show how their results fit in with the results, and if possible, fit with the naming conventions, of evolutionary biology.

      Thank you for your kind consideration and excellent suggestion. It's our pleasure to adopt your idea in tour study. The evolutionary strategies were renamed according to your recommendation. Both the main text and Fig. 7 were revised as follows.

      (L285-293) “Common mutations22,44 or identical genetic functions45 were reported in the experimental evolution with different reduced genomes, commonly known as parallel evolution (Fig. 7, i). In addition, as not all mutations contribute to the evolved fitness 22,45, another strategy for varied phenotypes was known as divergent evolution (Fig. 7, ii). The present study accentuated the variety of mutations fixed during evolution. Considering the high essentiality of the mutated genes (Table S3), most or all mutations were assumed to benefit the fitness increase, partially demonstrated previously 20. Nevertheless, the evolved transcriptomes presented a homeostatic architecture, revealing the divergent to convergent evolutionary strategy (Fig. 7, iii).”

      Author response image 1.

      • Line 327 Growth rates/fitness. I don't think this should be called growth fitness- a rate is being calculated. I would like the authors to explain how the times were chosen - do the three points have to be during the log phase? Can you also explain what you mean by choosing three ri that have the largest mean and minor variance?

      Sorry for the confusing term usage. The fitness assay was changed to the growth assay. Choosing three ri that have the largest mean and minor variance was to avoid the occasional large values (blue circle), as shown in the following figure. In addition, the details of the growth analysis can be found at https://doi.org/10.3791/56197 (ref. 59), where the video of experimental manipulation, protocol, and data analysis is deposited. The following sentence was added in accordance.

      Author response image 2.

      (L369-371) “The growth rate was determined as the average of three consecutive ri, showing the largest mean and minor variance to avoid the unreliable calculation caused by the occasionally occurring values. The details of the experimental and analytical processes can be found at https://doi.org/10.3791/56197.”

      • Line 403 Chromosomal periodicity analysis. The windows chosen for smoothing (100kb) seem big. Large windows make sense for some things - for example looking at how transcription relates to DNA replication timing, which is a whole-genome scale trend. However, here the authors are looking for the differences after evolution, which will be local trends dependent on specific genes and transcription factors. 100kb of the genome would carry on the order of one hundred genes and might be too coarse-grained to see differences between evos lineages.

      Thank you for the advice. We agree that the present analysis focused on the global trend of gene expression. Varying the sizes may lead to different patterns. Additional analysis was performed according to your comment. The results showed that changes in window size (1, 10, 50, 100, and 200 kb) didn't alter the periodicity of the reduced genome, which agreed with the previous study on a different reduced genome MDS42 of a conserved periodicity (Ying et al., 2013, BMC Genomics). The following sentence was added in the Materials and Methods.

      (L460-461) “Note that altering the moving average did not change the max peak.”

      • Figures - the figures look great. Figure 7 needs a legend.

      Thank you. The following legend was added.

      (L774-777) “Three evolutionary strategies are proposed. Pink and blue arrowed lines indicate experimental evolution and genome reduction, respectively. The size of the open cycles represents the genome size. Black and grey indicate the ancestor and evolved genomes, respectively.”

      Response to Reviewer #2:

      Thank you for reviewing our manuscript and for your fruitful comments. We agree that our study leaned towards elaborating observed findings rather than explaining the detailed biological mechanisms. We focused on the genome-wide biological features rather than the specific biological functions. The underlying mechanisms indeed remained unknown, leaving the questions as you commented. We didn't perform the fitness assay on reconstituted (single and combinatorial) mutants because the research purpose was not to clarify the regulatory or metabolic mechanisms. It's why the RNA-Seq analysis provided the findings on genome-wide patterns and chromosomal view, which were supposed to be biologically valuable. We did understand your comments and complaints that the conclusions were biologically meaningless, as ALE studies that found the specific gene regulation or improved pathway was the preferred story in common, which was not the flow of the present study.

      For this reason, our revision may not address all these concerns. Considering your comments, we tried our best to revise the manuscript. The changes made were highlighted. We sincerely hope the revision and the following point-to-point response are acceptable.

      Major remarks:

      (1) The authors outlined the significance of ALE in genome-reduced organisms and important findings from published literature throughout the Introduction section. The description in L65-69, which I believe pertains to the motivation of this study, seems vague and insufficient to convey the novelty or necessity of this study i.e. it is difficult to grasp what aspects of genome-reduced biology that this manuscript intends to focus/find/address.

      Sorry for the unclear writing. The sentences were rewritten for clarity as follows.

      (L64-70) “Although the reduced growth rate caused by genome reduction could be recovered by experimental evolution, it remains unclear whether such an evolutionary improvement in growth fitness was a general feature of the reduced genome and how the genome-wide changes occurred to match the growth fitness increase. In the present study, we performed the experimental evolution with a reduced genome in multiple lineages and analyzed the evolutionary changes of the genome and transcriptome.”

      (2) What is the rationale behind the lineage selection described in Figure S1 legend "Only one of the four overnight cultures in the exponential growth phase (OD600 = 0.01~0.1) was chosen for the following serial transfer, highlighted in red."?

      The four wells (cultures of different initial cell concentrations) were measured every day, and only the well that showed OD600=0.01~0.1 (red) was transferred with four different dilution rates (e.g., 10, 100, 1000, and 10000 dilution rates). It resulted in four wells of different initial cell concentrations. Multiple dilutions promised that at least one of the wells would show the OD600 within the range of 0.01 to 0.1 after the overnight culture. They were then used for the next serial transfer. Fig. S1 provides the details of the experimental records. The experimental evolution was strictly controlled within the exponential phase, quite different from the commonly conducted ALE that transferred a single culture in a fixed dilution rate. Serial transfer with multiple dilution rates was previously applied in our evolution experiments and well described in Nishimura et al., 2017, mBio; Lu et al., 2022, Comm Biol; Kurokawa et al., 2022, Front Microbiol, etc. The following sentence was added in the Materials and Methods.

      (L344-345) “Multiple dilutions changing in order promised at least one of the wells within the exponential growth phase after the overnight culture.”

      (3) The measured growth rate of the end-point 'F2 lineage' shown in Figure S2 seemed comparable to the rest of the lineages (A1 to H2), but the growth rate of 'F2' illustrated in Figure 1B indicates otherwise (L83-84). What is the reason for the incongruence between the two datasets?

      Sorry for the unclear description. The growth rates shown in Fig. S2 were obtained during the evolution experiment using the daily transfer's initial and final OD600 values. The growth rates shown in Fig. 1B were obtained from the final population (Evos) growth assay and calculated from the growth curves (biological replication, N=4). Fig. 1B shows the precisely evaluated growth rates, and Fig. S2 shows the evolutionary changes in growth rates. Accordingly, the following sentence was added to the Results.

      (L84-87) “As the growth increases were calculated according to the initial and final records, the exponential growth rates of the ancestor and evolved populations were obtained according to the growth curves for a precise evaluation of the evolutionary changes in growth.”

      (4) Are the differences in growth rate statistically significant in Figure 1B?

      Eight out of nine Evos were significant, except F2. The sentences were rewritten and associated with the revised Fig. 1B, indicating significance.

      (L87-90) “The results demonstrated that most evolved populations (Evos) showed improved growth rates, in which eight out of nine Evos were highly significant (Fig. 1B, upper). However, the magnitudes of growth improvement were considerably varied, and the evolutionary dynamics of the nine lineages were somehow divergent (Fig. S2).”

      (5) The evolved lineages showed a decrease in their maximal optical densities (OD600) compared to the ancestral strain (L85-86). ALE could accompany changes in cell size and morphologies, (doi: 10.1038/s41586-023-06288-x; 10.1128/AEM.01120-17), which may render OD600 relatively inaccurate for cell density comparison. I suggest using CFU/mL metrics for the sake of a fair comparison between Anc and Evo.

      The methods evaluating the carrying capacity (i.e., cell density, population size, etc.) do not change the results. Even using CFU is unfair for the living cells that can not form colonies and unfair if the cell size changes. Optical density (OD600) provides us with the temporal changes of cell growth in a 15-minute interval, which results in an exact evaluation of the growth rate in the exponential phase. CFU is poor at recording the temporal changes of population changes, which tend to result in an inappropriate growth rate. Taken together, we believe that our method was reasonable and reliable. We hope you can accept the different way of study.

      (6) Please provide evidence in support of the statement in L115-119. i.e. statistical analysis supporting that the observed ratio of essential genes in the mutant pool is not random.

      The statistic test was performed, and the following sentence was added.

      (L139-141) “The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008).”

      (7) The assumption that "mutation abundance would correlate to fitness improvement" described in L120-122: "The large variety in genome mutations and no correlation of mutation abundance to fitness improvement strongly suggested that no mutations were specifically responsible or crucially essential for recovering the growth rate of the reduced genome" is not easy to digest, in the sense that (i) the effect of multiple beneficial mutations are not necessarily summative, but are riddled with various epistatic interactions (doi: 10.1016/j.mec.2023.e00227); (ii) neutral hitchhikers are of common presence (you could easily find reference on this one); (iii) hypermutators that accumulate greater number of mutations in a given time are not always the eventual winners in competition games (doi: 10.1126/science.1056421). In this sense, the notion that "mutation abundance correlates to fitness improvement" in L120-122 seems flawed (for your perusal, doi: 10.1186/gb-2009-10-10-r118).

      Sorry for the improper description and confusing writing, and thank you for the fruitful knowledge on molecular evolution. The sentence was deleted, and the following one was added.

      (L145-146) “Nevertheless, it was unclear whether and how these mutations were explicitly responsible for recovering the growth rate of the reduced genome.”

      (8) Could it be possible that the large variation in genome mutations in independent lineages results from a highly rugged fitness landscape characterized by multiple fitness optima (doi: 10.1073/pnas.1507916112)? If this is the case, I disagree with the notion in L121-122 "that no mutations were specifically responsible or crucially essential" It does seem to me that, for example, the mutations in evo A2 are specifically responsible and essential for the fitness improvement of evo A2 in the evolutionary condition (M63 medium). Fitness assessment of individual (or combinatorial) mutants reconstituted in the Ancestral background would be a bonus.

      Thank you for the intriguing thinking. The sentence was deleted. Please allow us to adapt your comment to the manuscript as follows.

      (L143-145) “The large variety of genome mutations fixed in the independent lineages might result from a highly rugged fitness landscape 38.”

      (9) L121-122: "...no mutations were specifically responsible or crucially essential for recovering the growth rate of the reduced genome". Strictly speaking, the authors should provide a reference case of wild-type E. coli ALE in order to reach definitive conclusions that the observed mutation events are exclusive to the genome-reduced strain. It is strongly recommended that the authors perform comparative analysis with an ALEed non-genome-reduced control for a more definitive characterization of the evolutionary biology in a genome-reduced organism, as it was done for "JCVI-syn3.0B vs non-minimal M. mycoides" (doi: 10.1038/s41586-023-06288-x) and "E. coli eMS57 vs MG1655" (doi: 10.1038/s41467-019-08888-6).

      The improper description was deleted in response to comments 7 and 8. The mentioned references were cited in the manuscript (refs 21 and 23). Thank you for the experimental advice. We are sorry that the comparison of wild-type and reduced genomes was not in the scope of the present study and will probably be reported soon in our future work.

      (10) L146-148: "The homeostatic periodicity was consistent with our previous findings that the chromosomal periodicity of the transcriptome was independent of genomic or environmental variation" A Previous study also suggested that the amplitudes of the periodic transcriptomes were significantly correlated with the growth rates (doi: 10.1093/dnares/dsaa018). Growth rates of 8/9 Evos were higher compared to Anc, while that of Evo F2 remained similar. Please comment on the changes in amplitudes of the periodic transcriptomes between Anc and each Evo.

      Thank you for the suggestion. The correlation between the growth rates and the amplitudes of chromosomal periodicity was statistically insignificant (p>0.05). It might be a result of the limited data points. Compared with the only nine data points in the present study, the previous study analyzed hundreds of transcriptomes associated with the corresponding growth rates, which are suitable for statistical evaluation. In addition, the changes in growth rates were more significant in the previous study than in the present study, which might influence the significance. It's why we did not discuss the periodic amplitude.

      (11) Please elaborate on L159-161: "It strongly suggested the essentiality mutation for homeostatic transcriptome architecture happened in the reduced genome.".

      Sorry for the improper description. The sentence was rewritten as follows.

      (L191-193) “The essentiality of the mutations might have participated in maintaining the homeostatic transcriptome architecture of the reduced genome.”

      (12) Is FPKM a valid metric for between-sample comparison? The growing consensus in the community adopts Transcripts Per Kilobase Million (TPM) for comparing gene expression levels between different samples (Figure 3B; L372-379).

      Sorry for the unclear description. The FPKM indicated here was globally normalized, statistically equivalent to TPM. The following sentence was added to the Materials and Methods.

      (L421-422) “The resulting normalized FPKM values were statistically equivalent to TPM.”

      (13) Please provide % mapped frequency of mutations in Table S3.

      They were all 100%. The partially fixed mutations were excluded in the present study. The following sentence was added to the caption of Table S3.

      (Supplementary file, p 9) “Note that the entire population held the mutations, i.e., 100% frequency in DNA sequencing.”

      (14) To my knowledge, M63 medium contains glucose and glycerol as carbon sources. The manuscript would benefit from discussing the elements that impose selection pressure in the M63 culture condition.

      Sorry for the missing information on M63, which contains 22 mM glucose as the only carbon source. The medium composition was added in the Materials and Methods, as follows.

      (L334-337) “In brief, the medium contains 62 mM dipotassium hydrogen phosphate, 39 mM potassium dihydrogen phosphate, 15 mM ammonium sulfate, 15 μM thiamine hydrochloride, 1.8 μM Iron (II) sulfate, 0.2 mM magnesium sulfate, and 22 mM glucose.”

      (15) The RNA-Seq datasets for Evo strains seemed equally heterogenous, just as their mutation profiles. However, the missing element in their analysis is the directionality of gene expression changes. I wonder what sort of biological significance can be derived from grouping expression changes based solely on DEGs, without considering the magnitude and the direction (up- and down-regulation) of changes? RNA-seq analysis in its current form seems superficial to derive biologically meaningful interpretations.

      We agree that most studies often discuss the direction of transcriptional changes. The present study aimed to capture a global view of the magnitude of transcriptome reorganization. Thus, the analyses focused on the overall features, such as the abundance of DEGs, instead of the details of the changes, e.g., the up- and down-regulation of DEGs. The biological meaning of the DEGs' overview was how significantly the genome-wide gene expression fluctuated, which might be short of an in-depth view of individual gene expression. The following sentence was added to indicate the limitation of the present analysis.

      (L199-202) “Instead of an in-depth survey on the directional changes of the DEGs, the abundance and functional enrichment of DEGs were investigated to achieve an overview of how significant the genome-wide fluctuation in gene expression, which ignored the details of individual genes.”

      Minor remarks

      (1) L41: brackets italicized "(E. coli)".

      It was fixed as follows.

      (L40) “… Escherichia coli (E. coli) cells …”

      (2) Figure S1. It is suggested that the x-axis of ALE monitor be set to 'generations' or 'cumulative generations', rather than 'days'.

      Thank you for the suggestion. Fig. S1 describes the experimental procedure, so the" day" was used. Fig. S2 presents the evolutionary process, so the "generation" was used, as you recommended here.

      (3) I found it difficult to digest through L61-64. Although it is not within the job scope of reviewers to comment on the language style, I must point out that the manuscript would benefit from professional language editing services.

      Sorry for the unclear writing. The sentences were revised as follows.

      (L60-64) “Previous studies have identified conserved features in transcriptome reorganization, despite significant disruption to gene expression patterns resulting from either genome reduction or experimental evolution 27-29. The findings indicated that experimental evolution might reinstate growth rates that have been disrupted by genome reduction to maintain homeostasis in growing cells.”

      (4) Duplicate references (No. 21, 42).

      Sorry for the mistake. It was fixed (leaving ref. 21).

      (5) Inconsistency in L105-106: "from two to 13".

      "From two to 13" was adopted from the language editing. It was changed as follows.

      (L119) “… from 2 to 13, …”

      Response to Reviewer #3:

      Thank you for reviewing our manuscript and for the helpful comments, which improved the strength of the manuscript. The recommended statistical analyses essentially supported the statement in the manuscript were performed, and those supposed to be the new results in the scope of further studies remained unconducted. The changes made in the revision were highlighted. We sincerely hope the revised manuscript and the following point-to-point response meet your concerns. You will find all your suggested statistic tests in our future work that report an extensive study on the experimental evolution of an assortment of reduced genomes.

      (1) Line 106 - "As 36 out of 45 SNPs were nonsynonymous, the mutated genes might benefit the fitness increase." This argument can be strengthened. For example, the null expectation of nonsynonymous SNPs should be discussed. Is the number of observed nonsynonymous SNPs significantly higher than the expected one?

      (2) Line 107 - "In addition, the abundance of mutations was unlikely to be related to the magnitude of fitness increase." Instead of just listing examples, a regression analysis can be added.

      Yes, it's significant. Random mutations lead to ~33% of nonsynonymous SNP in a rough estimation. Additionally, the regression is unreliable because there's no statistical significance between the number of mutations and the magnitude of fitness increase. Accordingly, the corresponding sentences were revised with additional statistical tests.

      (L123-129) “As 36 out of 45 SNPs were nonsynonymous, which was highly significant compared to random mutations (p < 0.01), the mutated genes might benefit fitness increase. In addition, the abundance of mutations was unlikely to be related to the magnitude of fitness increase. There was no significant correlation between the number of mutations and the growth rate in a statistical view (p > 0.1). Even from an individual close-up viewpoint, the abundance of mutations poorly explained the fitness increase.”

      (3) Line 114 - "They seemed highly related to essentiality, as 11 out of 49 mutated genes were essential (Table S3)." Here, the information mentioned in line 153 ("the ratio of essential to all genes (302 out of 3,290) in the reduced genome.") can be used. Then a statistical test for a contingency table can be used.

      (4) Line 117 - "the high frequency of the mutations fixed in the essential genes suggested the mutation in essentiality for fitness increase was the evolutionary strategy for reduced genome." What is the expected number of fixed mutations in essential genes vs non-essential genes? Is the observed number statistically significantly higher?

      Sorry for the improper and insufficient information on the essential genes. Yes, it's significant. The statistical test was additionally performed. The corresponding part was revised as follows.

      (L134-146) “They seemed highly related to essentiality7 (https://shigen.nig.ac.jp/ecoli/pec/genes.jsp), as 11 out of 49 mutated genes were essential (Table S3). Although the essentiality of genes might differ between the wild-type and reduced genomes, the experimentally determined 302 essential genes in the wild-type E. coli strain were used for the analysis, of which 286 were annotated in the reduced genome. The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008). As the essential genes were determined according to the growth35 and were known to be more conserved than nonessential ones 36,37, the high frequency of the mutations fixed in the essential genes was highly intriguing and reasonable. The large variety of genome mutations fixed in the independent lineages might result from a highly rugged fitness landscape 38. Nevertheless, it was unclear whether and how these mutations were explicitly responsible for recovering the growth rate of the reduced genome.”

      (5) The authors mentioned no overlapping in the single mutation level. Is that statistically significant? The authors can bring up what the no-overlap probability is given that there are in total x number of fixed mutations observed (either theory or simulation is good).

      Sorry, we feel confused about this comment. It's unclear to us why it needs to be statistically simulated. Firstly, the mutations were experimentally observed. The result that no overlapped mutated genes were detected was an Experimental Fact but not a Computational Prediction. We feel sorry that you may over-interpret our finding as an evolutionary rule, which always requires testing its reliability statistically. We didn't conclude that the evolution had no overlapped mutations. Secondly, considering 65 times random mutations happened to a ~3.9 Mb sequence, the statistical test was meaningful only if the experimental results found the overlapped mutations. It is interesting how often the random mutations cause the overlapped mutations in parallel evolutionary lineages while increasing the evolutionary lineages, which seems to be out of the scope of the present study. We are happy to include the analysis in our ongoing study on the experimental evolution of reduced genomes.

      (6) The authors mentioned no overlapping in the single mutation level. How about at the genetic level? Some fixed mutations occur in the same coding gene. Is there any gene with a significantly enriched number of mutations?

      No mutations were fixed in the same gene of biological function, as shown in Table S3. If we say the coding region, the only exception is the IS sequences, well known as the transposable sequences without genetic function. The following description was added.

      (L119-122) “The number of mutations largely varied among the nine Evos, from 2 to 13, and no common mutation was detected in all nine Evos (Table S3). A 1,199-bp deletion of insH was frequently found in the Evos (Table S3, highlighted), which well agreed with its function as a transposable sequence.”

      (7) Line 151-156- It seems like the authors argue that the expression level differences can be just explained by the percentage of essential genes that get fixed mutations. One further step for the argument could be to compare the expression level of essential genes with vs without fixed mutations. Also, the authors can compare the expression level of non-essential genes with vs without fixed mutations. And the authors can report whether the differences in expression level became insignificant after the control of the essentiality.

      It's our pleasure that the essentiality intrigued you. Thank you for the analytical suggestion, which is exciting and valuable for our studies. As only 11 essential genes were detected here and "Mutation in essentiality" was an indication but not the conclusion of the present study, we would like to apply the recommended analysis to the datasets of our ongoing study to demonstrate this statement. Thank you again for your fruitful analytical advice.

      (8) Line 169- "The number of DEGs partially overlapped among the Evos declined significantly along with the increased lineages of Evos (Figure 4B). " There is a lack of statistical significance here while the word "significantly" is used. One statistical test that can be done is to use re-sampling/simulation to generate a null expectation of the overlapping numbers given the DEGs for each Evo line and the total number of genes in the genome. The observed number can then be compared to the distribution of the simulated numbers.

      Sorry for the inappropriate usage of the term. Whether it's statistically significant didn't matter here. The word "significant" was deleted as follows.

      (L205--206) “The number of DEGs partially overlapped among the Evos declined along with the increased lineages of Evos (Fig. 4B).”

      (9) Line 177-179- "In comparison,1,226 DEGs were induced by genome reduction. The common DEGs 177 of genome reduction and evolution varied from 168 to 540, fewer than half of the DEGs 178 responsible for genome reduction in all Evos" Is the overlapping number significantly lower than the expectation? The hypergeometric test can be used for testing the overlap between two gene sets.

      There's no expectation for how many DEGs were reasonable. Not all numbers experimentally obtained are required to be statistically meaningful, which is commonly essential in computational and data science.

      (10) The authors should give more information about the ancestral line used at the beginning of experimental evolution. I guess it is one of the KHK collection lines, but I can not find more details. There are many genome-reduced lines. Why is this certain one picked?

      Sorry for the insufficient information on the reduced genome used for the experimental evolution. The following descriptions were added in the Results and the Materials and Methods, respectively.

      (L75-79) “The E. coli strain carrying a reduced genome, derived from the wild-type genome W3110, showed a significant decline in its growth rate in the minimal medium compared to the wild-type strain 13. To improve the genome reduction-mediated decreased growth rate, the serial transfer of the genome-reduced strain was performed with multiple dilution rates to keep the bacterial growth within the exponential phase (Fig. S1), as described 17,20.”

      (L331-334) “The reduced genome has been constructed by multiple deletions of large genomic fragments 58, which led to an approximately 21% smaller size than its parent wild-type genome W3110.”

      (11) How was the saturated density in Figure 1 actually determined? In particular, the fitness assay of growth curves is 48h. But it seems like the experimental evolution is done for ~24 h cycles. If the Evos never experienced a situation like a stationary phase between 24-48h, and if the author reported the saturated density 48 h in Figure 1, the explanation of the lower saturated density can be just relaxation from selection and may have nothing to do with the increase of growth rate.

      Sorry for the unclear description. Yes, you are right. The evolution was performed within the exponential growth phase (keeping cell division constant), which means the Evos never experienced the stationary phase (saturation). The final evolved populations were subjected to the growth assay to obtain the entire growth curves for calculating the growth rate and the saturated density. Whether the decreased saturated density and the increased growth rate were in a trade-off relationship remained unclear. The corresponding paragraph was revised as follows.

      (L100-115) “Intriguingly, a positive correlation was observed between the growth fitness and the carrying capacity of the Evos (Fig. 1D). It was somehow consistent with the positive correlations between the colony growth rate and the colony size of a genome-reduced strain 11 and between the growth rates and the saturated population size of an assortment of genome reduced strains 13. Nevertheless, the negative correlation between growth rate and carrying capacity, known as the r/K selection30,31 was often observed as the trade-off relationship between r and K in the evolution and ecology studies 32 33,34. As the r/K trade-off was proposed to balance the cellular metabolism that resulted from the cost of enzymes involved 34, the deleted genes might play a role in maintaining the metabolism balance for the r/K correlation. On the other hand, the experimental evolution (i.e., serial transfer) was strictly performed within the exponential growth phase; thus, the evolutionary selection was supposed to be driven by the growth rate without selective pressure to maintain the carrying capacity. The declined carrying capacity might have been its neutral "drift" but not a trade-off to the growth rate. Independent and parallel experimental evolution of the reduced genomes selecting either r or K is required to clarify the actual mechanisms.”

      (12) What annotation of essentiality was used in this paper? In particular, the essentiality can be different in the reduced genome background compared to the WT background.

      Sorry for the unclear definition of the essential genes. They are strictly limited to the 302 essential genes experimentally determined in the wild-type E coli strain. Detailed information can be found at the following website: https://shigen.nig.ac.jp/ecoli/pec/genes.jsp. We agree that the essentiality could differ between the WT and reduced genomes. Identifying the essential genes in the reduced genome will be an exhaustedly vast work. The information on the essential genes defined in the present study was added as follows.

      (L134-139) “They seemed highly related to essentiality7 (https://shigen.nig.ac.jp/ecoli/pec/genes.jsp), as 11 out of 49 mutated genes were essential (Table S3). Although the essentiality of genes might differ between the wild-type and reduced genomes, the experimentally determined 302 essential genes in the wild-type E. coli strain were used for the analysis, of which 286 were annotated in the reduced genome.”

      (13) The fixed mutations in essential genes are probably not rarely observed in experimental evolution. For example, fixed mutations related to RNA polymerase can be frequently seen when evolving to stressful environments. I think the author can discuss this more and elaborate more on whether they think these mutations in essential genes are important in adaptation or not.

      Thank you for your careful reading and the suggestion. As you mentioned, we noticed that the mutations in RNA polymerases (rpoA, rpoB, and rpoD) were identified in three Evos. As they were not shared across all Evos, we didn't discuss the contribution of these mutations to evolution. Instead of the individual functions of the mutated essential gene functions, we focused on the enriched gene functions related to the transcriptome reorganization because they were the common feature observed across all Evos and linked to the whole metabolic or regulatory pathways, which are supposed to be more biologically reasonable and interpretable. The following sentence was added to clarify our thinking.

      (L268-273) “In particular, mutations in the essential genes, such as RNA polymerases (rpoA, rpoB, rpoD) identified in three Evos (Table S3), were supposed to participate in the global regulation for improved growth. Nevertheless, the considerable variation in the fixed mutations without overlaps among the nine Evos (Table 1) implied no common mutagenetic strategy for the evolutionary improvement of growth fitness.”

      (14) In experimental evolution to new environments, several previous literature also show that long-term experimental evolution in transcriptome is not consistent or even reverts the short-term response; short-term responses were just rather considered as an emergency plan. They seem to echo what the authors found in this manuscript. I think the author can refer to some of those studies more and make a more throughput discussion on short-term vs long-term responses in evolution.

      Thank you for the advice. It's unclear to us what the short-term and long-term responses referred to mentioned in this comment. The "Response" is usually used as the phenotypic or transcriptional changes within a few hours after environmental fluctuation, generally non-genetic (no mutation). In comparison, long-term or short-term experimental "Evolution" is associated with genetic changes (mutations). Concerning the Evolution (not the Response), the long-term experimental evolution (>10,000 generations) was performed only with the wild-type genome, and the short-term experimental evolution (500~2,000 generations) was more often conducted with both wild-type and reduced genomes, to our knowledge. Previous landmark studies have intensively discussed comparing the wild-type and reduced genomes. Our study was restricted to the reduced genome, which was constructed differently from those reduced genomes used in the reported studies. The experimental evolution of the reduced genomes has been performed in the presence of additional additives, e.g., antibiotics, alternative carbon sources, etc. That is, neither the genomic backgrounds nor the evolutionary conditions were comparable. Comparison of nothing common seems to be unproductive. We sincerely hope the recommended topics can be applied in our future work.

      Some minor suggestions

      • Figures S3 & Table S2 need an explanation of the abbreviations of gene categories.

      Sorry for the missing information. Figure S3 and Table S3 were revised to include the names of gene categories. The figure was pasted followingly for a quick reference.

      Author response image 3.

      • I hope the authors can re-consider the title; "Diversity for commonality" does not make much sense to me. For example, it can be simply just "Diversity and commonality."

      Thank you for the suggestion. The title was simplified as follows.

      (L1) “Experimental evolution for the recovery of growth loss due to genome reduction.”

      • It is not easy for me to locate and distinguish the RNA-seq vs DNA-seq files in DRA013662 at DDBJ. Could you make some notes on what RNA-seq actually are, vs what DNA-seq files actually are?

      Sorry for the mistakes in the DRA number of DNA-seq. DNA-seq and RNA-seq were deposited separately with the accession IDs of DRA013661 and DRA013662, respectively. The following correction was made in the revision.

      (L382-383) “The raw datasets of DNA-seq were deposited in the DDBJ Sequence Read Archive under the accession number DRA013661.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Yu et al. describe the chemotactic gradient formation for CCL5 bound to - i.e. released from - glycosaminoglycans. The authors provide evidence for phase separation as the driving mechanism behind chemotactic gradient formation. A conclusion towards a general principle behind the finding cannot be drawn since the work focuses on one chemokine only, which is particularly prone to glycan-induced oligomerisation.

      Strengths:

      The principle of phase separation as a driving force behind and thus as an analytical tool for investigating protein interactions with strongly charged biomolecules was originally introduced for protein-nucleic acid interactions. Yu et al. have applied this in their work for the first time for chemokine-heparan sulfate interactions. This opens a novel way to investigate chemokine-glycosaminoglycan interactions in general.

      Response: Thanks for the encouragement of the reviewer.

      Weaknesses:

      As mentioned above, one of the weaknesses of the current work is the exemplification of the phase separation principle by applying it only to CCL5-heparan sulfate interactions. CCL5 is known to form higher oligomers/aggregates in the presence of glycosaminoglycans, much more than other chemokines. It would therefore have been very interesting to see, if similar results in vitro, in situ, and in vivo could have been obtained by other chemokines of the same class (e.g. CCL2) or another class (like CXCL8).

      Response: We share the reviewer’s opinion that to investigate more molecules/cytokines that interact with heparan sulfate in the system should be of interesting. We expect that researchers in the field will adapt the concept to continue the studies on additional molecules. Nevertheless, our earlier study has demonstrated that bFGF was enriched to its receptor and triggered signaling transduction through phase separation with heparan sulfate (PMID: 35236856; doi: 10.1038/s41467-022-28765-z), which supports the concept that phase separation with heparan sulfate on the cell surface may be a common mechanism for heparan sulfate binding proteins. The comment of the reviewer that phase separation is related to oligomerization is demonstrated in (Figure 1—figure supplement 2C and D), showing that the more easily aggregated mutant, A22K-CCL5, does not undergo phase separation.

      In addition, the authors have used variously labelled CCL5 (like with the organic dye Cy3 or with EGFP) for various reasons (detection and immobilisation). In the view of this reviewer, it would have been necessary to show that all the labelled chemokines yield identical/similar molecular characteristics as the unlabelled wildtype chemokine (such as heparan sulfate binding and chemotaxis). It is well known that labelling proteins either by chemical tags or by fusion to GFPs can lead to manifestly different molecular and functional characteristics.

      Response: We agree with the reviewer that labeling may lead to altered property of a protein, thus, we have compared chemotactic activity of CCL5 and CCL5-EGFP (Figure 2—figure supplement 1). To further verify this, we performed additional experiment to compare chemotactic activity between CCL5 and Cy3-CCL5 (see Author response image 1). For the convenience of readers, we have combined the original Figure 2—figure supplement 1 with the new data (Figure R1), which replaced original Figure 2—figure supplement 1.

      Author response image 1.

      Chemotactic function of CCL5-EGFP and CCL5-Cy3. Cy3-Labeled CCL5 has similar activity as CCL5, 50 nM CCL5 or CCL5-Cy3 were added to the lower chamber of the Transwell. THP-1 cells were added to upper chambers. Data are mean ± s.d. n=3. P values were determined by unpaired two-tailed t-tests. NS, Not Significant.

      Reviewer #2 (Public Review):

      Although the study by Xiaolin Yu et al is largely limited to in vitro data, the results of this study convincingly improve our current understanding of leukocyte migration.

      (1) The conclusions of the paper are mostly supported by the data although some clarification is warranted concerning the exact CCL5 forms (without or with a fluorescent label or His-tag) and amounts/concentrations that were used in the individual experiments. This is important since it is known that modification of CCL5 at the N-terminus affects the interactions of CCL5 with the GPCRs CCR1, CCR3, and CCR5 and random labeling using monosuccinimidyl esters (as done by the authors with Cy-3) is targeting lysines. Since lysines are important for the GAG-binding properties of CCL5, knowledge of the number and location of the Cy-3 labels on CCL5 is important information for the interpretation of the experimental results with the fluorescently labeled CCL5. Was the His-tag attached to the N- or C-terminus of CCL5? Indicate this for each individual experiment and consider/discuss also potential effects of the modifications on CCL5 in the results and discussion sections.

      Response: We agree with the reviewer that labeling may lead to altered property of a protein, thus, we have compared chemotactic activity of CCL5 and CCL5-EGFP (Figure 2—figure supplement 1). To further verify this, we performed additional experiment to compare chemotactic activity between CCL5 and Cy3-CCL5 (see Author response image 1). For the convenience of readers, we have combined the original Figure 2—figure supplement 1 with the new data (Author response image 1), which replaced original Figure 2—figure supplement 1.

      The His-tag is attached to the C-terminus of CCL5, in consideration of the potential impact on the N-terminus.

      (2) In general, the authors appear to use high concentrations of CCL5 in their experiments. The reason for this is not clear. Is it because of the effects of the labels on the activity of the protein? In most biological tests (e.g. chemotaxis assays), unmodified CCL5 is active already at low nM concentrations.

      Response: We agree with the reviewer that the CCL5 concentrations used in our experiments were higher than reported chemotaxis assays and also higher than physiological levels in normal human plasma. In fact, we have performed experiments with lower concentration of CCL5, where the effect of LLPS was not seen though the chemotactic activity of the cytokine was detected. Thus, LLPS-associated chemotactic activity may represent a scenario of acute inflammatory condition when the inflammatory cytokines can increase significantly.

      (3) For the statistical analyses of the results, the authors use t-tests. Was it confirmed that data follow a normal distribution prior to using the t-test? If not a non-parametric test should be used and it may affect the conclusions of some experiments.

      Response: We thank the reviewer for pointing out this issue. As shown in Author response table 1, The Shapiro-Wilk normality test showed that only two control groups (CCL5 and 44AANA47-CCL5+CHO K1) in Figure 3 did not conform to the normal distribution. The error was caused by using microculture to count and calculate when there were very few cells in the microculture. For these two groups, we re-counted 100 μL culture medium to calculate the number of cells. The results were consistent with the positive distribution and significantly different from the experimental group (Author response image 3). The original data for the number of cells chemoattractant by 500 nM CCL5 was revised from 0, 247, 247 to 247, 123, 370 and 500 nM 44AANA47 +CHO-K1 was revised from 1111, 1111, 98 to 740, 494, 617. The revised data does not affect the conclusion.

      Author response table 1.

      Table R1 Shapiro-Wilk test results of statistical data in the manuscript

      Author response image 3.

      Quantification of THP-1collected from the lower chamber. Data are mean ± s.d. n=3. P values were determined by unpaired two-tailed t-tests.

      Recommendations for the authors:

      Reviewer #1:

      See the weaknesses section of the Public Review. In addition, the authors should discuss the X-ray structure of CCL5 in complex with a heparin disaccharide in comparison with their docked structure of CCL5 and a heparin tetrasaccharide.

      Response: Our study, in fact, is strongly influenced by the report (Shaw, Johnson et al., 2004) that heparin disaccharide interaction with CCL5, which is highlighted in the text (page5, line100-102).

      Reviewer #2:

      (1) Clearly indicate in the results section and figure legends (also for the supplementary figures) which form and concentration of CCL5 is used.

      Response: The relevant missing information is indicated across the manuscript.

      (2) Clearly indicate which GAG was used. Was it heparin or heparan sulfate and what was the length (e.g. average molecular mass if known) or source (company?)?

      Response: Relevant information is added in the section “Materials and Methods.

      (3) Line 181: What do you mean exactly with "tiny amounts"?

      Response: “tiny amounts” means 400 transfected cells. This is described in the section of Materials and Methods. It is now also indicated in the text and legend to the figure.

      (4) Lines 216-217: This is a very general statement without a link to the presented data. No combination of chemokines is used, in vivo testing is limited (and I agree very difficult). You may consider deleting this sentence (certainly as an opening sentence for the Discussion).

      Response: We appreciate very much for the thoughtful suggestion of the reviewer. This sentence is deleted in the revised manuscript.

      (5) Why was 5h used for the in vitro chemotaxis assay? This is extremely long for an assay with THP-1 cells.

      Response: We apologize for the unclear description. The 5 hr includes 1 hr pre- incubation of CCL5 with the cells enable to form phase separation. After transferring the cells into the upper chamber, the actual chemotactic assay was 4 hr. This is clarified in the Materials and Methods section and the legend to each figure.

      (6) Define "Sec" in Sec-CCL5-EGFP and "Dil" in the legend of Figure 4.

      Response: The Sec-CCL5-EGFP should be “CCL5-EGFP’’, which has now been corrected. Dil is a cell membrane red fluorescent probe, which is now defined.

      (7) Why are different cell concentrations used in the experiment described in Figure 5?

      Response: The samples were from three volunteers who exhibited substantially different concentrations of cells in the blood. The experiment was designed using same amount of blood, so we did not normalize the number of the cell used for the experiment. Regardless of the difference in cell numbers, all three samples showed the same trend.

      (8) Check the text for some typos: examples are on line 83 "ratio of CCL5"; line 142 "established cell lines"; line 196 "peripheral blood mononuclear cells"; line 224 "to mediate"; line 226 "bind"; line 247 "to form a gradient"; line 248 "of the glycocalyx"; line 343 and 346 "tetrasaccharide"; line 409-410 "wild-type"; line 543 "on the surface of CHO-K1 and CHO-677"; line 568 "white".

      Response: Thanks for the careful reading. The typo errors are corrected and Manuscript was carefully read by colleagues.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      1. The name of the new method "inter-haplotype distance" is more confusing than helpful, as the haplotype information is not critical for implementing this method. First, the mutation spectrum is aggregated genome-wide regardless of the haplotypes where the mutations are found. Second, the only critical haplotype information is that at the focal site (i.e., the locus that is tested for association): individuals are aggregated together when they belong to the same "haplotype group" at the focal site. However, for the classification step, haplotype information is not really necessary: individuals can be grouped based on their genotypes at the given locus (e.g., AA vs AB). As the authors mentioned, this method can be potentially applied to other mutation datasets, where haplotype information may well be unavailable. I hope the authors can reconsider the name and remove the term "haplotype" (perhaps something like "inter-genotype distance"?) to avoid giving the wrong impression that haplotype information is critical for applying this method.

      We appreciate the reviewer's concern about the name of our method. The reviewer is correct that haplotype information is not critical for our method to work, and as a result we've decided to simply rename the approach to "aggregate mutation spectrum distance" (abbreviated AMSD). For simplicity, we refer to the method as IHD throughout our responses to reviewers, but the revised manuscript now refers to AMSD.

      1. The biggest advantage of the IHD method over QTL mapping is alleviation of the multiple testing burden, as one comparison tests for any changes in the mutation spectrum, including simultaneous, small changes in the relative abundance of multiple mutation types. Based on this, the authors claim that IHD is more powerful to detect a mutator allele that affects multiple mutation types. Although logically plausible, it is unclear under what quantitative conditions IHD can actually have greater power over QTL. It will be helpful to support this claim by providing some simulation results.

      This comment prompted us to do a more detailed comparison of IHD vs. QTL power under conditions that are more similar to those observed in the BXD cohort. While preparing the original manuscript, we assumed that IHD might have greater power than QTL mapping in a population like the BXDs because some recombinant inbred lines have accumulated many more germline mutations than others (see Figure 1 in Sasani et al. 2022, Nature). In a quantitative trait locus scan (say, for the fraction of C>A mutations in each line) each BXD's mutation data would be weighted equally, even if a variable number of mutations was used to generate the phenotype point estimate in each line.

      To address this, we performed a new series of simulations in which the average number of mutations per haplotype was allowed to vary. At the low end, some BXDs accumulated as few as 100 total germline mutations, while others have accumulated as many as 2,000. Thus, instead of simulating a mean number of mutations on each simulated haplotype, we allowed the mean number of mutations per haplotype to vary from N to 20N. By simulating a variable count of mutations on each haplotype, we could more easily test the benefits of comparing aggregate, rather than individual, mutation spectra between BXDs.

      In these updated simulations, we find that IHD routinely outperforms QTL mapping under a range of parameter choices (see Author Response image 1). Since IHD aggregates the mutation spectra of all haplotypes with either B or D alleles at each locus in the genome, the method is much less sensitive to individual haplotypes with low mutation counts. We include a mention of these updated simulations on lines 135-138 and describe the updated simulations in greater detail in the Materials and Methods (lines 705-715).

      Author response image 1.

      Power of IHD and QTL mapping on simulated haplotypes with variable counts of mutations. We simulated germline mutations on the specified number of haplotypes (as described in the manuscript) but allowed the total number of mutations per haplotype to vary by a factor of 20.

      1. The flip side of this advantage of IHD is that, when a significant association is detected, it is not immediately clear which mutation type is driving the signal. Related to this, it is unclear how the authors reached the point that "...the C>A mutator phenotype associated with the locus on chromosome 6", when they only detected significant IHD signal at rs46276051 (on Chr6), when conditioning on D genotypes at the rs27509845 (on Chr4) and no significant signal for any 1-mer mutation type by traditional mapping. The authors need to explain how they deduced that C>A mutation is the major source of the signal. In addition, beyond C>A mutations, can mutation types other than C>A contribute to the IHD signal at rs46276051? More generally, I hope the authors can provide some guidelines on how to narrow a significant IHD signal to specific candidate mutation type(s) affected, which will make the method more useful to other researchers.

      We thank the reviewer for pointing out this gap in our logic. We omitted specific instructions for narrowing down an IHD signal to specific mutation type(s) for a few reasons. First, this can be addressed using mutational signature analysis methods that are in widespread use. For example, upon identifying one or more candidate mutator loci, we can enter the mutation spectra of samples with each possible mutator genotype into a program (e.g., SigProfilerExtractor) to determine which combinations of mutation types occur proportionally more often in the genomes that harbor mutators (see Figure 3c in our manuscript). A second approach for narrowing down an IHD signal, highlighted in Figure 3a (and now described in the text of the Results section at lines 256-261), is to simply test which mutation type proportion(s) differ significantly between groups of samples with and without a candidate mutator (for example, with a Chi-square test of independence for each mutation type).

      Although this second approach incurs a multiple testing burden, the burden is offset somewhat by using IHD to identify mutator loci, rather than performing association tests for every possible mutation type to begin with. Although Figure 3a only shows the significant difference in C>A fraction among BXDs with different mutator locus genotypes, Figure 3-figure supplement 1 shows the complete set of 1-mer spectrum comparisons. It is possible that this second approach would not prove very useful in the case of a mutator with a “flat” signature (i.e., a mutator that slightly perturbs the rates of many different mutation types), but in our case it clearly shows which mutation type is affected.

      1. To account for differential relatedness between the inbred lines, the authors regressed the cosine distance between the two aggregate mutation spectra on the genome-wide genetic similarity and took the residual as the adjusted test metric. What is the value of the slope from this regression? If significantly non-zero, this would support a polygenic architecture of the mutation spectrum phenotype, which could be interesting. If not, is this adjustment really necessary? In addition, is the intercept assumed to be zero for this regression, and does such an assumption matter? I would appreciate seeing a supplemental figure on this regression.

      The reviewer raises a good question. We find that the slope of the "distance vs. genetic similarity" regression is significantly non-zero, though the slope estimate itself is small. A plot of cosine distance vs. genome-wide genetic similarity (using all BXDs) is shown below in Author response image 2:

      Author response image 2.

      Relationship between cosine distance and genetic similarity in the BXDs. As described in the Materials and Methods, we computed two values at each marker in the BXDs: 1) the cosine distance between the aggregate mutation spectra of BXDs with either B or D genotypes at the marker, and 2) the correlation between genome-wide D allele frequencies in BXDs with either B or D genotypes at the marker. We then regressed these two values across all genome-wide markers.

      This result indicates that if two groups of BXDs (one with D genotypes and one with B genotypes at a given locus) are more genetically similar, their mutation spectra are also more similar. Since the regression slope estimate is significantly non-zero (p < 2.2e-16), we believe that it's still worth using residuals as opposed to raw cosine distance values. This result also suggests that there may be a polygenic effect on the mutation spectrum in the BXDs.

      We have also generated a plot showing the cosine distance between the mutation spectra of every possible pair of BXDs, regressed against the genetic similarity between each of those pairs (Author Response image 3). Here, the potential polygenic effects on mutation spectra similarity are perhaps more obvious.

      Author response image 3.

      Pairwise cosine distance between BXD mutation spectra as a function of genetic similarity. We computed two values for every possible pair of n = 117 BXDs: 1) the cosine distance between the samples' individual 1-mer mutation spectra and 2) the correlation coefficient between the samples' genome-wide counts of D alleles.

      Private Comments

      1. It will also be useful to see how the power of IHD and QTL mapping depend on the allele frequency of the mutator allele and the sample size, as mutator alleles are likely rare or semi-rare in natural populations (such as the human de novo mutation dataset that the authors mentioned).

      This is another good suggestion. In general, we'd expect the power of both IHD and QTL mapping to decrease as a function of mutator allele frequency. At the same time, we note that the power of these scans should mostly depend on the absolute number of carriers of the mutator allele and less on its frequency. In the BXD mouse study design, we observe high frequency mutators but also a relatively small sample size of just over 100 individuals. In natural human populations, mutator frequencies might be orders of magnitude smaller, but sample sizes may be orders of magnitude larger, especially as new cohorts of human genomes are routinely being sequenced. So, we expect to have similar power to detect a mutator segregating at, say, 0.5% frequency in a cohort of 20,000 individuals, as we would to detect a mutator segregating at 50% frequency in a dataset of 200 individuals.

      To more formally address the reviewer's concern, we performed a series of simulations in which we simulated a population of 100 haplotypes. We assigned the same average number of mutations to each haplotype but allowed the allele frequency of the mutator allele to vary between 0.1, 0.25, and 0.5. The results of these simulations are shown in Author response image 4 and reveal that AMSD tends to have greater power than QTL mapping at lower mutator allele frequencies. We now mention these simulations in the text at lines 135-138 and include the simulation results in Figure 1-figure supplement 4.

      Author response image 4.

      Power of AMSD and QTL mapping on simulated haplotypes with variable marker allele frequencies. We simulated germline mutations on the specified number of haplotypes (as described in the manuscript), but simulated genotypes at the mutator allele such that "A" alleles were at the specified allele frequency.

      1. In the Methods section of "testing for epistasis between the two mutator loci", it will be helpful to explicitly lay out the model and assumptions in mathematical formulae, in addition to the R scripts. For example, are the two loci considered independent when their effects on mutation rate is multiplicative or additive? Given the R scripts provided, it seems that the two loci are assumed to have multiplicative effects on the mutation rate, and that the mutation count follows a Poisson distribution with mean being the mutation rate times ADJ_AGE (i.e., the mutation opportunity times the number of generations of an inbred line). However, this is not easily understandable for readers who are not familiar with R language. In addition, I hope the authors can be more specific when discussing the epistatic interaction between the two loci by explicitly saying "synergistic effects beyond multiplicative effects on the C>A mutation rate".

      The reviewer raises a good point about the clarity of our descriptions of tests for epistasis. We have now added a more detailed description of these tests in the section of the Materials and Methods beginning at line 875. We have also added a statement to the text at lines 289-291: “the combined effects of D genotypes at both loci exceed the sum of marginal effects of D genotypes at either locus alone.” We hope that this will help clarify the results of our tests for statistical epistasis.

      Reviewer 2 (Public Review):

      1. The main limitation of the approach is that it is difficult to see how it might be applied beyond the context of mutation accumulation experiments using recombinant inbred lines. This is because the signal it detects, and hence its power, is based on the number of extra accumulated mutations linked to (i.e. on the same chromosome as) the mutator allele. In germline mutation studies of wild populations the number of generations involved (and hence the total number of mutations) is typically small, or else the mutator allele becomes unlinked from the mutations it has caused (due to recombination), or is lost from the population altogether (due to chance or perhaps selection against its deleterious consequences).

      The reviewer is correct that as it currently exists, IHD is mostly limited to applications in recombinant inbred lines (RILs) like the BXDs. This is due to the fact that IHD assumes that each diploid sample harbors one of two possible genotypes at a particular locus and ignores the possibility of heterozygous genotypes for simplicity. In natural, outbreeding populations, this assumption will obviously not hold. However, as we plan to further iterate on and improve the IHD method, we hope that it will be applicable to a wider variety of experimental systems in the future. We have added additional caveats about the applicability of our method to other systems in the text at lines 545-550.

      Private Comments

      1. On p. 8, perhaps I've misunderstood but it's not clear in what way the SVs identified were relevant to the samples used in this dataset - were the founder strains assembled? Is there any chance that additional SVs were present, e.g. de novo early in the accumulation line?

      Our description of this structural variation resource could have been clearer. The referenced SVs were identified in Ferraj et al. (2023) by generating high-quality long read assemblies of inbred laboratory mice. Both DBA/2J and C57BL/6J (the founder strains for the BXD resource) were included in the Ferraj et al. SV callset. We have clarified our description of the callset at lines 247-248.

      It is certainly possible that individual BXD lines have accumulated de novo structural variants during inbreeding. However, these "private" SVs are unlikely to produce a strong IHD association signal (via linkage to one of the ~7,000 markers) at either the chromosome 4 or chromosome 6 locus, since we only tested markers that were at approximately 50% D allele frequency among the BXDs.

      1. On p. 13, comparing the IHD and QTL approaches, regarding the advantage of the former in that it detects the combined effect of multiple k-mer mutation types, would it not be straightforward to aggregate counts for different types in a QTL setting as well?

      The mutation spectrum is a multi-dimensional phenotype (6-dimensional if using the 1-mer spectrum, 96-dimensional if using the 3-mer spectrum, etc.). Most QTL mapping methods use linear models to test for associations between genotypes and a 1-dimensional phenotype (e.g., body weight, litter size). In the past, we used QTL mapping to test for associations between genotypes and a single element of the mutation spectrum (e.g., the rate of C>A mutations), but there isn't a straightforward way to aggregate or collapse the mutation spectrum into a 1dimensional phenotype that retains the information contained within the full 1-mer or 3-mer spectrum. For that reason, we developed the "aggregate mutation spectrum" approach, as it preserves information about the complete mutation spectrum in each group of strains.

      The reviewer is correct that we could also aggregate counts of different mutation types to, say, perform a QTL scan for the load of a specific mutational signature. For example, we could first perform standard mutational signature analysis on our dataset and then test for QTLs associated with each signature that is discovered. However, this approach would not solve the second problem that our method is designed to solve: the appropriate weighting of samples based on how many mutations they contain.

      1. pp. 15-16: In the discussion of how you account for relatedness between strains, I found the second explanation (on p. 16) much clearer. It would be interesting to know how much variance was typically accounted for by this regression?

      As shown in the response to Reviewer 1, genotype similarity between genotype groups (i.e., those with either D or B genotypes at a marker) generally explains a small amount of variance in the cosine distance between those groups (R2 ~= 0.007). However, since the slope term in that regression is significantly non-zero, correcting for this relationship should still improve our power relative to using raw cosine distance values that are slightly confounded by this relationship.

      1. Similarly, in the section on Applying the IHD method to the BXDs (pp. 18-19), I think this description was very useful, and some or all of this description of the experiment (and how the DNMs in it arise) could profitably be moved to the introduction.

      We appreciate the reviewer’s feedback about the details of the BXD cohort. Overall, we feel the description of the BXDs in the Introduction (at lines 65-73) is sufficient to introduce the cohort, though we now add some additional detail about variability in BXD inbreeding duration (at lines 89-93) to the Introduction as well, since it is quite relevant to some of the new simulation results presented in the manuscript.

      1. A really minor one, not sure if this is for the journal or the authors, but it would be much better to include both page and line numbers in any version of an article for review. My pdf had neither!

      We apologize for the lack of page/line numbers in the submitted PDF. We have now added line numbers to the revised version of the manuscript.

      Reviewer 3 (Public Review):

      1. Under simulated scenarios, the authors' new IHD method is not appreciably more powerful than conventional QTL mapping methods. While this does not diminish the rigor or novelty of the authors findings, it does temper enthusiasm for the IHD method's potential to uncover new mutators in other populations or datasets. Further, adaptation of this methodology to other datasets, including human trios or multigenerational families, will require some modification, which could present a barrier to broader community uptake. Notably, BXD mice are (mostly) inbred, justifying the authors consideration of just two genotype states at each locus, but this decision prevents out-of-the-box application to outbred populations and human genomic datasets. Lastly, some details of the IHD method are not clearly spelled out in the paper. In particular, it is unclear whether differences in BXD strain relatedness due to the breeding epoch structure are fully accounted for in permutations. The method's name - inter-haplotype distance - is also somewhat misleading, as it seems to imply that de novo mutations are aggregated at the scale of sub-chromosomal haplotype blocks, rather than across the whole genome.

      The reviewer raises very fair concerns. As mentioned in response to a question from Reviewer 1, we performed additional simulation experiments that demonstrate the improved power of IHD (as compared to QTL mapping) in situations where mutation counts are variable across haplotypes or when mutator alleles are present at allele frequencies <50% (see Author response image 2 and 3, as well as new supplements to Figure 1 in the manuscript). However, the reviewer is correct that the IHD method is not applicable to collections of outbred individuals (that is, individuals with both heterozygous and homozygous genotypes), which will limit its current applications to datasets other than recombinant inbred lines. We have added a mention of these limitations to the Results at lines 138-141 and the Discussion at lines 545-550, but plan to iterate on the IHD method and introduce new features that enable its application to other datasets. We have also explicitly stated that we account for breeding epochs in our permutation tests in the Materials and Methods at lines 670-671. Both Reviewer 1 and Reviewer 3 raised concerns about the name of our method, and we have therefore changed “inter-haplotype distance” to “aggregate mutation spectrum distance” throughout the manuscript.

      1. Nominating candidates within the chr6 mutator locus requires an approach for defining a credible interval and excluding/including specific genes within that interval as candidates. Sasani et al. delimit their focal window to 5Mb on either side of the SNP with the most extreme P-value in their IHD scan. This strategy suffers from several weaknesses. First, no justification for using 10 Mb window, as opposed to, e.g., a 5 Mb window or a window size delimited by a specific threshold of P-value drop, is given, rendering the approach rather ad hoc. Second, within their focal 10Mb window, the authors prioritize genes with annotated functions in DNA repair that harbor protein coding variants between the B6 and D2 founder strains. While the logic for focusing on known DNA repair genes is sensible, this locus also houses an appreciable number of genes that are not functionally annotated, but could, conceivably, perform relevant biological roles. These genes should not be excluded outright, especially if they are expressed in the germline. Further, the vast majority of functional SNPs are non-coding, (including the likely causal variant at the chr4 mutator previously identified in the BXD population). Thus, the author's decision to focus most heavily on coding variants is not well-justified. Sasani et al. dedicate considerable speculation in the manuscript to the likely identity of the causal variant, ultimately favoring the conclusion that the causal variant is a predicted deleterious missense variant in Mbd4. However, using a 5Mb window centered on the peak IHD scan SNP, rather than a 10Mb window, Mbd4 would be excluded. Further, SNP functional prediction accuracy is modest [e.g., PMID 28511696], and exclusion of the missense variant in Ogg1 due its benign prediction is potentially premature, especially given the wealth of functional data implicating Ogg1 in C>A mutations in house mice. Finally, the DNA repair gene closest to the peak IHD SNP is Rad18, which the authors largely exclude as a candidate.

      We agree that the use of a 10 Mb window, rather than an empirically derived confidence interval, is a bit arbitrary and ad hoc. To address this concern, we have implemented a bootstrap resampling approach (Visscher et al. 1996, Genetics) to define confidence intervals surrounding IHD peaks. We have added a description of the approach to the Materials and Methods at lines 609-622, but a brief description follows. In each of N trials (here, N = 10,000), we take a bootstrap sample of the BXD phenotype and genotype data with replacement. We then perform an IHD scan on the chromosome of interest using the bootstrap sample and record the position of the marker with the largest cosine distance value (i.e., the "peak" marker). After N trials, we calculate the 90% confidence interval of bootstrapped peak marker locations; in other words, we identify the locations of two genotyped markers, between which 90% of all bootstrap trials produced an IHD peak. We note that bootstrap confidence intervals can exhibit poor "coverage" (a measure of how often the confidence intervals include the "true" QTL location) in QTL mapping studies (see Manichaikul et al. 2006, Genetics), but feel that the bootstrap is more reasonable than simply defining an ad hoc interval around an IHD peak.

      The new 90% confidence interval surrounding the IHD peak on chromosome 6 is larger than the original (ad hoc) 10 Mbp window, now extending from around 95 Mbp to 114 Mbp. Notably, the new empirical confidence interval excludes Mbd4. We have accordingly updated our Results and Discussion sections to acknowledge the fact that Mbd4 no longer resides within the confidence interval surrounding the IHD peak on chromosome 6 and have added additional descriptions of genes that are now implicated by the 90% confidence interval. Given the uncertainties associated with using bootstrap confidence intervals, we have retained a brief discussion of the evidence supporting Mbd4 in the Discussion but focus primarily on Ogg1 as the most plausible candidate.

      The reviewer raises a valid concern about our treatment of non-DNA repair genes within the interval surrounding the peak on chromosome 6. We have added more careful language to the text at lines 219-223 to acknowledge the fact that non-annotated genes in the confidence interval surrounding the chromosome 6 peak may play a role in the epistatic interaction we observed.

      The reviewer also raises a reasonable concern about our discussions of both Mbd4 and Ogg1 as candidate genes in the Discussion. Since Mbd4 does not reside within the new empirical bootstrap confidence interval on chromosome 6 and given the strong prior evidence that Ogg1 is involved in C>A mutator phenotypes (and is in the same gene network as Mutyh), we have reframed the Discussion to focus on Ogg1 as the most plausible candidate gene (see lines 357360).

      Using the GeneNetwork resource, we also more carefully explored the potential effects of noncoding variants on the C>A mutator phenotype we observed on chromosome 6. We have updated the Results at lines 240-246 and the Discussion at line 439-447 to provide more evidence for regulatory variants that may contribute to the C>A mutator phenotype. Specifically, we discovered a number of strong-effect cis-eQTLs for Ogg1 in a number of tissues, at which D genotypes are associated with decreased Ogg1 expression. Given new evidence that the original mutator locus we discovered on chromosome 4 harbors an intronic mobile element insertion that significantly affects Mutyh expression (see Ferraj et al. 2023, Cell Genomics), it is certainly possible that the mutator phenotype associated with genotypes on chromosome 6 may also be mediated by regulatory, rather than coding, variation.

      1. Additionally, some claims in the paper are not well-supported by the author's data. For example, in the Discussion, the authors assert that "multiple mutator alleles have spontaneously arisen during the evolutionary history of inbred laboratory mice" and that "... mutational pressure can cause mutation rates to rise in just a few generations of relaxed selection in captivity". However, these statements are undercut by data in this paper and the authors' prior publication demonstrating that a number of candidate variants are segregating in natural mouse populations. These variants almost certainly did not emerge de novo in laboratory colonies, but were inherited from their wild mouse ancestors. Further, the wild mouse population genomic dataset used by the authors falls far short of comprehensively sampling wild mouse diversity; variants in laboratory populations could derive from unsampled wild populations.

      The reviewer raises a good point. In our previous publication (Sasani et al. 2022, Nature), we hypothesized that Mutyh mutator alleles had arisen in wild, outbreeding populations of Mus musculus, and later became fixed in inbred strains like DBA/2J and C57BL/6J. However, in the current manuscript, we included a statement about mutator alleles "spontaneously arising during the evolutionary history of inbred laboratory mice" to reflect new evidence (from Ferraj et al. 2023, Cell Genomics) that the mutator allele we originally identified in Mutyh may not be wild derived after all. Instead, Ferraj et al. suggest that the C>A mutator phenotype we originally identified is caused by an intronic mobile element insertion (MEI) that is present in DBA/2J and a handful of other inbred laboratory strains. Although this MEI may have originally occurred in a wild population of mice, we wanted to acknowledge the possibility that both the original Mutyh mutator allele, as well as the new mutator allele(s) we discovered in this manuscript, could have arisen during the production and inbreeding of inbred laboratory lines. We have also added language to the Discussion at lines 325-327 to acknowledge that the 67 wild mice we analyzed do not comprise a comprehensive picture of the genetic diversity present in wild-derived samples.

      We have added additional language to the Discussion at lines 349-357 in which we acknowledge that the chromosome 6 mutator allele might have originated in either laboratory or wild mice and elaborate on the possibility that mutator alleles with deleterious fitness consequences may be more likely to persist in inbred laboratory colonies.

      1. Finally, the implications of a discovering a mutator whose expression is potentially conditional on the genotype at a second locus are not raised in the Discussion. While not a weakness per se, this omission is perceived to be a missed opportunity to emphasize what, to this reviewer, is one of the most exciting impacts of this work. The potential background dependence of mutator expression could partially shelter it from the action of selection, allowing the allele persist in populations. This finding bears on theoretical models of mutation rate evolution and may have important implications for efforts to map additional mutator loci. It seems unfortunate to not elevate these points.

      We agree and have added additional discussion of the possibility that the C>A mutator phenotypes in the BXDs are a result of interactions between the expression of two DNA repair genes in the same base-excision network to the Discussion section at lines 447-449.

      Private comments

      1. The criteria used to determine or specify haplotype size are not specified in the manuscript. I mention this above but reiterate here as this was a big point of confusion for me when reading the paper. Haplotype length is important consideration for overall power and for proper extension of this method to other systems/populations.

      We may not have been clear enough in our description of our method, and as suggested by Reviewer 1, the name "inter-haplotype distance" may also have been a source of confusion. At a given marker, we compute the aggregate mutation spectrum in BXDs with either B or D genotypes using all genome-wide de novo mutations observed in those BXDs. Since the BXDs were inbred for many generations, we expect that almost all de novo germline mutations observed in an RIL are in near-perfect linkage with the informative genotypes used for distance scans. Thus, the "haplotypes" used in the inter-haplotype distance scans are essentially the lengths of entire genomes.

      1. Results, first paragraph, final sentence. I found the language here confusing. I don't understand how one can compute the cosine distance at single markers, as stated. I'm assuming cosine distance is computed from variants residing on haplotypes delimited by some defined window surrounding the focal marker?

      As discussed above, we aggregate all genome-wide de novo mutations in each group of BXDs at a given marker, rather than only considering DNMs within a particular window surrounding the marker. The approach is discussed in greater detail in the caption of Figure 1.

      1. Nominating candidates for the chr6 locus, Table 1. It would be worth confirming that the three prioritized candidates (Setmar, Ogg1, and Mbd4) all show germline expression.

      Using the Mouse Genome Informatics online resource, we confirmed that all prioritized candidate genes (now including Setmar and Ogg1, but not Mbd4) are expressed in the male and female gonads, and mention this in the Results at lines 228 and 233-234.

      1. Does the chr6 peak on the C>A LOD plot (Figure 2- figure supplement 1) overlap the same peak identified in the IHD scan? And, does this peak rise to significance when using alpha = 0.05? Given that the goal of these QTL scans is to identify loci that interact with the C>A mutator on chr4, it is reasonable to hypothesize that the mutation impact of epistatic loci will also be restricted to C>A mutations. Therefore, I am not fully convinced that the conservative alpha = 0.05/7 threshold is necessary.

      The chromosome 6 peak in Figure 2-figure supplement 1 does, in fact, overlap the peak marker we identified on chromosome 6 using IHD. One reason we decided to use a more conservative alpha of (0.05 / 7) is that we wanted these results to be analogous to the ones we performed in a previous paper (Sasani et al. 2022, Nature), in which we first identified the mutator locus on chromosome 4. However, the C>A peak does not rise to genome-wide significance if we use a less conservative alpha value of 0.05 (see Author response image 5). As discussed in our response to Reviewer 1, we find that QTL mapping is not as powerful as IHD when haplotypes have accumulated variable numbers of germline mutations (as in the BXDs), which likely explains the fact that the peak on chromosome 6 is not genome-wide significant using QTL mapping.

      Author response image 5.

      QTL scan for the fraction of C>A mutations in BXDs harboring D alleles at the locus near Myth QTL scan was performed at a genome-wide significance alpha of 0.05, rather than 0.05/7.

      1. Is there significant LD between the IHD peaks on chr6 and chr4 across the BXD? If so, it could suggest that the signal is driven by cryptic population structure that is not fully accounted for in the author's regression based approach. If not, this point may merit an explicit mention in the text as an additional validation for the authenticity of the chr6 mutator finding.

      This is a good question. We used the scikit-allel Python package to calculate linkage disequilibrium (LD) between all pairs of genotyped markers in the BXD cohort, and found that the two peak loci (on chromosomes 4 and 6) exhibit weak LD (r2 = 4e-5). We have added a mention of this to the main text of the Results at lines 212-213. That being said, we do not think the chromosome 6 mutator association (or the apparent epistasis between the alleles on chromosomes 4 and 6) could be driven by cryptic population structure. Unlike in human GWAS and other association studies in natural populations, there is no heterogeneity in the environmental exposures experienced by different BXD subpopulations. In humans, population structure can create spurious associations (e.g., between height and variants that are in LD and are most common in Northern Europe), but this requires the existence of a phenotypic gradient caused by genetic or environmental heterogeneity that is not likely to exist in the context of inbred laboratory mice that are all the progeny of the same two founder strains.

      1. Discussion, last sentence of the "Possible causal alleles..." section: I don't understand how the absence of the Mariner-family domain leads the authors to this conclusion. Setmar is involved in NHEJ, which to my knowledge is not a repair process that is expected to have a specific C>A mutation bias. I think this is grounds enough for ruling out its potential contributions, in favor of focusing on other candidates, (e.g., Mbd4 and Ogg1).

      The reviewer raises a good point. Our main reason for mentioning the absence of the Marinerfamily domain is that even if NHEJ were responsible for the C>A mutator phenotype, it likely wouldn't be possible for Setmar to participate in NHEJ without the domain. However, the reviewer is correct that NHEJ is not expected to cause a C>A mutation bias, and we have added a mention of this to the text as well at lines 379-382.

      1. Discussion, second to last paragraph of section "Mbd4 may buffer...": The authors speculate that reduced activity of Mbd4 could modulate rates of apoptosis in response to DNA damage. This leads to the prediction that mice with mutator alleles at both Mutyh and Mbd4 should exhibit higher overall mutation rates compared to mice with other genotypes. This possibility could be tested with the authors' data.

      The reviewer raises a good question. As mentioned above, however, we implemented a new approach to calculate confidence intervals surrounding distance peaks and found that this empirical approach (rather than the ad hoc 10-Mbp window approach we used previously) excluded Mbd4 from the credible interval. Although we still mention Mbd4 as a possible candidate (since it still resides within the 10 Mbp window), we have refactored the Discussion section to focus primarily on the evidence for Ogg1 as a candidate gene on chromosome 6.

      In any case, we do not observe that mice with mutator alleles at both the chromosome 4 and chromosome 6 loci have higher overall mutation rates compared to mice with other genotype combinations. This may not be terribly surprising, however, since C>A mutations only comprise about 10% of all possible mutations. Thus, given the variance in other 1-mer mutation counts, even a substantial increase in the C>A mutation rate might not have a detectable effect on the overall mutation rate. Indeed, in our original paper describing the Mutyh mutator allele (Sasani et al. 2022, Nature), we did not identify any QTL for the overall mutation rate in the BXDs and found that mice with the chromosome 4 mutator allele only exhibited a 1.11X increase in their overall mutation rates relative to mice without the mutator allele.

      1. Methods, "Accounting for BXD population structure": An "epoch-aware" permutation strategy is described here, but it is not clear when (and whether) this strategy is used to determine significance of IHD P-values.

      We have added a more explicit mention of this to the Methods section at lines 670-671, as we do, in fact, use the epoch-aware permutation strategy when calculating empirical distance thresholds.

      1. The simulation scheme employed for power calculations is highly specific to the BXD population. This is not a weakness, and perfectly appropriate to the study population used here. However, it does limit the transferability of the power analyses presented in this manuscript to other populations. This limitation may merit an explicit cautionary mention to readers who may aspire to port the IHD method over to their study system.

      This is true. Our simulation strategy is relatively simple and makes a number of assumptions about the simulated population of haplotypes (allele frequencies normally distributed around 0.5, expected rates of each mutation type, etc.). In response to concerns from Reviewer 1, we performed an updated series of simulations in which we varied some of these parameters (mutator allele frequencies, mean numbers of mutations on haplotypes, etc.). However, we have added a mention of the simulation approach's limitations and specificity to the BXDs to the text at lines 545-550.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Yun et al. examined the molecular and neuronal underpinnings of changes in Drosophila female reproductive behaviors in response to social cues. Specifically, the authors measure the ejaculate-holding period, which is the amount of time females retain male ejaculate after mating (typically 90 min in flies). They find that female fruit flies, Drosophila melanogaster, display shorter holding periods in the presence of a native male or male-associated cues, including 2-Methyltetracosane (2MC) and 7-Tricosene (7-T). They further show that 2MC functions through Or47b olfactory receptor neurons (ORNs) and the Or47b channel, while 7-T functions through ppk23 expressing neurons. Interestingly, their data also indicates that two other olfactory ligands for Or47b (methyl laurate and palmitoleic acid) do not have the same effects on the ejaculate-holding period. By performing a series of behavioral and imaging experiments, the authors reveal that an increase in cAMP activity in pC1 neurons is required for this shortening of the ejaculate-holding period and may be involved in the likelihood of remating. This work lays the foundation for future studies on sexual plasticity in female Drosophila.

      The conclusions of this paper are mostly supported by the data, but aspects of the lines used for individual pC1 subtypes and visual contributions as well as the statistical analysis need to be clarified.

      (1) The pC1 subtypes (a - e) are delineated based on their morphology and connectivity. While the morphology of these neurons is distinct, they do share a resemblance that can be difficult to discern depending on the imaging performed. Additionally, genetic lines attempting to label individual neurons can easily be contaminated by low-level expression in off-target neurons in the brain or ventral nerve cord (VNC), which could contribute to behavioral changes following optogenetic manipulations. In Figures 5C - D, the authors generated and used new lines for labeling pC1a and pC1b+c. The line for pC1b+c was imaged as part of another recent study (https://doi.org/10.1073/pnas.2310841121). However, similar additional images of the pC1a line (i.e. 40x magnification and VNC expression) would be helpful in order to validate its specificity.

      We have included the high-resolution images of the expression of the pC1a-split-Gal4 driver in the brain and the VNC in the new figures S6A and S6B.

      (2) The author's experiments examining olfactory and gustatory contributions to the holding period were well controlled and described. However, the experiments in Figure 1D examining visual contributions were not sufficiently convincing as the line used (w1118) has previously been shown to be visually impaired (Wehner et al., 1969; Kalmus 1948). Using another wild-type line would have improved the authors' claims.

      It is evident that w1118 flies are visually impaired and are able to receive a limited amount of visual information in dim red light. Nevertheless, they are able to exhibit MIES phenotypes, which further supports the dispensability of visual information in MIES. In a 2024 study, Doubovetzky et al. (1) found that MIES in ninaB mutant females, which have defects in visual sensation, was not altered. This further corroborates our assertion that vision is likely to be of lesser importance than olfaction in MIES.

      (3) When comparisons between more than 2 groups are shown as in Figures 1E, 3D, and 5E, the comparisons being made were not clear. Adding in the results of a nonparametric multiple comparisons test would help for the interpretation of these results.

      We have revised figures 1E, 3D, 5E and the accompanying legends as suggested.

      Reviewer #2 (Public Review):

      The work by Yun et al. explores an important question related to post-copulatory sexual selection and sperm competition: Can females actively influence the outcome of insemination by a particular male by modulating the storage and ejection of transferred sperm in response to contextual sensory stimuli? The present work is exemplary for how the Drosophila model can give detailed insight into the basic mechanism of sexual plasticity, addressing the underlying neuronal circuits on a genetic, molecular, and cellular level.

      Using the Drosophila model, the authors show that the presence of other males or mated females after mating shortens the ejaculate-holding period (EHP) of a female, i.e. the time she takes until she ejects the mating plug and unstored sperm. Through a series of thorough and systematic experiments involving the manipulation of olfactory and chemo-gustatory neurons and genes in combination with exposure to defined pheromones, they uncover two pheromones and their sensory cells for this behavior. Exposure to the male-specific pheromone 2MC shortens EHP via female Or47b olfactory neurons, and the contact pheromone 7-T, present in males and on mated females, does so via ppk23 expressing gustatory foreleg neurons. Both compounds increase cAMP levels in a specific subset of central brain receptivity circuit neurons, the pC1b,c neurons. By employing an optogenetically controlled adenyl cyclase, the authors show that increased cAMP levels in pC1b and c neurons increase their excitability upon male pheromone exposure, decrease female EHP, and increase the remating rate. This provides convincing evidence for the role of pC1b,c neurons in integrating information about the social environment and mediating not only virgin but also mated female post-copulatory mate choice.

      Understanding context and state-dependent sexual behavior is of fundamental interest. Mate behavior is highly context-dependent. In animals subjected to sperm competition, the complexities of optimal mate choice have attracted a long history of sophisticated modelling in the framework of game theory. These models are in stark contrast to how little we understand so far about the biological and neurophysiological mechanisms of how females implement post-copulatory or so-called "cryptic" mate choice and bias sperm usage when mating multiple times.

      The strength of the paper is decrypting "cryptic" mate choice, i.e. the clear identification of physiological mechanisms and proximal causes for female post-copulatory mate choice. The discovery of peripheral chemosensory nodes and neurophysiological mechanisms in central circuit nodes will provide a fruitful starting point to fully map the circuits for female receptivity and mate choice during the whole gamut of female life history.

      We appreciate the positive response to our work.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      While appreciating the quality of the work the reviewers had a few key concerns that would greatly improve the manuscript. These are:

      (1) In some cases the specific statistical analyses are not clear. Could the authors please clarify what comparisons were made and the specific tests used?

      We have clarified the comparisons made in the multiple comparison analysis and specified the tests used in figures 1E, 3D, 5E.

      (2) Could the authors please include data that verify the expression patterns of their new reagent for pC1a, which will be useful for the community?

      Figure S6 was revised to include the expression of the pC1a-split-Gal4 gene in the brain (Fig. S6A) and the VNC (Fig. S6B).

      (3) A figure summarising their findings in the context of known circuitry will be useful.

      A new Figure 7 has been prepared, which provides a summary of our findings.

      (4) The SAG data are interesting. Do the authors wish to consider moving it to the main text or removing it if too preliminary?

      The supplementary figure 10 and related discussions in the discussion section have been removed.

      In the revised version of this manuscript, we present new evidence that the Or47b gene is required for 2MC-induced cAMP elevation in pC1 neurons, but not for 7T-induced one (see Fig. 5F). This observation supports that Or47b is a receptor for 2MC.

      The following paragraph was inserted at line 248 to provide a detailed description of the new findings: "To further test the role of Or47b in 2MC detection, we generated Or47b-deficient females with pC1 neurons expressing the CRE-luciferase reporter. Females with one copy of the wild-type Or47b allele, which served as the control group, showed robust CRE-luciferase reporter activity in response to either 2MC or 7-T. In contrast, Or47b-deficient females showed robust CRE-luciferase activity in response to to 7-T, but little activity in response to 2MC. This observation suggests that the odorant receptor Or47b plays an essential role in the selective detection of 2MC (Fig. 5F).”

      In addition, the following sentence was inserted at line 308 in the discussion section: “In this study, we provide compelling evidence that 2MC induces cAMP elevation in pC1 neurons and EHP shortening via both the Or47b receptor and Or47b ORNs, suggesting that 2MC functions as an odorant ligand for Or47b.”

      Relative CRE-luciferase reporter activity of pC1 neurons in females of the indicated genotypes, incubated with a piece of filter paper perfumed with solvent vehicle control or the indicated pheromones immediately after mating. The CRE-luciferase reporter activity of pC1 neurons of Or47b-deficient females (Or47b2/2 or Or47b3/3) was observed to increase in response to 7-T but not to 2MC. To calculate the relative luciferase activity, the average luminescence unit values of the female incubated with the vehicle are set to 100%. Mann-Whitney Test (n.s. p > 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001). Gray circles indicate the relative luciferase activity (%) of individual females, and the mean ± SEM of data is presented.

      Reviewer #1 (Recommendations For The Authors):

      (1) There was a discrepancy between the text and the figures. Based on the asterisks above the data in Figure S5A, the data supports only 150 ng of 7-T shortening the ejaculation holding period. However, the text states that (line 190) "150 or 375 ng of 7-T significantly shortened EHP." It would be helpful if the authors clarified this discrepancy.

      The sentence has been revised and now reads as follows: ‘150 ng of 7-T significantly shortened EHP’.

      (2) Based on the current organization of the text, it was not clear how 2MC was identified and its concentrations were known to be physiologically relevant. It would be helpful if the authors could expand on this in lines 178 - 179.

      The following sentences were inserted into the revised version of the manuscript at line 178: The EHP was therefore measured in females incubated in a small mating chamber containing a piece of filter paper perfumed with male CHCs, including 2-methylhexacosane, 2-methyldocosane, 5-methyltricosane, 7-methyltricosane, 10Z-heneicosene, 9Z-heneicosene, and 2MC at various concentrations (not shown). Among these, 2MC at 750 ng was the only one that significantly reduced EHP (Fig. 3A; Fig. S4). 2MC was mainly found in males, but not in virgin females (30). Notably, it is present in D. melanogaster, D. simulans, D. sechellia, and D. erecta, but not in D. yakuba (30, 60).

      (3) The inset pie chart image illustrating MIES in Figure 1A was difficult to interpret. It would be helpful if the authors used a different method for representing this (i.e. a timeline).

      Figure 1A was revised as suggested.

      (4) In lines 121 - 122, the authors state that the females are exposed to "actively courting naive wild type Canton S males." This was difficult to understand and might be improved by removing "actively courting."

      Revised as suggested.

      Reviewer #2 (Recommendations For The Authors):

      (1) Summary figure

      The story is quite comprehensive and contains a lot of detail regarding the interaction of signaling pathways, internal state, and sensory stimuli. I believe a schematic summary figure bringing together all findings could be very helpful and would make it much easier to understand the discussion!

      Figure 7 has been prepared, which provides a summary of the findings and an explanation of the current working model.

      (2) Figure S10/effect on SAG activation of EHP

      At the moment, the quite interesting and relevant result that SAG activation shortens EHP shown in Figure S10 is only referred to in the discussion. Maybe move this to the results and give it a bit more attention? Actually, I believe this is a very exciting finding that could also be the basis for some more interesting speculations about physiological relevance. Since SAG is silenced upon seminal fluid/sex peptide exposure after mating, a mating with failed SAG silencing (i.e. unusually high post-mating SAG activity) could indicate to the female that there was low or failed sex peptide/seminal fluid transfer. In such a case it would be probably advantageous for the female to decrease EHP and quickly remate, as females need the "beneficial" effects of seminal fluid on ovulation and physiology adaptation. SAG could therefore represent another arm of sensing male quality- here not via external pheromones, but internally, via sensing male sex peptide levels.

      If this is a bit preliminary and rather suited to start a new study, Figure S10 could also be removed from the current manuscript.

      Figure S10 and associated text were removed in the revised version of the manuscript.

      (3) PhotoAC experiments in pC1b,c: the authors find that raising cAMP levels in pC1b,c leads to a decrease in EHP. They argue that increased cAMP levels lead to higher excitability of pC1b,c. This implies that the activity of pC1b,c promotes mating plug ejection. I assume the authors have also tried activating pC1b,c directly by optogenetic cation channels? What is the outcome of this? If different from elevating cAMP levels: why so?

      We employed CsChrimson, a red light-sensitive channelrhodopsin, to investigate the effect of optogenetic activation of each pC1 subset on EHP. Optogenetic activation of pC1a, pC1d, or pC1e had little effect on EHP; however, optogenetic activation of pC1b, c significantly increased EHP. This observation was puzzling because optogenetic silencing of the same neurons also increased EHP. In this experiment, females expressing CsChrimson were exposed to red light for the entire period of EHP measurement. Therefore, we suspect that prolonged activation of pC1b and pC1c neurons depleted their neurotransmitter pool, resulting in a silencing effect, but this requires further testing.

      Author response image 1.

      The prolonged optogenetic activation of pC1b, c neurons increases EHP, mimicking silencing of pC1b, c neurons. Females of the indicated genotypes were cultured on food with or without all-trans-retinal (ATR). The ΔEHP is calculated by subtracting the mean of the reference EHP of females cultured in control ATR- food from the EHP of individual females in comparison. The female genotypes are as follows: (A) 71G01-GAL4/UAS-CsChrimson, (B) pC1a-split-Gal4/UAS-CsChrimson, (C) pC1b,c-split-Gal4/UAS-CsChrimson, (D) pC1d-split-Gal4/UAS-CsChrimson, and (E) pC1e-split-Gal4/UAS-CsChrimson. Gray circles indicate the ΔEHP of individual females, and the mean ± SEM of data is presented. Mann-Whitney Test (n.s. p > 0.05; *p <0.05; ****p < 0.0001). Numbers below the horizontal bar represent the mean of the EHP differences between the indicated treatments.

      (4) Text edits

      In general, the manuscript is very well-written, clear, and easy to follow. I recommend small edits of the text and correction of typos in some places:

      l.92: "Drosophila females seem to signal the social sexual context through sperm ejection." This sentence could give the impression that the main function of sperm ejection was to signal to conspecifics. I recommend reformulating to leave it open if ejected sperm is a signal or rather a simple cue. e.g. :"There is evidence that Drosophila females detect the social sexual context through sperm ejected by other females."

      Thanks for the good suggestion. It has been revised as suggested. In addition, we have also made additional changes to the text to correct typos.

      l.97: "transcriptional factor" > "transcription factor"

      Revised as suggested. See lines 77, 98, and 201.

      l.101: "There are Dsx positive 14 pC1 neurons in each brain hemisphere of the brain," > "There are 14 Dsx positive pC1 neurons in each brain hemisphere,"

      Revised as suggested, it now reads " There are 14 Dsx-positive pC1 neurons in each hemisphere of the brain, ...".

      l.160: ", even up to 1440 ng" > ", even when applied at concentrations as high as 1440 ng"

      Revised as suggested.

      l.168: "females with male oenocytes significantly shortens EHP" >"females with male oenocytes significantly shorten EHP"

      Revised as suggested.

      l.181: "it was restored when Orco expression is reinstated" >"it was restored when Orco expression was reinstated"

      Revised as suggested. See line 186.

      l.196: "MIES is almost completely abolished" >"MIES was almost completely abolished"

      Revised as suggested. See line 201.

      l.202: "a sexually dimorphic transcriptional factor gene" >"the sexually determination transcription factor gene" or "the sex specifically spliced transcription factor gene". The gene itself is not dimorphic!

      Revised as suggested, lines 208-210 now read "The same study found that Dh44 receptor neurons involved in EHP regulation also express doublesex (dsx), which encodes sexually dimorphic transcription factors."

      l.211: "to silenced" > "to silence"

      Revised as suggested. See line 216.

      l.229: "females that selectively produce the CRE-Luciferase reporter gene" >"females that selectively express CRE-Luciferase reporter"

      Revised as suggested. See line 234.

      l.271: "neurons. expedite" > delete dot

      Revised as suggested. See line 284.

      l.287: "Furthermore, our study has uncovered the conserved neural circuitry that processes male courtship cues and governs mating decisions play an important role in regulating this behavior." > grammar: "our study has uncovered that the conserved neural circuitry that processes male courtship cues and governs mating decisions plays an important role in regulating this behavior." Also: the meaning of "conserved" is not fully clear to me here: conserved in regards to other Drosophila species? Or do the authors mean: general functional similarity with mouse sexual circuitry?

      The sentence (lines 299-301) has been revised for clarity to read "In addition, our study has revealed that the neural circuit that processes male courtship cues and controls mating decisions plays an important role in regulating this behavior. This fly circuit has recently been proposed to be homologous to VMHvl in the mouse brain (45, 46).”

      l.311: "lipid drolet" > "lipid droplets"

      Revised as suggested. See line 325.

      l.316 and in several instances in the following, including Figure 5 caption (l.723) : "cAMP activity" > "cAMP levels" or "increased cAMP levels"

      Revised as suggested.

      l.323: "in hemibrain" > ", as seen in the hemibrain connectome dataset"

      Revised as suggested. See line 337.

      l.326: "increased cAMP levels causes pC1b,c neurons" > "increased cAMP levels cause pC1b,c neurons"

      Revised as suggested. See line 340.

      l.329: "removement" > "removal" or "ejection"

      Revised as suggested, it now reads "the removal of the mating plug". See line 343.

      l. 330: "This observation well aligns" > "The observation aligns well"

      Revised as suggested. See line 345.

      l. 398: Behavior assays: It would be good to describe how mating plug ejection was identified- by eye? Under the microscope/UV light?

      The following sentence has been added to the behavioral assays section at lines 425-426: The sperm ejection scene, in which the female expels a white sac containing sperm and the mating plug through the vulva, has been directly observed by eye in recorded video footage.

      l.685, Figure legend 2: "thermal activation" > "thermogenetic activation"

      Revised as suggested. See line 430.

      Reference:

      (1) Doubovetzky, N., Kohlmeier, P., Bal, S., & Billeter, J. C. (2023). Cryptic female choice in response to male pheromones in Drosophila melanogaster. bioRxiv, 2023-12.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study uses a novel experimental design to elegantly demonstrate how we exploit stimulus structure to overcome working memory capacity limits. While the behavioural evidence is convincing, the neural evidence is incomplete, as it only provides partial support for the proposed information compression mechanism. This study will be of interest to cognitive neuroscientists studying structure learning and memory.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Huang and Luo investigated whether regularities between stimulus features can be exploited to facilitate the encoding of each set of stimuli in visual working memory, improving performance. They recorded both behavioural and neural (EEG) data from human participants during a sequential delayed response task involving three items with two properties: location and colour. In the key condition ('aligned trajectory'), the distance between locations of successively presented stimuli was identical to their 'distance' in colour space, permitting a compression strategy of encoding only the location and colour of the first stimulus and the relative distance of the second and third stimulus (as opposed to remembering 3 locations and 3 colours, this would only require remembering 1 location, 1 colour, and 2 distances). Participants recalled the location and colour of each item after a delay.

      Consistent with the compression account, participants' location and colour recall errors were correlated and were overall lower compared to a non-compressible condition ('misaligned trajectory'). Multivariate analysis of the neural data permitted decoding of the locations and colours during encoding. Crucially, the relative distance could also be decoded - a necessary ingredient for the compression strategy.

      Strengths:

      The main strength of this study is a novel experimental design that elegantly demonstrates how we exploit stimulus structure to overcome working memory capacity limits. The behavioural results are robust and support the main hypothesis of compressed encoding across a number of analyses. The simple and well-controlled design is suited to neuroimaging studies and paves the way for investigating the neural basis of how environmental structure is detected and represented in memory. Prior studies on this topic have primarily studied behaviour only (e.g., Brady & Tenenbaum, 2013).

      Thanks for the positive comments and excellent summary.

      Weaknesses:

      The main weakness of the study is that the EEG results do not make a clear case for compression or demonstrate its neural basis. If the main aim of this strategy is to improve memory maintenance, it seems that it should be employed during the encoding phase. From then on, the neural representation in memory should be in the compressed format. The only positive evidence for this occurs in the late encoding phase (the re-activation of decoding of the distance between items 1 and 2, Fig. 5A), but the link to behaviour seems fairly weak (p=0.068).

      Thanks for raising this important concern. The reviewer is correct that in principle subjects should employ the compression strategy during the encoding phase when sequence stimuli are presented, yet our results show that the 1-2 trajectory could only be decoded during the late encoding phase.

      Meanwhile, subjects could not get enough information to form the compressed strategy for the location and color sequences until the appearance of the 3rd item. Specifically, based on the first two items, the 1st and 2nd item, they only learn whether the 1st-2nd trajectories are congruent between location and color features. However, they could not predict whether it would also apply to the incoming 2nd-3rd trajectory. This is exactly what we found in neural decoding results. The 1st-2nd trajectory could be decoded after the 2nd item presentation, and the 2nd-3rd trajectory appears after the 3rd item onset. Most critically, the 1st-2nd trajectory is reactivated after the 3rd item but only for alignment condition, implicating formation of the full-sequence compression strategy wherein the previously formed 1st-2nd trajectory is reactivated to be connected to the 2nd-3rd trajectory.

      Regarding the difference between higher- and lower-correlation groups, previously we used the time window based on the overall 2nd-3rd neural reactivations, which might not be sensitive to reactivation strength. We now re-chose the time window based on the higher-correlation group (bootstrap test, p = 0.037, two sides).

      Results have been updated (Figure 5; Results, Page 16). Interpretations about the formation of compression strategy during encoding phase have been added to Results (Page 15-16) and Discussion (Page 18).

      Stronger evidence would be showing decoding of the compressed code during memory maintenance or recall, but this is not presented. On the contrary, during location recall (after the majority of memory maintenance is already over), colour decoding re-emerges, but in the un-compressed item-by-item code (Fig. 4B). The authors suggest that compression is consolidated at this point, but its utility at this late stage is not obvious.

      Thank you for the important question we apologize for omitting previously - neural evidence for the compressive account.

      The reason we did not perform neural decoding during maintenance is that previous EEG/MEG studies including our own failed to reveal robust and sustained time-resolved memory decoding during this period. This is posited to arise from “activity-silent” WM states, wherein memories are not necessarily retained in sustained firing but silently stored within connection weights of WM networks (Stokes, Trends Cogn. Sci., 2015; Rose, Curr Dir Psychol Sci, 2020). Our previous work showed that by transiently perturbing the 'activity-silent' WM using a retrocue or neutral impulse, memories could be reactivated and robustly decoded from neural activities (Huang et al., eLife, 2021). However, due to the lack of transient events during retention in the current design, we do not expect robust decoding results during maintenance. As shown below (AB), this is indeed what we have observed, i.e., no robust neural decoding of trajectories during retention.

      We further used alpha-band (8-11 Hz) neural activities, which have been shown to carry WM information (de Vries et al., Trends Cogn. Sci, 2020; Foster et al., Curr. Biol, 2016; Fukuda et al., J. Neurophysiol, 2016; Sutterer et al., PLOS Biol., 2019) to perform decoding analysis of compression trajectories during maintenance. As shown below, the alpha-band decoding results are indeed stronger than raw activities. Importantly, as shown below (CD), the aligned condition indeed showed significant and long-lasting decoding of compression trajectories (1st-2nd, 2nd-3rd) during retention, while the misaligned condition only showed decoding at the beginning (GH), which might be due to the non-specific offset response of the 3rd item. The results, although not as clear as those during encoding and recalling periods, support the reviewer’s hypothesis that the compressive strategy, if exploited, would be demonstrated during both encoding and maintenance periods. New results and related discussion have been added (Page 16, Supplementary Figure 4).

      With regards to the observed item-by-item color replay during location recall, the reviewer was concerned that this was not consistent with the compressive account, given the lack of trajectory decoding.

      First, item sequences stored in compressive formats need to be converted to sequences during serial recall. In other words, even though color and location sequences are retained in a compressive format (i.e., common 1st-2nd, 2nd-3rd trajectories) throughout the encoding and retention phases, they should be transferred to two sequences as outputs. This is exactly why we performed decoding analysis on individual color and location items rather than trajectories.

      Second and most importantly, we observed serial replay of color sequences when recalling locations. In our view, these results constitute strong evidence for common structure, since the spontaneous color replay during location recall for aligned condition highlights the close bound between color and location sequences stored in WM. In fact, item-by-item serial replay has been well acknowledged as a critical neural index of cognitive maps, not only for spatial navigation but also for higher-order tasks (e.g., Liu et al., Cell, 2019; Liu et al., Science, 2021). Therefore, spontaneous color sequence replay during location sequence recall supports their shared underlying cognitive map.

      Finally, spontaneous serial replay is also correlated with the reactivation of compressive trajectories during encoding (Supplementary Figure 3). This further indicates that serial replay during recalling is associated with memory reorganization formed during encoding.

      Taken together, we posit that memories need to be converted to sequences as outputs, which leads to serial reactivations during recalling. Importantly, the observed spontaneous replay of color sequences for the aligned condition provides strong evidence supporting the associations between color and location sequences in WM.

      We have now added relevant interpretations and discussions (Page 11&13).

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors wanted to test if using a shared relational structure by a sequence of colors in locations can be leveraged to reorganize and compress information.

      Strength:

      They applied machine learning to EEG data to decode the neural mechanism of reinstatement of visual stimuli at recall. They were able to show that when the location of colors is congruent with the semantically expected location (for example, green is closer to blue-green than purple) the related color information is reinstated at the probed location. This reinstatement was not present when the location and color were not semantically congruent (meaning that x displacement in color ring location did not displace colors in the color space to the same extent) and semantic knowledge of color relationship could not be used for reducing the working memory load or to benefit encoding and retrieval in short term memory.

      Weakness:

      The experiment and results did not address any reorganization of information or neural mechanism of working memory (that would be during the gap between encoding and retrieval).

      We apologize for not presenting clear neural evidence for memory reorganization, particularly neural decoding during WM maintenance and retrieval, in the previous version. As below, we explain why the findings provide converging neural evidence for WM reorganization based on a shared cognitive map.

      First, during the encoding phase when location and color sequences are serially presented, our results reveal reactivation of the 1st-2nd trajectories upon the onset of the 3rd item when location and color sequences are aligned with each other. The reactivation of 1st-2nd trajectory right after the emergence of 2nd-3rd trajectory for aligned but not for misaligned sequences strongly supports WM reorganization, since only stimulus sequences that could be compressed based on shared trajectories (aligned condition) show the co-occurrence of 1st-2nd and 2nd-3rd trajectories. Moreover, the relevance of 1st-2nd reactivation to behavioral measurements of color-location reorganization (i.e., behavioral trajectory correlation, Figure 5D) further indicates its link to WM reorganization.

      Second, the reason we originally did not perform neural decoding during maintenance is that previous EEG/MEG studies including our own failed to reveal robust and sustained time-resolved memory decoding during this period. This is posited to arise from “activity-silent” WM states, wherein memories are not necessarily retained in sustained firing but silently stored within connection weights of WM networks (Stokes, Trends Cogn. Sci., 2015; Wolff et al., Nat. Neurosci, 2017; Rose et al., Curr Dir Psychol Sci, 2020). Our previous work showed that by transiently perturbing the 'activity-silent' WM using a retrocue or neutral impulse, memories could be reactivated and robustly decoded from neural activities (Huang et al., eLife, 2021). However, due to the lack of transient events during retention in the current design, we do not expect robust decoding results during maintenance. As shown in Supplementary Figure 4(AB), this is indeed what we have observed, i.e., no robust neural decoding of trajectories during retention.

      We then used alpha-band (8-11 Hz) neural activities, which have been found to carry WM information (de Vries et al., Trends Cogn. Sci, 2020; Foster et al., Curr. Biol, 2016; Fukuda et al., J. Neurophysiol, 2016; Sutterer et al., PLOS Biol., 2019) to perform decoding analysis of compression trajectories during maintenance. As shown below, the alpha-band decoding results are indeed stronger than raw activities. Importantly, as shown in Supplementary Figure 4(CD), the aligned condition indeed showed significant and long-lasting decoding of compression trajectories (1st-2nd, 2nd-3rd) during retention, while the misaligned condition only showed decoding at the beginning (GH), which might be due to the non-specific offset response of the 3rd item. The results, although not as clear as those during encoding and recalling periods, thus also support WM reorganization.

      Finally, during the recalling period, we observed automatic serial replay of color sequences when recalling locations. In our view, these results constitute strong evidence for common structure, since the spontaneous color replay during location recall for aligned condition highlights the close bound between color and location sequences stored in WM. In fact, item-by-item serial replay has been well acknowledged as a critical neural index of cognitive maps, not only for spatial navigation but also for higher-order tasks (e.g., Liu et al., Cell, 2019; Liu et al., Science, 2021). Therefore, spontaneous replay of color sequence during location recall supports their shared underlying cognitive map. Moreover, the spontaneous serial replay is correlated with the reactivation of compressive trajectories during encoding (Supplementary Figure 3). This further indicates that serial replay during recalling is associated with memory reorganization formed during encoding.

      Taken together, we have added updated results about the maintenance period (Page 16, Supplementary Figure 4) and included clarifications and interpretations about why the findings during the encoding and retrieval periods support the WM reorganization view (Page 15-16).

      There was also a lack of evidence to rule out that the current observation can be addressed by schematic abstraction instead of the utilization of a cognitive map.

      The likely impact of the initial submission of the study would be in the utility of the methods that would be helpful for studying a sequence of stimuli at recall. The paper was discussed in a narrow and focused context, referring to limited studies on cognitive maps and replay. The bigger picture and long history of studying encoding and retrieval of schema-congruent and schema-incongruent events is not discussed.

      We agree with the reviewer that cognitive map referred here could be understood as schematic abstraction. Cognitive map refers to the internal representation of spatial relations in a specific environment (Tolman 1948). Schematic abstraction denotes a more broad range of circumstances, whereby the gist or structure of multiple environments or episodes can be integrated (Bartlett, 1932; Farzanfar et al., Nat. Rev. Neurosci, 2023).

      In other words, schema refers to highly abstract framework of prior knowledge that captures common patterns across related experiences, which does not necessarily occur in a spatial framework as cognitive maps do. Meanwhile, in the current design, we specifically manipulate the consistency of spatial trajectory distance between color and location sequences. Therefore, we would argue that cognitive map is a more conservative and appropriate term to frame our findings.

      Relevant discussions have been added (Page 3&19).

      We apologize for the lack of more generalized discussion and have added schema-related literatures. Thanks for the suggestion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Do time-frequency-domain data (e.g., alpha-band power) in the delay provide evidence for delay-period decoding of trajectory lengths? This might strengthen the case for compression.

      Thanks for the suggestion. We now performed decoding analysis of the delay period based on alpha-band power. As shown in supplementary figure 4, both the 1st-2nd and 2nd-3rd trajectories could be decoded for the aligned condition.

      Added in supplementary figure 4 and Page 16.  

      (2) Do participants erroneously apply the compression strategy in the misaligned condition? This would not show up in the trajectory error correlation analysis, but might be visible when examining correlations between raw trajectory lengths.

      Thanks for raising this interesting suggestion. To test the hypothesis, we chose a typical misaligned condition where 1st-2nd trajectory distances are same between location and color sequences, while the 2nd-3rd trajectory distances are different between the two features.

      In this case, participants might exploit the compression strategy for the first two items and erroneously apply the strategy to the 3rd item. If so, we would expect better memory performance for the first two items but worse memory for the 3rd item, compared to the rest of misaligned trials. As shown below, the 1st-2nd aligned trials showed marginally significant higher performance than misaligned trials for the first two items (t(32) = 1.907, p = 0.066, Cohen’s d = 0.332) . Unfortunately, we did not find significant worse performance for the 3rd item between the two conditions (t(32) = -0.4847, p = 0.631, Cohen’s d = -0.084). We observed significant interactions between the last two items and the alignment effect (t(32) = 2.082, p = 0.045, Cohen’s d = 0.362), indicating a trend of applying wrong compression strategy to the 3nd item.

      Author response image 1.

      (3a) Some more detail on some of the methods might help readers. For instance, did trajectories always move in a clockwise direction? Could the direction reverse on the third item? If not, did this induce a response bias? Could such a bias possibly account for the trajectory error correlations

      Sorry for the unclear statement. For individual trial, both the color and location features of the three items are randomly selected from nine possible values without any constraint about the directions. That is to say, the trajectories can move in a clockwise or anticlockwise direction, and the direction can also reverse on the third item in some trials. Thus, we think the current design can actually help us to reduce the influence of response bias. Taking a step back, if trajectory error correlations are due to response bias, we should expect consistent significant correlation for all conditions, instead of only observing significant correlation for 1st-2nd and 2nd-3rd trajectories but not for 1st-3rd trajectory and only in aligned trajectory condition but not in misaligned condition. Therefore, we think the trajectory error correlations cannot be simply explained by response bias.

      Details have been added (Page 23).

      (3b) Is the colour wheel always oriented the same way for a participant? If so, given there are only nine colors, it seems possible that colors are mapped to locations and remembered in a location code instead. This does not seem to be a problem in principle for the behavioural findings, but might change the interpretation of what is being decoded from the EEG. If this is a possibility then this might be acknowledged.

      The color wheel is always oriented the same way for each participant. We agree with the reviewer that it is possible that participants tend to map colors to locations and remembered in a location code. We don’t have sufficient evidence to rule out this possibility. One possible way could be running another experiment with varied color wheel during response period. Meanwhile, we would like to point out that the underlying logic of the current design is based on the facts that thinking spatially is intuitive and spatial metaphors like “location” and “distance” is commonly used to describe world, e.g., the well-known mental number line (Dehaene et al., JEP: General, 1993). Therefore, we expected participants to associate or integrate location and color maps based on trajectory distance.

      The reviewer is correct that the color decoding would reflect spatial location rather than the genuine color feature. This is actually the point of the experimental design, whereby two irrelevant features could be possibly combined within a common cognitive map. Without the realignment of the two feature maps defined in space, subjects could not at all form the strategy to compress the two sequences. In other words, decoding of color sequences could be understood as neural representation of a series of corresponding locations along the ring that are independent of the physical locations of the items.

      Interpretations and clarifications have been added (Page 23&26).

      (4) Does the discretisation of the stimulus distribution (to only 9 possible locations) make the compression strategy easier to use? If the features had been continuously distributed across the location/colour circle, would participants still pick up on and use the shared trajectory structure?

      Thanks for the question. Without further data, it’s hard to say whether the discretization of the stimulus distribution would make the compression strategy easier to use or not, compared to continuous distribution. Both outcomes seem possible. On the one hand, discrete stimulus distribution would result in discrete trajectory distribution, which helps participants to realize the common trajectory strategy. On the other hand, discrete stimulus distribution would result in category or label representation, which may weaken the effectiveness of structure compression strategy. We postulate that our findings could be generalized to continuous trajectories in a cognitive map within certain resolution.

      (5a) Minor point: I disagree that avoiding the same points for location and colour for a given item allows them to be independently decoded. I would argue the contrary - this kind of constraint should create a small anti-correlation that in principle could lead to spurious decoding of one variable (although this seems unlikely here).

      We appreciate the concern. As mentioned above, with discrete stimulus distribution (9 possible values for both color and location domains), it is quite possible that a fraction of trials would share same values in location and color. Therefore, the neural decoding for one domain might be confounded by another domain. To dissociate their neural representations, we imposed constraints that color and location could not occupy the same value for a given item.

      We agree that this kind of constraint might create a small anti-correlation, even though it is not observed here. Future studies using continuous stimulus distribution would reduce the correlation or anti-correlation between stimuli.

      (5b) Very minor point: 1,000 permutations for significance testing seems on the low side. Since some of the p-values are close to 0.05 it may be worth running more permutations.

      Thanks for this suggestion. We got similar results using 1000 or 10000 permutations.

      (6) Missing reference: H. H. Li et al., 2021 (line 213) seems not to be on the list of references.

      Sorry for the mistake. Added.

      Reviewer #2 (Recommendations For The Authors):

      The study aimed to discuss the working memory mechanism, instead, it seems to be focused on the encoding and recall strategies after a short while, I recommend updating the manuscript to refer to the relevant cognitive mechanism.

      There was a strong voice on the effect of using the cognitive map in working memory, without any tests on if indeed a cognitive map was used (for example the novel link between stimuli and how a cognitive map can be used to infer shortcuts). Was the participant required to have any mental map beyond the schema of the shown color ring?

      In the current experiment, to discuss if the effect is driven by utilizing a cognitive map or schematic abstraction of color-relatedness, further analysis is required to possibly assess the effects of schema on neural activity and behavior. Namely,<br /> (1) Was there any reinstatement of schematically congruent (expected) colors that were probed by location 1, at locations 2 and 3 in the MAT condition?

      Thanks for pointing out this possibility. However, we don’t think there will be stable color expectations given location information under the MAT condition. First, as the trajectory distance varied on a trial-by-trial basis, no prior common trajectory knowledge could be used to make inference about the current stimuli in individual trial. Second, the starting points for color and location (1st item) were randomly and independently selected, such that color sequence could not be predicted based on the location sequence for both aligned and misaligned conditions.

      (2) Given that response time can be a behavioral marker of schematic conflict, was the response time faster for congruent than incongruent conditions?

      Thanks for this question. Unfortunately, due to the experimental design, the response time could not be used as a behavioral marker to infer mental conflicts, since participants were not required to respond as fast as possible. Instead, they took their own pace to reproduce sequences without time limit. They could even take a short break before submitting their response to initiate the next trial.

      (3) In case you cannot rule out that utilizing schema is the cognitive mechanism that supports working memory performance (the behavior), please add the classical literature (on the memory of schematically congruent and incongruent events) to the discussion.

      Thanks for this suggestion and we have added relevant literatures now (Page 3&19).

      (4) On page 6, 'common structure in the cognitive map' is the schema, isn't it?

      Correct. Based on our understanding, ‘common structure in the cognitive map’ is a spatial schema.

      (5) In Figure 2 EFG, would you please use a mixed effect model or show evidence that all participants demonstrated a correlation between the location trajectory error and color trajectory error?

      Thanks for the suggestion. We have added the mixed effect model results, which are consistent with Figure 2EFG (AT: 1st-2nd trajectory, β = 0.071, t = 4.215, p < 0.001; 2nd-3rd trajectory, β = 0.077, t = 3.570, p < 0.001; 1st-3rd trajectory, β = 0.019, t = 1.118, p = 0.264; MAT: 1st-2nd trajectory, β = 0.031, t = 1.572, p = 0.116; 2nd-3rd trajectory, β = 0.002, t = 0.128 , p = 0.898; 1st-3rd trajectory, β = -0.017, t = -1.024, p = 0.306).

      In general, doesn't such correlation just show that good participants/trials were good (some did well in the study and some did poorly throughout?)

      We don’t think the trajectory error correlation results just reveal that some participants did well and some participants did poorly. If that is the case, we shouldn’t observe significant correlation in Figure 2D, where we first run correlation for each participant and then test correlation significance at group level. Indeed, trajectory error correlation between color and location domains characterizes the consistent changes between the two domains.

      It is worth to note that the correlation was estimated with signed trajectory errors in color and location domains, which meant that we indeed cared about whether the errors in the two domains were consistently varied in the same direction, i.e., whether longer trajectory memory compared to the actual trajectory in location domain would predict longer trajectory memory in color domain.

      Moreover, as shown in Figure 2EFG, by dividing trials into 4 bins according to the location trajectory error for each participant and pooling the data across participants, we observed 4 clusters along x-axis (location trajectory error). This suggests that participants’ memory performance is rather consistent instead of being extremely good or bad. Besides, if trajectory error correlation is due to different overall memory performance between participants, we should observe significant trajectory error correlations both in AT and MAT conditions, instead of only under AT condition and for 1st-2nd and 2nd-3rd trajectories but not for 1st-3rd trajectory.

      In Figure 2 G, is the marginal error just too big to be sensitive? I am not sure what we are learning here, please clarify.

      Sorry for the confusion. To examine this possibility, we excluded errors which are beyond 2.5 * σ, and still observed non-significant 1st-3rd trajectory error correlation between color and location domains (r = 0.119, p = 0.167).

      The 1st-3rd trajectory showed nonsignificant behavioral correlation and neural representation, which suggests that the current sequential memory task would encourage participants to organize all information by relying more on the adjacent items and their distance. Thus, we think the 1st-3rd trajectory would serve as a control trajectory, which helps us not only exclude other possible explanation (e.g., systematic response bias), but also validate current findings both in behavioral and neural level.

      Results and statements (Page 10-11) added now.

      Author response image 2.

      (6) Regarding the first lines on page 11, did you do qualitative research to know if less information was encoded in congruent conditions?

      The current experimental design is inspired by the mental compression of spatial sequence studies from Dehaene’s lab (Amalric er al., 2017; Roumi et al., 2021), in which they propose that human brain compresses spatial sequence using an abstract language and formalize minimal description length of a sequence as the “language-of-thought complexity.” Based on this evidence, we think less information is required to describe congruent condition compared to incongruent condition. This idea is supported by better memory performance for congruent condition. Unfortunately, we couldn’t manage to quantify how less information was encoded in congruent condition.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, the authors examine the mechanism of action of MOTS-c and its impact on monocyte-derived macrophages. In the first part of the study, they show that MOTS-c acts as a host defense peptide with direct antibacterial activity. In the second part of the study, the authors aim to demonstrate that MOTS-c influences monocyte differentiation into macrophages via transcriptional regulation.

      Major strengths.

      Methods used to study the bactericidal activity of MOTS-c are appropriate and the results are convincing.

      Major weaknesses.

      Methods used to study the impact on monocyte differentiation are inappropriate and the conclusions are not supported by the data shown. A major issue is the use of the THP-1 cell line, a transformed monocytic line which does not mimic physiological monocyte biology. In particular, THP-1 differentiation is induced by PMA, which is a completely artificial system and conclusions from this approach cannot be generalized to monocyte differentiation. The authors would need to perform this series of experiments using freshly isolated monocytes, either from mouse or human. The read-out used for macrophage differentiation (adherence to plastic) is also not very robust, and the authors would need to analyze other parameters such as cell surface markers. It is also not clear whether MOTS-c could act in a cell-intrinsic fashion, as the authors have exposed cells to exogenous MOTS-c in all their experiments. The authors did not perform complementary experiments using MOTS-c deficient monocytes. The authors have also analyzed the transcriptomic changes induced by MOTS-c exposure in macrophages derived from young or old mice. While the results are potentially interesting, the differences observed seem independent from MOTS-c and mainly related to age, therefore the conclusions from this figure are not clear. Another concern is the reproducibility of the experiments, as the authors do not indicate the number of biological replicates analyzed nor the number of independent experiments performed.

      In this study, we employed the THP-1 cell line as a proof-of-principle to elucidate the existence of a firstin-class mitochondrial-encoded host defense peptide. This peptide is expressed in monocytes and serves dual functions: i) direct targeting of bacteria, and ii) regulation of monocyte differentiation. It is noteworthy that THP-1 cells differentiated by PMA have been widely utilized as a model for monocyte differentiation by numerous research groups.  While we acknowledge the significance of utilizing primary monocytes to fully comprehend the translational implications of our findings, conducting a complete replication of our experiments in primary monocytes falls beyond the scope of this study. However, we have conducted several pivotal experiments in primary monocytes, including:  

      i) Demonstration of the induction of endogenous MOTS-c in primary human monocytes during differentiation by M-CSF (Fig 3A).

      ii) Observation of an increased number of adhered monocytes during monocyte differentiation following MOTS-c treatment (Fig 5A).

      iii) Examination of the transcriptional regulation in mouse primary bone marrow-derived macrophages (BMDMs) by MOTS-c, seven days after a single treatment at the onset of differentiation (Fig 6).

      In addition to assessing adherence to plastic, we performed RNA-seq of THP-1 cells during early differentiation with MOTS-c as a measure of accelerated differentiation (Fig 4). The positive correlation between the effects of PMA and PMA+MOTS-c suggests that MOTS-c accelerates the transcriptional changes that occur during differentiation (Fig 4G). We consider this method a more comprehensive evaluation of differentiation as it encompasses the expression of thousands of genes rather than relying on a limited selection of cell surface markers. Future investigations should explore additional indicators of differentiation, including potential epigenetic effects of MOTS-c.

      Our findings indicate that endogenous MOTS-c is induced during monocyte stimulation and translocates into the nucleus (Figs 3-4), implying a cell-intrinsic role for MOTS-c during monocyte differentiation. Although examining MOTS-c deficient monocytes would offer valuable insights, technical limitations currently hinder the production of such monocytes due to the mitochondrial genomic encoding of MOTSc within the 12S rRNA.

      Furthermore, our study reveals that MOTS-c alters gene expression in macrophages similarly across age and sex groups. This observation, illustrated in Fig 6E where the fold changes in clusters 5 and 6 in response to MOTS-c were consistent across all groups, suggests that MOTS-c modulates macrophage gene expression in an age-related manner. We postulate this to be an adaptive response to age-related alterations in the monocyte and macrophage microenvironment.

      The number of biological replicates performed for each experiment is indicated.

      The different parts of the manuscript do not appear well connected and it is not clear what the main message from the manuscript would be. The physiological relevance of this study is also unclear.

      The main message of our manuscript is that the mitochondrial genome encodes for a previously unknown host defense peptide that has physiological roles in modulating immune responses during infection and during aging. We have edited the ‘introduction’ to clarify this.

      Reviewer #2 (Public Review):

      The research study presented by Rice et al. set out to further profile the host defense properties of the mitochondrial protein MOTS-c. To do this they studied i. the potential antimicrobial effects of MOTS-c on common bacterial pathogens E.coli and MRSA, ii. the effects of MOTS-c on the stimulation and differentiation of monocytes into macrophages. This is a well performed study that utilizes relevant methods and cell types to base their conclusions on. However, there appear to be a few weaknesses to the current study that hold it back from more broad application.

      Comment 1: From reading the manuscript methods and results, it is unclear exactly what the synthetic MOTS-c source is. Therefore it is hard to determine whether there may be any impurities in the production of this synthetic protein that may interfere with the results presented throughout the manuscript. Though, the data presented in Supplemental Figure 4F, where E.coli expressing intracellular MOTS-c inhibited bacterial growth certainly support MOTS-c specific effects. Similarly with the experiments showing endogenous MOTS-c levels rising in stimulation and differentiated macrophages (Figure 3).

      We have edited our manuscript to include the source and purity of our synthetic MOTS-c peptide. The MOTS-c peptide used was synthesized by New England Peptides (now Biosynth) with a purity >95% by mass spectrometry.

      Comment 2: It is interesting that the mice receiving bacteria coupled with MOTS-c lost about 10% of their body weight. It would have been interesting to demonstrate the cause of this weight loss since the effect appears to be separate from mere PAMPs as shown by using heat-killed MRSA in Supplemental Figure 5. Was inflammation changed? Is this due to changes in systemic metabolism? Would have been interesting to have seen CRP levels or circulating liver enzymes.

      As suggested, we repeated this experiment to include both the heat-killed and MOTS-c-MRSA groups in the same controlled experiment for comparison (Fig 2; see below). Blood was collected from these mice for evaluation of cytokine levels and markers of organ damage. While only 1/6 controls survived, all MOTSc and heat-killed MRSA-treated mice survived. However, compared to the heat-killed group, the MOTS-cMRSA group lost more weight and had a higher inflammatory profile, but still significantly less than in the control group. We hypothesize that this is due to only partial killing of MRSA by MOTS-c, as suggested by the CFU plated after overnight incubation, leading to a non-lethal infection in these mice. Others have shown that in this peritonitis model, α-hemolysin production by live MRSA is a key factor in toxicity, rather than PAMP-induced shock (PMID: 8975909; 22802349), which is consistent with the absence of death following heat-killed MRSA inoculation.

      Despite these concerns, the data are well suited to answering their research question, and they open up the door to studying how mitochondrial peptides like MOTS-c could have roles outside of the mitochondria.

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improvement

      (1) The authors need to indicate in each legend the number of biological replicates analyzed and the number of independent experiments performed. This is essential.

      We have included the number of biological replicates analyzed.

      (2) The authors need to repeat the key experiments using freshly isolated monocytes, either human or mouse. THP-1 cells are abnormal cells and findings from these cells cannot be generalized to monocytes. For instance, in Figures 3A and B, it is clear that the kinetics of MOTS-c expression are different between THP-1 cells and human blood monocytes.

      The kinetics of THP-1 cells compared to human monocytes are slightly different, as expected by using different cells and different differentiation cues (M-CSF vs PMA). However, our findings collectively demonstrate the same effect, that each stimulus transiently induces the expression of MOTS-c within 24 hours in monocytes.

      In Figure 3A, the authors should show what happens in the absence of MCSF. Is MOTS-c expression upregulated by culture alone?

      There is some degree of baseline expression of MOTS-c in a resting state, and MOTS-c expression is significantly increased upon stimulation. This expression may be higher in primary monocytes than THP-1 cells, given that these monocytes are inevitably stressed by being removed from the native environment and put through the purification process.

      (3) In Figure 4A, a control for cytoplasmic contamination in the nuclear fraction is missing.

      We now include GAPDH detection in the nuclear fraction.  

      Author response image 1.

      (4) The RNA-seq analysis shown in Figure 4 is not very informative. What genes are differentially expressed? The authors should provide a list of these genes as supplementary information and highlight some key genes in the figure and text.

      The complete list of these genes is provided in Tables S1 and S2. We chose not to highlight specific genes in this paper due to the lack of sufficient evidence identifying any particular genes as key factors at this time.

      (5) In Figure 5A, a control is missing: the authors should treat the monocytes with the same volume of 'vehicle' (presumably it is water).

      In all experiments with MOTS-c treatment, the controls were treated with the same volume of vehicle (water). We have edited legends to state this.

      (6) In Figure 6, the differences observed seem independent on MOTS-c. The conclusions from this figure are overstated and need to be rephrased and clarified.

      MOTS-c shifted gene expression in macrophages in a similar manner regardless of age and sex, as shown in Fig 6E where the fold changes in clusters 5 and 6 in response to MOTS-c were similar in all groups. Independently, aging alone increases the expression of these same genes related to antigen presentation and interferon signaling, suggesting that MOTS-c shifts macrophage gene expression in an age-related manner – the expression of antigen presentation and interferon-related genes have been shown to be highly age-related (PMID: 36040389, 32669714, 36622281, 31754020). We hypothesize this to be an adaptive response to age-related changes in the monocyte and macrophage microenvironment.

      (7) Adherence to plastic is not a robust read-out for monocyte differentiation into macrophages. The authors need to examine other parameters, for instance characteristic cell surface markers for macrophages.

      As a read-out of accelerated differentiation, in addition to adherence to plastic we performed RNA-seq of THP-1 cells during early differentiation with MOTS-c (Fig 4). The positive correlation between the effects of PMA and effects of PMA+MOTS-c suggest MOTS-c is accelerating the transcriptional changes that occur during differentiation (Fig 4G). We believe this to be a more robust assessment of differentiation as it relies on the expression of thousands of genes rather than a limited selection of cell surface markers. Further studies are needed to assess other read-outs of differentiation, including possible epigenetic effects of MOTS-c.

      (8) It is not clear whether MOTS-c could have a cell-intrinsic effect in monocytes. The results should be strengthened by examining the differentiation of monocytes deficient for MOTS-c (without addition of exogenous MOTS-c).

      We have shown that endogenous MOTS-c is induced during monocyte stimulation and translocates into the nucleus (Figs 3-4), suggesting that MOTS-c does have a cell-intrinsic role during monocyte differentiation.

      While having MOTS-c deficient monocytes would certainly be insightful, because MOTS-c is encoded within the mitochondrial genome in the 12S rRNA there are currently technical limitations in producing these monocytes.

      Other points

      (1) The paper would benefit from a more extended discussion to understand the physiological relevance of these findings. What cells would release MOTS-c in vivo, and how would that affect monocytes ? Is there a cell-intrinsic of MOTS-c in monocytes, and if so what would be the signals inducing its expression during differentiation ? These aspects should be discussed by the authors so that the readers can understand their views.

      We thank the reviewer for their suggestion and have edited the discussion in our revised manuscript.  

      MOTS-c has been detected in various tissue and cell types, including the liver, muscle, T cells, monocytes/macrophages, and epithelial cells. This aligns with MOTS-c being referred to in literature as a cytokine, which are typically expressed by a broad range of cell types. Consistent with this, we also propose that MOTS-c would be expressed in cells known to express HDPs.

      We hypothesize that MOTS-c acts in both a cell-intrinsic and extrinsic manner in vivo, consistent with known HDPs, to both target bacteria directly and modulate immune cell responses. In vitro, M-CSF, PMA, LPS, and IFNγ each induced MOTS-c expression. In vivo, monocytes respond to a range of stimuli that influence their differentiation, and these stimuli may induce MOTS-c as well. We have previously published that MOTS-c acts primarily under conditions of cell stress, such as nutrient deprivation and oxidative stress, to help restore homeostasis. While MOTS-c did regulate macrophage gene expression in resting “M0-like” macrophages, we hypothesize that the physiological role of MOTS-c is to regulate cell adaptation to stress, therefore the context under which monocytes differentiate will be an important factor determining the functional effects of MOTS-c. In future studies, we plan to test whether the immuno-modulatory effects of MOTS-c are dependent on the environment during differentiation.

      (2) Scale bar appear to be missing from Figure 1G.

      We apologize for the poor resolution of the scale bar. We have made it easily recognizable in the revised figure.  

      (3) It is not very clear what is shown in Figure S2. The authors should better explain what the images represent.

      Figure S2 is related to Figure 1D and Figure S1. In this experiment, E. coli, S. typhimurium, and P. aeruginosa cultures were treated with MOTS-c (100uM). We observed that only E. coli aggregated immediately, while

      S. typhimurium and P. aeruginosa did not show aggregation. This suggests that MOTS-c exhibits specificity in targeting certain types of bacteria, although the underlying basis of this specificity is currently unknown.  

      We have revised the legend as follows: 'MOTS-c exhibits specificity in bacterial targeting. MOTS-c (100 μM) treatment causes immediate aggregation of E. coli but not S. typhimurium or P. aeruginosa (n=6). Representative image shown. See Figure 1D'.

      Reviewer #2 (Recommendations For The Authors):

      This is a beautifully executed study and a well written manuscript. I generally don't have much critical feedback to give based on my reading. The only recommendation I have to improve the completeness of the data would be in relation to Figure 5E and F. The metabolic phenotype of LPS stimulated monocytes/macrophages is more typically the Warburg effect where oxidative phosphorylation is reduced (as you show with a lowered OCR), but with a concomitant elevation in lactate production. It would have been nice to see either i. the ECAR levels from your seahorse data, or ii. separate lactate measurements on your supernatants. This would go a long way to further explaining the phenotype described in the figure.

      We greatly appreciate the reviewer's positive feedback. The data provided below are ECAR measurements obtained from the Seahorse assay. However, it's important to note that the assays were originally designed for OCR measurement (e.g. buffered media unsuitable for ECAR measurements, use of mitochondrial complex inhibitors, etc.), thus rendering the ECAR data unreliable for accurately assessing glycolysis. Consequently, while we share this data with the reviewer, we believe it is inappropriate to include it in the manuscript (hence omitted in the original submission).

      Author response image 2.

      Furthermore, we are currently engaged in a separate manuscript focusing on elucidating the immunometabolic mechanisms of MOTS-c in macrophages. We intend for this manuscript to stand alone, providing a comprehensive exploration of metabolic pathways, including a detailed untargeted metabolomics map spanning multiple time-points.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors sought to test whether anterior insular cortex neurons increase or decrease firing during fear behavior and freezing, bi-directionally control fear via separate, anatomically defined outputs. Using a fairly simple behavior where mice were exposed to tone-shock pairings, they found roughly equal populations that do indeed either increase or decrease firing during freezing. Next, they sought to test whether these distinct populations may also have distinct outputs. Using retrograde tracers they found that the anterior insular cortex contains non-overlapping neurons which project to the mediodorsal thalamus or amygdala. Mediodorsal thalamus-projecting neurons tended to cluster in deep cortical layers while amygdala-projecting neurons were primarily in more superficial layers. Stimulation of insula-thalamus projection decreased freezing behavior, and stimulation of insula-amygdala projections increased fear behavior. Given that the neurons that increased firing were located in deep layers, that thalamus projections occurred in deep layers, and that stimulation of insula-thalamus neurons decreased freezing, the authors concluded that the increased firing neurons may be thalamus projections. Similarly, given that decreased-firing neurons tended to occur in more superficial layers, that insula-amygdala projections were primarily superficial, and that insula-amygdala stimulation increased freezing behavior, authors concluded that the decreased firing cells may be amygdala projections. The study has several strengths though also some caveats.

      Strengths:

      The potential link between physiological activity, anatomy, and behavior is well laid out and is an interesting question. The activity contrast between the units that increase/decrease firing during freezing is clear.

      It is nice to see the recording of extracellular spiking activity, which provides a clear measure of neural output, whereas similar studies often use bulk calcium imaging, a signal that rarely matches real neural activity even when anatomy suggests it might (see London et al 2018 J Neuro - there are increased/decreased spiking striatal populations, but both D1 and D2 striatal neurons increase bulk calcium).

      Weaknesses:

      The link between spiking, anatomy, and behavior requires assumptions/inferences: the anatomically/genetically defined neurons which had distinct outputs and opposite behavioral effects can only be assumed the increased/decreased spiking neurons, based on the rough area of the cortical layer they were recorded.

      Yes, we are aware that we could not provide a direct link between spiking, anatomy and behavior. We have specifically noted this in the discussion section and added a possible experiment that could be carried out to provide a more direct link in a future study.

      [Lines 371-375] We would like to provide a more direct evidence between the neuronal response types and projection patterns in future studies by electrophysiologically identifying freezing-excited and freezing-inhibited aIC neurons and testing whether those neurons activates to optogenetic activation of amygdala or medial thalamus projecting aIC neurons.

      The behavior would require more control to fully support claims about the associative nature of the fear response (see Trott et al 2022 eLife) - freezing, in this case, could just as well be nonassociative. In a similar vein, fixed intertrial intervals, though common practice in the fear literature, pose a problem for neurophysiological studies. The first is that animals learn the timing of events, and the second is that neural activity is dynamic and changes over time. Thus it is very difficult to determine whether changes in neural activity are due to learning about the tone-shock contingency, timing of the task, simply occur because of time and independently of external events, or some combination of the above.

      Trott et al. (2022) stated that "...freezing was the purest reflection of associative learning." The nonassociative processes mentioned in the study were related to running and darting behaviors, which the authors argue are suppressed by associative learning. Moreover, considerable evidence from immediate postshock freezing and immediate postshock context shift studies all indicate that the freezing response is an associative (and not nonassociative) response (Fanselow, 1980 and 1986; and Landeira-Fernandez et al., 2006). Thus, our animals' freezing response to the tone CS presentation in a novel context, following three tone CS-footshock US pairings, most likely reflects associative learning. 

      Concerning the issue of fixed inter-trial intervals (ITIs), which are standard in fear conditioning studies, particularly those with few CS-US paired trials, we acknowledge the challenge in interpreting the neural correlates of behavior. However, the ITIs in our extinction study was variable and we still found neural activities that had significant correlation with freezing. The results of our extinction study, carried out with variable it is, suggest that the aIC neural activity changes measured in this study is likely due to freezing behavior associated with fear learning, not due to learning the contingencies of fixed ITIs.

      Reviewer #2 (Public Review):

      In this study, the authors aim to understand how neurons in the anterior insular cortex (insula) modulate fear behaviors. They report that the activity of a subpopulation of insula neurons is positively correlated with freezing behaviors, while the activity of another subpopulation of neurons is negatively correlated to the same freezing episodes. They then used optogenetics and showed that activation of anterior insula excitatory neurons during tones predicting a footshock increases the amount of freezing outside the tone presentation, while optogenetic inhibition had no effect. Finally, they found that two neuronal projections of the anterior insula, one to the amygdala and another to the medial thalamus, are increasing and decreasing freezing behaviors respectively. While the study contains interesting and timely findings for our understanding of the mechanisms underlying fear, some points remain to be addressed.

      We are thankful for the detailed and constructive comments by the reviewer and addressed the points. Specifically, we included possible limitations of using only male mice in the study, included two more studies about the insula as references, specified the L-ratio and isolated distance used in our study, added the ratio of putative-excitatory and putative-inhibitory neurons obtained from our study, changed the terms used to describe neuronal activity changes (freezing-excited and freezing-inhibited cells), added new analysis (Figure 2H), rearranged Figure 2 for clarity, added new histology images, and added atlas maps with viral expressions (three figure supplements).

      Reviewer #1 (Recommendations For The Authors):

      - I would suggest keeping the same y-axis for all figures that display the same data type - Figure 5D, for example.

      Thank you for the detailed suggestion. We corrected the y-axis that display the same data type to be the same for all figures.

      - In the methods, it says 30s bins were used for neural analysis (line 435). I cannot imagine doing this, and looking at the other figures, it does not look like this is the case so could you please clarify what bins, averages, etc were used for neural and behavioral analysis?

      Bin size for neural analysis varied; 30s, 5s, 1s bins were used depending on the analysis. We corrected this and specified what time bin was used for which figure in the methods.

      Bin size for neural and freezing behavior was 30s and we also added this to the methods.

      - I would not make any claims about the fear response here being associative/conditional. This would require a control group that received an equal number of tone and shock exposures, whether explicitly unpaired or random.

      The unpaired fear conditioning paradigm, unpaired tone and shock, suggested by the reviewer is well characterized not to induce fear behavior by CS (Moita et al., 2003 and Kochli et al., 2015). In addition, considerable evidence from immediate post-shock freezing and immediate post-shock context shift studies all indicate that the freezing response is an associative (and not nonassociative) response (Fanselow, 1980 and 1986; and Landeira-Fernandez et al., 2006). Thus, our animals' freezing response to the tone CS presentation in a novel context, following three tone CS-footshock US pairings, most likely reflects associative learning.

      - I appreciate the discussion about requiring some inference to conclude that anatomically defined neurons are the physiologically defined ones. This is a caveat that is fully disclosed, however, I might suggest adding to the discussion that future experiments could address this by tagging insula-thalamus or insula-amygdala neurons with antidromic (opto or even plain old electric!) stimulation. These experiments are tricky to perform, of course, but this would be required to fully close all the links between behavior, physiology, and anatomy.

      As suggested, we have included that, in a future study, we would like to elucidate a more direct link between physiology, anatomy and behaviors by optogenetically tagging the insula-thalamus/insula-amygdala neurons and identifying whether it may be a positive or a negative cell (now named the freezing-excited and freezing-inhibited cells, respectively) in the discussion.

      [Lines 371-375] We would like to provide a more direct evidence between the neuronal response types and projection patterns in future studies by electrophysiologically identifying freezing-excited and freezing-inhibited aIC neurons and testing whether those neurons activates to optogenetic activation of amygdala or medial thalamus projecting aIC neurons.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) As all experiments have been performed only in male mice, the authors need to clearly state this limit in the introduction, abstract, and title of the manuscript.

      With increasing number of readers becoming interested in the biological sex used in preclinical studies, we also feel that it should be mentioned in the beginning of the manuscript. As suggested, we explicitly wrote that we only used male mice in the title, abstract, and introduction. In addition, we discussed possible limitations of only using male mice in the discussion section as follows:

      [Lines 381-386] Another factor to consider is that we have only used male mice in this study. Although many studies report that there is no biological sex difference in cued fear conditioning (42), the main experimental paradigm used in this study, it does not mean that the underlying brain circuit mechanism would also be similar. The bidirectional fear modulation by aIC→medial thalamus or the aIC→amygdala projections may be different in female mice, as some studies report reduced cued fear extinction in females (42).

      (2) The authors are missing important publications reporting findings on the insular cortex in fear and anxiety. For example, the authors should cite studies showing that anterior insula VIP+ interneurons inhibition reduces fear memory retrieval (Ramos-Prats et al., 2022) and that posterior insula neurons are a state-dependent regulator of fear (Klein et al., 2021). Also, regarding the anterior insula to basolateral amygdala projection (aIC-BLA), the author should include recent work showing that this population encodes both negative valence and anxiogenic spaces (Nicolas et al., 2023). 

      We appreciate the detailed suggestions and we added appropriate publications in the discussion section. The anterior insula VIP+ interneuron study (Ramos-Prats et al., 2022) is interesting, but based on the evidence provided in the paper, we felt that the role of aIC VIP+ interneuron in fear conditioning is low. VIP+ interneurons in the aIC seem to be important in coding sensory stimuli, however, it’s relevance to conditioned stimuli seems to be low; overall VIP intracellular calcium activity to CS was low and did not differ between acquisition and retrieval. Also, inhibition of VIP did not influence fear acquisition. VIP inhibition during fear acquisition did reduce fear retrieval (CS only, no light stimulation), but this does not necessarily mean that VIP activity will be involved in fear memory storage or retrieval, especially because intracellular calcium activity of VIP+ neurons was low during fear conditioning and retrieval.

      Studies by Klein et al. (2021) and Nicolas et al. (2023) are integrated in the discussion section as follows.

      [Lines 297-301] Group activity of neurons in the pIC measured with fiberphotometry, interestingly, exhibited fear state dependent activity changes—decreased activity with high fear behavior and increased activity with lower fear behavior (29)—suggesting that group activity of the pIC may be involves in maintain appropriate level of fear behavior.

      [Lines 316-319] Another distinction between the aIC and pIC may be related with anxiety, as a recent study showed that group activity of aIC neurons, but not that of the pIC, increased when mice explored anxiogenic space (open arms in an elevated plus maze, center of an open field box) (32).

      (3) The authors should specify how many neurons they excluded after controlling the L-ratio and isolation distance. It is also important to specify the percentage of putative excitatory and inhibitory interneurons recorded among the 11 mice based on their classification (the number of putative inhibitory interneurons in Figure 1D seems too low to be accurate).

      We use manual cluster cutting and only cut clusters that are visually well isolated. So we hardly have any neurons that are excluded after controlling for L-ratio and isolation distance. The criterion we used was L-ratio<0.3 and isolation distance>15, and we specified this in the methods as follows.

      [Lines 454-458] We only used well-isolated units (L-ratio<0.3, isolation distance>15) that were confirmed to be recorded in the aIC (conditioned group: n = 116 neurons, 11 mice; control group: n = 14 neurons, 3 mice) for the analysis (46). The mean of units used in our analysis are as follows: L-ratio = 0.09 ± 0.012, isolation distance = 44.97 ± 5.26 (expressed as mean ± standard deviation).

      As suggested, we also specified the percentage of putative excitatory and inhibitory interneurons recorded from our study in the results and methods section. The relative percentage of putative excitatory and inhibitory interneurons were similar for both the conditioned and the control groups (conditioned putative-excitatory: 93.1%, putative-inhibitory: 6.9%; control putative-excitatory: 92.9%, putative-inhibitory: 7.1%). Although the number of putative-interneurons isolated from our recordings is low that is what we obtained. Putative inhibitory neurons, probably because of their relatively smaller size, has a tendency to be underrepresented than the putative excitatory cells.

      [Lines 83-87] Of the recorded neurons, we analyzed the activity of 108 putative pyramidal neurons (93% of total isolated neurons) from 11 mice, which were distinguished from putative interneurons (n = 8 cells, 7% of total isolated neurons) based on the characteristics of their recorded action potentials (Figure 1D; see methods for details).

      [Lines 464-467] The percentage of putative excitatory neurons and putative inhibitory interneurons obtained from both groups were similar (conditioned putative-excitatory: 93.1%, putative-inhibitory: 6.9%; control putative-excitatory: 92.9%, putative-inhibitory: 7.1%).

      (4) While the use of correlation of single-unit firing frequency with freezing is interesting, classically, studies analyze the firing in comparison to the auditory cues. If the authors want to keep the correlation analysis with freezing, rather than correlations to the cues, they should rename the cells as "freezing excited" and "freezing inhibited" cells instead of positive and negative cells.

      As suggested, we used the terms “freezing-excited” and “freezing-inhibited” cells instead of positive and negative cells.

      (5) To improve clarity, Figure 2 should be reorganized to start with the representative examples before including the average of population data. Thus Panel D should be the first one. The authors should also consider including the trace of the firing rate of these representative units over time, on top of the freezing trace, as well as Pearson's r and p values for both of them. Then, the next panels should be ordered as follows: F, G, H, C, A, B, I, and finally E.

      We have rearranged Figure 2 based on the suggestions.

      (6) It is unclear why the freezing response in Figure 2 is different in current panels F, G, and H. Please clarify this point.

      It was because the freezing behaviors of slightly different population of animals were averaged. Some animals did not have positive/negative (or both) cells and only the behavior of animals with the specified cell-type were used for calculating the mean freezing response. With rearrangement of Figure 2, now we do not have plots with juxtaposed mean neuronal response-types and behavior.

      (7) Even though the peak of tone-induced firing rate change between negative and positive cells is 10s later for positive cells, the conclusion that this 'difference suggests differential circuits may regulate the activities of different neuron types in response to fear' is overstating the observation. This statement should be rephrased. Indeed, it could be the same circuits that are regulated by different inputs (glutamatergic, GABA, or neuromodulatory inputs).

      We agree and delete the statement from the manuscript.

      (8) The authors mention they did not find tone onset nor tone offset-induced responses of anterior insula neurons. It would be helpful to represent this finding in a Figure, especially, which were the criteria for a cell to be tone onset or tone offset responding.

      We added how tone-onset and tone-offset were analyzed in the methods section and added a plot of the analysis in Figure 2H.

      (9) Based on the spread of the viral expression shown in Figure 3B, it appears that the authors are activating/inhibiting insula neurons in the GI layer, whereas single-unit recordings report the electrodes were located in DI, AID, and AIV layers. The authors should provide histology maps of the viral spread for ChR2, NpHR3, and eYFP expression.

      Thank you for the excellent suggestion. Now the histological sample in Figure 3B is a sample with expression in the GI/DI/AID layers and it also has an image taken at higher resolution (x40) to show that viral vectors are expressed inside neurons. We also added histological maps with overlay of viral expression patterns of the ChR2, eYFP, and NpHR3 groups in Figure 3—figure supplement 1.

      (10) In Figure 5B, the distribution of terminals expressing ChR2 appears much denser in CM than in MD. This should be quantified across mice and if consistent with the representative image, the authors should refer to aIC-CM rather than aIC-MD terminals.

      Overall, we referred to the connection as aIC-medial thalamus, which collectively includes both the CM and the MD. Microscopes we have cannot determine whether terminals end at the CM or MD, but the aIC projections seems to pass through the CM to reach the MD. The Allen Brain Institute’s Mouse brain connectivity map (https://connectivity.brain-map.org/projection/experiment/272737914) of a B6 mouse, the mouse strain we used in our study, with tracers injected in similar location as our study also supports our speculation and shows that aIC neuronal projections terminate more in the MD than in the CM. In addition, the power of light delivered for optogenetic manipulation is greatly reduced over distance, and therefore, the MD projecting terminals which is closer to the optic fiber will be more likely to be activated than the CM projecting terminals. However, since we could not determine whether the aIC terminate at the CM or the MD, we collectively referred to the connection as the aIC-medial thalamus throughout the manuscript.

      Author response image 1.

      (11) Histological verifications for each in vivo electrophysiology, optogenetic, and tracing experiments need to include a representative image of the implantation/injection site, as well as a 40x zoom-in image focusing on the cell bodies or terminals right below the optic fiber (for optogenetic experiments). Moreover, an atlas map including all injection locations with the spread of the virus and fiber placement should be added in the Supplement Figures for each experiment (see Figure S1 Klein et al., 2021). Similarly, the authors need to add a representation of the spread of the retrograde tracers for each mouse used for this tracing experiment.

      As suggested, we added a histology sample showing electrode recording location for in-vivo electrophysiology in Figure 1 and added atlas maps for the optogenetic and tracing experiments in supplementary figures. We also provide a 40x zoom-in image of the expression pattern for the optogenetic experiments (Figure 3B).

      (12) To target anterior insula neurons, authors mention coordinates that do not reach the insula on the Paxinos atlas (AP: +1.2 mm, ML: -3.4 mm, DV: -1.8 mm). If the DV was taken from the brain surface, this has to be specified, and if the other coordinates are from Bregma, this also needs to be specified. Finally, the authors cite a review from Maren & Fanselow (1996), for the anterior insula coordinates, but it remains unclear why.

      AP and ML coordinates are measurement made in reference to the bregma. DV was calculated from the brain surface. We specified these in the Methods. We did not cite a review from Maren & Fenselow for the aIC coordinates.

      Minor comments:

      (1) A schematic of the microdrive and tetrodes, including the distance of each tetrode would also be helpful.

      We used a handcrafted Microdrives with four tetrodes. Since they were handcrafted, the relative orientation of the tetrodes varies and tetrode recording locations has to be verified histologically. We, however, made sure that the distance between tetrodes to be more than 200 μm apart so that distinct single-units will be obtained from different tetrodes. We added this to the methods as follows.

      [Lines 430-431] The distance between the tetrodes were greater than 200 μm to ensure that distinct single-units will be obtained from different tetrodes.

      (2) Figure 2E: representation of the baseline firing (3-min period before the tone presentation) is missing.

      Figure 2E is the 3 min period before tone presentation

      (3) Figure 2: Averages Pearson's correlation r and p values should be stated on panels F, G, and H (positive cell r = 0.81, P < 0.05; negative cell r = -0.68, P < 0.05).

      They were all originally stated in the figures. But with reorganization of Figure 2, we now have a plot of the Pearson’s Correlation with r and p values in Figure 2F.

      (4) Figure 2I: Representation of the absolute value of the normalized firing is highly confusing. Indeed, as the 'negative cells' are inhibited to freezing, firing should be represented as normalized, and negative for the inhibited cells.

      To avoid confusion, we did not take an absolute value of the “negative cells”, which are now called the “freezing-inhibited cells”.

      (5) Figure 4E (retrograde tracing): representation of individual values is missing.

      Figure 4E now has individual values.

      References:

      London, T. D., Licholai, J. A., Szczot, I., Ali, M. A., LeBlanc, K. H., Fobbs, W. C., & Kravitz, A. V. (2018). Coordinated ramping of dorsal striatal pathways preceding food approach and consumption. Journal of Neuroscience, 38(14), 3547-3558.

      Trott, J. M., Hoffman, A. N., Zhuravka, I., & Fanselow, M. S. (2022). Conditional and unconditional components of aversively motivated freezing, flight and darting in mice. Elife, 11, e75663.

      Fanselow, M. S. (1980). Conditional and unconditional components of post-shock freezing. The Pavlovian journal of biological science: Official Journal of the Pavlovian, 15(4), 177-182.

      Fanselow, M. S. (1986). Associative vs topographical accounts of the immediate shock-freezing deficit in rats: implications for the response selection rules governing species-specific defensive reactions. Learning and Motivation, 17(1), 16-39.

      Landeira-Fernandez, J., DeCola, J. P., Kim, J. J., & Fanselow, M. S. (2006). Immediate shock deficit in fear conditioning: effects of shock manipulations. Behavioral neuroscience, 120(4), 873.

      Moita, M. A., Rosis, S., Zhou, Y., LeDoux, J. E., & Blair, H. T. (2003). Hippocampal place cells acquire location-specific responses to the conditioned stimulus during auditory fear conditioning. Neuron, 37(3), 485-497.

      Kochli, D. E., Thompson, E. C., Fricke, E. A., Postle, A. F., & Quinn, J. J. (2015). The amygdala is critical for trace, delay, and contextual fear conditioning. Learning & memory, 22(2), 92-100.

      Ramos-Prats, A., Paradiso, E., Castaldi, F., Sadeghi, M., Mir, M. Y., Hörtnagl, H., ... & Ferraguti, F. (2022). VIP-expressing interneurons in the anterior insular cortex contribute to sensory processing to regulate adaptive behavior. Cell Reports, 39(9).

      Klein, A. S., Dolensek, N., Weiand, C., & Gogolla, N. (2021). Fear balance is maintained by bodily feedback to the insular cortex in mice. Science, 374(6570), 1010-1015.

      Nicolas, C., Ju, A., Wu, Y., Eldirdiri, H., Delcasso, S., Couderc, Y., ... & Beyeler, A. (2023). Linking emotional valence and anxiety in a mouse insula-amygdala circuit. Nature Communications, 14(1), 5073.

      Maren, S., & Fanselow, M. S. (1996). The amygdala and fear conditioning : Has the nut been cracked? Neuron, 16(2), 237‑240. https://doi.org/10.1016/s0896-6273(00)80041-0

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work by Ding et al uses agent-based simulations to explore the role of the structure of molecular motor myosin filaments in force generation in cytoskeletal structures. The focus of the study is on disordered actin bundles which can occur in the cell cytoskeleton and have also been investigated with in vitro purified protein experiments.

      Strengths:

      The key finding is that cooperative effects between multiple myosin filaments can enhance both total force and the efficiency of force generation (force per myosin). These trends were possible to obtain only because the detailed structure of the motor filaments with multiple heads is represented in the model.

      We appreciate your comments about the strength of our study. 

      Weaknesses:

      It is not clearly described what scientific/biological questions about cellular force production the work answers. There should be more discussion of how their simulation results compare with existing experiments or can be tested in future experiments.

      Please see our response to the comment (1) below.

      The model assumptions and scientific context need to be described better.

      We apologize for the insufficient descriptions about the model and the scientific context. We revised the manuscript to better explain model assumptions and scientific context as described in our responses below.

      The network contractility seems to be a mere appendix to the bundle contractility which is presented in much more detail.

      Please see our response to the comment (6) below.

      Reviewer #1 (Recommendations for the authors):

      (1) It is not clearly described what scientific/biological questions about cellular force production the work answers. There should be more discussion of how their simulation results compare with existing experiments, or can be tested in future experiments. The authors do briefly mention Reference 4 where different myosin isoforms were used, but it is not clear that these experiments support the scalings predicted in this work in Figures 3-6. Also, the experiments in Ref. 4 apparently did not involve passive crosslinkers (ACPs) which are key in this study.

      Thank you for the comment. In the 5th paragraph of the discussion section of the original manuscript, we applied our findings to understand how structural differences between ventral stress fibers and actin arcs could affect force generation. In addition, at the end of the discussion section, we mentioned that experiments with artificially-made myosin thick filaments could be used for verifying our results. 

      The experiments in Ref. 4 were only ones that we could directly compare our results with. In previous study, actomyosin bundles were experimentally created with ACPs (K.L. Weirich et al., Biophys J, 2021, 120: 1957-1970), but the motions of myosin thick filaments were only quantities measured in the experiments. In general, measuring forces generated by in vitro actomyosin bundles is very challenging. This is why the predictions from our model are particularly valuable for understanding the force generation of actomyosin structures. 

      (2) The architecture of the bundles seems to be prescribed by hand in these simulations. Several well-known stochastic aspects of the dynamics of actin and actin-binding proteins are not included in the model. For example, there is no remodeling of the actin structures through actin polymerization and depolymerization, or crosslink (ACP) binding and unbinding. Can the authors comment on why these effects could be neglected for the questions they want to address?

      Thank you for the comment. We previously showed that the force generation process in actomyosin networks and bundles is affected by actin dynamics (Q. Yu et al., Biophys J, 2018, 115: 2003-2013) and the unbinding of ACPs (T. Kim, Biomech Model Mechanobiol, 2015, 14(2): 345-355 and W. Jung et al., Comput Part Mech, 2015, 2(4): 317-327). 

      However, we did not include the actin dynamics and the ACP unbinding in the current study to clearly understand the effects of the structural properties of thick filaments on the force generation process. We have learned that the stochastic behaviors of cytoskeletal components lead to noisier results, which requires us to run a much larger number of simulations to obtain statistically convincing data. We added the following paragraph in the discussion section of the revised manuscript:

      “Although this study focused mainly on parameters related to motor structures, we expect that other parameters would affect the force generation process. For example, as we showed before, a decrease in ACP density would reduce forces by deteriorating connectivity between filaments. With very low ACP density, some of neighboring motors may not have ACPs between them, thus adding up their forces as shown in Fig. 2. However, such low ACP density may not maintain the structure of bundles or cross-linked networks well. In addition, the force-dependent unbinding of ACPs could change the spatial distribution of ACPs during force generation. If they behave as a slip bond which unbinds more frequently with higher forces, ACPs may not stay between two motors for long time due to high tension. Then, forces generated by two motors may have a higher chance to add up. By contrast, if they behave as a catch bond which unbinds less frequently with larger forces, more ACPs will be recruited between two motors, reducing a chance to add up

      forces. The length of actin filaments is unlikely to affect the force generation process significantly unless filaments are very short. Additionally, as we showed before, actin turnover would reduce forces by competing with motor activities, change connectivity between filaments over time, and prevent motors from being stalled for long time, all of which could affect force generation.”

      (3) The present study is confined to the fixed density of motors and ACPs. However, these can be easily varied in in vitro experiments. Works such as Reference 4 show an optimum in contractility vs myosin concentration. Myosins act not only to slide actin filaments but also crosslink them.

      Can the authors vary myosin concentration to demonstrate such effects in their model?

      As the reviewer pointed out, there is a belief that myosin thick filaments can serve as crosslinkers as well. However, unless there are a fraction of dead myosins (which remain bound on filaments without walking) or myosins dwell at the barbed ends filaments for very long time, it looks very hard for bundles or networks to generate large forces. A former experiment showed that active myosins increases the viscosity of actin networks, not elasticity (D. Humphrey et al., Nature, 2002, 416: 413-416) Computer simulations with reasonable assumptions did not show significant force generation without cross-linkers. We have tested systems with a large number of motors and a few cross-linkers in previous studies (T. Kim, Biomech Model Mechanobiol, 2015, 14(2): 345-355 and W. Jung et al., Comput Part Mech, 2015, 2(4): 317-327). We observed that large force/stress was generated momentarily, but it was relaxed very fast. It is expected that there will be similar outcomes if we try such conditions in the current study.

      (4) Why is there a (factor of 1.5-2) discrepancy in the measured (Ftot) and estimated (Fest) force values in Figure 4-6? How can the authors improve their scaling arguments to capture this? What about the estimated efficiency?

      Thank you for the comment. Indeed, there was a discrepancy between the actual and estimated forces. When the estimated force was calculated, we used the z positions of motors without consideration of the actual bundle geometry with multiple filaments. For example, if two motors are located on the opposite sides of the bundle (i.e., if they are located far from each other in x or y direction), forces generated by them may not counterbalance each other. Then, the estimated force can be smaller than the actual force because counterbalance between motors can be overcounted. The original manuscript had the following sentences to clarify this point: “F</sub>est</sub> was generally smaller than F<sub>tot</sub> because this analysis does not account for actual bundle geometry consisting of multiple F-actins; if two motors are located far from each other in x or y direction, they may not counterbalance or add up forces. Nevertheless, we found that F<sub>est</sub> captures the overall dependence of F<sub>tot</sub> on parameters well.”

      (5) Several choices of parameter values used in the simulations are not clear:

      a) Why consider F actin of 140 nm specifically? Actin can come in a range of lengths. How do their results depend upon the length scale of actin?

      It seems that there is a misunderstanding. 140 nm is the equilibrium length of one actin segment in our model. The actual F-actin consists of multiple actin segments. The length of Factin was 9 μm in bundle simulations and 10 μm (average) in network simulations. We expect that the general tendency of our results would not change with different filament length. However, if filament length becomes too short, the force generation process would be impaired due to lack of connectivity between filaments. 

      b) Similarly, very specific values of myosin backbone length (42 nm), number of myosin heads (8), number of arms (24), and Actin Cross-linking Proteins (ACPs). What informs these values and how will the results change if they are different? It is not especially clear how an "Arm" differs from "heads" and what kind of coarse-graining is involved.

      In the “model overview” section of the original manuscript, we mentioned the following to clarify the definitions of motor arms and motor heads: 

      “To mimic the structure of bipolar filaments, each motor has a backbone, consisting of serially linked segments, and two arms on each endpoint of the backbone segments that represent 8 myosin heads (N<sub>h</sub> = 8).”

      We devised this coarse-graining scheme of myosin thick filaments in our previous work (T. Kim, Biomech Model Mechanobiol, 2015, 14(5): 1143-1155). Through extensive tests, we showed that force generation and motor behaviors are largely independent of coarse-graining level. In other words, a motor with the same value of N<sub>h</sub>N<sub>a</sub> leads to similar outcomes regardless of the value of N<sub>a</sub>. However, in a bundle with multiple filaments, each motor has a sufficient number of arms to ensure simultaneous interactions with those filaments. This is why we decided to useN<sub>h</sub> = 8 and N<sub>a</sub> = 24. 

      To match the length of thick filaments and the total number of heads (N<sub>h</sub>N<sub>a</sub>) in the model with real myosin thick filaments, we have used 42 nm for each backbone length. Varying this length is equivalent to a variation in L<sub>sp</sub> that we did for Fig. 6.

      We used high ACP density to ensure connections between all neighboring pairs of actin filaments. We already showed how the presence of ACPs affects the force generation process in Fig. 2 using two actin filaments. It is expected that a variation of ACP density would affect our results to some extent. Since the main focus of the current study is the structural properties of motors, we did not explore the effects of ACP density. I hope that the reviewer would understand our intention. 

      (6) The manuscript focuses on disordered bundles with only one figure on networks. However, actin fibers also ubiquitously exist as disordered networks, and it is important to explore in more detail the contractile forces in such network arrangements.

      We appreciate the comment. Because we plan to delve into the effects of motor structures on the force generation in networks as a follow-up study, we showed the minimal results in the current study to prove the generality of our findings. I hope that the reviewer would understand our intention and plan.

      It is not described very clearly how these networks were generated.

      We apologize for lack of explanation about how the networks were generated. We added the following section in Supplementary Text of the revised manuscript:

      “Network assembly

      Unlike F-actin in bundle simulations, F-actin in network simulations is formed by stochastic processes as in our previous studies. The formation of F-actin is initiated from a nucleation event with a constant rate constant, k<sub>n,A</sub>, with the appearance of one cylindrical segment in a random position with a random orientation perpendicular to the z direction. The polymerization of F-actin is simulated by adding cylindrical segments at the barbed end of existing filaments with a rate constant, k<sub>p,A</sub>. The ratio of k<sub>n,A</sub>to k<sub>p,A</sub> is adjusted to result in the average filament length of ~10 μm. The rest of the assembly process is identical to that described in the main text.”

      Crosslinked biopolymers like actin typically form disordered elastic networks with their coordination number below rigidity percolation threshold (z=4 in 2D), see for example review by Broedersz and Mackintosh Rev. Mod, Phys. 2013. Such networks should exist in the bendingdominated regime, where bending forces play a vital role in force propagation. Was that observed in the simulations? Why or why not?

      We appreciate the comment. We are aware of the bending-dominated regime and indeed showed the importance of the bending stiffness of actin filaments at low shear strain level in our previous work (T. Kim et al., PLOS Comput Biol, 2009, 5(7): e1000439). In case of active networks with motors, such a bending-dominated regime has not been observed without external shear strain. Instead, buckling of actin filaments was found to be essential for breaking symmetry between tensile and compressive forces developed by motor activities. We have shown that the free contraction of networks is inhibited if filament bending stiffness is increased substantially (J. Li et al., Soft Matter, 2017, 13: 3213-3220 and T. Bidone et al., PLOS Comput Biol, 2017, 13(1): e1005277). We expect that contractile forces generated by bundles or networks will be reduced significantly if we highly increase bending stiffness. However, considering the focus of the current study is on the structural properties of motors, we did not perform such simulations. 

      (7) It would be interesting to see the simulated predictions of the bundle or network contraction dynamics. This can be done by changing to free boundary conditions so that the bundle can contract.

      Thank you for the suggestion. We have previously investigated the free contraction of actomyosin networks with different motor density and ACP density (J Li et al., Soft Matter, 2017, 13: 3213). We observed that the rate of network contraction was higher with more motors and ACPs. However, we did not test the effects of the structural properties of thick filaments in the previous study. We plan to investigate the effects in future studies because the focus of the current study is the force generation process. Please note that in the discussion section of the original manuscript, we mentioned the following:

      “Although we focused on force generation, the contractile behaviors of actomyosin structures (i.e., a decrease in length) have also been of great interest. Our model can be used to study such contractile behaviors by deactivating the periodic boundary condition and removing connection between one end of bundle/network and a domain boundary as done previously [20]. To achieve higher contractile speed with the same total number of myosin heads, the existence of multiple contractile units would be better as suggested in a previous work [4]. This means that there is a trade-off between force generation and contractile speed. Previous studies also showed that the contractile speed of networks is proportional to motor density [18, 43, 51]. We may be able to use our model to systematically investigate how the contractile speed is regulated by parameters that we tested in this study, including the number, distribution, length, and structure of motors.”

      Minor suggestions for improvement:

      (1) What are the vertical markers in Figures 1E and F? They should be labelled. if they are crosslinkers, it is not clear why the color is different from Figure 1A and B.

      We believe that the reviewer meant Figs. 2E, F. Those vertical lines are indeed ACPs (crosslinkers). We changed the color of ACPs in Fig. 1A and Fig. 2B-D to purple to be consistent. In addition, we changed the colors of two filaments in Figs. 2B-D slightly to be consistent with Fig. 2E.

      (2) To help understanding, please include a figure showing how forces are measured.

      We added Fig. S1 in the revised manuscript to explain how the bundle force is calculated.

      (3) It should be possible to extend the scaling arguments to predict what is the crossover myosin density (N_M) in Figure 4a at which the efficiency changes from going as 1/N_M to saturating. 

      As the reviewer might have observed, the slope of the efficiency in Fig. 4A gradually changes, rather than showing a sharp transition. Thus, it is hard to define one crossover myosin density. 

      Similarly, what are the slopes in Figure 6a-b?

      We drew the reference lines in those two plots. Unfortunately, we do not have explanations about the origin of these slopes.

      (4) Some more explanation for the observed values should be added. Figure 4: Why does efficiency plateau at a value close to 0.8 in (A)? 

      We assume that the reviewer meant the plateau of η close to 0.08, not 0.8. Our speculation for the origin of this plateau value is related to L<sub>M</sub> (= 462 nm under the reference condition). Ideally, ~43 motors are required to cover the entire length of the bundle (= 20 μm). Under this condition, η is ~0.023. Although this is not 0.08, we believe that these two values are related to each other. For example, if we increase L<sub>M</sub>, this plateau level would increase. We added the following sentences in the result section of the revised manuscript:

      “The plateau level of η at ~0.08 is related to the minimum number of motors required for saturating an entire bundle, implying that the plateau level would be higher if each motor is longer.”

      Figure 5: Overlapping between motors seems to increase the total force applied by them because of cooperative effects. However, it is not abundantly clear why that should peak at a value of f = 0.06.

      As shown in Fig. 5B, smaller f always results in higher F<sub>tot</sub> due to higher level of cooperative overlap. The minimum value of f we tested in this study was 0.06, so F<sub>tot</sub> was maximal at f = 0.06.

      (5) Why is the network force expected to scale approximately as sqrt(N_M)? Is it because of the 2D geometry where the number of motors along the x or y-direction scale as sqrt(N_M)?

      We initially thought that the weaker dependence of the total force on N<sub>M</sub> was related to the random orientations of motors. However, if the network is fully saturated with motors, the inclusion of more motors will increase forces in both x and y directions almost linearly, resulting in the direct proportionality of F<sub>tot</sub> to N<sub>M</sub>. Our new hypothesis for weaker dependence is consistent with the reviewer’s speculation; the network is not fully saturated even with 1000 motors, so the entire regime shown in Fig. 7B corresponds to that with N<sub>M</sub> < 100 in Fig. 4A where similar weaker dependence on N<sub>M</sub> was observed. We added the following sentence in the result section of the revised manuscript to clarify this point:

      “the average number of motors in each direction which can experience the cooperative overlap would be ~. Maximal N<sub>M</sub> tested with the network was ~2,500, so the dependence of F<sub>tot</sub> on N<sub>M</sub> with the network is similar to that with N<sub>M</sub> < ~50 with the bundle (Fig. 4A).”

      (6) Figures 6 D and A: Figure 6D suggests that there is a more full overlap in the cases where there was a longer bare zone or larger spacing between motor arms. However, the quantification of the total force in A shows that the force is highest for the case where LM was increased by increasing the number of arms. Why do the authors think that is? I would expect from the explanation in Fig 6D that the Lsp and Lbz would be higher than Na in Fig 6A.

      Fig. 6D shows a difference in the level of the cooperative overlap () between two motors. As the reviewer pointed out, the case with more arms shows the lowest , resulting in the lowest as we showed in Fig. S2B. However, as show in in Eq. 7, the total force is a function of both N<sub>a</sub> and . Thus, due to higher N<sub>a</sub> and lower , the force in the case with different N<sub>a</sub> can be similar to that in the case with different L<sub>bz</sub>. In the original manuscript, we had the following sentence to explain how the force can be similar between the two cases: 

      “Thus, was higher (Fig. S2B, blue), resulting in higher F<sub>tot</sub> and η despite smaller N<sub>a</sub>.”

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors use a mechanical model to investigate how the geometry and deformations of myosin II filaments influence their force generation. They introduce a force generation efficiency that is defined as the ratio of the total generated force and the maximal force that the motors can generate. By changing the architecture of the myosin II filaments, they study the force generation efficiency in different systems: two filaments, a disorganized bundle, and a 2D network. In the simple two-filament systems, they found that in the presence of actin crosslinking proteins motors cannot add up their force because of steric hindrances. In the disorganized bundle, the authors identified a critical overlap of motors for cooperative force generation. This overlap is also influenced by the arrangement of the motor on the filaments and influenced by the length of the bare zone between the motor heads.

      Strengths:

      The strength of the study is the identification of organizational principles in myosin II filaments that influence force generation. It provides a complementary mechanistic perspective on the operation of these motor filaments. The force generation efficiency and the cooperative overlap number are quantitative ways to characterize the force generation of molecular motors in clusters and between filaments. These quantities and their conceptual implications are most likely also applicable in other systems.

      Thank you for the comments about the strength of our study. 

      Weaknesses:

      The detailed model that the authors present relies on over 20 numerical parameters that are listed in the supplement. Because of this vast amount of parameters, it is not clear how general the findings are. On the other hand, it was not obvious how specific the model is to myosin II, meaning how well it can describe experimental findings or make measurable predictions. The model seems to be quantitative, but the interpretation and connection to real experiments are rather qualitative in my point of view.

      As the reviewer mentioned, all agent-based computational models for simulating the actin cytoskeleton are inevitably involved with such a large number of parameters. Some of the parameter values are not known well, so we have tuned our parameter values carefully by comparing our results with experimental observations in our previous studies since 2009.We were aware of the importance of rigorous representation of unbinding and walking rates of myosin motors, so we implemented the parallel cluster model, which can predict those rates with consideration of the mechanochemical rates of myosin II, into our model. Thus, we are convincing that our motors represent myosin II.

      In our manuscript, our results were compared with prior observations in Ref. 4 (Thoresen et al., Biophys J, 2013) several times. In particular, larger force generation with more myosin heads per thick filament was consistent between the experiment and our simulations. 

      Our study can make various predictions. First, our study explains why non-muscle myosin II in stress fibers shows focal distributions rather than uniform distributions; if they stay closely, they can generate much larger forces in the stress fibers via the cooperative overlap. Our study also predicts a difference between bipolar structures (found in skeletal muscle myosins and nonmuscle myosins) and side polar structures (found in smooth muscle myosins) in terms of the likelihood of the cooperative overlap. As shown below, myosin filaments with the bipolar structure can add up their forces better than those with the side polar structure when their overlap level is the same.

      Author response image 1.

       

      It was often difficult for me to follow what parameters were changed and what parameters were set to what numerical values when inspecting the curve shown in the figures. The manuscript could be more specific by explicitly giving numbers. For example, in the caption for Figure 6, instead of saying "is varied by changing the number of motor arms, the bare zone length, the spacing between motor arms", the authors could be more specific and give the ranges: "is varied by changing the number of motor arms form ... to .., the bare zone length from .. to..., and the spacing between motor arms from .. to ..".

      This unspecificity is also reflected in the text: "We ran simulations with a variation in either L<sub>sp</sub> or L<sub>bz</sub>" What is the range of this variation? "WhenL<sub>M</sub> was similar" similar to what? "despite different N<sub>M</sub>." What are the different values for N<sub>M</sub>? These are only a few examples that show that the text could be way more specific and quantitative instead of qualitative descriptions.

      We appreciate the comment. In the revised manuscript, we specified the range of the variation in each parameter.

      In the text, after equation (2) the authors discuss assumptions about the binding of the motor to the actin filament. I think these model-related assumptions and explanations should be discussed not in the results section but rather in the "model overview" section.

      Thank you for pointing this out. In the original manuscript, we described all the details of the model in Supplementary Material. We feel that the assumptions about interactions between motors and actin filaments are too detailed information to be included in the model overview section.

      The lines with different colors in Figure 2A are not explained. What systems and parameters do they represent?

      The different colors used in Fig. 2A were used for distinguishing 20 cases. We added the explanation about the colors in the figure caption in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      To guarantee the reproducibility of the results, I recommend that the authors publish their simulation code on GitHub.

      We appreciate the reviewer’s suggestion. Following the suggestion, we prepared and posted the code on GitHub as mentioned in the Data Availability of the revised manuscript: The source code of our model is available on GitHub: https://github.com/ktyman2/ThickFilament”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public review):

      Weaknesses: The interpretation is somewhat model-dependent, and it is unclear if the interpretation is unique. For example, it is unclear if the heterogeneous release probability among sites, silent sites, can explain the results. N estimates out of variance-mean analysis for example may be limited by the availability of postsynaptic receptors.

      To address this criticism, we have added a paragraph in the Discussion outlining the main assumptions underlying our work and how possible deviations from these assumptions may have affected our conclusions. This new paragraph is titled ' Assumptions behind our analysis, and possible limitations of our conclusions'.

      Reviewer 1, Recommendations to Authors:

      Without molecular evidence or anatomical evidence, the model and conclusions may remain as a postulate at this stage. This can be discussed carefully. Also, the study looks a bit narrow regarding the scope, only dealing with RS-DS model vs TS-LS model. Maybe, the authors pick up a bit more qualitative findings that directly support RS-DS model.

      To address these issues, another paragraph has been added to the Discussion titled 'Functional evidence in favor of the RS/DS model at PF-MLI synapses, and remaining uncertainties on the molecular composition and morphological arrangement of docking sites'.

      Minor: Fukaya et al. studied not cerebellar mossy fiber synapses.

      We apologize for this error, which has now been rectified.

      Reviewer 2 (Public review):

      It remains unclear how generalizable the findings are to other types of synapses.

      We agree with the Reviewer: this is a limitation of our study. In the Discussion we have a paragraph titled 'Maximum RRP size for other synaptic types' where we discuss this point. As we say in this paragraph, central synapses are clearly diverse, and the level of applicability of our results across preparations will depend on our ability to extend SV counting to various types of brain synapses. For the moment SV counting has been applied to only two types of synapses: PF-MLI synapses and hMF-IN synapses. We are encouraged by the fact that the simple synapse study by Tanaka et al. (2021), carried out at hMF-IN synapses, offers another example where the ratio between RRP size and N is larger than 1.

      Recommendations to Authors,

      Minor comments:

      The manuscript is at times difficult to read or reads like a review. The introduction could be shortened to concisely outline the motivation and premises for the study. The results and methods sections should not contain excessive interpretation and discussion. Although very informative, it distracts from the simple principal message.

      To address these criticisms, we have shortened the Introduction and parts of the Results section. These changes have resulted in a presentation of Results that is shorter and more focused on data and simulations than in the previous version. Nevertheless, readers need to be informed of ongoing research on docking sites and the principles of sequential models to understand the usefulness of our work. For this reason, we have maintained a theoretical section at the beginning of Results.

      The rationale for the choice of synapse and experimental conditions remains unclear until the discussion. This needs to be clearly addressed at the beginning, in the introduction, or in the results. In particular, the extracellular calcium concentration and the addition of 4-AP to the recording solution should be addressed in the results.

      The reason to choose the PF-MLI synapse is now indicated at the end of the Introduction. The rationale underlying our choice of experimental conditions including the extracellular calcium concentration and the addition of 4-AP is now briefly explained in the beginning the second section of Results (titled 'Maximizing RRP size and its release during AP trains'), and more extensively in the Methods section (as in the previous version of the manuscript).

      Potential confounds of the approach should be discussed (e.g. could a broadened AP in 4-AP alter synchronicity of release, i.e. desynchronization of release, especially during trains. That could be complemented with information on the EPSC kinetics (rise, decay) under different experimental conditions, as well as during train stimulation. How could presynaptic calcium concentration and time course in 4-AP impact the conclusions?

      To study the effects of 4-AP on AP broadening we have performed a new analysis of EPSC latencies in control and in 4-AP. In both cases the first latencies were independent of i. In 4-AP, first latencies displayed a small right shift of 0.2 ms (see additional figure below). This indicates that 4-AP does broaden the AP waveform, but that the extent of this broadening is limited. This new information has been added in the Methods of the revised manuscript.

      As suspected by the Reviewer, the latency distribution changes as a function of i and in the presence of 4-AP. Consistent with earlier findings (Miki et al., 2018), the proportion of 2-step release (with longer latencies) augments as a function of i both in control and in 4-AP. We also find that the value of the fast time constant of the latency distribution,τf, is larger in 4-AP than in control. This last result probably indicates a longer presynaptic calcium entry in 4-AP.

      In the revised version, we describe these results in the Methods section, in a new paragraph titled 'Changes in latency distributions as a function of i and of experimental conditions'.

      While the latency distributions change as a function of i and as a function of experimental conditions, this does not affect our conclusions, because these conclusions are based on the summed number of release events after each AP (or in other words, on the integral of the latency distributions).

      The kinetics of mEPSCs (risetime and decay time) are unchanged by 4-AP or by PTP. Consequently, in a given experiment, we used the same template to perform our deconvolution analysis for all conditions that were examined (starting with 3 mM Cao up to 200 Hz). This information has now been added in Methods.

      Following an AP stimulation, the amount of calcium entry in the presence of 4-AP is presumably much larger than in control. TEA, a weaker K channel blocker than 4-AP at PF-MLI synapses, elicits a marked increase in calcium entry (Malagon et al., 2020). This suggests an even larger increase with 4-AP, even though this has not been directly confirmed in the present work. The enhanced calcium entry translates in an increase in the parameters pr, r and s of our model. The important thing for our study is to increase pr and r as much as possible to promote the emptying of the RRP during trains. Knowing the exact amount of calcium entry and its relation to pr /r increase is not essential for this purpose. Likewise, whether r (and/or s) increase as a function of i is of little practical importance since much of the RRP is emptied already after the second stimulation, at least in the most extreme case (200 Hz stimulation).

      The applicability of this model to other synapses needs to be addressed more thoroughly. This synapse, under physiological conditions, has a very low Pr, and the experimental conditions have to be adjusted dramatically to achieve a high-Pr. How applicable are the conclusions to high-Pr synapses and/or synapses that operate in a multivesicular release regime? Although that might be difficult to test experimentally it should be addressed in the discussion.

      The applicability issue to other synapses has been addressed above, in response to the public comments of the same Reviewer.

      As the Reviewer points out, the PF-MLI synapse has a small P value under physiological conditions. One can speculate that synapses that exhibit a higher P value may have a higher docking site occupancy than PF-MLI synapses. This feature would increase their chance of having a ratio of RRP size over N larger than 1, as it occurs in PF-MLI synapses in high docking occupancy conditions. A sentence making this point has been added to the paragraph titled 'Maximum RRP size for other synaptic types' in the revised manuscript.

      Author response image 1.

      Latency histograms for s1 in control and in the presence of 4-AP. After normalization, the averaged latency histogram in 4-AP displays an additional delay of 0.2 ms, and a slowing of the time constant τf from 0.47 ms to 0.70 ms.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Su et al propose the existence of two mechanisms repressing SBF activity during entry into meiosis in budding yeast. First, a decrease in Swi4 protein levels by a LUTI-dependent mechanism where Ime1 would act closing a negative feedback loop. Second, the sustained presence of Whi5 would contribute to maintaining SBF inhibited under sporulation conditions. The article is clearly written and the experimental approaches used are adequate to the aims of this work. The results obtained are in line with the conclusions reached by the authors but, in my view, they could also be explained by the existing literature and, hence, would not represent a major advance in the field of meiosis regulation.

      We respectfully disagree with the reviewer about their comment that this work can be explained by the existing literature. First, while SWI4LUTI has been previously identified in meiotic cells along with ~ 380 LUTIs, the biological purpose of these alternative mRNA isoforms and their effect on cellular physiology still remain largely unknown. Our manuscript clarifies this gap in understanding for SWI4LUTI. Loss of SWI4LUTI contributes to dysregulation of meiotic entry and does so by failing to properly repress the known inhibitors of meiotic entry, the CLNs. Furthermore, even though Cln1 and Cln2 have been previously shown to antagonize meiosis, the mechanisms that restrict their activity was unclear prior to our study.

      We recognize work done by others demonstrating Whi5-dependent repression of SBF during mitotic G1/S transition (De Bruin et al., 2004; Costanzo et al., 2004). We further examined Whi5’s involvement during meiotic entry and found that it acts in conjunction with the LUTI-based mechanism to restrict SBF activity. Combined loss of both mechanisms results in the increased expression of G1 cyclins, decreased expression of early meiotic genes, and a delay in meiotic entry (Figure 6). Neither mechanism was previously known to regulate meiotic entry. Our study not only adds to our broader understanding of gene regulation during meiosis but also raises additional questions regarding how LUTIs regulate gene expression and function.

      Regarding the first mechanism, Fig 1 shows that Swi4 decreases very little after 1-2h in sporulation medium, whereas G1-cyclin expression is strongly repressed very rapidly under these conditions (panel D and work by others). This fact dampens the functional relevance of Swi4 downregulation as a causal agent of G1 cyclin repression.

      Reviewer 1 expresses concern for the observation that by 2 h in sporulation media there is a 32% decrease in Swi4-3V5 protein abundance compared to 0 h in SPO. This is consistent with the range of protein level decrease typically accomplished by LUTI-based gene regulation (Chen et al., 2017; Chia et al., 2017; Tresenrider et al., 2021), and while it is a modest reduction, it is consistent across replicates. Furthermore, we don’t make the argument that reduction in Swi4 levels alone is the sole regulator of G1 cyclin levels. In fact, we report that in addition to Swi4 downregulation, Whi5 also functions to restrict SBF activity during meiotic entry, thereby ensuring G1 cyclin repression.

      In addition, the LUTI-deficient SWI4 mutant does not cause any noticeable relief in CLN2 repression, arguing against the relevance of this mechanism in the repression of G1-cyclin transcription during entry into meiosis. The authors propose a second mechanism where Whi5 would maintain SBF inactive under sporulation conditions. The role of Whi5 as a negative regulator of the SBF regulon is well known. On the other hand, the double WHI5-AA SWI4-dLUTI mutant does not upregulate CLN2, the G1 cyclin with the strongest negative effect on sporulation, raising serious doubts on the functional relevance of this backup mechanism during entry into meiosis.

      Due to replicate variance, CLN2 did not make the cut by our mRNA-seq data analysis as a significant hit. To address reviewer 1’s final point we opted for the “gold standard” of reverse transcription coupled with qPCR to measure CLN2 transcript levels in the double mutant ∆LUTI; WHI5-AA and the wild-type control. This revealed that CLN2 levels were significantly increased in the double mutant compared to wild type at 2 h in SPO (Author Response Image 1, *, p = 0.0288, two-tailed t-test).

      Author response image 1.

      Wild type (UB22199) and ∆LUTI;WHI5-AA (UB25428) cells were collected to perform RT-qPCR for CLN2 transcript abundance. Transcript abundance was quantified using primer sets specific for each respective gene from three technical replicates for each biological replicate. Quantification was performed in reference to PFY1 and then normalized to wild-type control. FC = fold change. Experiments were performed twice using biological replicates, mean value plotted with range. Differences in wild type versus ∆LUTI; WHI5-AA transcript levels compared with a two-tailed t-test (*, p = 0.0288)

      Reviewer #2 (Public Review):

      Summary:

      The manuscript highlights a mechanistic insight into meiotic initiation in budding yeast. In this study, the authors addressed a genetic link between mitotic cell cycle regulator SBF (the Swi4-Swi6 complex) and a meiosis inducing regulator Ime1 in the context of meiotic initiation. The authors' comprehensive analyses with cytology, imaging, RNA-seq using mutant strains lead the authors to conclude that Swi4 levels regulates Ime1-Ume6 interaction to activate expression of early meiosis genes for meiotic initiation. The major findings in this paper are that (1) the higher level of Swi4, a subunit of SBF transcription factor for mitotic cell cycle regulation, is the limiting factor for mitosis-to-meiosis transition; (2) G1 cyclins (Cln1, Cln2), that are expressed under SBF, inhibit Ime1-Ume6 interaction under overexpression of SWI4, which consequently leads to downregulation of early meiosis genes; (3) expression of SWI4 is regulated by LUTI-based transcription in the SWI4 locus that impedes expression of canonical SWI4 transcripts; (4) expression of SWI4 LUTI is likely negatively regulated by Ime1; (5) Action of Swi4 is negatively regulated by Whi5 (homologous to Rb)-mediated inhibition of SBF, which is required for meiotic initiation. Thus, the authors proposed that meiotic initiation is regulated under the balance of mitotic cell cycle regulator SBF and meiosis-specific transcription factor Ime1.

      Strengths:

      The most significant implication in their paper is that meiotic initiation is regulated under the balance of mitotic cell cycle regulator and meiosis-specific transcription factor. This finding will provide a mechanistic insight in initiation of meiosis not only into the budding yeast also into mammals. The manuscript is overall well written, logically presented and raises several insights into meiotic initiation in budding yeast. Therefore, the manuscript should be open for the field. I would like to raise the following concerns, though they are not mandatory to address. However, it would strengthen their claims if the authors could technically address and revise the manuscript by putting more comprehensive discussion.

      Weaknesses:

      The authors showed that increased expression of the SBF targets, and reciprocal decrease in expression of meiotic genes upon SWI4 overexpression at 2 h in SPO (Figure 2F). However, IME1 was not found as a DEG in Supplemental Table 1. Meanwhile, IME1 transcript level was decreased at 2 h SPO condition in pATG8-CLN2 cells in Fig S4C.

      Now this reviewer still wonders with confusion whether expression of IME1 transcripts per se is directly or in directly suppressed under SBF-activated gene expression program at 2 h SPO in pATG8-SWI4 and pATG8-CLN2 cells. This reviewer wonders how Fig S4C data reconciles with the model summarized in Fig 6F.

      One interpretation could be that persistent overexpression of G1 cyclin caused active mitotic cell cycle, and consequently delayed exit from mitotic cell cycle, which may have given rise to an apparent reduction of cell population that was expressing IME1. For readers to better understand, it would be better to explain comprehensively this issue in the main text.

      We believe there was an oversight here. In supplemental table 1, IME1 expression is reported as significantly decreased. The volcano plot shown below also highlights this change (Author response image 2).

      Author response image 2.

      Volcano plot of DE-Seq2 analysis for ∆LUTI;WHI5-AA versus wild type. Dashed line indicates padj (p value) = 0.05. Analysis was performed using mRNA-seq from two biological replicates. Wild type (UB22199) and ∆LUTI;WHI5-AA (UB25428) cells were collected at 2 h in SPO. SBF targets (pink) (Iyer et al., 2001) and early meiotic genes (blue) defined by (Brar et al., 2012). Darker pink or darker blue, labeled dots are well studied targets in either gene set list.

      The % of cells with nuclear Ime1 was much reduced in pATG8-CLN2 cells (Fig 2B) than in pATG8-SWI4 cells (Fig 4C). Is the Ime1 protein level comparable or different between pATG8-CLN2 strain and pATG8-SWI4 strain? Since it is difficult to compare the quantifications of Ime1 levels in Fig S1D and Fig S4B, it would be better to comparably show the Ime1 protein levels in pATG8-CLN2 and pATG8-SWI4 strains.

      Further, it is uncertain how pATG8-CLN2 cells mimics the phenotype of pATG8-SWI4 cells in terms of meiotic entry. It would be nice if the authors could show RNA-seq of pATG8-CLN2/WT and/or quantification of the % of cells that enter meiosis in pATG8-CLN2.

      Analyzing bulk Ime1 protein levels across a population of cells (Author response image 3) reveals that overexpression of CLN2 causes a more severe decrease in Ime1 levels than overexpression of SWI4. This is consistent with our observation that pATG8-CLN2 has a more severe impact on meiotic entry than pATG8-SWI4. The higher CLN2 levels (Author response image 4) likely accounts for the observed difference in severity of phenotype between the two mutants.

      Author response image 3.

      Samples from strain wild type (UB22199), pATG8-SWI4 (UB2226), pATG8-CLN2 (UB25959) and were collected between 0-4 hours (h) in sporulation medium (SPO) and immunoblots were performed using α-GFP. Hxk2 was used a loading control.

      Author response image 4.

      Wild type (UB22199), pATG8-SWI4 (UB2226), pATG8-CLN2 (UB25959) cells were collected to perform RT-qPCR for CLN2 transcript abundance. Quantification was performed in reference to PFY1 and then normalized to wild-type control. FC = fold change.

      The authors stated that reduced Ime1-Ume6 interaction is a primary cause of meiotic entry defect by CLN2 overexpression (Line 320-322, Fig 4J-L). This data is convincing. However, the authors also showed that GFP-Ime1 protein level was decreased compared to WT in pATG8-CLN2 cells by WB (Fig S4A).

      Compared to wild type, pATG8-CLN2 cells have lower levels of Ime1. Consequently, reviewer 2 suggests that this reduction may be responsible for the observed meiotic defect. However, we tested this possibility and found it not to be the primary cause of the meiotic defect in pATG8-CLN2 cells. As shown in Figure S4A, when IME1 was overexpressed from the pCUP1 promoter, Ime1 protein levels were similar between wild-type and pATG8-CLN2 cells. Despite this similarity, we still observed a decrease in nuclear Ime1 (Figure 4F) and no rescue in sporulation (Figure 4A). Therefore, the reduction in Ime1 protein levels alone cannot explain the meiotic defect caused by CLN2 overexpression.

      Further, GFP-Ime1 signals were overall undetectable through nuclei and cytosol in pATG8-CLN2 cells (Fig 4B), and accordingly cells with nuclear Ime1 were reduced (Fig 4C). Although the authors raised a possibility that the meiotic entry defect in the pATG8-CLN2 mutant arises from downregulation of IME1 expression (Line 282-283), causal relationship between meiotic entry defect and CLN2 overexpression is still not clear.

      As reviewer 2 comments, we initially considered the possibility that meiotic entry defect induced by CLN2 overexpression could be attributed to decreased IME1 expression. However, in the following paragraph in the manuscript, we demonstrate equalizing IME1 transcript levels using the pCUP1-IME1 allele does not rescue the meiotic defect caused by CLN2 overexpression. Consequently, we conclude that the decrease in IME1 transcript levels alone cannot explain the meiotic defect caused by increased CLN2 levels.

      Is the Ime1 protein level reduced in the pATG8-CLN2;UME6-⍺GFP strain compared to WT? It would be better to comparably show the Ime1 protein levels in the pATG8-CLN2 strain and the pATG8-CLN2;UME6-⍺GFP strain by WB. Also, it would be nice if the authors could show quantification of the % of cells that enter meiosis in the pATG8-CLN2;UME6-⍺GFP strain to see how and whether artificial tethering of Ime1 to Ume6 rescued normal meiosis program rather than simply showing % sporulation in Fig4A.

      We do not agree with the suggestion to compare the pATG8-CLN2;UME6-⍺GFP with wild type as the kinetics of meiosis is rather different. The more appropriate comparison is UME6-⍺GFP and pATG8-CLN2;UME6-⍺GFP which shows GFP-Ime1 bulk protein levels are slightly lower (Author response image 5). However, when we use a more sensitive measurement of meiotic entry through the nuclear accumulation of Ime1 in single cells, as illustrated in Figure 4L, it becomes evident that the Ume6-Ime1 tether is capable of restoring nuclear Ime1 levels, even in the presence of CLN2 overexpression. Given that these cells exhibited wild type levels of nuclear Ime1 and underwent sporulation after 24 hours, we make the fair assumption that they have successfully initiated the meiotic program.

      Author response image 5.

      Wild type (UB22199), pATG8-SWI4 (UB35106), UME6-⍺GFP (UB35300), and UME6-⍺GFP; pATG8-CLN2 (UB35177) cells collected between 0-3 hours (h) in sporulation medium (SPO) and immunoblots were performed using α-GFP. Hxk2 was used a loading control

      The authors showed Ume6 binding at the SWI4LUTI promoter (Figure 5K). However, since Ume6 forms a repressive form with Rpd3 and Sin3a and binds to target genes independently of Ime1, Ume6 binding at the SWI4LUTI promoter bind does not necessarily represent Ime1-Ume6 binding there. Instead, it would be better to show Ime1 ChIP-seq at the SWI4LUTI promoter.

      We agree with reviewer 2 that Ime1 ChIP would be the ideal measurement. Unfortunately, this has proved to be technically challenging. To address this limitation, we utilized a published Ume6 ChIP-seq dataset along with a published UME6-T99N RNA-seq dataset. Cells carrying the UME6-T99N allele are unable to induce the expression of early meiotic transcripts due to lack of Ime1 binding to Ume6 (Bowdish et al., 1995). Accordingly, RNA-seq analysis should reveal whether or not the LUTIs identified by Ume6 ChIP are indeed regulated by Ime1-Ume6 during meiosis. For SWI4LUTI, this is exactly what we observe. Not only is there Ume6 binding at the SWI4LUTI promoter (Figure 5K), but there is also a significant decrease in SWI4LUTI expression in UME6-T99N cells under meiotic conditions (Figure S5). Based on these data, we conclude that the Ime1-Ume6 complex is responsible for regulating SWI4LUTI expression during meiosis.

      The authors showed ∆LUTI mutant and WHI5-AA mutant did not significantly change the expression of SBF targets nor early meiotic genes relative to wildtype (Figure 6A, C). Accordingly, they concluded that LUTI- or Whi5-based repression of SBF alone was not sufficient to cause a delay in meiotic entry (Line451-452), and perturbation of both pathways led to a significant delay in meiotic entry (Figure 6E). This reviewer wonders whether Ime1 expression level and nuclear localization of Ime1 was normal in ∆LUTI mutant and WHI5-AA mutant.

      Based on our observations in Figure 4, Ime1 protein and expression levels were not reliable indicators of meiotic entry. Consequently, we opted for a more downstream and functionally relevant measure of meiotic entry, which involved time-lapse fluorescence imaging of Rec8, an Ime1 target.

      Reviewer #1 (Recommendations For The Authors):

      The authors would like to mention previous work showing that G1-cyclin overexpression decreases the expression and nuclear accumulation of Ime1 (Colomina et al 1999 EMBO J 18:320). In this work, the interaction between Ime1 and Ume6 had been found to be resistant to G1-cyclin expression, arguing against a direct effect on the recruitment of Ime1 at meiotic promoters. Alternatively, differences in the experimental approaches used could be discussed to explain this apparent discrepancy.

      To clarify, in the paper that reviewer 1 is referring to (Colomina et al., 1999), the authors determine that the interaction between Ime1 and Ume6 is regulated by the presence of a non-fermentable carbon source. Additional work by others reveals that Ime1 undergoes phosphorylation by the protein kinases Rim11 and Rim15, promoting its nuclear localization and enabling interaction with Ume6 (Vidan and Mitchell, 1997; Pnueli et al., 2004; Malathi et al., 1999, 1997). Furthermore, both Rim11 and Rim15 kinase activities are inhibited by the presence of glucose via the PKA pathway (Pedruzzi et al., 2003; Rubin-Bejerano et al., 2004; Vidan and Mitchell, 1997). Accordingly, the elimination of cyclins in the presence of a non-fermentable carbon source (glucose) in (Colomina et al., 1999) is unlikely to result in an interaction between Ime1 and Ume6, as Rim11 and Rim15 remain repressed. Removal of cyclins in acetate does not further increase Ime1-Ume6 interaction leading the authors to conclude that G1 cyclins do not block Ime1 function through its interaction with Ume6. This work however uses loss of function (removal of G1 cyclins) to study the G1 cyclins’ effect on Ime1-Ume6 interaction while using timepoints that are well beyond meiotic entry. Additionally, Ime1-Ume6 interaction is being tested using yeast-two hybrid analysis with just the proposed interaction domain of Ime1 (amino acids 270-360). Therefore, the interpretation that G1 cyclins are dispensable for regulating the interaction between Ime1 and Ume6 is unclear from this work alone.

      There are many differences that can explain the discrepancy between our work and (Colomina et al., 1999). Our work uses increased expression of cyclins during meiotic entry. Additionally, in our study, we collected timepoints to measure meiotic entry (2 h in SPO) and sporulation (gamete formation) efficiency (24 h in SPO). Finally, we are using the endogenous, full length Ime1. These differences could very well explain the discrepancy with previous work. Lastly, in our discussion we acknowledge the lack of CDK consensus phosphorylation sites on Ime1. Therefore, it is most likely that G1 cyclins are not directly phosphorylating Ime1 and that other factors like Rim11 and Rim15 could be direct targets of the G1 cyclins, considering their involvement in the phosphorylation of Ime1-Ume6, as well as their role in regulating Ime1 localization and its interaction with Ume6. We have included these points in the revised manuscript (lines 547-551).

      Reviewer #2 (Recommendations For The Authors):

      This reviewer thinks that the findings in this paper are of general interest to meiosis field and help understanding the mechanism of meiotic initiation in mammals. The way of the current manuscript seems to be written for limited budding yeast scientists, and should not limited to the interest by the budding yeast scientists. Thus, it would be better to discuss more about what is known about the mechanism of initiation of meiosis not only in budding yeast but also in other species to share their finding to more broad scientists using other organisms.

      We appreciate reviewer 2’s comment and have added more discussion about the parallels between yeast and mammalian systems in meiotic initiation (lines 613-624).

      Reviewer #3 (Recommendations For The Authors):

      The effect of overexpression of Swi4 is tested for MI and MII (Fig1F): this is a very indirect readout of meiotic entry. The authors could present Rec8 localization (Fig2I) at this stage. However, this is still a superficial description of the meiotic phenotype: is the phenotype only a delay or is the meiotic prophase altered. It is specifically important to analyse this in more detail to answer whether the overexpression of Swi4 leads to an identical phenotype to the one of CLN2. Also the comparison between overexpression of Swi4 and Cln2 is difficult to evaluate: what is the level of CLN2 when SwI4 is overexpressed compared to CLN2 overexpression. The percentage of nuclear Ime1 is 50% vs 5% when Swi4 or Cln2 are overexpressed. What is the interpretation? What are the levels of Ime1? (Y axis of quantifications not comparable, see also comment for Fig5F,H)

      CLN2 is expressed at a much higher level in pATG8-CLN2 cells relative to pATG8-SWI4 (Author Response Image 4). Therefore, we don’t expect identical phenotypes, but rather a more severe deficiency in meiotic entry upon CLN2 overexpression. The key experiment that establishes causality between SWI4 and CLNs is reported in Figure 3, where deletion of either CLN1 or CLN2 rescues the meiotic entry delay exerted by SWI4 overexpression.

      Fig3EF: What is the phenotype of Cln1 and Cln2 without overexpression of Swi4?

      Meiotic entry is not faster in cln1∆ or cln2∆ cells compared to wild-type. We included these data in Supplemental Figure 3 and made the relevant changes in the manuscript (lines 257-261).

      Fig4F: Need a control with CLN2 overexpression only.

      A control with only CLN2 overexpression (pATG8-CLN2) is not appropriate since these meiotic time course experiments are synchronized using the pCUP1-IME1 allele. It would be a misleading comparison since the two meiosis would have different kinetics. Figure 4F reports that despite similar IME1 transcript levels and Ime1 protein levels, CLN2 overexpressing cells still have reduced nuclear Ime1. Since side-by-side comparison of pATG8-CLN2 and pCUP1-IME1 is not possible, we chose to measure sporulation efficiency at 24 h in Figure 4A. These data together suggest that elevated IME1 transcript and protein levels cannot rescue the defects associated with increased CLN2 expression.

      Fig5E: in wild type, by Northern blot, Swi4canon level is increasing during meiosis, not decreasing?, whereas protein level is decreasing, what is the interpretation?

      Northern data is less quantitative than smFISH, which show that SWI4canon transcript levels are significantly lower in meiosis compared to vegetative cells (Figure 5D). We also note that the Northern blot data were acquired from unsynchronized meiotic cells and could have additional limitations based on the population-based nature of the assay. Finally, additional analysis of a transcript leader sequencing (TL-seq) dataset from synchronized cells (Tresenrider et al., 2021) further confirms the decrease in SWI4canon transcript levels upon meiotic entry. (Author response image 6).

      Author response image 6.

      TL-seq data from (Tresenrider et al. 2021) visualized on IGV at the SWI4 locus. Two timepoints are plotted including premeiotic before IME1 induction (pink) and meiotic prophase or after IME1 induction (blue).

      Fig5F, H. This quantification needs duplicates for validation.

      Replicates are submitted for every blot in this paper to eLIFE.It can be found in the shared Dropbox folder to the editors (named Raw-blots-for-eLIFE).

      Fig5F, H. Why are the wild type values so different?

      The immunoblotting done between Figure 5F and Figure 5H are on separate blots and therefore should not be compared. Additionally, these values are not absolute measurements of wild type values of Swi4-3V5 and therefore we should not expect them to be the same. Any comparisons done of relative amounts of Swi4-3V5 are always done on the same blot and normalized to a loading control, hexokinase.

      FigS5: What is the effect of the Ume6-T99N on Swi4 protein level and on meiotic entry? Is the backup mechanism proposed active?

      We haven’t measured Swi4 protein levels in the UME6-T99N background but given that this mutation is known to disrupt the interaction between Ime1 and Ume6, we expect a similar trend to that reported in Figure 5I (pCUP1-IME1 uninduced).

      What is the evidence that Swi4/6 is a E2F homolog? What is the homology at the protein level?

      While there is no sequence homology between SBF and E2F there is remarkable similarity between metazoans and yeast in terms of the regulation of the G1/S transition (reviewed in Bertoli et al., 2013). E2F and SBF are both repressed before the G1/S transition by the inhibitors Rb and Whi5, respectfully (Costanzo et al., 2004; De Bruin et al., 2004; Hasan et al., 2014). During G1/S transition, a cyclin dependent kinase phosphorylates and inactivates these inhibitors. We have carefully edited our language in the manuscript to “functional homology” instead of just “homology”.

      FigS3 is missing

      Each supplemental figure was matched to its corresponding main figure. In the original submission, we didn’t have Figure S3. However, the revised manuscript now contains FigS3.

      Bertoli, C., J.M. Skotheim, and R.A.M. De Bruin. 2013. Control of cell cycle transcription during G1 and S phases. Nat. Rev. Mol. Cell Biol. 14:518–528. doi:10.1038/nrm3629.

      Bowdish, K.S., H.E. Yuan, and A.P. Mitchell. 1995. Positive control of yeast meiotic genes by the negative regulator UME6. Mol. Cell. Biol. 15:2955–2961. doi:10.1128/mcb.15.6.2955.

      Brar, G.A., M. Yassour, N. Friedman, A. Regev, N.T. Ingolia, and J.S. Weissman. 2012. High-Resolution View of the Yeast Meiotic Program Revealed by Ribosome Profiling. Science (80-. ). 335:552–558. doi:10.1126/science.1215110.

      De Bruin, R.A.M., W.H. McDonald, T.I. Kalashnikova, J. Yates, and C. Wittenberg. 2004. Cln3 activates G1-specific transcription via phosphorylation of the SBF bound repressor Whi5. Cell. 117:887–898. doi:10.1016/j.cell.2004.05.025.

      Chen, J., A. Tresenrider, M. Chia, D.T. McSwiggen, G. Spedale, V. Jorgensen, H. Liao, F.J. Van Werven, and E. Ünal. 2017. Kinetochore inactivation by expression of a repressive mRNA. Elife. 6:1–31. doi:10.7554/eLife.27417.

      Chia, M., A. Tresenrider, J. Chen, G. Spedale, V. Jorgensen, E. Ünal, and F.J. van Werven. 2017. Transcription of a 5’ extended mRNA isoform directs dynamic chromatin changes and interference of a downstream promoter. Elife. 6:1–23. doi:10.7554/eLife.27420.

      Colomina, N., E. Garí, C. Gallego, E. Herrero, and M. Aldea. 1999. G1cyclins block the Ime1 pathway to make mitosis and meiosis incompatible in budding yeast. EMBO J. 18:320–329. doi:10.1093/emboj/18.2.320.

      Costanzo, M., J.L. Nishikawa, X. Tang, J.S. Millman, O. Schub, K. Breitkreuz, D. Dewar, I. Rupes, B. Andrews, and M. Tyers. 2004. CDK activity antagonizes Whi5, an inhibitor of G1/S transcription in yeast. Cell. 117:899–913. doi:10.1016/j.cell.2004.05.024.

      Hasan, M., S. Brocca, E. Sacco, M. Spinelli, P. Elena, L. Matteo, A. Lilia, and M. Vanoni. 2014. A comparative study of Whi5 and retinoblastoma proteins : from sequence and structure analysis to intracellular networks. 4:1–24. doi:10.3389/fphys.2013.00315.

      Iyer, V.R., C.E. Horak, P.O. Brown, D. Botstein, V.R. Iyer, M. Snyder, and C.S. Scafe. 2001. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature. 409:533–538. doi:10.1038/35054095.

      Malathi, K., Y. Xiao, and A.P. Mitchell. 1997. Interaction of yeast repressor-activator protein Ume6p with glycogen synthase kinase 3 homolog Rim11p. Mol. Cell. Biol. 17:7230–7236. doi:10.1128/mcb.17.12.7230.

      Malathi, K., Y. Xiao, and A.P. Mitchell. 1999. Catalytic roles of yeast GSK3β/shaggy homolog Rim11p in meiotic activation. Genetics. 153:1145–1152. doi:10.1093/genetics/153.3.1145.

      Pedruzzi, I., F. Dubouloz, E. Cameroni, V. Wanke, J. Roosen, J. Winderickx, and C. De Virgilio. 2003. TOR and PKA Signaling Pathways Converge on the Protein Kinase Rim15 to Control Entry into G0. Mol. Cell. 12:1607–1613. doi:10.1016/S1097-2765(03)00485-4.

      Pnueli, L., I. Edry, M. Cohen, and Y. Kassir. 2004. Glucose and Nitrogen Regulate the Switch from Histone Deacetylation to Acetylation for Expression of Early Meiosis-Specific Genes in Budding Yeast. Mol. Cell. Biol. 24:5197–5208. doi:10.1128/mcb.24.12.5197-5208.2004.

      Rubin-Bejerano, I., S. Sagee, O. Friedman, L. Pnueli, and Y. Kassir. 2004. The In Vivo Activity of Ime1, the Key Transcriptional Activator of Meiosis-Specific Genes in Saccharomyces cerevisiae, Is Inhibited by the Cyclic AMP/Protein Kinase A Signal Pathway through the Glycogen Synthase Kinase 3- Homolog Rim11. Mol. Cell. Biol. 24:6967–6979. doi:10.1128/mcb.24.16.6967-6979.2004.

      Tresenrider, A., K. Morse, V. Jorgensen, M. Chia, H. Liao, F.J. van Werven, and E. Ünal. 2021. Integrated genomic analysis reveals key features of long undecoded transcript isoform-based gene repression. Mol. Cell. 81:2231-2245.e11. doi:10.1016/j.molcel.2021.03.013.

      Vidan, S., and A.P. Mitchell. 1997. Stimulation of yeast meiotic gene expression by the glucose-repressible protein kinase Rim15p. Mol. Cell. Biol. 17:2688–2697. doi:10.1128/mcb.17.5.2688.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study by Wang et al. identifies a new type of deacetylase, CobQ, in Aeromonas hydrophila. Notably, the identification of this deacetylase reveals a lack of homology with eukaryotic counterparts, thus underscoring its unique evolutionary trajectory within the bacterial domain.

      Strengths:

      The manuscript convincingly illustrates CobQ's deacetylase activity through robust in vitro experiments, establishing its distinctiveness from known prokaryotic deacetylases. Additionally, the authors elucidate CobQ's potential cooperation with other deacetylases in vivo to regulate bacterial cellular processes. Furthermore, the study highlights CobQ's significance in the regulation of acetylation within prokaryotic cells.

      Weaknesses:

      While the manuscript is generally well-structured, some clarification and some minor corrections are needed.

      Reviewer #2 (Public Review):

      In recent years, lots of researchers have tried to explore the existence of new acetyltransferase and deacetylase by using specific antibody enrichment technologies and high-resolution mass spectrometry. This study adds to this effort. The authors studied a novel Zn2+- and NAD+-independent KDAC protein, AhCobQ, in Aeromonas hydrophila. They studied the biological function of AhCobQ by using a biochemistry method and used MS identification technology to confirm it. The results extend our understanding of the regulatory mechanism of bacterial lysine acetylation modifications. However, I find their conclusion to be a little speculative, and unfortunately, it also doesn't totally support the conclusion that the authors provided. In addition, regarding the figure arrangement, lots of the supplementary figures are not mentioned, and tables are not all placed in context.

      Major concerns:

      - In the opinion of this reviewer, is a little arbitrary to come to the title "Aeromonas hydrophila CobQ is a new type of NAD+- and Zn2+-independent protein lysine deacetylase in prokaryotes." This should be modified to delete the "in the prokaryotes", unless the authors get new or more evidence in the other prokaryotes for the existence of the AhCobQ.

      Thanks for your suggestions. " in the prokaryotes " has been deleted in the revised manuscript.

      - I was confused about the arrangement of the supplementary results. There are no citations for Figures S9-S19.

      Thank you very much for your suggestion. We have made revisions and highlighted in the undated manuscript.

      - No data are included for Tables S1-S6.;

      Dear reviewer, sorry to confuse you. We have included the Supplementary Tables in the undated manuscript.

      - The load control is not all integrated. All of the load controls with whole PAGE gel or whole membrane western blot results should be provided. Without these whole results, it is not convincing to come to the conclusion that the authors have.

      Dear reviewer, thanks for your suggestion. We have meticulously incorporated the complete PVDF membranes from our Western blot experiments into Supplementary Material 1. Furthermore, we have included the Coomassie Blue R-350 staining outcomes of these PVDF membranes, post-Western blot detection, as a loading control in accordance with the protocol outlined in the reference by Charlotte et al. (Journal of Proteome Research, 2011, 10:1416–1419).

      - The materials & methods section should be thoroughly reviewed. It is unclear to me what exactly the authors are describing in the method. All the experimental designs and protocols should be described in detail, including growth conditions, assay conditions, purification conditions, etc.

      Dear reviewer, thanks for your valuable comments. We have carefully reviewed the entire manuscript and made revisions, highlighted in red.

      - Relevant information should be included about the experiments performed in the figure legends, such as experimental conditions, replicates, etc. Often it is not clear what was done based on the figure legend description.

      Thank you very much for your suggestion. We have made revisions and highlighted in red.

      Reviewer #3 (Public Review):

      Summary:

      This study reports on a novel NAD+ and Zn2+-independent protein lysine deacetylase (KDAC) in Aeromonas hydrophila, termed AhCobQ (AHA_1389). This protein is annotated as a CobQ/CobB/MinD/ParA family protein and does not show similarity with known NAD+-dependent or Zn2+-dependent KDACs. The authors show that AhCobQ has NAD+ and Zn2+-independent deacetylase activity with acetylated BSA by western blot and MS analyses. They also provide evidence that the 195-245 aa region of AhCobQ is responsible for the deacetylase activity, which is conserved in some marine prokaryotes and has no similarity with eukaryotic proteins. They identified target proteins of AhCobQ deacetylase by proteomic analysis and verified the deacetylase activity using site-specific acetyllysine-incorporated target proteins. Finally, they show that AhCobQ activates isocitrate dehydrogenase by deacetylation at K388.

      Strengths:

      The finding of a new type of KDAC has a valuable impact on the field of protein acetylation. The characters (NAD+ and Zn2+-independent deacetylase activity in an unknown domain) shown in this study are very unexpected.

      Weaknesses:

      (1) As the characters of AhCobQ are very unexpected, to convince readers, MSMS data would be needed to exactly detect deacetylation at the target site in deacetylase activity assays. The authors show the MSMS data in assays with acetylated BSA, but other assays only rely on western blot.

      (2) They prepared site-specific Kac proteins and used them in deacetylase activity assays. The incorporation of acetyllysine at the target site needs to be confirmed by MSMS and shown as supplementary data.

      (3) The authors imply that the 195-245 aa region of AhCobQ may represent a new domain responsible for deacetylase activity. The feature of the region would be of interest but is not sufficiently described in Figure 5. The amino acid sequence alignments with representative proteins with conserved residues would be informative. It would be also informative if the modeled structure predicted by AlphaFold is shown and the structural similarity with known deacetylases is discussed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The protein molecules of AhCobB and AhCobQ are greater than 45 kDa. But the gene sequences don't seem to match. Please explain.

      We are sorry to confuse you. The vector used for the purification of CobB and CobQ in the manuscript is pET-32a, which carries the TrxA fusion protein and is approximately 20kDa in size. Therefore, the final molecular weight of recombinant AhCobB and AhCobQ is 48.3(28.3+ ~20kDa) and 49.8 (29.8+ ~20kDa), respectively.

      (2) Figure 7: The gels look very smeary. Please explain.

      Dear esteemed reviewer, in our study, we have meticulously crafted recombinant site-specific Kac proteins utilizing an innovative two-plasmid system, grounded on the seminal work published in Nature Chemical Biology (2017, 13(12): 1253-1260), which introduced the genetic encoding of Nᵋ-acetyllysine into recombinant proteins. However, we have encountered a prevalent challenge—the occurrence of protein truncation due to premature translation termination at the reassigned codon. This phenomenon not only diminishes protein yields, as highlighted in ChemBioChem (2017, 18(20): 1973-1983), but also plagues many recombinant proteins with a troublesome backdrop in Western Blot (WB) outcomes.

      Despite our rigorous approach, involving at least two independent repetitions for WB analysis of site-specific Kac proteins, yielding consistent results, we acknowledge that the overall quality of these WB assays remains suboptimal. This variability is inherently tied to the intrinsic properties of the target proteins themselves. Illustratively, the WB outcomes for proteins such as ENO and ICD exhibit notable differences in quality across biological replicates, emphasizing the complexity and nuances involved in this process.

      Thus, while our methodology remains robust and reproducible, we are mindful of the limitations imposed by the nature of the proteins under investigation and strive to continually refine our approaches to mitigate these challenges.

      (3) To ensure that the phenotype shown in Figure 1 is not due to polar effects, results of supplementing complementary strains should be provided.

      Thank you for your suggestion. We have constructed a complement strain and tested the bacterial migration ability. As shown in the Figure S1, the complement strain does not affect the physiological phenotype mentioned above.

      (4) The caption to Figure 8 includes * and *** to indicate significance levels, but only *** appears in the picture.

      Thank you for your suggestion. It has been modified and highlighted in red.

      (5) Has the mechanistic role of lysine 388 in ICD been characterized?

      Thank you for your invaluable professional insights. Indeed, the acetylation sites of ICD have been established to exert a significant influence on its enzymatic activity. Sumana Venkat et al., in their seminal work published in the Journal of Molecular Biology (2018, 430(13): 1901-1911), convincingly demonstrated that the acetylation of specific lysine residues—K100, K230, K55, and K350—in ICD proteins from E. coli serves as a negative regulatory mechanism for enzyme activity. Intriguingly, the functional implications of the Kac modification on K387 (corresponding to the K388 site in ICD from A. hydrophila ATCC 7966, as featured in this manuscript) remain an uncharted territory.

      Our experimental endeavors have illuminated that the K388 site of ICD in A. hydrophila holds the potential to modulate enzymatic activity and is under the regulatory influence of AhCobQ.

      (6) The format of the references is not uniform enough, for example, some journal names are abbreviated, and some are not, please check and correct.

      Thank you for your suggestion. It has been modified and highlighted in red.

      (7) Page 23, line 13, gene not expressed in italics, please correct.

      Thank you for your suggestion. It has been modified and highlighted in red.

      (8) Figure S8 does not appear to match the gene size.

      We are sorry to confuse you. The vector used for the purification of recombinant protein in the manuscript is pET-32a, which carries the TrxA fusion protein and is approximately 20kDa in size. Therefore, the final molecular weight of recombinant protein is 25.5(5.5+ ~20kDa).

      (9) The format of the two figures in Figure S10 is not uniform.

      Thank you for your suggestion. It has been modified and highlighted in red.

      Reviewer #2 (Recommendations For The Authors):

      Minor concerns:

      L147, L177 - Please arrange the results as they are shown in the content sequentially. For example, rename Figure S2 with Figure S1.

      Thank you for your suggestion. It has been modified and highlighted in red.

      L174 Figure 2D - There is no big change in the acetylation between the wild type and ahcobQ mutant from Figure 2D, but the ahcobB mutant is.

      I am extremely grateful for your insightful comment. As clearly depicted in the right panel of Figure 2D, the overall Kac protein levels in both the ahcobQ and ahcobB knockout strains exhibit a marked elevation compared to the wild-type strain, despite equivalent loading of total cellular proteins (the left panel of Figure 2D). Notably, this increase is particularly pronounced among proteins with a molecular weight below 35 kDa. We wholeheartedly concur with your perspective that the deletion of ahcobB leads to a more substantial enhancement in Kac protein levels, suggesting CobB may play a pivotal role in regulating a broader spectrum of acetylated proteins or Kac sites. This hypothesis is further strengthened by subsequent mass spectrometry analyses, which lend additional credence to our shared understanding.

      L174-187, L795 - Please show the whole membrane (or PAGE gel) of the loading control of CobB, and CobQ, except for the Kac-BSA.

      Dear esteemed reviewer, we have thoroughly revised our submission to include the full western blot (WB) membrane for all figures and supplementary figures within the updated Supplementary Material 1. Additionally, we would like to clarify a few crucial points to ensure transparency and accuracy.

      Firstly, in Figure 2D, we present WB results solely pertaining to whole-cell samples from cobB or cobQ mutant strains. Consequently, these findings do not directly correlate with recombinant CobB or CobQ proteins.

      Secondly, the objective of Figure 2 is to validate the lysine deacetylase activity of AhCobQ protein through a qualitative, rather than quantitative, experimental approach. Hence, the crucial loading control lies in the amount of Kac-BSA, rather than CobB or CobQ. Prior to conducting the in vitro deacetylase assay, we ensured equal protein concentrations of purified CobB or CobQ using BCA assay, adhering to the protocol's specified deacetylase-to-Kac-BSA loading ratio of 1:5. However, this ratio renders the deacetylase (CobB or CobQ) undetectable on Coomassie Blue R-350-stained blots or WB membranes (as detailed in the whole WB membrane in Supplementary Material 1).

      To reinforce our observations, we reiterated the analysis of protein samples by subjecting them once again to SDS-PAGE, maintaining the same loading quantity as utilized in the preceding western blotting experiment shown in Figure 2E. As Author response image 1 clearly illustrates, the CobB/CobQ bands are indeed discernible, albeit they exhibit significantly fainter intensities when compared to the Kac-BSA bands. Notably, upon reviewing the full strained PVDF membrane presented in Supplementary Material 1, we find that the CobB/CobQ bands are not readily visible. This observation can be attributed to the potential loss of proteins during the transfer process from SDS-PAGE to the PVDF membrane.

      Author response image 1.

      The SDS-PAGE gel displayed the loading amounts of Kac-BSA and CobB/CobQ.

      Furthermore, recognizing the potential for confusion given the similar molecular weights of CobB (257aa) and CobQ (264aa, excluding fusion tags), we conducted a comparative analysis of deacetylase activity between His-tagged and GST-fused recombinant CobQ proteins. Encouragingly, both variants exhibited deacetylase activity (as presented in Figure S5 of the revised manuscript), thereby excluding any influence from nonspecific proteins that might have contaminated the purification process.

      We hope these clarifications and additions to our submission address your concerns and enhance the overall quality of our work. Thank you for your valuable time and consideration.

      - Could you provide the raw data of these anti-acetylation western blot results?

      Thank you very much for your suggestion. The raw results have been uploaded in the supplementary materials.

      - According to the loading control, the protein quantity of BSA is very big, however, why is the acetylation of Kac-BSA relatively low? Is it consistent between the western blot and loading control?

      Thank you very much for your suggestion, first of all, all the western blot and loading control in the manuscript are the same membrane, and the specific method is described in "Western blot". Therefore, there is no possibility that the western blot and loading control do not correspond. Secondly, not every site of BSA has acetylation modifications, and the amount of modifications at each site is also different, so there will be a large amount of protein but a small amount of acetylation.

      Figure 2C - Could the Dot blot experiment be described in detail in the Methods part?

      Thank you for your suggestion. It has been added and highlighted in red.

      Figure 2C&2D - Please provide the anti-acetylation antibody information.

      Thank you for your suggestion. It has been added and highlighted in red.

      Figure 2E - It is confusing why the acetylation of Kac-BSA is higher than adding NAD+ with CobB? But only CobB can deacetylate the Kac-BSA without NAD+?

      We are sorry to confuse you. The information in the figure is incorrect. For somehow, we provided the uncorrected version, and we have revised it in the undated manuscript.

      Figure 2F - The control of this experiment should include the NAM, CobB, and NAM+CobB. Similar to 2E, it also should include NAD, CobB, and NAD+CobB, respectively. Same with 2H.

      We are sorry to confuse you. The intent of Figure 2F is to further confirm that AhCobQ is different from AhCobQ and can remove the acetylation modification of BSA without relying on NAD+, so NAD+ was added to this group of experiments. We have revised the manuscript to add details about the experiments.

      L178 Figure S1C - One question about the protein AhAcuC. From the PCR results, it is larger than ahcobB and ahcobQ, however, why is the protein AhAcuC smaller than them?

      We are sorry to confuse you. The images in the original manuscript may have had some errors in protein size due to different PAGE gels. We have re-run the gels and replaced them in the manuscript in the Supplementary Figure S3 in revised manuscript.

      - All the proteins are expressed and purified from E.coli BL21(DE3). How did you avoid the pollution of the deacetylase from the E.coli? There is no control over it in your experiment. Without this control, it is not easy to come to the conclusion that the deacetylation is from the AhCobQ but not from the pollution from the protein purification.

      In response to your inquiry, we have conducted a meticulous comparative analysis of the deacetylase activity exhibited by both His-tagged and GST-fused recombinant AhCobQ proteins. Reassuringly, our findings reveal that both variants possess robust deacetylase activity, as clearly demonstrated in Figure S5 of the revised manuscript. Furthermore, to ensure the rigor of our experiments, we employed GST protein purified from E.coli strains as a negative control in Figure S8. The Western blot (WB) results conclusively demonstrate that GST protein alone lacks deacetylase activity, thereby reinforcing the authenticity of our findings and effectively mitigating any concerns regarding potential interference from nonspecific proteins during the purification process.

      L190 - Could you provide the raw data for Table S1?

      Thank you very much for your suggestion. The raw MS data were deposited in the public ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD038735 or IPX0005366000(iProx database). We also uploaded the analysis results in Table S1 and Supplementary material 2.

      - I am not an expert on MS. I have one question about the MS results. Why there is no peak for the CobB or CobQ as they add to the reaction system?

      Thank you for your insightful question. To clarify, the Kac peptides identified from Kac-BSA, as presented in Table S1, were meticulously selected for the purpose of enhancing their display and facilitating interpretation. The comprehensive raw mass spectrometry (MS) data, along with detailed analytical outcomes, have been diligently deposited within the ProteomeXchange Consortium, specifically through the PRIDE partner repository, under the dataset identifier PXD038735 or alternatively accessible via the iProx database under IPX0005366000. The analysis results also included in the Table S1 and Supplementary material 2.

      Furthermore, it is crucial to note that in this study, we utilized Bovine serum albumin (BSA) as the foundational database for our MS searches. Consequently, the absence of CobB or CobQ proteins in our MS results stems from the inherent focus on BSA and the specific experimental design, which did not encompass the detection of these particular proteins.

      We appreciate your attention to these details and hope this clarification addresses your query.

      L189-L206 - Based on the results here, the function of CobB and CobQ overlaps on the same STDKac peptides.

      Dear esteemed reviewer, our mass spectrometry (MS) analysis has revealed an intriguing finding: CobB and CobQ indeed function on the same STDKac peptide, suggesting a potential collaboration among distinct deacetylases in regulating protein function. This observation is further corroborated by our subsequent quantitative Kac proteomics results, which were obtained from three deacetylase mutants. These results underscore the possibility that CobB, CobQ, and AcuC possess both unique and overlapping protein substrates, reinforcing our hypothesis that multiple deacetylases work in concert to modulate protein activity.

      - Do you assay the Km and Kcat about the CobQ by using Kac-BAS as the substrate by comparing with AhCobB?

      Dear reviewer, thanks for your professional suggestion. In accordance with your guidance, we diligently attempted to analyze the Km or Kcat values of CobQ during its incubation with the substrate Kac-BSA using LC-MS/MS, repeating the process twice. However, to our disappointment, our current experimental platform has been unable to detect any discernible metabolites. We suspect that this may stem from operational proficiency challenges, as even our positive control experiment involving CobB incubation has failed to yield satisfactory results.

      Given our uncertainty regarding the root cause of these issues, coupled with the suggestion from experts that the LC column might be a contributing factor except for skill, we have decided against repeating the experiments at this juncture. Nonetheless, we would like to assure you that we have rigorously validated the deacetylase activity of CobQ proteins through mass spectrometry, as detailed in our manuscript.

      Furthermore, I am delighted to share that our preliminary findings have sparked interest among other research teams. In fact, one such group, upon reading our preprint, has independently tested the activity of CobQ and uncovered an additional intriguing function. We are actively exploring the possibility of collaborating with this team to delve deeper into the research and, hopefully, in the future, conduct a more refined analysis of the Km and Kcat of CobQ.

      L214- Same question with Figures 2E-2H. Could you provide the whole page gel about the loading control? I want to know the quantity of the AhCobQ in this experiment except for the Kac-BSA. To tell the truth, the quantity of BSA is too much in the deacetylation reaction system to be able to tell its deacetylation activity in vitro.

      Thank you very much for your suggestion. The raw data has been uploaded in the supplementary materials and the clarification is similar with above mentioned.

      L217 - There might be a wrong citation of Figure S2 here.

      Thank you for your suggestion. It has been corrected.

      L244-250, Figure 6A - Are there 47, not 46 Kac proteins?

      Thank you for your suggestion. It has been corrected.

      - Are there nineteen, not nine increased Kac peptides common between the ΔahcobQ and ΔahacuC strains?

      Thank you for your suggestion. It has been corrected.

      - Are there ten, not six increased Kac peptides common between the ΔahcobQ and ΔahcobB strains?

      Thank you for your suggestion. It has been corrected.

      - Are there 69, not 65 increased Kac peptides common between the ΔahcobB and ΔahacuC strains?

      Thank you for your suggestion. It has been corrected.

      - Where is the raw data for Table S2?

      Thank you very much for your suggestion. The raw data has been uploaded in the supplementary materials.

      Figure 6B - Are there 52, not 51 Kac peptides?

      Thank you for your suggestion. It has been corrected.

      L272 - Why do you choose these 11 target proteins? There is no description of this background in the context.

      We have opted to prioritize these proteins for subsequent validation, as their Kac levels exhibit a notable upregulation in the ΔahcobQ strain, potentially indicating their role as protein substrates for AhCobQ. We will incorporate this clarification into the revised manuscript to ensure clarity and comprehensiveness.

      L277 - Figure S6 - Please show the whole PAGE gel about the loading control.

      Dear esteemed reviewer, we sincerely apologize for any confusion our previous presentation may have caused. We would like to clarify that the bottom panel of Figure S6 depicts a Coomassie Blue R-350 stained whole PVDF membrane, rather than a PAGE gel, as may have been mistakenly inferred. To facilitate a comprehensive understanding, we have included the entire stained PVDF membranes in Supplementary Material 1.

      As we have previously elaborated, the recombinant His-tagged or GST-fused AhCobQ proteins were not as discernible on the PVDF membrane due to a relatively lower loading amount compared to that of Kac-BSA.

      -There might be a wrong citation in Figure S6. As you mentioned in the context, you expressed and purified 11 proteins and then tested their acetylation background.

      Thank you for your suggestion. It has been corrected.

      L280 - Figure S7 -The label of the Figure should be modified for the ATP.

      Thank you for your suggestion. It has been modified.

      - How did you do the experiment for 0h of ATP? There is no description of it in the Methods.

      Thank you for your suggestion. It has been added.

      - Please show the whole PAGE gel about the loading control.

      Thank you very much for your suggestion. The whole PAGE gel has been uploaded in the supplementary materials.

      L282 - Figure 7 - Please show the whole PAGE gel about the loading control.

      Dear esteemed reviewer, we sincerely apologize for any confusion our previous presentation may have caused. We would like to clarify that the bottom panel of Figure S6 depicts a Coomassie Blue R-350 stained whole PVDF membrane, rather than a PAGE gel, as may have been mistakenly inferred. To facilitate a comprehensive understanding, we have included the entire stained PVDF membranes in Supplementary Material 1.

      - Please adjust the font size of "A" and "B".

      Thank you for your suggestion. It has been adjusted.

      Figure 7A - The anti-acetylation Western blot here does not look good. All the western blots here should be re-done.

      Dear reviewer, the recombinant site-specific Kac proteins were constructed by two-plasmid system based on genetically encoding Nᵋ-acetyllysine in recombinant proteins in this study (Nature chemical biology, 2017, 13(12): 1253-1260). However, a common problem experienced is protein truncation arising from translation termination at the reassigned codon, lowering protein yields (ChemBioChem, 2017, 18(20): 1973-1983), and leading to a dirty background of WB results in many recombinant proteins. Although we did perform at least two times independent repeats for site-specific Kac protein WB and got similar results, the WB quality of site-specific Kac proteins are general poor and that depend on the properties of target proteins. For example, the WB results of ENO and ICD can display considerable qualities in different biological repeats.

      - Why did you choose the PAGE gel but not the anti-His Western blot as the loading control?

      Thank you very much for your suggestion. Labeling antibodies is a very effective loading control. However, in order to ensure the accuracy of the data, both the experimental data and loading control in this manuscript are required to be reflected on the same membrane. If His tags are used, the membrane will be washed repeatedly for secondary color development. Based on the fact that acetylation modification is already difficult for color development, this will greatly affect the quality of the results presented. Meanwhile, while ensuring consistent protein levels, we believe that changes in acetylation modifications can also explain the issue. Therefore, you choose the PAGE gel but not the anti-His Western blot as the loading control.

      L278 - Where are the results of the site-specific lysine acetylation of the target protein by using two-plasmid-based system of genetically encoded Nε-acetyllysine. Usually, there will be a shift when it is full acetylated by compared with the wild-type protein.

      Sorry for the confusion caused. As the size of the acetyl group is only about 40.6Da, which is thousands of times smaller than the size of the protein, the changes in size of the protein before and after modification cannot be seen with the naked eye.

      L287 - Where is Figure 7C?

      We are sorry to confuse you. It has been corrected.

      - Here the citation might be Figure 7A but not Figure 7B.

      Thank you for your suggestion. It has been corrected.

      L290 - It is difficult to read here, please rearrange this Figure S8. There is no useful label.

      Thank you for your suggestion. It has been corrected.

      - The citation of Figure S8 is wrong.

      Thank you for your suggestion. It has been corrected.

      - For Figure S8, please add the label on the figure. And add anti-GST western blot as well. Because the GST is about 26KD, why are the purified recombinant truncated proteins (GST-fusion) so small?

      Sorry for the inconvenience caused. The truncated fragment used for recombinant purification in Figure S8 is very small, and when converted to protein, it is approximately between 1-5kDa. Therefore, the resulting protein is also very small.

      - Why there are two Figure S8 in the supplemental materials?

      We are sorry to confuse you. It has been corrected.

      L293 - Where is Figure 7D?

      We are sorry to confuse you. It has been corrected.

      L297-313 - Please provide the MS result of the ICDK388?

      Author response image 2.

      The mass spectrum of Kac modification on ICD protein at K388 site.

      Dear reviewer, we are pleased to present the mass spectrum data pertaining to the Kac modification at the K388 site of the ICD protein in Δ_ahcobQ_ strain in Figure2 in this responding letter. It is important to clarify that, while we have not directly validated the Kac status of site-specific lysine acetylation at the recombinant ICD K388 site through mass spectrometry (MS) in this particular study, we have strong reasons to believe in its specificity.

      Firstly, our confidence stems from the well-established and rigorously validated two-plasmid system methodology for site-directed acetylation modification. This approach has been successfully employed in modifying diverse and specific sites across various proteins, as evidenced by the pioneering work of David et al. in Nature Chemical Biology (2017, 13(12), 1253-1260).

      Secondly, we have taken meticulous measures to ensure the accuracy and reliability of our findings. This includes double-checking our PCR primers and DNA sequencing for the genetic code expansion technology employed. Furthermore, we have included control experiments utilizing proteins that were not subjected to site-directed acetylation (ICD), as detailed in Figure 8A in revised manuscript, thereby providing an additional layer of validation and reinforcing the robustness of our results.

      We believe that these two lines of evidence, combined with our rigorous experimental design and execution, provide a solid foundation for our conclusion regarding the specific acetylation of the K388 site in ICD.

      - Please provide the whole PAGE gel of loading control. Or other anti-His results?

      Dear esteemed reviewer, we sincerely apologize for any confusion our previous presentation may have caused. We would like to clarify that the bottom panel of Figure S6 depicts a Coomassie Blue R-350 stained whole PVDF membrane, rather than a PAGE gel, as may have been mistakenly inferred. To facilitate a comprehensive understanding, we have included the entire stained PVDF membranes in Supplementary Material 1.

      - Do you have site-specific antibody of ICDK388? It should be better to identify the ICDK388 with site-specific anti-acetylation antibody.

      Thank you for your insightful suggestion. We fully concur that a site-specific antibody targeting ICDK388 would be an optimal tool to elucidate the impact of CobQ on the acetylation status (Kac) of this protein. Unfortunately, we are currently without such an antibody due to the intricate and time-consuming process of its production, which also requires rigorous validation to ensure specificity. Furthermore, the cost associated with its development is considerable.

      To address this limitation, in the present manuscript, we have innovatively employed a two-plasmid system for site-directed acetylation modification of ICDK388. This method, which has been extensively validated and utilized in modifying diverse specific sites (David et al., Nature Chemical Biology, 2017, 13(12), 1253-1260), allowed us to precisely manipulate the acetylation status of our target protein. Additionally, we incorporated control experiments using proteins that were not subjected to site-directed acetylation, as depicted in Figure 8A in revised manuscript, thereby reinforcing the robustness and reliability of our findings.

      - Please give some background information about K388 site of ICD in the context.

      Thank you for your suggestion. It has been added.

      L484 - Could you provide the reference for this assay method "Protein deacetylation assay in vitro"?

      Thank you for your suggestion. The work published in science 327, 1004 (2010) and Nat. Protoc.5, 1583-1595.

      L490 - There is no detailed information about the growh condition for the quantitative acetylome analysis. Without these information, the proportion of the Kac peptides doesn't make any sense.

      Thank you for your suggestion. It has been added.

      L531 - Insert one line before the paragraph of Western blot.

      Thank you for your suggestion. It has been inserted.

      Reviewer #3 (Recommendations For The Authors):

      Tables S1 and S2 are missing. I could not fully understand the manuscript without them.

      We are sorry to confuse you.The data has been uploaded in the supplementary materials.

      Line 130. The gene IDs of AhCobB and AhAcuC should be presented.

      Thank you for your suggestion. It has been presented.

      Line 285. What is different between ArcA and ArcA-2? Please clarify.

      Thank you for your suggestion. ArcA is aerobic respiration control protein ArcA, gene name AHA_3026 (https://www.uniprot.org/uniprotkb/A0KMM9/entry). ArcA-2 is arginine deiminase, which gene name is AHA_4093  (https://www.uniprot.org/uniprotkb/A0KQG6/entry). Therefore, they are different proteins according to Uniport annotation.

      Line 303. 8further, a bug?

      We are sorry to confuse you. It has been corrected.

      Line 412-416. The related papers on ICD acetylation in E. coli should be cited.

      We are sorry to confuse you. It has been added.

      Line 478. Not in vivo but in vitro?

      Sorry to confuse you. It should be in vitro. We have revised in the updated manuscript.

      Figure 3C and 3D. The image resolution is bad. The figures should be improved so that readers to know easily that Kac is exactly incorporated at the target site.

      Thank you for your suggestion. It has been corrected.

      Figure 4B. The amino acid residues of the whole AhCobB should be 1-264 aa.

      Thank you for your suggestion. It has been corrected.

      Figure 8. It would be better to use the same colors between panels C and D. It should be shown the significance between ICD-Kac388 and ICD-Kac388+AhCobB to support the authors' conclusion that AhCobQ activates ICD by deacetylation at K388.

      Thank you for your suggestion. It has been adjusted.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      “The exact levels of inhibition, excitation, and neuromodulatory inputs to neural networks are unknown. Therefore, the work is based on fine-tuned measures that are indirectly based on experimental results. However, obtaining such physiological information is challenging and currently impossible. From a computational perspective it is a challenge that in theory can be solved. Thus, although we have no ground-truth evidence, this framework can provide compelling evidence for all hypothesis testing research and potentially solve this physiological problem with the use of computers.”

      Response: We agree with the reviewer. This work was intended to determine the feasibility of reverse engineering motor unit firing patterns, using neuron models with a high degree realism. Given the results support this feasibility, our model and technique will therefore serve to construct new hypotheses as well as testing them.

      • Common input structure lines 115

      I agree with the following concepts, but I would specify that there is not only one dominant common input. It has been shown that there are multiple common inputs to the same motor nuclei (e.g., the two inputs are orthogonal and are shared with a subset of the active motoneurons) particularly for agonist motoneuron pools of synergistic muscles. On the hand muscles the authors are correct that there is only one dominant common input. Moreover, there is also some animal work suggesting that common inputs is just an epiphenomenon. This is completely in contradiction to what we observe in-vivo in the firing patterns of motor units, but perhaps worth mentioning and discussing.

      Response: Thanks for emphasizing this point. We have cited a recent reference discussing the important issue of common drive and the possibility of more than one source. Our simulations assume the net form of the excitatory input to all motoneurons in the pool is the same, except for noise. This net form (which produces the linear CST output in each case) essentially represents the sum of all inputs, both descending and sensory. Our results show the same over pattern as human data, i.e. that all motor unit firing patterns have similar trajectories (again allowing for the impact of noise). Future studies will consider separating excitatory inputs into different sources.

      It is interesting that the authors mention suprathreshold rate modulation. Could the authors just discuss more on how the model would respond to a simulated suprathreshold current for all simulated motoneurons (i.e., like the ones generated during a suprathreshold-injected current or voluntary maximal feedforward movement?)

      Response: Thank you for this point. Our use of the term “suprathreshold” was not applied correctly. We meant “suprathreshold” to refer to amount of input above the recruitment threshold. We have decided to remove this term so now the sentence “…so less is available for rate modulation…”.

      194 a full point is missing.

      Response: We addressed the error.

      204-231 and 232-259, these two paragraphs have been copied twice.

      Response: We addressed the error.

      Line 475 typo

      Response: We addressed the error.

      591 It would be interesting to add the me it takes a standard computer with known specs and a super computer to run over one batch of simulation (i.e., how long one of the 6,300,000 simulation takes).

      Response: Each simulation took about 20 minutes of real me. Assuming a standard computer with 16 processor cores using a similar microarchitecture as Bebop (Intel Broadwell architecture), the standard computer could run 16 simulations at a me (one simulation assigned per core). This would take the standard computer about 15 years to complete all 6.3M simulations.

      594 I don't understand why there are 6M simulations, could the authors provide more info on the combinations and why there are 6M simulations.

      Response: The 6M simulations are the total number of simulations that were performed for this work. A detailed explanation can be found in section: “Machine learning inference of motor pool characteristics” at line 591. Briefly, there were 315,000 simulations of a pool of 20 motoneurons (20 x 315,000 = 6.3 million). The 315,000 simulations was required to run all possible combinations of 15 patens of inhibition, 5 of neuromodulation, 7 of distribution of excitatory inputs and 30 different repeats of synaptic noise with different seeds. In addition, there were 20 iterations for each of these combinations to generate a linear CST output (as illustrated in Fig. 3). 15 x5 x 7 x 30 x 20=315,000.

      In several simulations it seems that there was a lot of fine-tuning of inputs to match the measured motor unit firing pattern. Have the authors ever considered a fully black-box AI approach? If they think is interesting maybe it could spice up the discussion.

      Response: We agree that AI has potential for reverse engineering the whole system and we are looking into adding it to future version of this algorithm as an alternative. We started with a simple but powerful grid search to enhance our understanding of the interaction between inputs, neuron properties and outputs.

      Reviewer 2

      Comment 1:

      “First, I believe that the relation between individual motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties can be illustrated more clearly. Although this is explained in the text, I believe that this is not optimally supported by figures. Figure 6 to some extent shows this, but figures 8 and 9 as well as Table 1 shows primarily the goodness of fit rather than the actual fit.”

      Response: We agree with the reviewer that showing the relationship between the motor neuron behavioral characteristics (delta F, brace height etc.) and the motor neuron input properties would be a great addition to the manuscript. Because the regression models have multiple dimensions (7 inputs and 3 outputs) it is difficult to show the relationship in a static image. We thought it best to show the goodness of fit even though it is more abstract and less intuitive. We added a supplemental diagram to Figure 8 to show the structure of the reverse engineered model that was fit (see Figure 8D).

      Author response image 1.

      Figure 8. Residual plots showing the goodness of fit of the different predicted values: (A) Inhibition, (B) Neuromodulation and (C) excitatory Weight Rao. The summary plots are for the models showing highest 𝑅𝑅2 results in Table 1. The predicted values are calculated using the features extracted from the firing rates (see Figure 7, section Machine learning inference of motor pool characteristics and Regression using motoneuron outputs to predict input organization). Diagram (D) shows the multidimensionality of the RE models (see Model fits) which have 7 feature inputs (see Feature Extraction) predicting 3 outputs (Inhibition, Neuromodulation and Weight Rao).

      Comment 2:

      “Second, I would have expected the discussion to have addressed specifically the question of which of the two primary schemes (pushpull, balanced) is the most prevalent. This is the main research question of the study, but it is to some degree le unanswered. Now that the authors have identified the relation between the characteristics of motor neuron behaviors (which has been reported in many previous studies), why not exploit this finding by summarizing the results of previous studies (at least a few representative ones) and discuss the most likely underlying input scheme? Is there a consistent trend towards one of the schemes, or are both strategies commonly used?”

      Response: We agree with the reviewer that our discussion should have addressed which of the two primary schemes – push-pull or balanced – is the most prevalent. At first glance, the upper right of Figure 6 looks the most realistic when compared to real data. We thus would expect that the push-pull scheme to dominate for the given task.

      We added a brief section (Push-Pull vs Balance Motor Command) in the discussion to address the reviewer’s comments. This section is not exhaustive but frames the debate using relevant literature. We are also now preparing to deploy these techniques on real data.

      Comment 3:

      In addition, it seems striking to me that highly non-linear excitation profiles are necessary to obtain a linear CST ramp in many model configurations. Although somewhat speculative, one may expect that an approximately linear relation is desired for robust and intuitive motor control. It seems to me that humans generally have a good ability to accurately grade the magnitude of the motor output, which implies that either a non-linear relation has been learnt (complex task), or that the central nervous system can generally rely on a somewhat linear relation between the neural drive to the muscle and the output (simpler task).

      Response: We agree with the reviewer, and we were surprised by these results. Our motoneuron pool is equipped with persistent inward currents (PICs) which are nonlinear. Therefore, for the motoneuron to produce a linear output the central nervous system would have to incorporate these nonlinearities into its commands.

      Following this reasoning, it could be interesting to report also for which input scheme, the excitation profile is most linear. I understand that this is not the primary aim of the study, but it may be an interesting way to elaborate on the finding that in many cases non-linear excitation profiles were needed to produce the linear ramp.

      This is a very interesting point. The most realistic firing patterns – with respect to human data – are found in the parameter regions in the upper right in Figure 6, which in fact produce the most nonlinear input (see push-pull pattern in Figure 4C). However, in future studies we hope to separate the total motor command illustrated here into descending and feedback commands. This may result in a more linear descending drive.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study provides valuable insights into allosteric regulation of BTK, a non-receptor protein kinase, challenging previous models. Using a variety of biophysical and functional techniques, the paper presents evidence that the N-terminal PH-TH domain of BTK exists in a conformational ensemble surrounding a compact SH3-SH2-kinase core, that the BTK kinase domain can form partially active dimers, and that the PH domain can form a novel inhibitory interface after SH2/SH3 disengagement. Overall the presented evidence is solid, but the EM results may be over-interpreted and the work would benefit from additional functional validation.

      We made every effort in our descriptions of the cryoEM data presented for full-length BTK to not overinterpret the results. In essence this is not an ideal EM target but given the failure by us and others to capture the full-length multi-domain protein crystallographically, we decided that the albeit low resolution cryoEM data are useful to the field.

      Reviewer #1 (Public Review):

      The manuscript by Lin et al describes a wide biophysical survey of the molecular mechanisms underlying full-length BTK regulation. This is a continuation of this lab's excellent work on deciphering the myriad levels of regulation of BTKs downstream of their activation by plasma membrane localised receptors.

      The manuscript uses a synergy of cryo EM, HDX-MS and mutational analysis to delve into the role of how the accessory domains modify the activity of the kinase domain. The manuscript essentially has three main novel insights into BTK regulation.

      1) Cryo EM and SAXS show that the PHTH region is dynamic compared to the conserved Src module.

      2) A 2nd generation tethered PH-kinase construct crystal of BTK reveals a unique orientation of the PH domain relative to the kinase domain, that is different from previous structures.

      3) A new structure of the kinase domain dimer shows how trans-phosphorylation can be achieved.

      Excitingly these structural works allow for the generation of a model of how BTK can act as a strict coincidence sensor for both activated BCR complex as well as PIP3 before it obtains full activity. To my eye the most exciting result of this work is describing how the PH domain can inhibit activity once the SH3/SH2 domain is disengaged, allowing for an additional level of regulatory control.

      I have very few experimental concerns as the methods and figures are well-described and clear. As the authors are potentially saying that the previously solved PH domain-kinase interface is artefactual, additional evidence strengthening their model would be helpful to resolve any possible controversies.

      We do not argue that the previously solved PH domain-kinase interface is artefactual. Instead we point out that the PH/kinase interface identified in the prior structure is incompatible with the contacts between the SH3 and kinase domains in autoinhibited BTK. This then leads us to the suggestion that a PH/kinase inhibitory interaction may instead occur upon dissociation of the SH3-SH2 cassette from the kinase domain. Our data support that model. Moreover, our data suggest the PHTH domain is dynamic, likely not settling in to one particular autoinhibitory state. Thus, it is possible the previously solved PH/kinase structure exists within the conformational ensemble of a range PH/kinase domain interactions. In an effort to clarify our think we added two sentences to the Discussion (pg. 19).

      Reviewer #2 (Public Review):

      In this study, multiple biophysical techniques were employed to investigate the activation mechanism of BTK, a multi-domain non-receptor protein kinase. Previous studies have elucidated the inhibitory effects of the SH3 and SH2 domains on the kinase and the potential activation mechanism involving the membranebound PIP3 inducing transient dimerization of the PH-TH domain, which binds to lipids.

      The primary focus of the present study was on three new constructs: a full-length BTK construct, a construct where the PH-TH domain is connected to the kinase domain, and a construct featuring a kinase domain with a phosphomimetic at the autophosphorylation site Y551. The authors aimed to provide new insights into the autoinhibition and allosteric control of BTK.

      The study reports that SAXS analysis of the full-length BTK protein construct, along with cryoEM visualization of the PH-TH domain, supports a model in which the N-terminal PH-TH domain exists in a conformational ensemble surrounding a compact/autoinhibited SH3-SH2-kinase core. This finding is interesting because it contradicts previous models proposing that each globular domain is tightly packed within the core.

      Furthermore, the authors present a model for an inhibitory interaction between the N-lobe of the kinase and the PH-TH domain. This model is based on a study using a tethered complex with a longer tether than a previously reported construct where the PH-TH domain was tightly attached to the kinase domain (ref 5). The authors argue that the new structure is relevant. However, this assertion requires further explanation and discussion, particularly considering that the functional assays used to assess the impact of mutating residues within the PH-TH/kinase domain contradict the results of the previous study (ref 5).

      In our hands BTK activity is not significantly affected by mutation of just two residues, R133 and Y134. It is somewhat difficult to compare the previously reported activity assay for the same BTK mutant (Wang et al. ref 5, Figure 4D) with the data we report here. For unexplained reasons, the time scale for the quantitative assay in the previous work is truncated to 50 munutes for the R133/Y134 mutant data compared to 120 minutes for all of the other activity data reported in that figure. In our data, if we qualitatively examine the differences in a representative progress curve at 50 minutes between WT and the double R133/Y134 mutant (see Figure 6a, dark blue and pink traces) one might conclude that the R133/Y134 mutation is activating BTK. However, when we calculate the average kinase activity rate ± standard error for three independent experiments we find that the difference between WT and the double R133/Y134 mutant is not significant (see Figure 6b and c). Thus, instead of making any assertions about the previously published data we are trying to be as rigoruous as possible in presentation and interpretation of our own data.

      In addition, throughout the manuscript we tried to be very careful in our discussion of our data and that published previously, to avoid conclusive statements about the previously described interface. Afterall, one of our overriding conclusions is that the N-terminal region of BTK is highly dynamic. See response to reviewer 1 above.

      Additionally, the study presents the structure of the kinase domain with swapped activation loops in a dimeric form, representing a previously unseen structure along the trans-phosphorylation pathway. This structure holds potential relevance. To better understand its significance, employing a structure/function approach like the one described for the PH-TH/kinase domain interface would be beneficial.

      We completely agree with this comment and are pursuing such studies now.

      Overall, this study contributes to our understanding of the activation mechanism of BTK and sheds light on the autoinhibition and allosteric control of this protein kinase. It presents new structural insights and proposes novel models that challenge previous understandings. However, further investigation and discussion would significantly strengthen the study.

      As indicated we are pursuing further investigation and felt that the body of work presented here is sufficient for a single manuscript.

      Reviewer #3 (Public Review):

      Yin-wei Lin et al set out to visualize the inactive conformation of full-length Bruton's Tyrosine Kinase (BTK), a molecule that has evaded high-resolution structural studies in its full-length form to this date. An open question in the field is how the Pleckstrin Homology-Tec Homology (PHTH) domain inhibits BTK activity, with multiple competing models in the field. The authors used a complimentary set of biophysical techniques combined with well-thought-out stabilizing mutations to obtain structural insights into BTK regulation in its full-length form. They were able to crystallize the full-length construct of BTK but unfortunately, the PHTH was not resolved yielding a structure similar to that previously obtained in the field. The investigation of the same construct by SAXS yielded an elongated structural model, consistent with previous SAXS studies. Using cryo-EM the authors obtained a low-resolution model for the FL BTK with a loosely connected density assigned to the dynamic PHTH around the compact SH2-SH3-Kinase Domain (KD) core. To gain further molecular insights into PHTH-KD interactions the authors followed a previously reported strategy and generated a fusion of PHTH-KD with a longer linker, yielding a crystal structure with a novel PHTH-KD interface which they tested in biochemical assays. Lastly, Yin-wei Lin et al crystallized the BTK KD in a novel partially active state in a "face-to-face" dimer with kinases exchanging the activation loops, although partially disordered, being theoretically perfectly positioned for transphosphorylation. Overall this presents a valiant effort to gain molecular insights into what clearly is a dynamic regulatory motif on BTK and is a valuable addition to the field.

      However, this work can be improved by considering these points:

      1) The cryo-EM reconstructions are potentially over-interpreted. The reported resolution for all of the analyzed reconstructions is better than 8Å, at which point helices should be recognized as well-resolved structural elements. In the current view/depiction of the cryo-EM maps/models it is hard to see such structural features and it would be great if the authors could include a panel showing maps at higher thresholds to show correspondence between the helices in the kinase C lobe and the cryo-EM maps. Otherwise, the overall positioning of the models within the cryo-EM maps is hard to evaluate and may very well be wrong. (Fig 4, S2).

      First, we fully recognize the model is low-resolution and we are careful in our discussion of the cryo-EM data to use language that acknowledges the limitations of the model. Nevertheless, this is the model we have (specific data processing points are discussed below).

      The resolution numbers are from the Fourier Shell Correlation (FSC) curve given by Cryosaprc at the end of refinement. We do acknowledge the reviewer’s comments that the resolution could be over estimated in that calculation, but our main focus is to show that the overall domain arrangement of the autoinhibited BTK core (Src-module) fits into the reconstructions.

      We tested visualizing the maps at higher threshold, but the secondary structures of the reconstructions were still not well resolved. We do realize that with the current reconstructions, we do not have the structural details to correctly orientate and fit individual domains; this is why we chose to simply fit the available crystal structure of the autoinhibited BTK SH3-SH2-kinase core into the maps.

      2) With the above in mind, if the maps are not at the point where helices are well resolved, it may be beneficial to low-pass filter the maps to a more conservative resolution for fitting, analysis, and representation. (Fig 4, S2).

      Using low-pass filtered maps at 10Å or unsharpened maps, the fitting of the BTK model and map do not change significantly.

      3) It would be valuable to get a quantitative metric on the model/map fitting for the cryo-EM work. One good package for this is Situs which provides cross-correlation values for the top orthogonal fits, without user input for initial fitting. This would again increase confidence in the correctness of model positioning on the map. (Fig 4, S2).

      Thank you for this suggestion. We tested the colores feature (Exhaustive One-At-A-Time 6D Search) in Situs to perform model to map fitting without user input as the reviewer suggested. The highest ranked fitting is identical to what we presented in the manuscript. Following are the cross-corelation numbers calculated from “Fit-in-map” tool in chimera and from “collage” function in Situs. We now indicate this step in the caption to Figure 4.

      Author response table 1.

      4) It would be great to see 2D class averages from the particles contributing to each of the 3D classes. Theoretically, a clear bright "blob" (hypothesized to be the PHTH domain) should be observable in the 2D class averages. In the current 2D class averages that region is unconvincingly weak. (Fig 4, S2).

      We attempted to improve both 2D and 3D reconstructitions by feeding the particles from each 3D class through many cycles of 2D classification and selection to exclude ‘bad’ paritcles, but neither the 2D class averages nor 3D reconstructions could be improved.

      We agree the feature that appears in the 2D class averages is weak. The BTK protein is only 77kD in size and is highly dynamic and flexible. Thus, in reality this is not an ideal system for cryo-EM. As well, the PHTH domain itself is quite small and NMR data, acquired in the context of a different project, provides evidence that the isolated PHTH domain is dynamic in solution (NMR linewidths vary throughout the protein suggesting intermediate exchange). Nevertheless, given the inability to capture the PHTH domain in crystal structures of full-llength BTK we reasoned that cryo-EM could provide some insight. In the future we anticipate building on these data to include inhibitory binding partners of BTK; however such an effort is beyond the scope of the current work.

      5) It seems like there was quite a large circular mask applied during 2D classification. Are authors confident that the weak density attributed to the PHTH domain is not neighboring particles making their way into the extraction box? It would be great if the authors would trim their particle stack with a very stringent interparticle distance cutoff (or report the cutoff in the manuscript if already done so) to minimize this possibility.

      We initially picked particles using a small radius (100 Å), and stringently selected 2D classes with particles that contained only density aligning to the core SH3-SH2-kinase domains. We found, however, that 3D ab initio reconstruction always resulted in an additional density located at different positions around the larger core density. The structure of a single BTK PHTH domain fits into that additional remote density. Given the additional density that consistently appeared in 3D reconstructions, we went back and picked particles using a larger circular mask (200 A). Subsequent 2D classification and 3D reconstruction from this analysis gave similar results and are presented in the manuscript.

      Regardless of the mask radius, we used stringent conditions for particle picking and checked for the presence of duplicates. An interparticle distance cutoff of 0.1 to 0.5 times the particle diameter was used and resulted in fewer number of particles, but the presence of the extended density remains. We also made use of template picking (2D class averages) to repick the particles and found no significant difference in the number of particles or quality of 2D classifications.

      6) The cryo-EM processing may benefit from more stringent particle picking. The authors picked over 2M particles from 750 micrographs which likely represents very heavy overpicking. I would encourage the authors to re-pick the micrographs with 2D class averages and use more stringent metrics to reduce the overpicking. This may result in higher-resolution reconstructions. (Fig 4, S2).

      This was an effort to maximize the number of particles extracted. After multiple rounds of 2D classification and selection to exclude empty and junk particles, the final number of particles selected for 3D ab-initio reconstructions were only 68,788, and only ~20K particles for each 3D reconstruction. Thus, we are not concerned that we overpicked particles. This approach is described in Supp Figure S2.

      7) The Dmax from SAXS for the Full Length BTK is at 190Å. It would be great if the authors could make a cartoon of what domain arrangement may satisfy this distance, as it is quite extended for such a small particle. Can the authors rule out dimerization at SAXS concentrations? (Fig 1).

      SAXS data for full-length, wild-type BTK has been previously published (Márquez et al, 2003 EMBO J. (2003) 22:4616-4624). Our data for WT BTK are consistent with that published previously (and we have cited this previous work). In that work, the authors attribute the ~200 Å Dmax value to an elongated BTK conformation where the domains of BTK are arranged in a linear fashion (a figure showing this domain arragement is provided by Marquez et al. precluding the need for such a cartoon here).

      In the present work we take advantage of targeted mutations to stabilize the autoinhibted SH2-SH2-kinase core and the Dmax value that we report for this more autoinhibited version of full-length BTK (FL 4P1F) is ~150Å. Notwithstanding low resolution in both SAXS and cryoEM, it is notable that superposition of the cryoEM models in Figure 4c & d gives a distance of ~150Å between the PHTH domains from the two models.

      Finally, we cannot completely rule out that a small fraction of full length BTK is forming dimers. However, in our experience purifying and working with this protein, we find that purified and concentrated monomeric fulllength Btk proteins (as high as 15mg/ml) are quite stable and remain monomeric and free of aggregation even after sitting at 4°C for more than a week. Here the BTK SAXS data were collected within 24 hours after the samples were thawed.

      8) In Figure S1 (C) it seems that the curves are just scattering curves with Guinier plots in the inserts, but are labeled as Guinier plots in the legend. The Guinier plots for some samples (FL 4P1F) show signs of aggregation, which may complicate the analysis, it could be beneficial to redo.

      We thank the reviewer for pointing out our mistake in presention of the SAXS data. We have now replaced plots in Figure S1c with the correct scattering profiles for each construct with the Guinier insets shown. We revised the label of this panel to “Scattering profile and Guinier plots (insets)”.

      In addition, we re-processed the FL 4P1F data by performing buffer subtraction (using a different buffer alone scattering dataset (also collected during original data acquisition)). The data quality after reprocessing were significantly improved (see new scattering profiles and Guinier plots for full-length BTK in Supplementary Figure S1). Protein stability (see above) and the current data quality therefore suggest that aggregation is not complicating the SAXS analysis.

      9) Have the authors verified that the activation loop mutations that they introduce do not disrupt the PHTH binding as they previously reported an activation loop on BTK to interact with PHTH, an interaction they do not see here? If so, a citation would be helpful in the text. If not, testing this would strengthen the paper.

      The same activation loop mutations were included in the constructs used in the previous solution studies of the PHTH/kinase domain interaction by NMR and HDX (see ref [11]). We clarify this point in the methods section. As well, all but one of the sequence changes introduced into the activation loop are at positions at the ‘base’ of the activation loop and therefore are not surface exposed. Only one amino acid change is on the exposed part of the activation loop (V555T).

      10) Can the authors comment on the surfaces which are accessible and inaccessible to the PHTH in the crystal (Fig 3E)? The fact that PHTH doesn't adopt a stable conformation in the solvent channel to some degree indicates that the accessible interaction surfaces are not suitable for PHTH interactions, as the "effective concentration" of the PHTH would be quite high. Are these surfaces consistent with the cryo-EM analysis?

      This is an excellent point and we did state the following in describing the crystallization results:

      “the crystallography results are consistent with a flexible N-terminal PHTH domain with the caveat that the domain swapped dimer organization might limit native autoinhibitory contacts between the PHTH and SH3SH2-kinase regions.”

      In the domain swapped dimer seen in the crystal, a symmetry related molecule does partially block the Ghelix region of the kinase domain while the activation loop and C-helix in the N-lobe remain accessible. Our previous solution studies (ref [11]) pointed to the G helix as part of the interaction interface in addition to the activation loop and part of the N-lobe. We have now modified the sentence above to more clearly describe which parts of the kinase domain are inaccessible in the crystal and the possible ramifications of the steric environment on PHTH domain mobility in the crystal (see pg. 10). That said, all of our previous HDX data shows little protection in the PHTH domain in full-length BTK (mapping of the PHTH/kinase interaction was only possible in trans using excess PHTH domain) and so our data can be best summarized by concluding that the PHTH domain visits a number of conformational states and makes transient contacts with various regions of the kinase domain (dependent upon whether the SH3-SH2 region is engaged or not). This is similar to the ‘fuzzy’ intramolecular contacts described for the N-terminal region of the SRC family. Like the SRC family, BTK (and other TEC kinases) contain a long disordered linker between the N-terminal region and the compact SH3-SH2-kinase core.

      11) For the novel active state dimer of the Kinase Domain it would be great to see some functional validation of the dimerization interface. It is structurally certainly quite suggestive, but without such experiments the functional significance is unclear. If appropriate mutations have been published previously a citation would be helpful.

      We completely agree. We scoured the literature and our own facuntional assay results over many years but the appropriate mutations to test the functional significance of the kinase domain dimer have not been reported or previously studied in our lab. We are therefore actively pursuing this line of investigation now.

      Reviewer #1 (Recommendations For The Authors):

      I have the following proposed experiments/analysis that should help.

      1) To better validate the putative PH-kinase interface seen, the authors should try some alphafold multimer / rosettaTTFold modelling of just the PHTH module with the kinase domain. The advantage of this is that it will test how conserved over evolution the potential interface is, and will help to decipher discrepancies between the two structures. This may end up being similar to what is seen in Akt (in this case the alphafold prediction does not match the allosteric inhibitor structure, or the nanobody bound structure), but this could help provide additional insight into how the PH domain interacts.

      We have applied alphafold to this system. The PHTH-kinase fusion sequence was fed to Alphafold and the separate PHTH and kinase domains to Aphafold multimer. The results provide a range of ‘complexes’ none of which recapitulate the PHTH/kinase interface reported here or that reported by Wang et al in previous work. Three of five results from Alphafold Multimer place the PHTH domain on the activation loop face of the kinase domain consistent with the previous solution data pointing to a similar regulatory interface. This is interesting but our experience in applying alphafold to dynamic confromationally heterogeneous systems is that the results need to be considered with caution. For that reason we did not include any of the alphafold predictions in the manuscript.

      Evolutionary conservation is discussed further in the next section:

      2) Could the authors provide a detailed evolutionarily analysis of the binding surface between the PHTH and kinase domains and include this in Fig5, this also would help interpret the likelihood of this interface.

      This is an excellent question and we have in fact previously published a detailed evolutionary analysis of the BTK kinase domain in collaboration with Kannan Natarajan (see Amatya et al., PNAS, 2019, [ref 11]). In that work we found that evolutionarily conserved residues on the kinase domain map to the activation loop face, supporting the solution data that the PHTH interacts with the kinase domain across the activation loop face. That work predated alphafold but it is interesting that, to the exent that alphafold predicts anything, it seems to converge on the PHTH domain containg the activation loop face.

      In the context of our current work, and this question from the reviewer, we re-examined the evolutionary anlysis carried out previously and find that BTK (or TEC family) specific residues on the kinase domain do not appear at the newly identified PHTH/kinase interface we report here. We could speculate that since the ‘back’ of the kinase domain N-lobe interacts with multiple binding partners (SH3, SH2-linker and PHTH) evolutionary pressures may have resulted in a certain degree of plasticity to allow recognition of multiple binding partners.

      Evolutionary analysis of the BTK PH domain was also carried out previously and shows that the conserved sites map to the phospholipid binding pocket of the PH domain. The analysis did not include TH domain residues. Since we find the TH domain contributes to the PHTH/kinase interface in our crystal structure, we do not have the data at this time to do a thourough anaylsis but we appreciate this comment and can address this in furture work with collaborators.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their careful read of our paper, and appreciate the thoughtful comments.

      Both reviewers agreed that our work had several major strengths: the large dataset collected in collaboration across ten labs, the streamlined processing pipelines, the release of code repositories, the multi-task neural network, and that we definitively determined that electrode placement is an important source of variability between datasets.

      However, a number of key potential improvements were noted: the reviewers felt that a more standard model-based characterization of single neuron responses would benefit our reproducibility analysis, that more detail was needed about the number of cells, sessions, and animals, and that more information was needed to allow users to deploy the RIGOR standards and to understand their relationship to other metrics in the field.

      We agree with these suggestions and have implemented many major updates in our revised manuscript. Some highlights include:

      (1)  A new regression analysis that specifies the response profile of each neuron, allowing a comparison of how similar these are across labs and areas (See Figure 7 in the new section, “Single neuron coefficients from a regression-based analysis are rep oducible across labs”);

      (2) A new decoding analysis (See Figure 9 in the section, “Decodability of task variables is consistent across labs, but varies by brain region”);

      (3) A new RIGOR notebook to ease useability;

      (4) A wealth of additional information about the cells, animals and sessions in each figure;

      (5) Many new additional figure panels in the main text and supplementary material to clarify the specific points raised by the reviewers.

      Again, we are grateful to the reviewers and editors for their helpful comments, which have significantly improved the work. We are hopeful that the many revisions we have implemented will be sufficient to change the “incomplete” designation that was originally assigned to the manuscript.

      Reviewer #1 (Public review):

      Summary:

      The authors explore a large-scale electrophysiological dataset collected in 10 labs while mice performed the same behavioral task, and aim to establish guidelines to aid reproducibility of results collected across labs. They introduce a series of metrics for quality control of electrophysiological data and show that histological verification of recording sites is important for interpreting findings across labs and should be reported in addition to planned coordinates. Furthermore, the authors suggest that although basic electrophysiology features were comparable across labs, task modulation of single neurons can be variable, particularly for some brain regions. The authors then use a multi-task neural network model to examine how neural dynamics relate to multiple interacting task- and experimenter-related variables, and find that lab-specific differences contribute little to the variance observed. Therefore, analysis approaches that account for correlated behavioral variables are important for establishing reproducible results when working with electrophysiological data from animals performing decision-making tasks. This paper is very well-motivated and needed. However, what is missing is a direct comparison of task modulation of neurons across labs using standard analysis practice in the fields, such as generalized linear model (GLM). This can potentially clarify how much behavioral variance contributes to the neural variance across labs; and more accurately estimate the scale of the issues of reproducibility in behavioral systems neuroscience, where conclusions often depend on these standard analysis methods.

      We fully agree that a comparison of task-modulation across labs is essential. To address this, we have performed two new analyses and added new corresponding figures to the main text (Figures 7 and 9). As the reviewer hoped, this analysis did indeed clarify how much behavioral variance contributes to the variance across labs. Critically, these analyses suggested that our results were more robust to reproducibility than the more traditional analyses would indicate.

      Additional details are provided below (See detailed response to R1P1b).

      Strengths:

      (1) This is a well-motivated paper that addresses the critical question of reproducibility in behavioural systems neuroscience. The authors should be commended for their efforts.

      (2) A key strength of this study comes from the large dataset collected in collaboration across ten labs. This allows the authors to assess lab-to-lab reproducibility of electrophysiological data in mice performing the same decision-making task.

      (3) The authors' attempt to streamline preprocessing pipelines and quality metrics is highly relevant in a field that is collecting increasingly large-scale datasets where automation of these steps is increasingly needed.

      (4) Another major strength is the release of code repositories to streamline preprocessing pipelines across labs collecting electrophysiological data.

      (5) Finally, the application of MTNN for characterizing functional modulation of neurons, although not yet widely used in systems neuroscience, seems to have several advantages over traditional methods.

      Thanks very much for noting these strengths of our work.

      Weaknesses:

      (1) In several places the assumptions about standard practices in the field, including preprocessing and analyses of electrophysiology data, seem to be inaccurately presented:

      a) The estimation of how much the histologically verified recording location differs from the intended recording location is valuable information. Importantly, this paper provides citable evidence for why that is important. However, histological verification of recording sites is standard practice in the field, even if not all studies report them. Although we appreciate the authors' effort to further motivate this practice, the current description in the paper may give readers outside the field a false impression of the level of rigor in the field.

      We agree that labs typically do perform histological verification. Still, our methods offer a substantial improvement over standard practice, and this was critical in allowing us to identify errors in targeting. For instance, we used new software, LASAGNA, which is an innovation over the traditional, more informal approach to localizing recording sites. Second, the requirement that two independent reviewers concur on each proposed location for a recording site is also an improvement over standard practice. Importantly, these reviewers use electrophysiological features to more precisely localize electrodes, when needed, which is an improvement over many labs. Finally, most labs use standard 2D atlases to identify recording location (a traditional approach); our use of a 3D atlas and a modern image registration pipeline has improved the accuracy of identifying the true placement of probes in 3D space.

      Importantly, we don’t necessarily advocate that all labs adopt our pipeline; indeed, this would be infeasible for many labs. Instead, our hope is that the variability in probe trajectory that we uncovered will be taken into account in future studies. Here are 3 example ways in which that could happen. First, groups hoping to target a small area for an experiment might elect to use a larger cohort than previously planned, knowing that some insertions will miss their target. Second, our observation that some targeting error arose because experimenters had to move probes due to blood vessels will impact future surgeries: when an experimenter realizes that a blood vessel is in the way, they might still re-position the probe, but they can also adjust its trajectory (e.g., changing the angle) knowing that even little nudges to avoid blood vessels can have a large impact on the resulting insertion trajectory. Third, our observation of a 7 degree deviation between stereotaxic coordinates and Allen Institute coordinates can be used for future trajectory planning steps to improve accuracy of placement. Uncovering this deviation required many insertions and our standardized pipeline, but now that it is known, it can be easily corrected without needing such a pipeline.

      We thank the reviewer for bringing up this issue and have added new text (and modified existing text) in the Discussion to highlight the innovations we introduced that allowed us to carefully quantify probe trajectory across labs (lines 500 - 515):

      “Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, an approach that greatly exceeds the histological analyses done by most individual labs. Our approach, which enables scalability and standardization across labs while minimizing subjective variability, revealed that much of the variance in targeting was due to the probe entry positions at the brain surface, which were randomly displaced across the dataset. … Detecting this offset relied on a large cohort size and an automated histological pipeline, but now that we have identified the offset, it can be easily accounted for by any lab. Specifically, probe angles must be carefully computed from the CCF, as the CCF and stereotaxic coordinate systems do not define the same coronal plane angle. Minimizing variance in probe targeting is another important element in increasing reproducibility, as slight deviations in probe entry position and angle can lead to samples from different populations of neurons. Collecting structural MRI data in advance of implantation could reduce targeting error, although this is infeasible for most labs. A more feasible solution is to rely on stereotaxic coordinates but account for the inevitable off-target measurements by increasing cohort sizes and adjusting probe angles when blood vessels obscure the desired location.”

      b) When identifying which and how neurons encode particular aspects of stimuli or behaviour in behaving animals (when variables are correlated by the nature of the animals behaviour), it has become the standard in behavioral systems neuroscience to use GLMs - indeed many labs participating in the IBL also has a long history of doing this (e.g., Steinmetz et al., 2019; Musall et al., 2023; Orsolic et al., 2021; Park et al., 2014). The reproducibility of results when using GLMs is never explicitly shown, but the supplementary figures to Figure 7 indicate that results may be reproducible across labs when using GLMs (as it has similar prediction performance to the MTNN). This should be introduced as the first analysis method used in a new dedicated figure (i.e., following Figure 3 and showing results of analyses similar to what was shown for the MTNN in Figure 7). This will help put into perspective the degree of reproducibility issues the field is facing when analyzing with appropriate and common methods. The authors can then go on to show how simpler approaches (currently in Figures 4 and 5) - not accounting for a lot of uncontrolled variabilities when working with behaving animals - may cause reproducibility issues.

      We fully agree with the reviewer's suggestion. We have addressed their concern by implementing a Reduced-Rank Regression (RRR) model, which builds upon and extends the principles of Generalized Linear Models (GLMs). The RRR model retains the core regression framework of GLMs while introducing shared, trainable temporal bases across neurons, enhancing the model’s capacity to capture the structure in neural activity (Posani, Wang, et al., bioRxiv, 2024). Importantly, Posani, Wang et al compared the predictive performance of GLMs vs the RRR model, and found that the RRR model provided (slightly) improved performance, so we chose the RRR approach here.

      We highlight this analysis in a new section (lines 350-377) titled, “Single neuron coefficients from a regression-based analysis are reproducible across labs”. This section includes an entirely new Figure (Fig. 7), where this new analysis felt most appropriate, since it is closer in spirit to the MTNN analysis that follows (rather than as a new Figure 3, as the reviewer suggested). As the reviewer hoped, this analysis provides some reassurance that including many variables when characterizing neural activity furnishes results with improved reproducibility. We now state this in the Results and the Discussion (line 456-457), highlighting that these analyses complement the more traditional selectivity analyses, and that using both methods together can be informative.

      When the authors introduce a neural network approach (i.e. MTNN) as an alternative to the analyses in Figures 4 and 5, they suggest: 'generalized linear models (GLMs) are likely too inflexible to capture the nonlinear contributions that many of these variables, including lab identity and spatial positions of neurons, might make to neural activity'). This is despite the comparison between MTNN and GLM prediction performance (Supplement 1 to Figure 7) showing that the MTNN is only slightly better at predicting neural activity compared to standard GLMs. The introduction of new models to capture neural variability is always welcome, but the conclusion that standard analyses in the field are not reproducible can be unfair unless directly compared to GLMs.

      In essence, it is really useful to demonstrate how different analysis methods and preprocessing approaches affect reproducibility. But the authors should highlight what is actually standard in the field, and then provide suggestions to improve from there.

      Thanks again for these comments. We have also edited the MTNN section slightly to accommodate the addition of the previous new RRR section (line 401-402).

      (2) The authors attempt to establish a series of new quality control metrics for the inclusion of recordings and single units. This is much needed, with the goal to standardize unit inclusion across labs that bypasses the manual process while keeping the nuances from manual curation. However, the authors should benchmark these metrics to other automated metrics and to manual curation, which is still a gold standard in the field. The authors did this for whole-session assessment but not for individual clusters. If the authors can find metrics that capture agreed-upon manual cluster labels, without the need for manual intervention, that would be extremely helpful for the field.

      We thank the reviewer for their insightful suggestions regarding benchmarking our quality control metrics against manual curation and other automated methods at the level of individual clusters. We are indeed, as the reviewer notes, publishing results from spike sorting outputs that have been automatically but not manually verified on a neuron-by-neuron basis. To get to the point where we trust these results to be of publishable quality, we manually reviewed hundreds of recordings and thousands of neurons, refining both the preprocessing pipeline and the single-unit quality metrics along the way. All clusters, both those passing QCs and those not passing QCs, are available to review with detailed plots and quantifications at https://viz.internationalbrainlab.org/app (turn on “show advanced metrics” in the upper right, and navigate to the plots furthest down the page, which are at the individual unit level). We would emphasize that these metrics are definitely imperfect (and fully-automated spike sorting remains a work in progress), but so is manual clustering. Our fully automated approach has the advantage of being fully reproducible, which is absolutely critical for the analyses in the present paper. Indeed, if we had actually done manual clustering or curation, one would wonder whether our results were actually reproducible independently. Nevertheless, it is not part of the present manuscript’s objectives to validate or defend these specific choices for automated metrics, which have been described in detail elsewhere (see our Spike Sorting whitepaper, https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_La boratory/19705522?file=49783080). It would be a valuable exercise to thoroughly compare these metrics against a careful, large, manually-curated set, but doing this properly would be a paper in itself and is beyond the scope of the current paper. We also acknowledge that our analyses studying reproducibility across labs could, in principle, result in more or less reproducibility under a different choice of metrics, which we now describe in the Discussion (line 469-470)”:

      “Another significant limitation of the analysis presented here is that we have not been able to assess the extent to which other choices of quality metrics and inclusion criteria might have led to greater or lesser reproducibility.”

      (3) With the goal of improving reproducibility and providing new guidelines for standard practice for data analysis, the authors should report of n of cells, sessions, and animals used in plots and analyses throughout the paper to aid both understanding of the variability in the plots - but also to set a good example.

      We wholeheartedly agree and have added the number of cells, mice and sessions for each figure. This information is included as new tabs in our quality control spreadsheet (https://docs.google.com/spreadsheets/d/1_bJLDG0HNLFx3SOb4GxLxL52H4R2uPRcpUlIw6n4 n-E/). This is referred to in line 158-159 (as well as its original location on line 554 in the section, “Quality control and data inclusion”).

      Other general comments:

      (1) In the discussion (line 383) the authors conclude: 'This is reassuring, but points to the need for large sample sizes of neurons to overcome the inherent variability of single neuron recording'. - Based on what is presented in this paper we would rather say that their results suggest that appropriate analytical choices are needed to ensure reproducibility, rather than large datasets - and they need to show whether using standard GLMs actually allows for reproducible results.

      Thanks. The new GLM-style RRR analysis in Figure 7, following the reviewer’s suggestion, does indeed indicate improved reproducibility across labs. As described above, we see this new analysis as complementary to more traditional analyses of neural selectivity and argue that the two can be used together. The new text (line 461) states:

      “This is reassuring, and points to the need for appropriate analytical choices to ensure reproducibility.”

      (2) A general assumption in the across-lab reproducibility questions in the paper relies on intralab variability vs across-lab variability. An alternative measure that may better reflect experimental noise is across-researcher variability, as well as the amount of experimenter experience (if the latter is a factor, it could suggest researchers may need more training before collecting data for publication). The authors state in the discussion that this is not possible. But maybe certain measures can be used to assess this (e.g. years of conducting surgeries/ephys recordings etc)?

      We agree that understanding experimenter-to-experimenter variability would be very interesting and indeed we had hoped to do this analysis for some time. The problem is that typically, each lab employed one trainee to conduct all the data collection. This prevents us from comparing outcomes from two different experimenters in the same lab. There are exceptions to this, such as the Churchland lab in which 3 personnel (two postdocs and a technician) collected the data. However, even this fortuitous situation did not lend itself well to assessing experimenter-to-experimenter variation: the Churchland lab moved from Cold Spring Harbor to UCLA during the data collection period, which might have caused variability that is totally independent of experimenter (e.g., different animal facilities). Further, once at UCLA, the postdoc and technician worked closely together- alternating roles in animal training, surgery and electrophysiology. We believe that the text in our current Discussion (line 465-468) accurately characterizes the situation:

      “Our experimental design precludes an analysis of whether the reproducibility we observed was driven by person-to-person standardization or lab-to-lab standardization. Most likely, both factors contributed: all lab personnel received standardized instructions for how to implant head bars and train animals, which likely reduced personnel-driven differences.”

      Quantifying the level of experience of each experimenter is an appealing idea and we share the reviewer’s curiosity about its impact on data quality. Unfortunately, quantifying experience is tricky. For instance, years of conducting surgeries is not an unambiguously determinable number. Would we count an experimenter who did surgery every day for a year as having the same experience as an experimenter who did surgery once/month for a year? Would we count a surgeon with expertise in other areas (e.g., windows for imaging) in the same way as surgeons with expertise in ephys-specific surgeries? Because of the ambiguities, we leave this analysis to be the subject of future work; this is now stated in the Discussion (line 476).

      (3) Figure 3b and c: Are these plots before or after the probe depth has been adjusted based on physiological features such as the LFP power? In other words, is the IBL electrophysiological alignment toolbox used here and is the reliability of location before using physiological criteria or after? Beyond clarification, showing both before and after would help the readers to understand how much the additional alignment based on electrophysiological features adjusts probe location. It would also be informative if they sorted these penetrations by which penetrations were closest to the planned trajectory after histological verification.

      The plots in Figure 3b and 3c reflect data after the probe depth has been adjusted based on electrophysiological features. This adjustment incorporates criteria such as LFP power and spiking activity to refine the trajectory and ensure precise alignment with anatomical landmarks. The trajectories have also been reviewed and confirmed by two independent reviewers. We have clarified this in line 180 and in the caption of Figure 3.

      To address this concern, we have added a new panel c in Figure 3 supplementary 1 (also shown below) that shows the LFP features along the probes prior to using the IBL alignment toolbox. We hope the reviewer agrees that a comparison of panels (a) and (c) below make clear the improvement afforded by our alignment tools.

      In Figure 3 and Figure 3 supplementary 1, as suggested, we have also now sorted the probes by those that were closest to the planned trajectory. This way of visualizing the data makes it clear that as the distance from the planned trajectory increases, the power spectral density in the hippocampal regions becomes less pronounced and the number of probes that have a large portion of the channels localized to VISa/am, LP and PO decreases. We have added text to the caption to describe this. We thank the reviewer for this suggestion and agree that it will help readers to understand how much the additional alignment (based on electrophysiological features) adjusts probe location.

      (4) In Figures 4 and 6: If the authors use a 0.05 threshold (alpha) and a cell simply has to be significant on 1/6 tests to be considered task modulated, that means that they have a false positive rate of ~30% (0.05*6=0.3). We ran a simple simulation looking for significant units (from random null distribution) from these criteria which shows that out of 100.000 units, 26500 units would come out significant (false error rate: 26.5%). That is very high (and unlikely to be accepted in most papers), and therefore not surprising that the fraction of task-modulated units across labs is highly variable. This high false error rate may also have implications for the investigation of the spatial position of task-modulated units (as effects of the spatial position may drown in falsely labelled 'task-modulated' cells).

      Thank you for this concern. The different tests were kept separate, so we did not consider a neuron modulated if it was significant in only one out of six tests, but instead we asked whether a neuron was modulated according to test one, whether it was modulated according to test two, etc., and performed further analyses separately for each test. Thus, we are only vulnerable to the ‘typical’ false positive rate of 0.05 for any given test. We made this clearer in the text (lines 232-236) and hope that the 5% false positive rate seems more acceptable.

      (5) The authors state from Figure 5b that the majority of cells could be well described by 2 PCs. The distribution of R2 across neurons is almost uniform, so depending on what R2 value one considers a 'good' description, that is the fraction of 'good' cells. Furthermore, movement onset has now been well-established to be affecting cells widely and in large fractions, so while this analysis may work for something with global influence - like movement - more sparsely encoded variables (as many are in the brain) may not be well approximated with this suggestion. The authors could expand this analysis into other epochs like activity around stimulus presentation, to better understand how this type of analysis reproduces across labs for features that have a less global influence.

      We thank the reviewer for the suggestion and fully agree that the window used in our original analysis would tend to favor movement-driven neurons. To address this, we repeated the analysis, this time using a window centered around stimulus onset (from -0.5 s prior to stimulus onset until 0.1 s after stimulus onset). As the reviewer suspected, far fewer neurons were active in this window and consequently far fewer were modelled well by the first two PCs, as shown in Author response image 1b (below). Similar to our original analysis using the post-movement window, we found mixed results for the stimulus-centered window across labs. Interestingly, regional differences were weaker in this new analysis compared to the original analysis of the post-movement window. We have added a sentence to the results describing this. Because the results are similar to the post-movement window main figure, we would prefer to restrict the new analysis only to this point-by-point response, in the hopes of streamlining the paper.

      Author response image 1.

      PCA analysis applied to a stimulus-aligned window ([-0.5, 0.1] sec relative to stim onset). Figure conventions as in main text Fig 5. Results are comparable to the post-movement window analysis, however regional differences are weaker here, possibly because fewer cells were active in the pre-movement window. We added panel j here and in the main figure, showing cell-number-controlled results. I.e. for each test, the minimum neuron number of the compared classes was sampled from all classes (say labs in a region), this sampling was repeated 1000 times and p-values combined via Fisher’s method, overall resulting in much fewer significant differences across laboratories and, independently, regions.

      (6) Additionally, in Figure 5i: could the finding that one can only distinguish labs when taking cells from all regions, simply be a result of a different number of cells recorded in each region for each lab? It makes more sense to focus on the lab/area pairing as the authors also do, but not to make their main conclusion from it. If the authors wish to do the comparison across regions, they will need to correct for the number of cells recorded in each region for each lab. In general, it was a struggle to fully understand the purpose of Figure 5. While population analysis and dimensionality reduction are commonplace, this seems to be a very unusual use of it.

      We agree that controlling for varying cell numbers is a valuable addition to this analysis. We added panel j in Fig. 5 showing cell-number-controlled test results of panel i. I.e. for a given statistical comparison, we sample the lowest number of cells of compared classes from the others, do the test, and repeat this sampling 1000 times, before combining the p-values using Fisher’s method. This cell-number controlled version of the tests resulted in clearly fewer significant differences across distributions - seen similarly for the pre-movement window shown in j in Author response image 1. We hope this clarified our aim to illustrate that low-dimensional embedding of cells’ trial-averaged activity can show how regional differences compare with laboratory differences.

      As a complementary statistical analysis to the shown KS tests, we fitted a linear-mixed-effects model (statsmodels.formula.api mixedlm), to the first and second PC for both activity windows (“Move”: [-0.5,1] first movement aligned; “Stim”: [-0.5,0.1] stimulus onset aligned), independently. Author response image 2 (in this rebuttal only) is broadly in line with the KS results, showing more regional than lab influences on the distributions of first PCs for the post-movement window.

      Author response image 2:

      Linear mixed effects model results for two PCs and two activity windows. For the post-movement window (“Move”), regional influences are significant (red color in plots) for all but one region while only one lab has a significant model coefficient for PC1. For PC2 more labs and three regions have significant coefficients. For the pre-movement window (“Stim”) one region for PC1 or PC2 has significant coefficients. The variance due to session id was smaller than all other effects (“eids Var”). “Intercept” shows the expected value of the response variable (PC1, PC2) before accounting for any fixed or random effects. All p-values were grouped as one hypothesis family and corrected for multiple comparisons via Benjamini-Hochberg.

      (7) In the discussion the authors state: " Indeed this approach is a more effective and streamlined way of doing it, but it is questionable whether it 'exceeds' what is done in many labs.

      Classically, scientists trace each probe manually with light microscopy and designate each area based on anatomical landmarks identified with nissl or dapi stains together with gross landmarks. When not automated with 2-PI serial tomography and anatomically aligned to a standard atlas, this is a less effective process, but it is not clear that it is less precise, especially in studies before neuropixels where active electrodes were located in a much smaller area. While more effective, transforming into a common atlas does make additional assumptions about warping the brain into the standard atlas - especially in cases where the brain has been damaged/lesioned. Readers can appreciate the effectiveness and streamlining provided by these new tools without the need to invalidate previous approaches.

      We thank the reviewer for highlighting the effectiveness of manual tracing methods used traditionally. Our intention in the statement was not to invalidate the precision or value of these classical methods but rather to emphasize the scalability and streamlining offered by our pipeline. We have revised the language to more accurately reflect this (line 500-504):

      “Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, an approach that greatly exceeds the histological analyses done by most individual labs. Our approach, which enables scalability and standardization across labs while minimizing subjective variability, revealed that much of the variance in targeting was due to the probe entry positions at the brain surface, which were randomly displaced across the dataset.”

      (8) What about across-lab population-level representation of task variables, such as in the coding direction for stimulus or choice? Is the general decodability of task variables from the population comparable across labs?

      Excellent question, thanks! We have added the new section “Decodability of task variables is consistent across labs, but varies by brain region” (line 423-448) and Figure 9 in the revised manuscript to address this question. In short, yes, the general decodability of task variables from the population is comparable across labs, providing additional reassurance of reproducibility.

      Reviewer #2 (Public review):

      Summary:

      The authors sought to evaluate whether observations made in separate individual laboratories are reproducible when they use standardized procedures and quality control measures. This is a key question for the field. If ten systems neuroscience labs try very hard to do the exact same experiment and analyses, do they get the same core results? If the answer is no, this is very bad news for everyone else! Fortunately, they were able to reproduce most of their experimental findings across all labs. Despite attempting to target the same brain areas in each recording, variability in electrode targeting was a source of some differences between datasets.

      Major Comments:

      The paper had two principal goals:

      (1) to assess reproducibility between labs on a carefully coordinated experiment

      (2) distill the knowledge learned into a set of standards that can be applied across the field.

      The manuscript made progress towards both of these goals but leaves room for improvement.

      (1) The first goal of the study was to perform exactly the same experiment and analyses across 10 different labs and see if you got the same results. The rationale for doing this was to test how reproducible large-scale rodent systems neuroscience experiments really are. In this, the study did a great job showing that when a consortium of labs went to great lengths to do everything the same, even decoding algorithms could not discern laboratory identity was not clearly from looking at the raw data. However, the amount of coordination between the labs was so great that these findings are hard to generalize to the situation where similar (or conflicting!) results are generated by two labs working independently.

      Importantly, the study found that electrode placement (and thus likely also errors inherent to the electrode placement reconstruction pipeline) was a key source of variability between datasets. To remedy this, they implemented a very sophisticated electrode reconstruction pipeline (involving two-photon tomography and multiple blinded data validators) in just one lab-and all brains were sliced and reconstructed in this one location. This is a fantastic approach for ensuring similar results within the IBL collaboration, but makes it unclear how much variance would have been observed if each lab had attempted to reconstruct their probe trajectories themselves using a mix of histology techniques from conventional brain slicing, to light sheet microscopy, to MRI imaging.

      This approach also raises a few questions. The use of standard procedures, pipelines, etc. is a great goal, but most labs are trying to do something unique with their setup. Bigger picture, shouldn't highly "significant" biological findings akin to the discovery of place cells or grid cells, be so clear and robust that they can be identified with different recording modalities and analysis pipelines?

      We agree, and hope that this work may help readers understand what effect sizes may be considered “clear and robust” from datasets like these. We certainly support the reviewer’s point that multiple approaches and modalities can help to confirm any biological findings, but we would contend that a clear understanding of the capabilities and limitations of each approach is valuable, and we hope that our paper helps to achieve this.

      Related to this, how many labs outside of the IBL collaboration have implemented the IBL pipeline for their own purposes? In what aspects do these other labs find it challenging to reproduce the approaches presented in the paper? If labs were supposed to perform this same experiment, but without coordinating directly, how much more variance between labs would have been seen? Obviously investigating these topics is beyond the scope of this paper. The current manuscript is well-written and clear as is, and I think it is a valuable contribution to the field. However, some additional discussion of these issues would be helpful.

      We thank the reviewer for raising this important issue. We know of at least 13 labs that have implemented the behavioral task software and hardware that we published in eLife in 2021, and we expect that over the next several years labs will also implement these analysis pipelines (note that it is considerably cheaper and faster to implement software pipelines than hardware). In particular, a major goal of the staff in the coming years is to continue and improve the support for pipeline deployment and use. However, our goal in this work, which we have aimed to state more clearly in the revised manuscript, was not so much to advocate that others adopt our pipeline, but instead to use our standardized approach as a means of assessing reproducibility under the best of circumstances (see lines 48-52): “A high level of reproducibility of results across laboratories when procedures are carefully matched is a prerequisite to reproducibility in the more common scenario in which two investigators approach the same high-level question with slightly different experimental protocols.”

      Further, a number of our findings are relevant to other labs regardless of whether they implement our exact pipeline, a modified version of our pipeline, or something else entirely. For example, we found probe targeting to be a large source of variability. Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, but now that we have identified the offset, it can be easily accounted for by any lab. Specifically, probe angles must be carefully computed from the CCF, as the CCF and stereotaxic coordinate systems do not define the same coronal plane angle. Relatedly, we found that slight deviations in probe entry position can lead to samples from different populations of neurons. Although this took large cohort sizes to discover, knowledge of this discovery means that future experiments can plan for larger cohort sizes to allow for off-target trajectories, and can re-compute probe angle when the presence of blood vessels necessitates moving probes slightly. These points are now highlighted in the Discussion (lines 500-515).

      Second, the proportion of responsive neurons (a quantity often used to determine that a particular area subserves a particular function), sometimes failed to reproduce across labs. For example, for movement-driven activity in PO, UCLA reported an average change of 0 spikes/s, while CCU reported a large and consistent change (Figure 4d, right most panel, compare orange vs. yellow traces). This argues that neuron-to-neuron variability means that comparisons across labs require large cohort sizes. A small number of outlier neurons in a session can heavily bias responses. We anticipate that this problem will be remedied as tools for large scale neural recordings become more widely used. Indeed, the use of 4-shank instead of single-shank Neuropixels (as we used here) would have greatly enhanced the number of PO neurons we measured in each session. We have added new text to Results explaining this (lines 264-268):

      “We anticipate that the feasibility of even larger scale recordings will make lab-to-lab comparisons easier in future experiments; multi-shank probes could be especially beneficial for cortical recordings, which tend to be the most vulnerable to low cell counts since the cortex is thin and is the most superficial structure in the brain and thus the most vulnerable to damage. Analyses that characterize responses to multiple parameters are another possible solution (See Figure 7).”

      (2) The second goal of the study was to present a set of data curation standards (RIGOR) that could be applied widely across the field. This is a great idea, but its implementation needs to be improved if adoption outside of the IBL is to be expected. Here are three issues:

      (a) The GitHub repo for this project (https://github.com/int-brain-lab/paper-reproducible-ephys/) is nicely documented if the reader's goal is to reproduce the figures in the manuscript. Consequently, the code for producing the RIGOR statistics seems mostly designed for re-computing statistics on the existing IBL-formatted datasets. There doesn't appear to be any clear documentation about how to run it on arbitrary outputs from a spike sorter (i.e. the inputs to Phy).

      We agree that clear documentation is key for others to adopt our standards. To address this, we have added a section at the end of the README of the repository that links to a jupyter notebook (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/master/RIGOR_script.ipynb) that runs the RIGOR metrics on a user’s own spike sorted dataset. The notebook also contains a tutorial that walks through how to visually assess the quality of the raw and spike sorted data, and computes the noise level metrics on the raw data as well as the single cell metrics on the spike sorted data.

      (b) Other sets of spike sorting metrics that are more easily computed for labs that are not using the IBL pipeline already exist (e.g. "quality_metrics" from the Allen Institute ecephys pipeline [https://github.com/AllenInstitute/ecephys_spike_sorting/blob/main/ecephys_spike_sorting/m odules/quality_metrics/README.md] and the similar module in the Spike Interface package [https://spikeinterface.readthedocs.io/en/latest/modules/qualitymetrics.html]). The manuscript does not compare these approaches to those proposed here, but some of the same statistics already exist (amplitude cutoff, median spike amplitude, refractory period violation).

      There is a long history of researchers providing analysis algorithms and code for spike sorting quality metrics, and we agree that the Allen Institute’s ecephys code and the Spike Interface package are the current options most widely used (but see also, for example, Fabre et al. https://github.com/Julie-Fabre/bombcell). Our primary goal in the present work is not to advocate for a particular implementation of any quality metrics (or any spike sorting algorithm, for that matter), but instead to assess reproducibility of results, given one specific choice of spike sorting algorithm and quality metrics. That is why, in our comparison of yield across datasets (Fig 1F), we downloaded the raw data from those comparison datasets and re-ran them under our single fixed pipeline, to establish a fair standard of comparison. A full comparison of the analyses presented here under different choices of quality metrics and spike sorting algorithms would undoubtedly be interesting and useful for the field - however, we consider it to be beyond the scope of the present work. It is therefore an important assumption of our work that the result would not differ materially under a different choice of sorting algorithm and quality metrics. We have added text to the Discussion to clarify this limitation:

      “Another significant limitation of the analysis presented here is that we have not been able to assess the extent to which other choices of quality metrics and inclusion criteria might have led to greater or lesser reproducibility.”

      That said, we still intend for external users to be able to easily run our pipelines and quality metrics.

      (c) Some of the RIGOR criteria are qualitative and must be visually assessed manually. Conceptually, these features make sense to include as metrics to examine, but would ideally be applied in a standardized way across the field. The manuscript doesn't appear to contain a detailed protocol for how to assess these features. A procedure for how to apply these criteria for curating non-IBL data (or for implementing an automated classifier) would be helpful.

      We agree. To address this, we have provided a notebook that runs the RIGOR metrics on a user’s own dataset, and contains a tutorial on how to interpret the resulting plots and metrics (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/master/RIGOR_script.ipynb).

      Within this notebook there is a section focused on visually assessing the quality of both the raw data and the spike sorted data. The code in this section can be used to generate plots, such as raw data snippets or the raster map of the spiking activity, which are typically used to visually assess the quality of the data. In Figure 1 Supplement 2 we have provided examples of such plots that show different types of artifactual activity that should be inspected.

      Other Comments:

      (1) How did the authors select the metrics they would use to evaluate reproducibility? Was this selection made before doing the study?

      Our metrics were selected on the basis of our experience and expertise with extracellular electrophysiology. For example: some of us previously published on epileptiform activity and its characteristics in some mice (Steinmetz et al. 2017), so we included detection of that type of artifact here; and, some of us previously published detailed investigations of instability in extracellular electrophysiological recordings and methods for correcting them (Steinmetz et al. 2021, Windolf et al. 2024), so we included assessment of that property here. These metrics therefore represent our best expert knowledge about the kinds of quality issues that can affect this type of dataset, but it is certainly possible that future investigators will discover and characterize other quality issues.

      The selection of metrics was primarily performed before the study (we used these assessments internally before embarking on the extensive quantifications reported here), and in cases where we refined them further during the course of preparing this work, it was done without reference to statistical results on reproducibility but instead on the basis of manual inspection of data quality and metric performance.

      (2) Was reproducibility within-lab dependent on experimenter identity?

      We thank the reviewer for this question. We have addressed it in our response to R1 General comment 2, as follows:

      We agree that understanding experimenter-to-experimenter variability would be very interesting and indeed we had hoped to do this analysis for some time. The problem is that typically, each lab employed one trainee to conduct all the data collection. This prevents us from comparing outcomes from two different experimenters in the same lab. There are exceptions to this, such as the Churchland lab in which 3 personnel (two postdocs and a technician) collected the data. However, even this fortuitous situation did not lend itself well to assessing experimenter-to-experimenter variation: the Churchland lab moved from Cold Spring Harbor to UCLA during the data collection period, which might have caused variability that is totally independent of experimenter (e.g., different animal facilities). Further, once at UCLA, the postdoc and technician worked closely together- alternating roles in animal training, surgery and electrophysiology. We believe that the text in our current Discussion (line 465-468) accurately characterizes the situation:

      “Our experimental design precludes an analysis of whether the reproducibility we observed was driven by person-to-person standardization or lab-to-lab standardization. Most likely, both factors contributed: all lab personnel received standardized instructions for how to implant head bars and train animals, which likely reduced personnel-driven differences.”

      Quantifying the level of experience of each experimenter is an appealing idea and we share the reviewer’s curiosity about its impact on data quality. Unfortunately, quantifying experience is tricky. For instance, years of conducting surgeries is not an unambiguously determinable number. Would we count an experimenter who did surgery every day for a year as having the same experience as an experimenter who did surgery once/month for a year? Would we count a surgeon with expertise in other areas (e.g., windows for imaging) in the same way as surgeons with expertise in ephys-specific surgeries? Because of the ambiguities, we leave this analysis to be the subject of future work; this is now stated in the Discussion (line 476).

      (3) They note that UCLA and UW datasets tended to miss deeper brain region targets (lines 185-188) - they do not speculate why these labs show systematic differences. Were they not following standardized procedures?

      Thank you for raising this point. All researchers across labs were indeed following standardised procedures. We note that our statistical analysis of probe targeting coordinates and angles did not reveal a significant effect of lab identity on targeting error, even though we noted the large number of mis-targeted recordings in UCLA and UW to help draw attention to the appropriate feature in the figure. Given that these differences were not statistically significant, we can see how it was misleading to call out these two labs specifically. While the overall probe placement surface error and angle error both show no such systematic difference, the magnitude of surface error showed a non-significant tendency to be higher for samples in UCLA & UW, which, compounded with the direction of probe angle error, caused these probe insertions to land in a final location outside LP & PO.

      This shows how subtle differences in probe placement & angle accuracy can lead to compounded inaccuracies at the probe tip, especially when targeting deep brain regions, even when following standard procedures. We believe this is driven partly by the accuracy limit or resolution of the stereotaxic system, along with slight deviations in probe angle, occurring during the setup of the stereotaxic coordinate system during these recordings.

      We have updated the relevant text in lines 187-190 as follows, to clarify:

      “Several trajectories missed their targets in deeper brain regions (LP, PO), as indicated by gray blocks, despite the lack of significant lab-dependent effects in targeting as reported above. These off-target trajectories tended to have both a large displacement from the target insertion coordinates and a probe angle that unfavorably drew the insertions away from thalamic nuclei (Figure 2f).”

      (4) The authors suggest that geometrical variance (difference between planned and final identified probe position acquired from reconstructed histology) in probe placement at the brain surface is driven by inaccuracies in defining the stereotaxic coordinate system, including discrepancies between skull landmarks and the underlying brain structures. In this case, the use of skull landmarks (e.g. bregma) to determine locations of brain structures might be unreliable and provide an error of ~360 microns. While it is known that there is indeed variance in the position between skull landmarks and brain areas in different animals, the quantification of this error is a useful value for the field.

      We thank the reviewer for their thoughtful comment and are glad that they found the quantification of variance useful for the field.

      (5) Why are the thalamic recording results particularly hard to reproduce? Does the anatomy of the thalamus simply make it more sensitive to small errors in probe positioning relative to the other recorded areas?

      We thank the reviewer for raising this interesting question. We believe that they are referring to Figure 4: indeed when we analyzed the distribution of firing rate modulations, we saw some failures of reproducibility in area PO (bottom panel, Figure 4h). However, the thalamic nuclei were not, in other analyses, more vulnerable to failures in reproducibility. For example, in the top panel of Figure 4h, VisAM shows failures of reproducibility for modulation by the visual stimulus. In Fig. 5i, area CA1 showed a failure of reproducibility. We fear that the figure legend title in the previous version (which referred to the thalamus specifically) was misleading, and we have revised this. The new title is, “Neural activity is modulated during decision-making in five neural structures and is variable between laboratories.” This new text more accurately reflects that there were a number of small, idiosyncratic failures of reproducibility, but that these were not restricted to a specific structure. The new analysis requested by R1 (now in Figure 7) provides further reassurance of overall reproducibility, including in the thalamus (see Fig. 7a, right panels; lab identity could not be decoded from single neuron metrics, even in the thalamus).

      Reviewer #1 (Recommendations for the authors):

      (1) Figure font sizes and formatting are variable across panels and figures. Please streamline the presentation of results.

      Thank you for your feedback. We have remade all figures with the same standardized font sizes and formatting.

      (2) Please correct the noncontinuous color scales in Figures 3b and 3d.

      Thank you for pointing this out, we fixed the color bar.

      (3) In Figures 5d and g, the error bars are described as: 'Error bands are standard deviation across cells normalised by the square root of the number of sessions in the region'. How does one interpret this error? It seems to be related to the standard error of the mean (std/sqrt(n)) but instead of using the n from which the standard deviation is calculated (in this case across cells), the authors use the number of sessions as n. If they took the standard deviation across sessions this would be the sem across sessions, and interpretable (as sem*1.96 is the 95% parametric confidence interval of the mean). Please justify why these error bands are used here and how they can be interpreted - it also seems like it is the only time these types of error bands are used.

      We agree and for clarity use standard error across cells now, as the error bars do not change dramatically either way.

      (4) It is difficult to understand what is plotted in Figures 5e,h, please unpack this further and clarify.

      Thank you for pointing this out. We have added additional explanation in the figure caption (See caption for Figure 5c) to explain the KS test.

      (5) In lines 198-201 the authors state that they were worried that Bonferroni correction with 5 criteria would be too lenient, and therefore used 0.01 as alpha. I am unsure whether the authors mean that they are correcting for multiple comparisons across features or areas. Either way, 0.01 alpha is exactly what a Bonferroni corrected alpha would be when correcting for either 5 features or 5 areas: 0.05/5=0.01. Or do they mean they apply the Bonferroni correction to the new 0.01 alpha: i.e., 0.01/5=0.002? Please clarify.

      Thank you, that was indeed written confusingly. We considered all tests and regions as whole, so 7 tests * 5 regions = 35 tests, which would result in a very strong Bonferroni correction. Indeed, if one considers the different tests individually, the correction we apply from 0.05 to 0.01 can be considered as correcting for the number of regions, which we now highlight better. We apply no further corrections of any kind to our alpha=0.01. We clarified this in the manuscript in all relevant places (lines 205-208, 246, 297-298, and 726-727).

      (6) Did the authors take into account how many times a probe was used/how clean the probe was before each recording. Was this streamlined between labs? This can have an effect on yield and quality of recording.

      We appreciate the reviewer highlighting the potential impact of probe use and cleanliness on recording quality and yield. While we did not track the number of times each probe was used, we ensured that all probes were cleaned thoroughly after each use using a standardized cleaning protocol (Section 16: Cleaning the electrode after data acquisition in Appendix 2: IBL protocol for electrophysiology recording using Neuropixels probe). We acknowledge that tracking the specific usage history of each probe could provide additional insights, but unfortunately we did not track this information for this project. In prior work the re-usability of probes has been quantified, showing insignificant degradation with use (e.g. Extended Data Fig 7d from Jun et al. 2017).

      (7) Figure 3, Supplement1: DY_013 missed DG entirely? Was this included in the analysis?

      Thank you for this question. We believe the reviewer is referring to the lack of a prominent high-amplitude LFP band in this mouse, and lack of high-quality sorted units in that region. Despite this, our histology did localize the recording trajectory to DG. This recording did pass our quality control criteria overall, as indicated by the green label, and was used in relevant analyses.

      The lack of normal LFP features and neuron yield might reflect the range of biological variability (several other sessions also have relatively weak DG LFP and yield, though DY_013 is the weakest), or could reflect some damage to the tissue, for example as caused by local bleeding. Because we could not conclusively identify the source of this observation, we did not exclude it.

      (8) Given that the authors argue for using the MTNN over GLMs, it would be useful to know exactly how much better the MTNN is at predicting activity in the held-out dataset (shown in Figure 7, Supplement 1). It looks like a very small increase in prediction performance between MTNN and GLMs, is it significantly different?

      The average variance explained on the held-out dataset, as shown in Figure 8–Figure Supplement 1 Panel B, is 0.065 for the GLMs and 0.071 for the MTNN. As the reviewer correctly noted, this difference is not significant. However, one of the key advantages of the MTNN over GLMs lies in its flexibility to easily incorporate covariates, such as electrophysiological characteristics or session/lab IDs, directly into the analysis. This feature is particularly valuable for assessing effect sizes and understanding the contributions of various factors.

      (9) In line 723: why is the threshold for mean firing rate for a unit to be included in the MTNN results so high (>5Hz), and how does it perform on units with lower firing rates?      

      We thank the reviewer for pointing this out. The threshold for including units with a mean firing rate above 5 Hz was set because most units with firing rates below this threshold were silent in many trials, and reducing the number of units helped keep the MTNN training time reasonable. Based on this comment, we ran the MTNN experiments including all units with firing rates above 1 Hz, and the results remained consistent with our previous conclusions (Figure 8). Crucially, the leave-one-out analysis consistently showed that lab and session IDs had effect sizes close to zero, indicating that both within-lab and between-lab random effects are small and comparable.

      Reviewer #2 (Recommendations for the authors):

      (1) Most of the more major issues were already listed in the above comments. The strongest recommendation for additional work would be to improve the description and implementation of the RIGOR statistics such that non-IBL labs that might use Neuropixels probes but not use the entire IBL pipeline might be able to apply the RIGOR framework to their own data.

      We thank the reviewer for highlighting the importance of making the RIGOR statistics more accessible to a broader audience. We agree that improving the description and implementation of the RIGOR framework is essential for facilitation of non-IBL labs using Neuropixels probes. To address this we created a jupyter notebook with step-by-step guidance that is not dependent on the IBL pipeline. This tool (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/develop/RIGOR_script.ipynb) is publicly available through the repository, accompanied by example datasets and usage tutorials.

      (2) Table 1: How are qualitative features like "drift" defined? Some quantitative statistics like "presence ratio" (the fraction of the dataset where spikes are present) already exist in packages like ecephys_spike_sorting. Who measured these qualitative features? What are the best practices for doing these qualitative analyses?

      At the probe level, we compute the estimate of the relative motion of the electrodes to the brain tissue at multiple depths along the electrode. We overlay the drift estimation over a raster plot to detect sharp displacements as a function of time. Quantitatively, the drift is the cumulative absolute electrode motion estimated during spike sorting (µm). We clarified the corresponding text in Table 1.

      The qualitative assessments were carried out by IBL staff and experimentalists. We have now provided code to run the RIGOR metrics along with an embedded tutorial, to complement the supplemental figures we have shown about qualitative metric interpretation.

      (3) Table 1: What are the units for the LFP derivative?

      We thank the reviewer for noting that the unit was missing. The unit (decibel per unit of space) is now in the table.

      (4) Table 1: For "amplitude cutoff", the table says that "each neuron must pass a metric". What is the metric?

      We have revised the table to include this information. This metric was designed to detect potential issues in amplitude distributions caused by thresholding during deconvolution, which could result in missed spikes. There are quantitative thresholds on the distribution of the low tail of the amplitude histogram relative to the high tail, and on the relative magnitude of the bins in the low tail. We now reference the methods text from the table, which includes a more extended description and gives the specific threshold numbers. Also, the metric and thresholds are more easily understood with graphical assistance; see the IBL Spike Sorting Whitepaper for this (Fig. 17 in that document and nearby text; https://doi.org/10.6084/m9.figshare.19705522.v4). This reference is now also cited in the text.

      (5) Figure 2: In panel A, the brain images look corrupted.

      Thanks; in the revised version we have changed the filetype to improve the quality of the panel image.

      (6) Figure 7: In panel D, make R2 into R^2 (with a superscript)

      Panel D y-axis label has been revised to include superscript (note that this figure is now Figure 8).

      Works Cited

      Julie M.J. Fabre, Enny H. van Beest, Andrew J. Peters, Matteo Carandini, and Kenneth D. Harris. Bombcell: automated curation and cell classification of spike-sorted electrophysiology data, July 2023. URL https://doi.org/10.5281/zenodo.8172822.

      James J. Jun, Nicholas A. Steinmetz, Joshua H. Siegle, Daniel J. Denman, Marius Bauza, Brian Barbarits, Albert K. Lee, Costas A. Anastassiou, Alexandru Andrei, C¸ a˘gatayAydın, Mladen Barbic, Timothy J. Blanche, Vincent Bonin, Jo˜ao Couto, Barundeb Dutta, Sergey L. Gratiy, Diego A. Gutnisky, Michael H¨ausser, Bill Karsh, Peter Ledochowitsch, Carolina Mora Lopez, Catalin Mitelut, Silke Musa, Michael Okun, Marius Pachitariu, Jan Putzeys, P. Dylan Rich, Cyrille Rossant, Wei-lung Sun, Karel Svoboda, Matteo Carandini, Kenneth D. Harris, Christof Koch, John O’Keefe, and Timothy D.Harris. Fully integrated silicon probes for high-density recording of neural activity.Nature, 551(7679):232–236, Nov 2017. ISSN 1476-4687. doi: 10.1038/nature24636. URL https://doi.org/10.1038/nature24636.

      Simon Musall, Xiaonan R. Sun, Hemanth Mohan, Xu An, Steven Gluf, Shu-Jing Li, Rhonda Drewes, Emma Cravo, Irene Lenzi, Chaoqun Yin, Bj¨orn M. Kampa, and Anne K. Churchland. Pyramidal cell types drive functionally distinct cortical activity patterns during decision-making. Nature Neuroscience, 26(3):495– 505, Mar 2023. ISSN 1546-1726. doi: 10.1038/s41593-022-01245-9. URL https://doi.org/10.1038/s41593-022-01245-9.

      Ivana Orsolic, Maxime Rio, Thomas D Mrsic-Flogel, and Petr Znamenskiy. Mesoscale cortical dynamics reflect the interaction of sensory evidence and temporal expectation during perceptual decision-making. Neuron, 109(11):1861–1875.e10, April 2021. Hyeong-Dong Park, St´ephanie Correia, Antoine Ducorps, and Catherine Tallon-Baudry.Spontaneous fluctuations in neural responses to heartbeats predict visual detection.Nature Neuroscience, 17(4):612–618, Apr 2014. ISSN 1546-1726. doi: 10.1038/nn.3671. URL https://doi.org/10.1038/nn.3671.

      Lorenzo Posani, Shuqi Wang, Samuel Muscinelli, Liam Paninski, and Stefano Fusi. Rarely categorical, always high-dimensional: how the neural code changes along the cortical hierarchy. bioRxiv, 2024. doi: 10.1101/2024.11.15.623878. URL https://www.biorxiv.org/content/early/2024/12/09/2024.11.15.623878.

      Nicholas A. Steinmetz, Christina Buetfering, Jerome Lecoq, Christian R. Lee, Andrew J. Peters, Elina A. K. Jacobs, Philip Coen, Douglas R. Ollerenshaw, Matthew T. Valley, Saskia E. J. de Vries, Marina Garrett, Jun Zhuang, Peter A. Groblewski, Sahar Manavi, Jesse Miles, Casey White, Eric Lee, Fiona Griffin, Joshua D. Larkin, Kate Roll, Sissy Cross, Thuyanh V. Nguyen, Rachael Larsen, Julie Pendergraft, Tanya Daigle, Bosiljka Tasic, Carol L. Thompson, Jack Waters, Shawn Olsen, David J. Margolis, Hongkui Zeng, Michael Hausser, Matteo Carandini, and Kenneth D. Harris. Aberrant cortical activity in multiple gcamp6-expressing transgenic mouse lines. eNeuro, 4(5), 2017. doi: 10.1523/ENEURO.0207-17.2017. URL https://www.eneuro.org/content/4/5/ENEURO.0207-17.2017.

      Nicholas A. Steinmetz, Peter Zatka-Haas, Matteo Carandini, and Kenneth D. Harris. Distributed coding of choice, action and engagement across the mouse brain. Nature, 576(7786):266–273, Dec 2019. ISSN 1476-4687. doi: 10.1038/s41586-019-1787-x. URL https://doi.org/10.1038/s41586-019-1787-x.

      Nicholas A. Steinmetz, Cagatay Aydin, Anna Lebedeva, Michael Okun, Marius Pachitariu, Marius Bauza, Maxime Beau, Jai Bhagat, Claudia B¨ohm, Martijn Broux, Susu Chen, Jennifer Colonell, Richard J. Gardner, Bill Karsh, Fabian Kloosterman, Dimitar Kostadinov, Carolina Mora-Lopez, John O’Callaghan, Junchol Park, Jan Putzeys, Britton Sauerbrei, Rik J. J. van Daal, Abraham Z. Vollan, Shiwei Wang, Marleen Welkenhuysen, Zhiwen Ye, Joshua T. Dudman, Barundeb Dutta, Adam W. Hantman,Kenneth D. Harris, Albert K. Lee, Edvard I. Moser, John O’Keefe, Alfonso Renart, Karel Svoboda, Michael H¨ausser, Sebastian Haesler, Matteo Carandini, and Timothy D. Harris. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science, 372(6539):eabf4588, 2021. doi: 10.1126/science.abf4588.URL https://www.science.org/doi/abs/10.1126/science.abf4588.

      Charlie Windolf, Han Yu, Angelique C. Paulk, Domokos Mesz´ena, William Mu˜noz, Julien Boussard, Richard Hardstone, Irene Caprara, Mohsen Jamali, Yoav Kfir, Duo Xu, Jason E. Chung, Kristin K. Sellers, Zhiwen Ye, Jordan Shaker, Anna Lebedeva, Manu Raghavan, Eric Trautmann, Max Melin, Jo˜ao Couto, Samuel Garcia, Brian Coughlin, Csaba Horv´ath, Rich´ard Fi´ath, Istv´an Ulbert, J. Anthony Movshon, Michael N. Shadlen, Mark M. Churchland, Anne K. Churchland, Nicholas A. Steinmetz, Edward F. Chang, Jeffrey S. Schweitzer, Ziv M. Williams, Sydney S. Cash, Liam Paninski, and Erdem Varol. Dredge: robust motion correction for high-density extracellular recordings across species. bioRxiv, 2023. doi: 10.1101/2023.10.24.563768. URL https://www.biorxiv.org/content/early/2023/10/29/2023.10.24.563768.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study extends the previous interesting work of this group to address the potentially differential control of movement and posture. Their earlier work explored a broad range of data to make the case for a downstream neural integrator hypothesized to convert descending velocity movement commands into postural holding commands. Included in that data were observations from people with hemiparesis due to stroke. The current study uses similar data but pushes into a different, but closely related direction, suggesting that these data may address the independence of these two fundamental components of motor control. I find the logic laid out in the second sentence of the abstract ("The paretic arm after stroke is notable for abnormalities both at rest and during movement, thus it provides an opportunity to address the relationships between control of reaching, stopping, and stabilizing") less than compelling, but the study does make some interesting observations. Foremost among them, is the relation between the resting force postural bias and the effect of force perturbations during the target hold periods, but not during movement. While this interesting observation is consistent with the central mechanism the authors suggest, it seems hard to me to rule out other mechanisms, including peripheral ones. 

      Response 1.1. Thank you for your comments, which we address in detail below and in our response to Recommendations to the authors (see pp. 15-19 of this letter). We would first like to clarify the motivation behind our use of a stroke population to understand the interactions between the control of reaching in and holding. We agree that this idea can be laid out in a more compelling way.

      The fact that stroke patients usually display issues with their control of both reaching and holding, allows for within-individual comparisons of those two modes of control. Further, the magnitude of abnormalities is relatively large, making it easier to measure, compare and investigate effects. And, importantly, these two modes of control can be differentially affected after stroke (also pointed out by Reviewer 2, point 4 in Comments to the Authors). Finally, this kind of work – examining interactions between positive signs of stroke (such as abnormal posture or synergy) vs. negative signs (such as loss of motor control) – needs to be done in humans, as positive signs are relatively absent even in primates (Tower, 1940).

      We have changed our abstract (changes shown below in red), and our intro (expanding the second paragraph, lines 75-76), to lay out our motivation more clearly.

      From the abstract:

      “The paretic arm after stroke exhibits different abnormalities during rest vs. movement, providing an opportunity to ask whether control of these behaviors is independently affected in stroke. “

      On the other hand, the relation between force bias and the well-recognized flexor synergy seems rather self-evident, and I don't see that these results add much to that story.

      Response 1.2. While it seems natural that these biases would be the resting expression of abnormal flexor synergies (given their directionality towards the body, as shown in Figures 2-3, and the other similarities we demonstrate in Figure 8), we do not believe it is self-evident. These biases are measured at rest, with the patient passively moved and held still, whereas abnormal synergies emerge when the patient actively tries to move. The lack of relationship we find between these resting force biases and active movement underlines that the relation between force bias and flexor synergy should not be taken as self-evident, making it worthwhile to examine it (as we motivate in lines 589-596 and show in Figure 8).

      The paradox here is that, in spite of a relationship between force bias and flexor synergy (itself manifesting during attempted movement), there seems to be no relationship between force bias and direct measures of active movement (Figures 5,6). This is the paradox that inspired our conceptual model (Figure 9) and inspires to further investigate the factors under which these two systems are intermingled or kept separate. We thus find it to be a helpful element in the story.

      I am also struck by what seems to be a contradiction between the conclusions of the current and former studies: "These findings in stroke suggest that moving and holding still are functionally separable modes of control" and "the commands that hold the arm and finger at a target location depend on the mathematical integration of the commands that moved the limb to that location." The former study is mentioned here only in passing, in a single phrase in the discussion, with no consideration of the relation between the two studies. This is odd and should be addressed. 

      Response 1.3. While these two sets of findings are not contradictory, we understand how they can appear as such without providing context. We now discuss the relationship between our present study and the previous one more directly (lines 66-70 and 663-669 of the revised manuscript).

      The previous study examined how the control of movement informs the control of holding after the movement was over; the current study examines whether abnormalities in holding measured at rest with the movement leading to the rest position being passive. There are thus two important distinctions:

      First, directionality of potential effects: here we examine the effect of (abnormalities in) holding control upon movement, but the 2020 study (Albert et al., 2020) examines the effects of movement upon holding control. Stroke patient data in the 2020 study showed that, under CST damage, while the reach controller is disrupted, the hold controller can continue to integrate the malformed reach commands faithfully. In line with this, we proposed a model where the postural controller system sits downstream of the moving controller (Figure 7G in the 2020 paper). We thus did not claim, in 2020, that integration of movement commands is the only way to do determine posture control, as we stated explicitly back then, e.g. (emphasis ours):

      “Equations (1) and (2) describe how the integration of move activity may relate to changes in hold commands, but does not specify the hold command at the target.”

      In short, finding no effect of holding abnormalities upon movement (present finding) does not mean there is no potential effect of movement upon holding (2020 finding). This is something we had alluded to in the Discussion but not clarified, which we do now (see edits at the end of our response to this point).

      Second, active vs. passive movement: here, we measure holding control at rest (Experiment 1). The 2020 study shows that endpoint forces reflect the integration of learned dynamics exerted during active movement that led to the endpoint position. However, in Experiment 1, there is no active reaching to integrate, as the robot passively moves the arm to the held position. Thus, resting postural forces measured in Experiment 1 could not reflect the integration of reach commands that led to each rest position.  

      Thus, the two sets of findings are not contradictory. Taking our current and 2020 findings together suggests that active holding control would comprise would reflect both the integration of movement control that led to assuming the held position, plus the force biases measured at rest.

      Hence our decision to describe these two systems as functionally separable: while these systems can interact, the effects of post-stroke malfunctions in each can be independent depending on the function and conditions at hand. This does not make this a limited finding: being able to dissociate post-stroke impairment based on each of these two modes of control may inform rehabilitation, and also importantly, understanding the conditions in which these two modes of control become separable can substantially advance our understanding of both how different stroke signs interact with each other and how motor control is assembled in the healthy motor system. Figure 9 illustrates our conceptual model behind this and may serve as a blueprint to further dissect these circuits in the future.

      We discuss these issues briefly in lines 663-669 in our Discussion section, reproduced below for convenience:

      “It should be noted, however, that having distinct neural circuits for reaching and holding does not rule out interactions between them. For example, we recently demonstrated how arm holding control reflects the integration of motor commands driving the preceding active movement that led to the hold position, in both healthy participants and patients with hemiparesis (Albert et al., 2020). However, in that paper, we did not claim that this integration is the only source of holding control. Indeed, in Experiment 1 of the current study, we used passive movement to bring the arm to each probed position, which means that the postural biases could not be the result of integration of motor commands.” 

      And, we have adjusted our Introduction to provide pertinent context regarding our 2020 work (first paragraph, lines 66-70 of the updated manuscript).

      A minor wording concern I had is that the term "holding still" is frequently hard to parse. A couple of examples: "These findings in stroke suggest that moving and holding still are functionally separable modes of control." This example is easily read, "moving and holding [continue to be] functionally separable". Another: "...active reaching and holding still in the same workspace, " could be "...active reaching and holding [are] still in the same workspace." Simply "holding", "posture" or "posture maintenance" would all be better options.

      Response 1.4. Thank you for your suggestion. Following your comment, we have abbreviated this term to simply “holding”, both on the title and throughout the text.

      Reviewer #2 (Public Review):

      Summary: 

      Here the authors address the idea that postural and movement control are differentially impacted with stroke. Specifically, they examined whether resting postural forces influenced several metrics of sensorimotor control (e.g., initial reach angle, maximum lateral hand deviation following a perturbation, etc.) during movement or posture. The authors found that resting postural forces influenced control only following the posture perturbation for the paretic arm of stroke patients, but not during movement. They also found that resting postural forces were greater when the arm was unsupported, which correlated with abnormal synergies (as assessed by the Fugl-Meyer). The authors suggest that these findings can be explained by the idea that the neural circuitry associated with posture is relatively more impacted by stroke than the neural circuitry associated with movement. They also propose a conceptual model that differentially weights the reticulospinal tract (RST) and corticospinal tract (CST) to explain greater relative impairments with posture control relative to movement control, due to abnormal synergies, in those with stroke.

      Strengths: 

      The strength of the paper is that they clearly demonstrate with the posture task (i.e., active holding against a load) that the resting postural forces influence subsequent control (i.e., the path to stabilize, time to stabilize, max. deviation) following a sudden perturbation (i.e., suddenly removal of the load). Further, they can explain their findings with a conceptual model, which is depicted in Figure 9. 

      Weaknesses: 

      Current weaknesses and potential concerns relate to i) not displaying or reporting the results of healthy controls and non-paretic arm in Experiment 2 and ii) large differences in force perturbation waveforms between movement (sudden onset) and posture (sudden release), which could potentially influence the results and or interpretation. 

      Response 2.0. Thank you for your assessment, and for pointing out ways to improve our paper. We address the weakness and potential concerns in detail below.

      Larger concerns

      (1) Additional analyses to further support the interpretation. In Experiment 1 the authors present the results for the paretic arm, non-paretic arm, and controls. However, in Experiment 2 for several key analyses, they only report summary statistics for the paretic arm (Figure 5D-I; Figure 6D-E; Figure 7F). It is understood that the controls have much smaller resting postural force biases, but they are still present (Figure 3B). It would strengthen the position of the paper to show that controls and the non-paretic arm are not influenced by resting postural force biases during movement and particularly during posture, while acknowledging the caveat that the resting positional forces are smaller in these groups. It is recommended that the authors report and display the results shown in Figure 5D-I; Figure 6D-E; Figure 7F for the controls and non-paretic arm. If these results are all null, the authors could alternatively place these results in an additional supplementary. 

      Response 2.1a. Thank you for your recommendations. We agree both on the value of these analyses and the caveat associated with them: these resting postural force biases are substantially smaller for the non-paretic and control data (for example, the magnitude of resting biases in the supported condition is 2.8±0.4N for the paretic data, but only 1.8±0.4N and 1.3±0.2N for the non-paretic and control data, respectively; the difference is even greater in the unsupported condition, though this is not the one being compared to Experiment 2).

      We now conduct a comprehensive series of supplementary analyses, including the examination of non-paretic and control data for all three components of Experiment 2 (unperturbed reaches; pulse perturbations; and active holding control). These are mentioned in the Results (lines 422-424, 512513, and 574-574 of the revised manuscript) and illustrated in the supplementary materials: Supplementary Figures S5-1, S6-1, and S7-1 contain the main analyses (comparisons of instances with the most extreme resting biases for each individual) for the unperturbed reach analysis, pulse perturbation analysis, and active holding control analysis, respectively.

      We find that non-paretic and control data do not display effects of resting biases upon unperturbed reaching control (Figure S5-1) or control against a pulse perturbation early during movement (Figure S6-1) – as is the case with the paretic data. Non-paretic and control data do not display evidence of influence of their resting force biases upon active holding control either (Figure S7-1), unlike the paretic data. For the non-paretic data, however, these influences are nominally towards the same direction as in the paretic data. Given that resting biases are substantially weaker for the non-paretic case, it is possible a similar relationship exists but requires increased statistical power to discern. Moreover, it is possible that the effect of resting biases is non-linear, with small biases effectively kept under check so that their impact upon active holding control is even less than a linearly scaled version of the impact of the stronger, paretic-side biases. This can be the subject of future work.

      Please also note that, following your recommendation (Recommendations to the Authors, point 2.1), we have conducted secondary analyses which estimate sensitivity to resting bias using all datapoints, validating our main analyses; these analyses were also performed for control and non-paretic data, with similar results (Response 2.A.1).

      Further, the results could be further boosted by reporting/displaying additional analyses. In Figure 6D the authors performed a correlation analysis. Can they also display the same analysis for initial deviation and endpoint deviation for the data shown in Figure 5D-F & 5G-I, as well for 7F for the path to stabilization, time to stabilization, and max deviation? This will also create consistency in the analyses performed for each dependent variable across the paper.

      Response 2.1b. Here, we set to test whether resting biases affect movement. It is best to do this using a within-individual comparison design, rather than using across-individual correlations: while correlation analyses can in general be informative, they obscure within-individual effects which are the main comparisons of interest in our study. Consider a participant with strong resting bias towards one direction, tested on opposing perturbations; averaging these responses for each individual would mostly cancel out any effects of resting biases. Even if we were to align responses to the direction of the perturbation before averaging, the power of correlation analyses may be diluted by inter-individual differences in other factors, such as overall stiffness.

      Thus, our analysis design was instead focused on examining the differential effects of resting posture biases within each individual’s data. We compared the most extreme opposing/aligned or clockwise/counter-clockwise instances within each individual, specifically to assess these differential effects. In our revised version, we have further reinforced these analyses to include all data rather than the most extreme instances (see response 2.A.1.a to the Reviewer’s recommendation to the authors) where we performed correlations of within-individual resting posture vs. the corresponding dependent variables and compared the resulting slopes. 

      The across-individual correlation analyses add little to that for the reasons we outlined above. At the same time, it is possible they can be helpful in e.g. illustrating across-individual variability. We thus now include across-individual correlation analyses for all dependent variables, but, given their limited value, only in the supplementary material. This also means that, for consistency, we moved the correlation analysis in Figure 6 to the corresponding supplementary figure as well (Figure S6-3).

      In addition, following the Reviewer’s comment about consistency in the analyses performed for each dependent variable across the paper, we added within-individual comparisons for settling time following the pulse perturbations (Figure 6D, right).

      (2) Inconsistency in perturbations that would differentially impact muscle and limb states during movement and posture. It is well known that differences in muscle state (activation / preloaded, muscle fiber length and velocity) and limb state (position and velocity) impact sensorimotor control (Pruszynski, J. A., & Scott, S. H. (2012). Experimental brain research, 218, 341-359.). Of course, it is appreciated that it is not possible to completely control all states when comparing movement and posture (i.e., muscle and limb velocity). However, using different perturbations differentially impacts muscle and limb states. Within this paper, the authors used very different force waveforms for movement perturbations (i.e., 12 N peak, bell-shaped, 0.7ms duration -> sudden force onset to push the limb; Figure 6A) and posture perturbations (i.e., 6N, 2s ramp up -> 3s hold -> sudden force release that resulted in limb movement; Figure 4) that would differentially impact muscle (and limb) states. Preloaded muscle (as in the posture perturbation) has a very different response compared to muscle that has little preload (as in the movement perturbations, where muscles that would resist a sudden lateral perturbation would likely be less activated since they are not contributing to the forward movement). Would the results hold if the same perturbation had been used for both posture and movement (e.g., 12 N pulse for both experiments)? It is recommended that the authors comment and discuss in the paper why they chose different perturbations and how that might impact the results. 

      Response 2.2a. We agree that it can be impossible to completely control all states when comparing movement and posture. We would also like to stress that these perturbations were not designed so that responses are directly compared to each other (though of course there is an indirect comparison in the sense that we show influence of biases in one type of perturbation but not the other). Instead, Experiment 2 tried to implement a probe optimized for each motor control modality (moving vs. holding). However, the Reviewer has a point that the potential impact of differences between the perturbations is important to discuss in the paper.

      The Reviewer points out two potentially interesting differences between the two perturbations. First, the magnitude (6N for the posture perturbation vs. 12N for the pulse perturbation); second, the presence of background load in the posture perturbation, in contrast to the pulse perturbation.

      For the movement perturbation, we used a 12-N, 70ms pulse. This perturbation and scaled versions have been tested before in both control and patient populations (Smith et al., 2000; Fine and Thoroughman, 2006). For the holding perturbation, we used a background load to ensure that active holding control is engaged, and the duration of the probe (holding for about 5s) made using a stronger perturbation impractical –maintaining a background load at, say, 12N for that long could lead to increased fatigue.

      The question raised by the Reviewer, whether the findings would be the same if the same, 12-N pulse were used to probe both moving and holding control, is interesting to investigate. We would expect the same qualitative findings (i.e. there would still be a connection between resting posture and active holding control when the latter were probed with a 12N pulse). Recent work provides more specific insight into what to expect. Our posture perturbation task is similar to the Unload Task in (Lowrey et al., 2019), whereby a background torque is released, whereas our pulse perturbation is more similar to their Load Task, whereby a torque is imposed against no background load (though it is a step perturbation rather than a pulse). Lowrey et al., 2019 find that their Unload task is harder than the Load task, with 2x the fraction of patient trials classified as failed (with failure defined as task performance being outside of the 95% confidence interval for controls), though there are still clear effects for the Load task. 

      This suggests that the potential effects of using a pulse-like perturbation to probe posture control would likely be weaker in magnitude, all other things being equal. At the same time, however, the Load and Unload tasks in Lowrey et al., 2019 were perturbations of the same magnitude; it is thus also likely that the reduction in effect would be mitigated, or reversed, by the fact that we would be using a 12N instead of a 6N perturbation.

      A relevant consequence of the Lowrey et al., 2019 findings is that the Unload paradigm is superior in its ability to detect impairment in static, posture perturbations, and thus provides a better signal to detect potential relationships with resting posture biases. This is not surprising, as a background load further engages the control of active holding, which what we were trying to probe in the first place.

      But then why not use the same paradigm (preloading and release) for movement? There are two main reasons. First, requiring a background load throughout the experiment is unfeasible due to fatigue. Second, for the holding perturbation, we wanted to ensure that the postural control system is meaningfully engaged when the perturbation hits, hence we picked the background load. Were we to impose the same during moving – i.e. impose a lateral background load on the movement - we could be engaging posture control on top of movement control. This preloading would reduce the degree to which the pulse probe isolates movement control, and lead to intrusion of the posture control system in the movement task by design. This relates to what the Reviewer proposes in the comment below: preloading may result in postural biases i.e. engage posture control; see below where we argue this interpretation is within the scope of our conceptual model rather a counter to it.

      We now explain the rationale behind our perturbation design in the Methods section (lines 211-220).

      Relatedly, an alternative interpretation of the results is that preloading muscle for stroke patients, whether by supporting the weight of one's arm (experiment 1) or statically resisting a load prior to force release (experiment 2), leads to a greater postural force bias that can subsequently influence control. It is recommended that the authors comment on this. 

      Response 2.2b. We find this interpretation valid, but we do not see how it meaningfully differs from the framework we propose. We already state that the RST may be tailored for both posture/holding control and the production of large forces (which would include muscle preloading):

      “Thus, the accumulated evidence suggests that the RST could control posture and large force production in the upper limb.“ (lines 698-699 in the current version)

      “the RST, in contrast, is weighted more towards slower postural control and generation of large isometric forces” (lines 724-726 in the current version)

      And, we discuss other conditions where the RST is involved in large force production, such as power grip, and how these interact with the role of the RST in posture/holding control (lines 758-768 in the current version).

      To better explain our model, we now provide the two examples mentioned by the reviewer along with our description of the proposed role for the RST (lines 726-727):

      “…the RST, in contrast, is weighted more towards slower postural control and generation of large isometric forces (such as vertical forces for arm support, or horizontal forces for holding the arm still against a background load like in our posture/release perturbation trials).”

      We note, however, that we find resting posture abnormalities even in the presence of arm support, suggesting the involvement of the RST in holding control even when the forces involved (and the need to preload the muscle) are small.

      Reviewer #3 (Public Review): 

      The authors attempt to dissociate differences in resting vs active vs perturbed movement biases in people with motor deficits resulting from stroke. The analysis of movement utilizes techniques that are similar to previous motor control in both humans and non-human primates, to assess impairments related to sensorimotor injuries. In this regard, the authors provide additional support to the extensive literature describing movement abnormalities in patients with hemiparesis both at rest and during active movement. The authors describe their intention to separate out the contribution of holding still at a position vs active movement as a demonstration that these two aspects of motor control are controlled by two separate control regimes.

      Strengths: 

      (1) The authors utilize a device that is the same or similar to devices previously used to investigate motor control of movement in normal and impaired conditions in humans and non-human primates. This allows comparisons to existing motor control studies. 

      (2) Experiment 1 demonstrates resting flexion biases both in supported and unsupported forelimb conditions. These biases show a correlated relationship with FM-UE scores, suggesting that the degree of motor impairment and the degree of resting bias are related.

      (3) The stroke patient participant population had a wide range of both levels of impairment and time since stroke, including both sub-acute and chronic cases allowing the results to be compared across impairment levels.

      The authors describe several results from their study: 1. Postural biases were systematically toward the body (flexion) and increased with distance from the body (when the arm was more extended) and were stronger when the arm was unsupported. 2. These postural biases were correlated with FM-UE score. 3. They found no evidence of postural biases impacting movement, even when that movement was perturbed. 4. When holding a position at the end of a movement, if the position was perturbed opposite of the direction of bias, movement back to the target was improved compared to the perturbation in the direction of bias. Taken together, the authors suggest that there are at least two separate motor controls for tasks at rest versus with motion. Further, the authors propose that these results indicate that there is an imbalance between cortical control of movement (through the corticospinal tracts) and postural control (through the reticulospinal tract).

      Response 3.1. Thank you for pointing out some of the strengths of our work and summarizing our findings. A minor clarification we would like to make, related to (3), is that, while our study did enroll two patients towards the end of the subacute stage (2-3 months), the rest of the population were at the chronic stage, at one year and beyond. We thus find it very unlikely that time after stroke was the primary driver of differences in impairment in the population we studied.

      There are several weaknesses related to the interpretation of the results:

      In Experiment 1, the participants are instructed to keep their limbs in a passive position after being moved. The authors show that, in the impaired limb, these resting biases are significantly higher when the limb is unsupported and increase when the arm is moved to a more extended position.

      When supported by the air sled, the arm is in a purely passive position, not requiring the same antigravity response so will have less RST but also less CST involvement. While the unsupported task invokes more involvement of the reticulospinal tract (RST), it likely also has significantly higher CST involvement due to the increased difficulty and novelty of the task.

      If there were an imbalance in CST regulating RST as proposed by the authors, the bias should be higher in the supported condition as there should be relatively less CST activation/involvement/ modulation leading to less moderating input onto the RST and introducing postural biases. In the unsupported condition, there is likely more CST involvement, potentially leading to an increased modulatory effect on RST. If the proportion of CST involvement significantly outweighs the RST activation in the unsupported task, then it isn't obvious that there is a clear differentiation of motor control. As the degree of resting force bias and FM-UE score are correlated, an argument could be made that they are both measuring the impairment of the CST unrelated to any RST output. If it is purely the balance of CST integrity compared to RST, then the degree of bias should have been the same in both conditions. In this idea of controller vs modulator, it is unclear when this switch occurs or how to weigh individual contributions of CST vs. extrapyramidal tracts. Further, it isn't clear why less modulation on the RST would lead only to abnormal flexion.

      Response 3.2. Our model posits two mechanisms by which CST impairment would lead to increased RST involvement. The first – which is the one discussed by the Reviewer here - is a direct one, whereby weaker modulation of the RST by the CST leads to increased RST involvement. The second is an indirect one, whereby the incapacity of CST to drive sufficient motor output to deal with tasks eventually leads to increased RST drive.

      The reviewer suggests it is likely that the unsupported task demands increased activation through both the CST and the RST. If that were the case, however, it would exaggerate the effects of CST/RST imbalance after stroke compared to healthy motor control: if task conditions (lack of support) required higher CST involvement, then CST damage would have an even larger effect. In turn, this would lead to even higher RST involvement and further diminishing the ability of CST to moderate RST. Thus, RST-driven biases would be higher in the unsupported condition.

      And, given that the CST itself is damaged and has to deal with an even-increased RST activation, we would not expect that the proportion of CST involvement would outweigh RST activation, but the opposite. In fact, a series of relatively recent findings suggest just this. For example,

      • Zaaimi et al., 2012  showed that unilateral CST lesions in monkeys lead to significant increases in the excitability of the contralesional RST (Zaaimi et al., 2012). Interestingly, this effect was present in flexors but not extensors, potentially explaining why less modulation and/or overactivation of the RST would primarily lead to abnormal flexion. 

      • McPherson et al. (further discussed in point 2.A.23, by Reviewer 2 – Recommendations to the Authors) showed that, after stroke, contralesional activity (which would include the ipsilateral RST) increases relative to ipsilesional activity (which would include the contralateral CST)

      (McPherson et al., 2018). The same study also provides evidence that FM-UE may primarily reflect RST-driven impairment. The ipsilateral(RST)/contralateral(CST) balance, expressed as a laterality index, correlated with FM-UE, with lower FM-UE for indices indicating higher RST involvement. (Interestingly, the slope of this relationship was steeper when the laterality of brain activation patterns was examined under tasks with less arm support, mirroring the steeper FM-UE vs resting bias slope when arm support is absent, as shown in our Figure 8).

      • Wilkins et al., 2020 (Wilkins et al., 2020) found that providing less support (i.e. requiring increased shoulder abduction) increases ipsilateral activation (representing RST) relative to contralateral activation (representing CST).

      This resting bias could be explained by an imbalance in the activation of flexors vs extensors which follows the results that this bias is larger as the arm is extended further, and/or in a disconnect in sensory integration that is overcome during active movement. Neither would necessitate separate motor control for holding vs active movement. 

      Response 3.3. We do not think that either of these points necessarily argue against our model. First, the resting biases we observe are clearly pointed towards increased flexion, and can thus be seen as the outcome of an imbalance in the activation of flexors vs. extensors at rest. This imbalance between flexors/extensors can also be explained by the CST/RST imbalance posited by our conceptual model: in their study of CST lesions in the monkey, Zaaimi et al., 2012 found increased RST activation for flexors but not extensors, suggesting that RST over-involvement may specifically lead to flexor abnormalities (Zaaimi et al., 2012). Second, overcoming a disconnect in sensory integration may be one way the motor system switches between separate controllers; how this switch happens is not examined by our conceptual model.

      In Experiment 2, the participants are actively moving to and holding at targets for all trials while being supported by the air sled. Even with the support, the paretic participants all showed start- and endpoint force biases around the movement despite not showing systematic deviations in force direction during active movement start or stop. There could be several factors that limit systematic deviations in force direction. The most obvious is that the measured biases are significantly higher when the limb is unsupported and by testing with a supported limb the authors are artificially limiting any effect of the bias.

      Response 3.4. We do expect, in line with what the reviewer suggests, that any potential effects would be stronger in the unsupported condition. The decision to test active motor control with arm support was done as running the same Experiment 2 would pose challenges, particularly with our most impaired patients, given the duration of Experiment 2 (~2 hours, about 1 hour with each arm) and the expected fatigue that would ensue.

      However, a key characteristic of our comparisons is that we are comparing Experiment 2 active control data under arm support, against Experiment 1 resting bias data also under arm support. While Experiment 1 measured biases without arm support as well, these are not used for this comparison. And, while resting biases are weaker with arm support, they are still clear and significant; yet they do not lead to detectable changes in active movement.

      At the same time, we do not rule out that, if we were to repeat Experiment 2 without arm support, we could find some systematic deviation in the direction of resting bias in movement control. Our conceptual model, in fact, suggests that this may be the case, as we described in lines 618-620 of our original manuscript. The idea here is that, when arm support is not provided, the increased strength requirements lead to increased drive through the RST, to the point that posture control (and its abnormalities) spills into movement control (Figure 9). We now better clarify this position in our Discussion (lines 744-750):

      “The interesting implication of this conceptual model is that synergies are in fact postural abnormalities that spill over into active movement when the CST can no longer modulate the increased RST activation that occurs when weight support is removed (i.e. resting biases may influence active reaching in absence of weight support). Supporting this idea, a study found increased ipsilateral activity (which primarily represents activation via the descending ipsilateral RST (Zaaimi et al., 2012)) when the paretic arm had reduced support compared to full support (McPherson et al., 2018).”

      It is also possible that significant adaptation or plasticity with the CST or rubrospinal tracts could give rise to motor output that already accounts for any intrinsic resting bias.  

      Response 3.5. This kind of adaptation – regardless of the tracts potentially involved – is an issue we examined in our experiment. As we talk about in our Results (lines 458-460 in the updated manuscript), with most of our patient population in the chronic stage, it could be likely that their motor system adapted to those biases to the point that movement planning took them into account, thereby limiting their effect. This motivated us to examine responses to unpredictable perturbations during movement (Figure 6) where we still find lack of an obvious effect of resting biases upon reaching control. We thus believe that our findings are not explained by this kind of adaptation, though we agree it would be of great interest for future work to compare resting biases and reaching control in acute vs. chronic stroke populations to examine the degree to which stroke patients adapt to these biases as they recover.

      In any case, the results from the reaching phase of Experiment 2 do not definitively show that directional biases are not present during active reaching, just that the authors were unable to detect them with their design. The authors do acknowledge the limitations in this design (a 2D constrained task) in explaining motor impairment in 3D unconstrained tasks. 

      Response 3.6. It is, of course, an inherent limitation of a negative finding is that it cannot be proven. What we show here is that, there is no hint of intrusion of resting posture abnormalities upon active movement in spite of these resting posture abnormalities being substantial and clearly demonstrated even under arm support. To allow for the maximum bandwidth to detect any such effects, we specifically chose to compare the most extreme instances (resting bias-wise) for each individual, and yet we did not find any relationship between biases and active reaching.

      This suggests that, even if these biases could be in some form present during active movement, their effect would be minimal and thus limited in meaningfully explaining post-stroke impairment in active movement under arm support.

      Note that, as we already discuss, our conceptual model (Figure 9) suggests that the degree to which directional biases would be present in active reaching may be influenced by arm support (or the specific movements examined – hence our limitation in not examining 3D movement). Thus we do not claim that this independence is absolute. Examples include the last line of the passage quoted right above, and the summary statement of our Discussion quoted below (lines 639-641):

      “…which raises the possibility that the observed dissociation of movement and posture control for planar weight-supported movements may break down for unsupported 3D arm movements.”

      Finally, we now more explicitly acknowledge that abnormal resting biases may influence active movement in the absence of arm support (see Response 3.4).

      It would have been useful, in Experiment 2, to use FM-UE scores (and time from injury) as a factor to determine the relationship between movement and rest biases. Using a GLMM would have allowed a similar comparison to Experiment 1 of how impairment level is related to static perturbation responses. While not a surrogate for imaging tractography data showing a degree of CST involvement in stroke, FM-UE may serve as an appropriate proxy so that this perturbation at hold responses may be put into context relative to impairment.

      Response 3.7. Here the Reviewer suggests we use FM-UE scores as a proxy for CST integrity. We do not think this analysis would be particularly helpful in our case for a number of reasons:

      First, while FM-UE is a general measure of post-stroke impairment, it was designed to track - among other things - the emergence and resolution of abnormal synergies, a sign assumed to result from abnormally high RST outflow (McPherson et al., 2018; McPherson and Dewald, 2022). In line with this, the FM-UE scales with EMG-based measures of synergy abnormality (Bourbonnais et al., 1989). Impairments in dexterity, a sign associated with damage to the CST (Lawrence and Kuypers, 1968; Porter and Lemon, 1995; Duque et al., 2003), dissociate with synergy abnormalities when compared under arm support as we do here (Levin, 1996; Hadjiosif et al., 2022). This means that FM-UE would be a stronger proxy for RST activity and thus not a direct proxy for CST integrity particularly when one wants to dissociate RST-specific vs. CST-specific abnormalities. In fact, as we discuss in Response 3.2 above, there is a number of studies supporting this idea: for example, Zaaimi et al., 2012 show that relative RST activation – the balance between ipsilateral excitability, primarily reflecting RST, and contralateral excitability, primarily reflecting the CST, scales with FM-UE (Zaaimi et al., 2012).

      Second, this kind of analysis would obscure within-individual effects, since FM-UE scores are, of course, assigned to each individual. This is the same issue as doing across-individual correlation analyses in general (see response 2.1b).Strong resting force bias would have opposite effects on opposing perturbations, averaging across subjects would occlude these effects.

      Third, while FM-UE is a good measure of synergy abnormality, weakness alone could also give an abnormal FM-UE (Avni et al., 2024).

      The Reviewer also suggests we use time from injury for this analysis. Time from injury can indeed potentially be an important factor. However, this analysis would not be appropriate for our dataset, since the effective variation in recovery stage within our population is limited: our sample is essentially chronic (only two patients were examined within the subacute stage – at 2 and 3 months after stroke - with everybody else examined more than a year after stroke) with the “positive” elements of their phenotype (and FM-UE itself) essentially plateaued (Twitchell, 1951; Cortes et al., 2017). We thus would not expect to see any meaningful effects of time from injury within our population. It would be an excellent question for future work to investigate both resting biases and their relationship to reaching in acute/subacute patients, and examine whether the trajectory of resting biases (both emergence and abatement due to recovery) follows the one for abnormal synergies.

      It is not clear that even in the static perturbation trials that the hold (and subsequent move from perturbation) is being driven by reticulospinal projections. Given a task where ~20% of the trials are going to be perturbed, there is likely a significant amount of anticipatory or preparatory signaling from the CST. How does this balance with any proposed contribution that the RST may have with increased grip?

      Response 3.8. We included our response to this as part of Response 3.2. In brief, while we cannot rule out that these tasks may recruit increased CST signaling, this would tend to increase, rather than reduce, the effects of post-stroke impairment: the requirement for increased signaling from a CST that is damaged would magnify the effects of this damage, in turn leading to increased recruitment of other tracts, such as the RST.

      In general, the weakness of the interpretation of the results with respect to the CST/RST framework is that it is necessary to ascribe relative contributions of different tracts to different phases of movement and hold using limited or indirect measures. Barring any quantification of this data during these tasks, different investigators are likely to assess these contributions in different ways and proportions limiting the framework's utility.

      Response 3.9. We believe that our Reponses 3.2-3.6 put our findings in fair perspective, and the edits undertaken based on the Reviewer’s comments have clarified our position as to how the dissociation between holding and moving control may break down. We do agree, however, that our framework would be strengthened by the use of direct measures of CST/RST connectivity in future research. We present our conceptual model as a comprehensive explanation of our findings and how they blend with current hypotheses regarding the role of these two tracts in motor control after stroke.  As such, it provides a blueprint towards future research that more directly measures or modulates CST and RST involvement, using tools such as tractography or non-invasive brain stimulation.

      Recommendations for the authors:   

      Reviewer #1 (Recommendations For The Authors):

      L226 “…of this issue, we repeated the analysis of Figure 7F (a) by excluding these four patients…”.  Should this be three, based on the previous sentence? 

      Response 1.A.1. Thank you for pointing this typo, which is now corrected. The analysis in question (Figure S1 in the original submission, now re-numbered as Figure S7-4), excluded the three patients mentioned in the previous sentence.

      L254 “…the hand was held in a more distal position. The postural force biases were strongest when…”  Could this be "extended" rather than distal? See my later comment about the inadequate description of targets.

      Response 1.A.2. The reviewer is correct that, the arm will tend to be more extended in the distal targets. However, since these positions were defined in extrinsic coordinates, we think the terms distal/proximal are also appropriate. In either case, we now clarify these definitions in the text (see Response 1.A.3 below).

      L263 “…contained both distal and proximal targets, and, importantly, they were also the movement…”.  Distal/proximal targets were never described as part of the task. 

      Response 1.A.3. We improved our description by (i) changing the wording above to “represented positions both distal and proximal to the body,”, (ii) doing the same in our Methods (line 175) and (iii) indicating distal/proximal targets in Figure 3A (bottom right of panel A).

      L378 “…the pulse perturbation. We hypothesized that, should resting postural forces play a role, they…”  L379 “…would tend to reduce the effect of the pulse if they were in the opposite direction, and…”  Not really obvious why. A reduction in the displacement caused by a force pulse might be caused by different stiffness or viscosity, but not by a linear, time-invariant force bias. This situation is different from that of "moving the arm through a high-postural bias area vs. a low-postural bias area" where it would encounter time- (actually spatially) varying forces and varying amounts of displacement. Clarify the logic if this is a critical point.

      Response 1.A.4. We thank the Reviewer for highlighting this point of potential confusion. We now clarify that these postural bias forces are neuromuscular in origin (Kanade-Mehta et al., 2023), and likely result from an expression of abnormal synergy, at least under static conditions. In this case, we hypothesized that force pulses acting against the gradient of the postural bias field would act to stretch the already active muscles, which would lead to a further increase in postural resistance due to inherent length-tension properties of active muscle. By contrast, force pulses acting along the gradient of the postural bias field would act to shorten the same active muscles, which would lead to a reduction in postural resistance. The data did not support this in the case of force pulses imposed during movement. We note, however, that similar effects would affect responses to static perturbations as well, wherein we do find an effect of resting biases. We now better explain this reasoning (lines 479482).

      L466 “resting postural force). In short, our perturbations revealed that resting flexor biases switched  467 on after movement was over, providing evidence for separate control between moving” and 

      L468 “holding still.”

      I do not think the authors have presented clear evidence that forces, "switch on", implying the switch to a different controller which they posit. This could as easily be a nonlinear or time-varying property of a single controller (admittedly, the latter possibility overlaps broadly with their idea of distinct, interacting controllers). An example that the authors are certainly aware of is that of muscle "thixotropy" a purely peripheral mechanism due to the dynamics of crossbridge cycling that causes resting muscle to be stiffer than moving muscle, changing with a time constant of ~1-2 seconds. Neither this particular example nor changing levels of contraction (more likely during the unpredictable force perturbations) would be in the direction to explain the main observation here -- a point perhaps worth making, together with the stretch reflex comments. 

      Response 1.A.5. Thank you for this perspective. Indeed, it might be that “switching on” represents a shift along a nonlinear property of the same controller: in the extreme, if this nonlinearity is a step (on/off) function, this single controller would be functionally identical to two separate controllers. We thus cannot tell if these controllers are distinct in the strict sense. What we argue here is that, no matter the underlying controller architecture - two distinct controllers or two distinct modes of the same controller - is that the control of reaching vs. holding can be functionally separable even after stroke. In line with this idea, we used a more nuanced phrasing (e.g. “separable functional modes for moving vs. holding”) throughout our manuscript, and we have now edited out a mention of “separate controllers” to be consistent with this.

      Moreover, thank you for pointing out the example of thixotropy, showing how peripheral mechanisms could interact with central control. As you point out, this effect would not explain the main observation here: in fact, if stiffness were substantially higher during rest or holding (instead of moving) that would reduce the impact of the static perturbation, making it harder to detect any effects of resting biases compared to the moving perturbation case.

      L480 “…during movement (Sukal et al., 2007). Yet, Experiment 2 found no relationship between resting…” L481”… postural force biases and active movement control. To further investigate this apparent…”  The methods of the two studies seem fairly similar, but this question warrants a more careful comparison. How did the size of the two workspaces compare? What about the magnitude of the exerted forces? The movement condition in this study was done with the limb entirely supported. Under that condition, the Sukal study also found fairly small effects of the range of motion.

      Response 1.A.6. Sukal et al., 2007 did not directly measure exerted forces, but instead compared the active range of motion under different loading conditions. They used the extent of reach area to quantify the effect of abnormal synergies, with a more extended active range of motion signifying reduced effect of abnormal synergies. As the Reviewer points out, Sukal et al. found fairly small effects of synergies upon the range of motion when arm support was provided (the reach area for the paretic side was found to be about 85% of the nonparetic side under full arm support, though they were statistically significantly different, Figure 5 of their paper). They found increasing effect of synergies as arm support was reduced: on average, the reach area when participants had to fully support the arm was less than 50% the reach area when full arm support was given (comparing the 0% vs. 100% active support conditions [i.e. 100% vs. 0% external support] in their Figure 5). As we discuss in our paper, this effect of arm support upon synergy mirrors the one we found for resting postures.

      To compare our workspace with the one in Sukal et al., we overlaid our workspace (the array of positions for which the posture biases were measured, for a typical participant from Experiment 1) on the one they used as shown in their Figure 4. Note that their figure only shows an example participant, and thus our ability to compare is limited by the fact that each participant can vary widely in terms of their impairment, and assumptions had to be made to prepare this overlay (e.g. that (0,0) represents the position of the right acromion point). 

      For this example, and our assumptions, our workspace was smaller, with the main points of interest (red dots, the movement start/end points used for Experiment 2) within the Sukal et al. workspace. That our workspace is smaller is not surprising, given that the area in Sukal et al. represents the limit of what can be reached, and thus motor control *has* to be examined in a subset of that area.

      Author response image 1.

      Comparing the two study methodologies, however, suggests an advantage of measuring resting biases in terms of sensitivity and granularity: first, resting biases can be clearly detected even under arm support (something we point out in our Discussion, lines 715-717); second, they can measure abnormalities at any point in the workspace, rather than a binary within/without the reach area. The resting bias approach may thus be a more potent tool to probe the shared bias/synergy mechanisms we propose here.

      Figure 2 

      Needs color code. 

      The red dots could be bigger.

      Response 1.A.7. We have increased the size of the red dots and added a color code to explain the levels illustrated by the contours. We also expanded our caption to better explain this illustration.

      Figure 3

      Labeling is confusing. Drop the colored words (from both A and B), and stick to the color legend. Consider using open and filled symbols (and bars) to represent arm support or lack thereof. The different colored ovals are very hard to distinguish.

      Response 1.A.8. We find these recommendations improve the readability of Figure 3 and we have thus adopted them - see updated Figure 3.

      Figure 4

      Not terribly necessary.  

      Response 1.A.9. While this figure is indeed redundant based our descriptions in the text, we kept it as we believe it can be useful in clarifying the different stages of movement we examine.

      Figure 5 

      Tiny blue and green arrows are impossible to distinguish. 

      Although the general idea is clear, E and H are not terribly intuitive.  Add distance scale bars for D-I. 

      Response 1.A.10. For improved contrast, we now use red and blue (also in line with comment below regarding Figure 7), and switched to brighter colors in general. To make E and H more intuitive and easier to follow, we expanded the on-panel legend. Thank you for pointing out that distance scale bars are missing; we have now added them (panels EFHI).

      Figure 6 

      Panel E inset is too small. 

      Response 1.A.11. We have now moved the inset to the right and enlarged it.

      Figure 7 

      Green and blue colors are not good. 

      Response 1.A.12. For improved contrast, we now use red and blue.

      Figure 8 

      Delete or move to supplement? 

      Response 1.A.13. We respectfully disagree. While the relationships on these data are also captured by the ANOVA, we believe these scatter plots offer a better overview of the relationships between force biases and FM-UE across different conditions.

      Really minor

      L113 “…participants' lower arm was supported using a custom-made air-sled (Figure 1C). Above the  participant's…” 

      Response 1.A.14. We put the apostrophe after the s so to refer to participants in general (plural).

      L117 ”…subject-produced forces on the handle were recorder using a 6-axis force transducer.”  recorded 

      Response 1.A.14. Thank you for pointing out this error which we have now corrected.

      L136 “…2013), Experiment 1 assessed resting postural forces by passively moving participants to>…”  The experiment did not move the participant. 

      Response 1.A.15. We now fix this issue: “by having the robot passively move…”

      L248 “…experiment blocks: two with each arm, with or without arm weight support (provided by an air experimental…”

      Response 1.A.16. We have now corrected this.

      L364 “…responses to mid-movement perturbations. In 1/3 of randomly selected reaching movements…”  Obviously, you mean 1/3 of all movements: "One-third of the reaching movements were chosen randomly"  

      Response 1.A.17. We now clarify: “In 1/3 of reaching movements in Experiment 2, chosen randomly”. Also please note our response to Reviewer 2, point 10: we now report the exact number of trials for which each kind of perturbation was present.

      L609 “Damage to the CST after stroke reduces its moderating influence upon the RST (Figure 9,…”  "its" refers to the subject, "Damage", not "CST".

      Response 1.A.18. We have changed this to “Post-stroke damage to the CST reduces the moderating influence the CST has upon the RST”.

      Reviewer #2 (Recommendations For The Authors):

      (1) Throughout, the authors cleverly selected the most opposed and most aligned resting postural force biases to perform a within-subject analysis. However, this approach excludes a lot of data. The authors could perform an additional within-subject analysis. For each participant they could correlate lateral resting posture force bias to each dependent variable, utilizing all the trials of a participant. 

      Response 2.A.1a. Thank you for your appreciating our analysis design, and suggesting additional analyses. We focused our within-subject analysis design on the most extreme instances, as we believe that this approach would offer the best opportunity to detect any potential effects of resting biases. We reasoned that, since resting biases tend to be relatively small for most locations in the workspace, taking all biases into account would inject a disproportionate amount of noise in our analysis, which would in turn diminish our ability to detect any potential relationships. This could be because small biases lead to small effects but also small biases may themselves be more likely to reflect measurement noise in the first place. Note that our study talks about separability of active reaching from resting abnormalities based on lack of relationships between the two. While one cannot definitely prove a negative, it is also important to take the approach that maximizes the ability to detect any such relationship if there were one. We believe taking the most extreme instances fulfills that role.

      However, as the Reviewer points out, this approach also excludes a substantial amount of data. We agree that our findings could be further strengthened by exploring additional within-subject analyses that utilize all trials. Thus, following the reviewer’s suggestion, we estimated the sensitivity of each dependent variable to lateral resting posture force bias. Specifically, we estimated the slope of this relationship for each individual (separately for paretic and non-paretic data) using linear regression, and assessed whether the average slope is significant for each group (paretic data, non-paretic data, and control data).

      This secondary analysis replicated our main findings: lack of relationship between posture biases and active reaching control (both for unperturbed and perturbed movement), and a significant relationship between posture biases and active holding control. In addition, in line with main point 2.1 by the reviewer, we performed the same analyses for non-paretic and control data. While there are no definitive conclusions to be made for these cases (as was likely, given that the resting force biases are smaller, as also pointed out by the Reviewer in 2.1) these data are worthy of discussion, with potentially interesting insights (for example, there are hints that the connection between resting biases and active holding control is present in the non-paretic arm as well, and may be explored in future research).

      We have included these analyses in the supplementary materials, and we point to them in the main text. Specifically:

      First, in line with our main analyses in Figure 5, we find no effect (the average slope is insignificant) for start and endpoint biases upon the corresponding reaching angles. This is now mentioned in lines 425-434 of the Results, and illustrated in Figure S5-2. There was a lack of effect for the non-paretic and control data as well.

      Second, in line with our main analyses in Figure 6, we find no effect of start biases upon responses to the pulse (Figure S6-2, mentioned in lines 513-517 of the Results). As above, there was no effect of non-paretic or control data either.

      And, finally, in line with our main analysis in Figure 7, we find an effect of resting biases upon performance for the static perturbation (Figure S7-2, mentioned in lines 578-586 of the Results). Interestingly, there is a suggestion that resting biases may affect static perturbation responses in the non-paretic data as well based on the relationship between posture bias and maximum deviation, but not the other two metrics. Given the lack of consistency of resting bias effects for all three different dependent variables examined, however, our current data are thus unable to give a definite answer as to whether there is the connection between resting biases and active holding control is also present in the non-paretic side. Our hypothesis is that, since resting abnormalities and their effects are the pathological over-manifestations of mechanisms inherent in the motor system in general, then such a relationship would exist. Answering this question, however, would require an experiment design better tailored to detect relationships in the non-paretic arm, where resting biases are weaker.

      We thank the Reviewer for their suggestions and believe that these additional analyses provide a more complete picture of the data, and their consistency with our main results reinforces the message of the paper.

      Then, they can report the percentage of participants that display significant correlations separately for the paretic, nonparetic, and control arms. 

      Response 2.A.1b. We note that, even in cases where the average slope (across individuals) is significant, the individual slopes themselves are usually not significant, likely due to the large amount of noise for datapoints corresponding to weak resting biases. To further examine this, we performed additional analyses whereby we examined slopes by (a) pooling all participant data together (centered separately for each individual), and then (b) took a further step to normalize each participant’s data not only by centering but by also adjusting by each individual’s variability along each axis (i.e. assess the slope between z-scores of resting bias vs. z-scores of each dependent variable). These two analyses confirmed our finding that resting biases interacted with active motor control, with significant slopes between resting biases and outcome variables. (a) Pooling all data together: path to stabilization: p = 0.032; time to stabilization: p = 1.4x10-5; maximum deviation: p = 0.021. (b) Pooling and normalizing: path to stabilization: p = 0.0013; time to stabilization: p = 8.6x10-6; maximum deviation: p = 0.00056. The latter analysis showed even stronger connection between resting bias and active holding control, probably due to better accounting for differences in the range of resting biases across participants). For simplicity, however, we only provide the across-individual slope comparisons in the paper.

      (2) An important aspect of all the analyses is that they rely heavily on estimates of the resting postural force bias. How stable are these resting postural force biases at the individual level? The authors could assess this by reporting within-subject variance for both the magnitude and direction of the resting postural force bias.

      Response 2.A.2. Thank you for your suggestion. We now assess the individual-level variance in error across measurements for patients’ paretic data using an ANOVA: the variance that remains after all other factors (same probe location; same arm support condition; same participant) are taken into account. We found that individual level measurement variance explained a mere 9.0% of total variance for resting bias magnitude. (We note that the same figure was 20.2% for the non-paretic data, in line with the weaker average biases which would be more susceptible to noise). We now note this in the Methods, as part of the new subsection “Stability of resting posture bias measurements in Experiment 1” (lines 266-273).

      (3) Does resting postural force bias influence hand movement immediately following force release from the postural perturbation? This could be assessed before any volitional responses by examining the velocity of the hand during the first 50 ms following the postural perturbation.

      Response 2.A.3. The influence seems fairly rapid, within the first 100ms as shown to the right. Here we plot hand deviation in the direction of the perturbation for the most-opposed (red) vs. most-aligned (blue) instances to examine when these curves become different. The bottom plots show the difference between these two, whereas shading indicates SEM (note that these curves are referenced to the average deviation in the last 0.5 s before force release). The rightmost plots zoom in to make it easier to see how responses to the most opposed vs. most aligned instances diverge.

      To detect the earliest post-perturbation timepoint for which this effect was significant, we performed paired t-tests at each timestep, and found that the two responses were systematically statistically different 95ms after perturbation onset onwards. For reference, the same method detected a response at 25ms for the most aligned instances and 40ms for the most opposed instances.

      We have now added Supplementary Figure S7-4 with short commentary in the Supplementary Materials.

      (4) Abstract. lines 7-9. At a glance (and when reading the manuscript linearly) this sentence is unclear. If the paretic arm is compromised across rest and movement, how does that afford the opportunity to address the relationship between reaching, stopping, and stabilizing when all could be impacted? It might be useful to specify that these factors may impacted differently relative to one another with stroke, providing an opportunity to better understand the differences between movement and postural control. 

      Response 2.A.4. Thank you for pointing out this issue (also related to Reviewer 1’s point – Response 1.1). We have changed this to more clearly reflect our reasoning and highlight that the issue is that stroke can differentially impact reaching vs. holding, copied below:

      “The paretic arm after stroke exhibits different abnormalities during rest vs. movement, providing an opportunity to ask whether control of these behaviors is independently affected in stroke.”

      (5) Line 27. It is perhaps more appropriate to say conceptual model than simply 'model'.  

      Response 2.A.5. Thank you for your suggestion, which we have adopted throughout the manuscript.

      (6) Line 122-125. Figure 1A caption. The authors should specify that resting posture force biases occur when the limb or hand is physically constrained in a specific position. 

      Response 2.A.6. Thank you for pointing this out – we have clarified the caption:

      “If one were to physically constrain the hand in a position away from the resting posture, the torques involved in each component of the abnormal resting posture translate to a force on the hand (blue arrow);”

      (7) Line 147. Why was the order not randomized or counterbalanced? 

      Response 2.A.7. We prioritized paretic data, as the primary analyses and comparisons in our paper involved resting posture biases and active movement with the paretic arm. We note that our primary analyses, which rely on paretic-paretic comparisons, would not be affected by paretic vs. non-paretic ordering effects. However, ordering effects could potentially affect comparisons between paretic and non-paretic data. We now note the reasoning behind the absence of counterbalancing, and mention the potential limitation in interpreting paretic to non-paretic comparisons in lines 124-129 of the Methods.

      (8) Line 172. 12N is the peak force of the pulse?

      Response 2.A.8. The reviewer is correct; we have clarified our description (line 463 in the updated manuscript):

      “a 70 ms bell-shaped force pulse which was 12N at its peak”

      (9) Line 175. What is a clockwise pulse? Was the force vector rotating in direction over time so that it was always acting orthogonally to the movement, or did it always act leftwards or rightwards?

      Response 2.A.9. The force vector was not rotating in direction over time. Here, we used clockwise/counterclockwise to indicate rightwards/leftwards with respect to the ideal movement direction – the line from start position to target (which is what we understand the Reviewer means by “always act rightwards or leftwards”). We have clarified the text to indicate this (lines 193-195):

      …was applied by the robot lateral to the ideal movement direction (i.e. the direction formed between the center of the start position and the center of the target) after participants reached 2cm away from the starting position (Smith and Shadmehr, 2005; Fine and Thoroughman, 2006).

      (10) Lines 177-182. It might be useful to explicitly mention the frequency of each of the perturbations, just for ease of the reader. 

      Response 2.A.10. We have added this information to our Methods (lines 206-210):

      Thus, in summary, each 96-movement block consisted of 64 unperturbed movements and 32 movements perturbed with a force pulse (16 clockwise, and 16 counter-clockwise). For 20 out of the 96 movements in each block, the hold period was extended to test the hold perturbation (4 trials for each of the 5 target locations, each one of the 4 trials testing one perturbation direction as shown in Figure 7C).

      (11) Line 191. Lines 188-190. It would be useful to see a sample of several of these force traces over time (0-5s) that were used to make the average for a position. That would give insight into the stability of the forces of a participant for one of the postures. These traces could be shown in Figure 2.

      Response 2.A.11. Thank you for your suggestion. We have added these panels to Figure 1, (as Figure 2 was already large). Each panel illustrates the three measurements taken at similar positions (closest to midline, distal from the body) and the same condition (paretic arm, with arm support given) for one participant (same participants as in Figure 2). Solid lines indicate the force on the x-axis (positive values indicate forces towards the left), whereas dashed lines indicate the force on the y-axis (positive values indicate forces towards the body). The shaded area indicates the part averaged in order to estimate the resting bias, illustrating how resting biases were relatively stable by the 2s mark. Note that these examples include one trial (blue traces in the third panel) which was rejected following visual inspection as described in Materials and Methods – Data Exclusion Criteria (“trials where forces appeared unstable and/or there was movement during the robot hold period”). We find this helpful as this illustrates (and motivates) one component of our methodology. 

      (12) Line 196. Figure 1D (not 1E).  

      Response 2.A.12. Thank you for catching this error, which we have now corrected.

      (13) Line 215: The authors mentioned similar results. Were there any different results that impacted interpretation? Some evidence of this, similar to and in addition to Supplementary 1, would be helpful. 

      Response 2.A.13. We repeated our analyses without these exclusion criteria, with no impact to the interpretation. We now include versions of the main outcome panels from Figures 5, 6, and 7 in the supplementary materials calculated without this outlier exclusion (Figures S5-E, S6-E, and S7-E, respectively). 

      (14) Line 231: Perhaps better to explicitly state the furthest three positions are being across as the distal targets for the ANOVA. 

      Response 2.A.14. Thank you for your suggestion. We now explicitly clarify this in line 276:

      “distal targets [furthest three positions] vs. proximal targets [closest two positions]”

      (15) Figure 3B, lines 265. Clearly, these are different, but the authors should report statistics. 

      Response 2.A.15. We now report these numbers (lines 339-346 of the revised manuscript, which also include statistics related to bias direction as described in 2.A.17 below).

      (16) Figure 2 should have a heat map scale.  

      Response 2.A.16. We have now added this (also Response 1.A.7), including an explanation of what the heat map represents in the caption.

      (17) Figure 3C: It would be useful to quantify and plot the direction of the resting force bias vector. 

      Response 2.A.17. Thank you for your suggestion. We have expanded Figure 3 to include the average direction of the resting force bias vector (note the readjustment of colors following Reviewer 1’s comment: striped bars indicate No Support data, and full bars indicate Support data, with the colors being the same). The direction of the force bias vector, however, may not be very informative in cases where the magnitude is small (and the signal-to-noise ratio is small), whereas averaging the direction of the force bias vector across different positions for one participant may average out systematic variations in this direction across different locations. Nevertheless, the average direction appears generally towards the body (around -90°, or 6 o’clock) even in the non-paretic and control data (though the noise – as suggested by the size of the errorbars – is much higher in the latter cases, especially when the arm is supported). This is a (weak) suggestion that these resting biases may be present, though much subdued, in the nonparetic limb and healthy individuals; further work will be needed to elucidate this.

      (18) Line 428. It is not significantly longer compared to controls. Can the authors slightly revise this sentence?

      Response 2.A.18. We have revised this sentence (lines 529-532):

      Patients showed impaired capacity to resist and recover from this perturbation (the abrupt release of the imposed force). The time to stabilization for the paretic side (0.94±0.05s) was longer compared to the non-paretic side (0.79±0.03s, p = 0.024) and controls (0.78±0.06s, though this was statistically marginal, p = 0.061) as shown in Figure 7E, left.

      (19) Line 541. It is unclear how these data support the idea of three distinct controllers. Can the authors please clarify? 

      Response 2.A.19. Here, we compared our findings to previous ideas about distinct controllers, and discuss a potential fusion of these ideas with ours. Specifically, we find that holding is distinct from both initial reaching and coming to a stop. Previous work argues that initial reaching and coming to a stop are themselves distinct (Ghez et al., 2007; Jayasinghe et al., 2022). Combining these two sets of arguments, we arrive at the possibility of three distinct controllers. 

      (20) It would be useful if the authors provided a definition of synergy, as well as distinguishing between muscle and movement synergies. 

      Response 2.A.20. We now provide this in lines 591-594:

      Here, “synergies” refer to abnormal co-activation patterns across joints that manifest as the patient tries to move – for example, the elbow involuntarily flexing as the patient tries to abduct their shoulder (Twitchell, 1951; Brunnstrom, 1966). 

      (21) Line 592-593. The wording of this sentence could be improved. 

      Response 2.A.21. We have switched this sentence to active voice for more clarity:

      Thus, while full weight support reduces both resting flexor biases and movement-related flexor synergies, this reduction seems more complete for synergies rather than resting biases.

      (22) Figure 9. In the left column, it should read normal synergies and normal resting posture.  

      Response 2.A.22. We intentionally used the same terminology, as the idea behind our conceptual model is that these patterns, which manifest as well-recognized abnormal synergies and abnormal resting postures in stroke, may be present in the healthy motor system as well, but kept in check by CST moderating the RST. At the same time, we recognize that, by definition, synergies and posture in controls are the “normal” reference point against which “abnormal” synergies and posture are defined after stroke. To clarify this issue, we thus decided to forgo the use of the terms “abnormal” in the figure, and instead refer to “synergistic movement ” and “synergistic resting posture”.

      (23) Figure 9. With stroke, is RST upregulated, a decreased influence of CST, or both? All seem plausible.

      Response 2.A.23a. We believe both can be happening. From previous work (e.g. McPherson et al., 2018) it seems safe to say that RST upregulation is the case, whereas one would also expect a decreased CST influence due to its damage due to the stroke. The relative weight of these influences would be interesting to elucidate in future work.

      I have not read the paper, but did McPherson et al., 2018 test these different hypotheses?  

      Response 2.A.23b. The main point of McPherson et al., 2018 is that increased synergy expression is due to increased RST involvement, rather than reduced CST influence. However, McPherson et al. do not show separate increases/reductions in RST/CST activity; they show that contralesional activity relative to ipsilesional activity is increased (using a laterality index). While it does seem that RST is upregulated in this case, this does not exclude the possibility that CST influence is reduced as well.

      We also noticed that the citation itself, while mentioned in the text, was missing from the bibliography. This is now fixed.

      For Figure 9, McPherson is cited as they provide evidence for the idea that RST involvement increases when arm support is decreased. This evidence is both direct (e.g. in their Figure 3 where they show that “Stroke participants exhibited increased activity in the contralesional (R) hemisphere as SABD loading increased” [i.e. arm support was reduced]) and indirect: they connect synergies to RST involvement, and also show increased synergies with reduced arm support (also shown multiple times previously). Both these arguments suggest that arm support reduces RST involvement. We have clarified the relevant sentence:

      The interesting implication of this conceptual model is that synergies are in fact postural abnormalities that spill over into active movement when the CST can no longer modulate the increased RST activation that occurs when weight support is removed. Supporting this idea, McPherson et al. found increased ipsilateral activity (which primarily represents activation via the descending RST (Zaaimi et al., 2012)) when the paretic arm had reduced support compared to full support (McPherson et al., 2018).

      Reviewer #3 (Recommendations For The Authors):

      For Experiment 2, it is not immediately clear how the within-subject values are being pooled and compared across the different conditions. For instance, in the static perturbation trials, there are four blocks with 20 perturbation trials per block per arm (80 total per arm) with each location and direction once per block. For each participant, the comparison is between the location/direction that was most opposed (although this doesn't look accurately represented in Fig 7F). Therefore, the within-subject comparison is 4 trials per participant? Were these values averaged or pooled? It is a little odd that the SD for all the within-subjects trials are identical or nearly identical across conditions especially when looking at the example patient data in 7B and 7F.  

      Response 3.A.1. For static perturbation trials, the within-subject comparison involves 8 trials per participant: 4 trials corresponding to the perturbation direction/position combination with resting bias most opposed to the perturbation, and 4 trials corresponding to the perturbation direction/position combination with resting bias most aligned with the perturbation. These values were averaged for each individual. We have expanded our methods to make this part of our data analysis clear (lines 284-296) for all types of comparisons (unperturbed movement, pulse perturbation, static perturbations – now referred to as “release perturbation”).

      The across-subject SDs for the average resting forces for each one of these two conditions, shown in Figure 7F are indeed identical. This is due to how these two instances (most aligned vs. most resistive) were selected: because the perturbation directions come in pairs that exactly oppose each other (Figure 7B), if one were to select the position with the most opposing resting bias, that would mean that the combination with same position and the oppositely-directed perturbation would be the one with the most assistive resting bias. Hence the resting biases selected for the most opposing/assistive instances would be equal in magnitude and opposite to each other for each participant, as illustrated in Figure 7F, whereby the most-opposed bias for each individual is exactly opposite to the corresponding most-aligned bias for the same individual. We have added a brief commentary about this on the caption (lines 551-554), reproduced below:

      Note how the most-opposed resting bias for each patient is equal and opposite to the their mostaligned resting bias. This is because the same resting bias, when projected along the direction of two oppositely-directed perturbations (illustrated in C), it would oppose one with the same magnitude it would align with the other.

      Importantly, following suggestions by Reviewer 2 (see point 2.A.1), we now provide supplementary analyses that use the entirety of the relevant data, rather than the most extreme instances, which provide evidence supporting our main findings (Figures S5-2, S6-2, and S7-2).

      The printed colors in Figure 3 are very muddled and hard to read/interpret, especially in panel A. 

      Response 3.A.2. Thank you for pointing out this issue, also raised by Reviewer 1. We have adjusted the colors to be more distinct from each other and look clear both in print and on-screen, making use of dashed lines and stripes rather than different shades.

      I think it would improve readability and interpretation if Figure 8 and the results related to FM-UE were contained within the description of results for Experiment 1.

      Response 3.A.3. Thank you for this suggestion. This is actually a debate we had among ourselves earlier, and we can see merits to either ordering. It is very arguable that moving Figure 8 and the FMUE results within the rest of Experiment 1 may improve readability somewhat. However, we believe that presenting these results at the end better serves to illustrate the apparent paradox between the lack of direct connection between resting biases and active movement on one hand, and the relationship between resting biases and abnormal synergies on the other. We believe that this better sets the stage to present our conceptual model, which explains this paradox based on the role arm support plays in modulating the expression of both resting biases and abnormal synergies.

      Additional changes/corrections not outlined above

      Figure 1D displayed a right arm, but showed a target array (red dots) for a left arm paradigm. We now flip the target array shown for consistency.

      We corrected Figure 6C, which accidentally used an earlier definition of settling time which was based on lateral stabilization throughout the entire movement, rather focus on the period immediately following the pulse. The intended definition of settling time (as we had described in the Methods, lines 204-206 of original submission) focuses on lateral corrections specific to the pulse (rather than corrections when the participant approaches the endpoint) and better matches the one for settling time for the release (static) perturbation trials. Note that this change did not affect the (lack of) relationship between settling time and resting force bias, both across individuals (correlation plots now in Figure S6-1) and within individuals (now shown in the right part of panel 6D). Also in panel C, an error in the scaling for the maximum lateral deviation in the pulse direction (right side of the panel) is also now corrected.

      In addition, we made minor edits throughout the text to improve readability.

      References

      Albert ST, Hadjiosif AM, Jang J, Zimnik AJ, Soteropoulos DS, Baker SN, Churchland MM, Krakauer JW, Shadmehr R (2020) Postural control of arm and fingers through integration of movement commands. Elife 9:e52507.

      Avni I, Arac A, Binyamin-Netser R, Kramer S, Krakauer JW, Shmuelof L (2024) The Kinematics of 3D Arm Movements in Sub-Acute Stroke: Impaired Inter-Joint Coordination is Attributable to Both Weakness and Flexor Synergy Intrusion. Neurorehabil Neural Repair 38:646–658.

      Bourbonnais D, VANDEN NOVEN S, Carey KM, Rymer WZ (1989) Abnormal spatial patterns of elbow muscle activation in hemiparetic human subjects. Brain 112:85–102.

      Brunnstrom S (1966) Motor testing procedures in hemiplegia: based on sequential recovery stages. Phys Ther 46:357–375.

      Cortes JC, Goldsmith J, Harran MD, Xu J, Kim N, Schambra HM, Luft AR, Celnik P, Krakauer JW,

      Kitago T (2017) A Short and Distinct Time Window for Recovery of Arm Motor Control Early After Stroke Revealed With a Global Measure of Trajectory Kinematics. Neurorehabil Neural Repair 31:552–560.

      Duque J, Thonnard J, Vandermeeren Y, Sébire G, Cosnard G, Olivier E (2003) Correlation between impaired dexterity and corticospinal tract dysgenesis in congenital hemiplegia. Brain 126:732–747.

      Fine MS, Thoroughman KA (2006) Motor Adaptation to Single Force Pulses: Sensitive to Direction but Insensitive to Within-Movement Pulse Placement and Magnitude. J Neurophysiol 96:710–720.

      Ghez C, Scheidt R, Heijink H (2007) Different Learned Coordinate Frames for Planning Trajectories and Final Positions in Reaching. J Neurophysiol 98:3614–3626.

      Hadjiosif AM, Branscheidt M, Anaya MA, Runnalls KD, Keller J, Bastian AJ, Celnik PA, Krakauer JW (2022) Dissociation between abnormal motor synergies and impaired reaching dexterity after stroke. J Neurophysiol 127:856–868.

      Jayasinghe SA, Scheidt RA, Sainburg RL (2022) Neural Control of Stopping and Stabilizing the Arm. Front Integr Neurosci 16.

      Kanade-Mehta P, Bengtson M, Stoeckmann T, McGuire J, Ghez C, Scheidt RA (2023) Spatial mapping of posture-dependent resistance to passive displacement of the hypertonic arm post-stroke. J NeuroEngineering Rehabil 20:163.

      Lawrence DG, Kuypers HG (1968) The functional organization of the motor system in the monkey: II. The effects of lesions of the descending brain-stem pathways. Brain 91:15–36.

      Levin MF (1996) Interjoint coordination during pointing movements is disrupted in spastic hemiparesis. Brain 119:281–293.

      Lowrey CR, Bourke TC, Bagg SD, Dukelow SP, Scott SH (2019) A postural unloading task to assess fast corrective responses in the upper limb following stroke. J NeuroEngineering Rehabil 16:1–17.

      McPherson JG, Chen A, Ellis MD, Yao J, Heckman C, Dewald JP (2018) Progressive recruitment of contralesional cortico-reticulospinal pathways drives motor impairment post stroke. J Physiol 596:1211–1225.

      McPherson LM, Dewald JP (2022) Abnormal synergies and associated reactions post-hemiparetic stroke reflect muscle activation patterns of brainstem motor pathways. Front Neurol 13:934670.

      Porter R, Lemon R (1995) Corticospinal function and voluntary movement. Oxford University Press.

      Smith MA, Brandt J, Shadmehr R (2000) Motor disorder in Huntington’s disease begins as a dysfunction in error feedback control. Nature 403:544.

      Smith MA, Shadmehr R (2005) Intact ability to learn internal models of arm dynamics in Huntington’s disease but not cerebellar degeneration. J Neurophysiol 93:2809–2821.

      Tower SS (1940) Pyramidal lesion in the monkey. Brain 63:36–90.

      Twitchell TE (1951) The restoration of motor function following hemiplegia in man. Brain 74:443–480.

      Wilkins KB, Yao J, Owen M, Karbasforoushan H, Carmona C, Dewald JP (2020) Limited capacity for ipsilateral secondary motor areas to support hand function post-stroke. J Physiol 598:2153– 2167.

      Zaaimi B, Edgley SA, Soteropoulos DS, Baker SN (2012) Changes in descending motor pathway connectivity after corticospinal tract lesion in macaque monkey. Brain 135:2277–2289.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In their manuscript, Gerlevik et al. performed an integrative analysis of clinical, genetic and transcriptomic data to identify MDS subgroups with distinct outcomes. The study was based on the building of an "immunoscore" and then combined with genotype and clinical data to analyze patient outcomes using multi-omics factor analysis. 

      Strengths: Integrative analysis of RNA-seq, genotyping and clinical data 

      Weaknesses: Validation of the bioinformatic pipeline is incomplete 

      Major comments: 

      (1) This study considered two RNA-seq data sets publicly available and generated in two distinct laboratories. Are they comparable in terms of RNA-seq technique: polyA versus rRNA depletion, paired-end sequencing, fragment length? 

      We want to reemphasize that the main point of this study is not to compare the BMMNC with the HSPC cohort. These datasets are not comparable because they were

      collected from different cell types, and we should not expect them to be matched. We just analysed them in parallel to check how much HSPCs contribute to the molecular signatures we see in BMMNC samples. However, we agree with the reviewer that similar RNA-seq experimental techniques should be employed to control for confounding factors. Here is the information that we found for HSPC and BMMNC RNA-seq studies:

      HSPC RNA-seq cohort: Total RNA was extracted using TRIzol (Thermo Scientific), and Sequencing was performed on an Illumina HiSeq4000 with 100-bp paired-end reads.

      BMMNC RNA-seq cohort: The RNA was extracted with TRIzol reagent (Thermo Scientific). RNA-sequencing libraries were prepared from poly(A)-selected RNA and were sequenced using Illumina HiSeq 2000 or 2500 platform with 100-bp paired-end reads. 

      The only difference between the two cohorts is that one cohort includes total RNAs, whereas the other has polyA-selected RNAs. Since the gene set signatures use the expression of proteincoding genes, which all have polyA tails and are included in total RNA libraries, the analysis will not be affected by total vs. polyA-selected RNA-seq techniques. 

      (2) Data quality control (figure 1): the authors must show in a graph whether the features (dimensions) of factor 1 were available for each BMMNC and CD34+ samples.  

      By features of Factor 1, we think the reviewer means the features with high weights for Factor 1 in BMMNC and CD34+ samples. Figure 2c-d clearly illustrates the important features and their associations with Factor 1 for all samples in both cohorts. The samples are the columns of the two heatmaps.

      (3) How to validate the importance of "immunoscore"? If GSEA of RNA-seq data was performed in the entire cohort, in the SF3B1-mutated samples or SRSF2-mutated samples (instead of patients having a high versus low level of factor 1 shown in Sup Fig. 4), what would be the ranking of Hallmarks or Reactome inflammatory terms among the others? 

      Our GSEA analysis was an attempt to validate the importance of our identified factors. As described in the paper, Factor 1 represents a combination of immunology scores (or  “immunoscores”) in CD34+ cohort. Applying GSEA, we identified upregulation of inflammation related pathways, chemokines, and Neutrophils in patients having high (4th quartile) versus low (1st quartile) levels of Factor 1. Interestingly, sorting patients by Factor 1 resulted in similar pattern based on gene signature scores (Figure 2d).    

      To show that Factor1 generated by MOFA is important and different from known MDS categories such as SF3B1 and SRSF2 mutants, we performed GSEA in SF3B1-mutated vs. SF3B1-WT samples and SRSF2-mutated vs. SRSF2-WT samples in the CD34+ cohort. As shown in Author response image 1, we did not see the upregulation of inflammation and interferon pathways in SF3B1 and SRSF2 mutant MDS.

      Author response image 1.

      GSEA showed no upregulation of inflammation and interferon pathways for SF3B1 and SRSF2 mutant in CD34+ cohort.  

      (4) To decipher cell-type composition of BMMNC and CD34+ samples, the authors used van Galen's data (2019; supplementary table 3). Cell composition is expressed as the proportion of each cell population among the others. Surprisingly, the authors found that the promonocytelike score was increased in SF3B1-mutated samples and not in SRSF2-mutated samples, which are frequently co-mutated with TET2 and associated with a CMML-like phenotype. Is there a risk of bias if bone marrow subpopulations such as megakaryocytic-erythroid progenitors or early erythroid precursors are not considered? 

      We thank the reviewer for their insightful comment about CMML and the high prevalence of SRSF2 mutation (> 45%) in CMML cases. Using single-cell RNA sequencing and high-parameter flow cytometry, Ferrall-Fairbanks et al. (DOI: 10.1158/2643-3230.BCD-21-0217) recently showed that CMML can be classified into three differentiation trajectories: monocytic, megakaryocyte-erythroid progenitor (MEP), and normal-like. One hallmark of monocytic-biased trajectory was the enrichment of inflammatory granulocyte–macrophage progenitor (GMP)-like cells, which we observed through our analysis for SRSF2 mutants (Figure 6a).

      Unfortunately,  van Galen's data does not provide any gene set for MEP, and there is no singlecell RNA-seq atlas for MDS to employ to calculate the MEP score. Also, we compared the Promono-like and GMP-like gene sets from van Galen's data, and we could not find any overlap, meaning that Promono-like is not specific enough to capture the signatures coming from the more differentiated progenitors such as GMPs. Therefore, as described in the paper, we focused on GMP-like rather than Promono-like.

      (5) Figures 2a and 2b indicated that the nature of retrotransposons identified in BMMNC and CD34+ was dicerent. ERVs were not detected in CD34+ cells. Are ERVs not reactivated in CD34+ cells? Is there a bias in the sequencing or bioinformatic method?  

      As described above, the two cohorts' sequencing methods, read length, etc., are identical.

      CD34+ RNA-seq is total RNA-seq that includes both polyA and non-polyA RTE transcripts.

      Therefore, the chance of bias and missing RTE signatures in CD34+ cohort is very low. L1 and Alu, which are shared between the two cohorts, are the two RTE families that are still active and make new insertions in humans. Our interpretation is that ERV activation in BM is associated with immune cells. As shown by Au et al. (DOI: 10.1016/j.ccell.2021.10.001), several ERV loci had expression in purified immune cell subsets in renal cell carcinoma samples, potentially explaining ERV upregulation in tumours responding to treatment as those biopsies had increased tumour infiltration.

      (6) What is the impact of factor 1 on survival? Is it dicerent between BMMNC and CD34+ cells considering the distinct composition of factor 1 in CD34+ and BMMNC? 

      As shown in Table 1, Factor 1 in the BMMNC cohort is associated with overall survival (P-val < 0.05) when we did multivariate analysis but not univariate analysis. We did not observe any association between Factor 1 and event-free survival in the BMMNC cohort. Also, The 10 factors identified by MOFA in BM CD34+ cohort did not show any significance associated with MDS overall survival (Supplementary Table 5). 

      (7) In Figure 1e, genotype contributed to the variance of in the CD34+ cell analyses more importantly than in the BMMNC. Because the patients are dicerent in the two cohorts, dicerences in the variance could be explained either by a greater variability of the type of mutations in CD34 or an increased frequency of poor prognosis mutations in CD34+ compared to BMMNC. The genotyping data must be shown.  

      The genotype has already been reported in Supplementary Table 2. In fact, the number of inspected genes was much higher in the BMMNC cohort (17 genes) compared to the CD34+ cohort (3 genes). Therefore, we have more significant variability of the type of mutations in the BMMNC cohort compared to the CD34+ cohort. For the CD34+ cohort, we only had mutations for three spliceosome genes, where most cases (n=28) were SF3B1 mutants with good prognosis. We think that the result makes sense because the less genetic variability, the more homogenous groups and the more chance that one factor or a group of factors can explain the genetic variance.   

      (8) Fig. 2a-b: Features with high weight are shown for each factor. For factor 9, features seemed to have a low weight (Fig. 1b and 1c). However, factor 9 was predictive of EFS and OS in the BMMNC cohort. What are the features driving the prognostic value of factor 9? 

      As shown in Figure 3b, The main features are RTE expression from LTR:ERV1, SINE:MIR, and SINE:Alu family.  

      (9) The authors also provided microarray analyses of CD34+ cell. It could be interesting to test more broadly the correlation between features identified by RNA-seq or microarrays. 

      The microarray data did not come with any genetic information or clinical data except survival information. Therefore, we could not apply MOFA on Microarray data. However, we did generate gene signature scores from Microarray data and investigated the relationship between inflammatory chemokines and cytokines, and IFN-I signature scores with MDS survival (Figure 3c and 4c).    

      (10) The authors should discuss the relevance of immunosenescence features in the context of SRSF2 mutation and extend the discussion to the interest of their pipeline for patient diagnosis and follow up under treatments. 

      We have added the below text to the discussion:

      Recent studies have shown that the expression of programmed death-ligand 1 (PD-L1) protein is significantly elevated in senescent cells (DOIs: 10.1128/mcb.00171-22, 10.1172/JCI156250, 10.1038/s41586-022-05388-4). Increased PD-L1 protein levels protect senescent cells from being cleared by cytotoxic immune cells that express the PD-1 checkpoint receptor. In fact, activation of the PD-1 receptor inhibits the cytotoxic capabilities of CD8 + T and NK cells, increasing immunosenescence.   

      Notably, patients with MDS who possess particular somatic mutations, such as those in the TP53, ASXL1, SETBP1, TET2, SRSF2, and RUNX1 genes, have an increased propensity to react favourably to PD-1/PD-L1 inhibitors (DOIs: 10.1111/bjh.17689, https://doi.org/10.1182/blood2020-141100) confirming that many cellular and molecular mechanisms, known to promote cellular senescence, including alteration of splicing machinery, are crucial stimulators of the expression of PD-L1 protein. Interestingly, in our analysis, we also observed a correlation between the senescence gene signature score and the expression of the PD-L1 gene in CD34+ cells (Supplementary Figure 7), supporting the previous findings linking PD-L1 gene expression to cellular senescence.

      The immunology and ageing features extracted from the MDS transcriptomic data used in our analysis pipeline can enhance the conventional risk-scoring systems for MDS by providing new insights into this disease, particularly in the context of inflammation and ageing. For some patients, the clinical and genetic features may remain relatively the same until follow-up. Still, the transcriptomic features might differ considerably from the baseline diagnosis, affecting the course of treatment.    

      Reviewer #2 (Public Review): 

      The authors performed a Multi-Omics Factor Analysis (MOFA) on analysis of two published MDS patient cohorts-1 from bone marrow mononuclear cells (BMMNCs) and CD34 cells (ref 17) and another from CD34+ cells (ref 15) --with three data modalities (clinical, genotype, and transcriptomics). Seven different views, including immune profile, inflammation/aging, Retrotransposon (RTE) expression, and cell-type composition, were derived from these modalities to attempt to identify the latent factors with significant impact on MDS prognosis. 

      SF3B1 was found to be the only mutation among 13 mutations in the BMMNC cohort that indicated a significant association with high inflammation. This trend was also observed to a lesser extent in the CD34+ cohort. The MOFA factor representing inflammation showed a good prognosis for MDS patients with high inflammation. In contrast, SRSF2 mutant cases showed a granulocyte-monocyte progenitor (GMP) pattern and high levels of senescence, immunosenescence, and malignant myeloid cells, consistent with their poor prognosis. Also, MOFA identified RTE expression as a risk factor for MDS. They proposed that this work showed the efficacy of their integrative approach to assess MDS prognostic risk that 'goes beyond all the scoring systems described thus far for MDS'. 

      Several issues need clarification and response: 

      (1) The authors do not provide adequate known clinical and molecular information which demonstrates prognostic risk of their sample cohorts in order to determine whether their data and approach 'goes 'beyond all the scoring systems described thus far for MDS'. For example, what data have the authors that their features provide prognostic data independent of the prior known factors related to prognosis (eg, marrow blasts, mutational, cytogenetic features, ring sideroblasts, IPSS-R, IPSS-M, MDA-SS)? 

      We agree with the reviewer that we did not generate a new cumulative risk score and compare it with the conventional risk scores for MDS. However, we identified individual MOFA factors, which are risk or protective factors for MDS, based on survival analysis in the BMMNC cohort. One reason that we did not generate our independent, cumulative score and compare it with other scores was that we did not receive any conventional risk score for the BMMNC cohort. However, we had access to all the clinical and genetic variables from the BMMNC cohort (except for three patients) that were required to calculate IPSS-R; hence, we calculated the IPSS-R in our resubmission for the BMMNC cohort. We made three IPSS-R risk categories by combining low and very low as low risk, and high and very high as high risk, and keeping intermediate as intermediate risk. Our survival analysis of these three categories showed a clear match between IPSS-R score and MDS survival (Author response image 2a).

      We then investigated the relationship between factors 2, 4, and 9 from MOFA with three IPSS-R risk groups.  Integration of IPSS-R risk groups with factor values confirmed the finding in the manuscript that Factors 4 and 9 generally exert a protective influence over the MDS risk, whilst higher levels of Factor 2 predict a high-risk MDS (Author response image 2b). However, we see so many outliers in all three factors, indicating that some patients were assigned to the wrong IPSS-R categories because IPSS-R calculation is based on clinical and genetic variables and does not include the transcriptomics data for coding and non-coding genomic regions. 

      Author response image 2.

      Comparison of IPSS-R risk categories and MOFA risk and protective factors.

      (2) A major issue in analyzing this paper relates to the specific patient composition from whom the samples and data were obtained. The cells from the Shiozawa paper (ref 17) is comprised of a substantial number of CMML patients. Thus, what evidence have the authors that much of the data from the BMMNCs from these patients and mutant SRSF2 related predominantly to their monocytic dicerentiation state?  

      We thank the reviewer for the insightful comment about the monocytic differentiation state of CMML and SRSF2 mutant cases. The BMMNC cohort has 11 CMML and 17 SRSF2 mutant cases, of which six are shared between the two groups. We have divided the patients into four groups: CMML only, SRSF2 mutant only, CCML and SRSF2 mutant, and others. We have generated boxplots for all cellular composition gene signature scores for these groups and compared the scores between these groups. As explained above, Ferrall-Fairbanks et al. (DOI: 10.1158/2643-3230.BCD-21-0217) recently showed that CMML can be classified into three differentiation trajectories: monocytic, megakaryocyte-erythroid progenitor (MEP), and normal-like. One hallmark of monocytic-biased trajectory was the enrichment of inflammatory granulocyte–macrophage progenitor (GMP)-like cells, which we observed through our analysis for the CMML cases with SRSF2 mutation (Author response image 3.).

      Author response image 3.

      Cellular composition gene signature scores for CMML and SRSF2 mutant versus other cases. CMML cases with SRSF2 mutation show a significant higher level of GMP and GMP-like scores compared to other MDS cases.  

      (3) In addition, as the majority of patients in the Shiozawa paper have ring sideroblasts (n=59), thus potentially skewing the data toward consideration mainly of these patients, for whom better outcomes are well known.  

      We disagree with the reviewer. We used 94 BMMNC samples from Shiozawa’s paper, of which 19 cases had Refractory Anemia with Ring Sideroblasts (RARS), 4 cases had Refractory Anemia with Ring Sideroblasts and thrombocytosis (RARS-T), and 5 cases had Refractory cytopenia with multilineage dysplasia and ring sideroblasts (RCMD-RS). In total, we had 28 cases (~30%) with Ring Sideroblasts (RS), which are not large enough to skew the data.

      (4) Further, regarding this patient subset, what evidence have the authors that the importance of the SF3B1 mutation was merely related to the preponderance of sideroblastic patients from whom the samples were analyzed? 

      We had 34 SF3B1 mutant cases, of which 25 had Ring Sideroblasts (RS). The total number of cases with RS in the BMMNC cohort was 28. Therefore, the BMMNC cohort is not an RSdominant cohort, and RS cases did not include all SF3B1 mutants. Furthermore, it was recently shown by Ochi et al. (DOI: 10.1038/s41598-022-18921-2) that RS is a consequence of SF3B1K700E mutation, and it is not a cause to affect the SF3B1 importance.

      (5) An Erratum was reported for the Shiozawa paper (Shiozawa Y, Malcovati L, Gallì A, et al. Gene expression and risk of leukemic transformation in myelodysplasia. Blood. 2018 Aug 23;132(8):869-875. doi: 10.1182/blood-2018-07-863134) that resulted from a coding error in the construction of the logistic regression model for subgroup prediction based on the gene expression profiles of BMMNCs. This coding error was identified after the publication of the article. The authors should indicate the ecect this error may have had on the data they now report.  

      Thank you for bringing this important issue to our attention. The error resulted from a mistake in the construction of the logistic regression model for subgroup prediction based on the gene expression profiles of BMMNCs. However, this issue does not affect our result because we analysed the expression data from scratch and generated our own gene signature scores. Also, the error has no impact on the genetics and clinical information that we received from the authors.

      (6) What information have the authors as to whether the dicering RTE findings were not predominantly related to the dicerentiation state of the cell population analyzed (ie higher in BM MNCs vs CD34, Fig 1)? What control data have the authors regarding these values from normal (non-malignant) cell populations? 

      As described above, L1 and Alu, the two RTE families shared between the two cohorts, are still active and make new insertions in humans (Figure 2.a-b). Our interpretation is that ERV activation in BM is associated with immune cells. This interpretation is further supported by the findings of Au et al. (DOI: 10.1016/j.ccell.2021.10.001), where several ERV loci had expression in purified immune cell subsets in renal cell carcinoma samples. 

      Unfortunately, none of these two cohorts had normal (non-malignant) cell populations. We think that the MOFA unbiased way of modelling the heterogeneity is su@icient to capture the RTE derepressed phenotype of a subset of MDS cases compared to others, and we do not need normal cases to further support the finding. 

      (7) The statement in the Discussion regarding the ecects of SRSF2 mutation is speculative and should be avoided. Many other somatic gene mutations have known stronger ecects on prognosis for MDS. 

      One aim of this study is to identify specific immune signatures associated with SRSF2 and SF3B1 mutations, which are highly prevalent in MDS. Although other mutations, such as TP53, may have a stronger correlation with poor survival, numerous studies have demonstrated a clear link between SRSF2 mutations and poor prognosis.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      1) Line 99-100 The authors claimed that IQCH is a novel IQ motif-containing protein, which is essential for spermiogenesis and fertilization. However, it is not clear if the currently published paper named an ancient testis-specific IQ motif containing H gene that regulates specific transcript isoform expression during spermatogenesis.

      Response: Thanks to the reviewer’s comment. Yes, IQCH is the ancient testis-specific IQ motif containing H gene. According to the reviewer’s suggestion, we have revised the statement “Here, we revealed a testis-specific IQ motif containing H gene, IQCH, which is essential for spermiogenesis and fertilization” in Introduction part of revised manuscript.

      2) Line 154-159 Immunofluorescence staining for the marker of the acrosome (peanut agglutinin: PNA) as well as the mitochondrial marker (Transcription Factor A, Mitochondrial: TFAM) was performed to confirm the deficiency of the acrosomes and mitochondria in the proband's spermatozoa. It seems that the spermatozoa acrosomes and mitochondria were severely defective in the proband. The authors should indicate IQCH's role in mitochondrial and acrosome function and IQCH's role in mitochondrial and acrosome function these points by explaining how IQCH is related to mitochondrial and acrosome deficiency. In addition to staining, other functional analyses should be performed to strengthen the claim of acrosome and mitochondrial defects.

      Response: We appreciate the reviewer's valuable suggestion. Indeed, in our study, the results of multiomics analysis on WT and Iqch KO testes, including LC-MS/MS analysis, proteomic analysis, and RNA-seq analysis, found a potential role of IQCH in mitochondrial and acrosome function. GO analysis of these analysis indicated a significant enrichment in mitochondrial and acrosomal functions, including acrosomal vesicle, acrosome assembly, vesicle fusion with Golgi apparatus, mitochondrion organization, mitochondrial matrix, and so on. Among the enriched molecules, in particular, HNRNPK mainly expresses at Golgi phase and Cap phase (Biggiogera et al. 1993). ANXA7 is a calcium-dependent phospholipid-binding protein that is a negative regulator of mitochondrial apoptosis (Du et al. 2015). Loss of SLC25A4 results in mitochondrial energy metabolism defects in mice (Graham et al. 1997). Furthermore, we confirmed that IQCH interacted with HNRNPK, ANXA7, and SLC25A4 through Co-IP, and exhibited downregulation in the sperm of the Iqch KO mice by immunofluorescence and western blotting. Moreover, IQCH can bind to HNRPAB, which could influence the mRNAs level of Catsper-family, such as Catsper1, Catsper2, and Catsper3, which are crucial for acrosome development (Jin ZR et al). In addition, we also detected HNRPAB binding to Dnhd1, which affects mitochondria development (Tan C et al). Therefore, in addition to staining, the other functional analyses also have provided the evidence of acrosome and mitochondrial defects caused by IQCH absence.

      3) Line 180-182 IQCH knockout mice were generated. It is not clear why Mut-IQCH mice were not generated to be consistent with the human sequencing data.

      Response: Thanks for reviewer’s comments. To understand IQCH's impact on fecundity in mice, we employed CRISPR-Cas9 to generate mice encoding the orthologous variant of IQCH387+1_387+10del detected in humans. Regrettably, due to sequence complexity, the designed sgRNA's specificity and efficiency were low, hindering successful Iqch knock-in mouse construction. Considering IQCH387+1_387+10del results in absent expression, we pursued Iqch knockout mice to explore IQCH's role in spermatogenesis.

      4) Line 241.Figure 5A Gene Ontology (GO) analysis of the IQCH-bound proteins revealed a particular enrichment in fertilization, sperm axoneme assembly, mitochondrial organization, calcium channel, and RNA processing. But these GO functions are not shown in Figure 5A. The entire Figure 5 should be revised to enhance readability.

      Response: We sincerely apologize for the oversight. These GO functions were indeed identified during the analysis of IQCH-bound proteins. Regrettably, we unintentionally omitted these GO functions when creating the plots. We have revised the plots in Figure 5 in revised manuscript to enhance readability.

      5) Line 242 "33 ribosomal proteins were identified (Fig. 5B), indicating that IQCH might be involved in protein synthesis". The authors should perform an analysis to support the claim of protein synthesis defects.

      Response: Thanks to reviewer’s suggestions. Initially, we have supplemented Co-IP experiments to confirm the interaction between IQCH and three ribosomal proteins (RPL4, RPS3, and RPS7), chosen from a pool of 33 ribosomal proteins based on different protein scores (Figure R1). In addition, the proteomic analysis revealed 807 upregulated proteins and 1,186 downregulated proteins in KO mice compared to WT mice. We confirmed the key downregulated proteins by western blotting and immunofluorescence staining in the previous manuscript. These results indicated that IQCH might interact with ribosomal proteins to regulate protein expression. Naturally, the regulation of protein synthesis by IQCH requires further experiments for confirmation in future studies.

      Author response image 1.

      The interaction between IQCH and ribosomal proteins. Co-IP assays confirmed that IQCH interacted with RPL4, RPS3, and RPS7 in WT mouse sperm.

      6) Line 244 The authors mentioned too many GO functions without focus.

      Response: Following reviewer’s suggestions, we have simplified IQCH-associated GO functions in the revised manuscript.

      7) Figure 6, there are no negative controls in all co-IP experiments. Band sizes are not marked. Thus, all data can't be evaluated. This also raises concern about whether the LC-MS/MS experiment to identify IQCH interacting protein was well-controlled? All co-IP experiments were poorly designed to draw any conclusion.

      Response: Thanks to reviewer’s comments. We have supplemented negative controls in all Co-IP experiments and provided band sizes in Figure 6 in revised manuscript.

      8) The authors mentioned that IQCH can bind to CaM. But they didn't detect CaM protein in Figure 5. Did the LC-MS/MS experiment really work?

      Response: Thanks to reviewer’s comments. We detected the interaction of CaM protein with IQCH in the LC-MS/MS experiment analysis, which has been submitted as new Data S1 in the revised manuscript. We also confirmed their binding in mouse sperm by Co-IP experiment and immunofluorescence staining, which results were shown in Figure 6 and Figure S10 in the previous study.

      9) Figure 6D. Because IQCH is lost in Iqch KO sperm, what is the point of showing in the Co-IP assay that CaM does not bind to IQCH in Iqch KO sperm?

      Response: Following reviewer’s suggestions, we have deleted the results of Co-IP assay that CaM could not bind to IQCH in Iqch KO sperm.

      10) Figure 6E. The Co-IP assay does not support the authors' claim that the decreased expression of HNRPAB was due to the reduced binding of IQCH and CaM by the knockout of IQCH or CaM.

      Response: Thanks to reviewer’s expert comments. Indeed, the results of Figure 6E confirmed the interaction of IQCH and CaM in K562 cells, and also showed that the expression of HNRPAB was reduced when IQCH or CaM was knocked down, suggesting that IQCH or CaM might regulate HNRPAB expression. While in Figure 6F, the downregulation of HNRPAB caused by knocking down IQCH (or CaM) cannot be rescued when overexpressed CaM (or IQCH), indicating that CaM (or IQCH) cannot mediate HNRPAB expression alone. Therefore, the reduced expression of HNRPAB in Figure 6E might result from the weakened interaction between IQCH and CaM, but not a superficial downregulation of IQCH or CaM expression. To avoid the confusion, we have modified the relevant description in the revied manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      1) Lines 117 and 129: Please provide the reference number (NM_xxx.x) for the IQCH isoform that was used to interpret this variant. This is key information. Also, please provide the predicted truncation consequence caused by this splicing variant to IQCH protein.

      Response: Thanks to reviewer’s suggestions. We have added reference number (NM_0010317152) of IQCH in manuscript. We employed splice site prediction tools, such as SpliceAI, RDDC, and varSEAK, to assess the expression consequences of this IQCH splicing variant. These tools couldn't anticipate the outcome of this splicing variant. However, the results of minigene splicing assay showed that the IQCH c.387+1_387+10del resulted in degradation of IQCH.

      2) Figure 1A: The deleted sequence indicated by the red box does not match IQCH c.387+1_387+10del. Please show a plot of the exon-intron boundary under the Sanger sequencing results of the WT allele.

      Response: Thanks to reviewer’s suggestions. We are sorry for the use of non-standard descriptions about the results of Sanger sequencing. According to the HGVS nomenclature (Figure R2), we have modified the red box to match IQCH c.387+1_387+10del and have added the exon-intron boundary in Figure 1A accordingly.

      Author response image 2.

      HGVS nomenclature description of the IQCH variant. The picture showed a detailed HGVS nomenclature description of IQCH c.387+1_387+10del.

      Minor comments:

      a) Manuscript title: It is suggested to change the title to "IQCH regulates spermatogenesis by interacting with CaM to promote the expression of RNA-binding proteins".

      Response: According to reviewer’s suggestions, we have modified the title as “IQCH regulates spermatogenesis by interacting with CaM to promote the expression of RNA-binding proteins”.

      b) Line 116: Please introduce the abbreviation WES. Also, please introduce the other abbreviations (such as WT, SEM, TEM, etc.) the first time they appear.

      Response: Thanks to reviewer’s suggestions. We have provided the full explanations for all abbreviations upon their initial appearance.

      c) Line 140, "Nonfunctional IQCH": Due to "the lack of IQCH expression" in Line 137, should "Nonfunctional IQCH" be changed into "IQCH deficiency"?

      Response: Thanks for reviewer’s the detailed review. We have modified this title in Results part of the revised manuscript as followed: “IQCH deficiency leads to sperm with cracked axoneme structures accompanied by defects in the acrosome and mitochondria”

      d) The information on the following references is incomplete: Sechi et al., Tian et al., Wang et al., and Xu et al. Please provide issue/page/article numbers.

      Response: We are sorry for our oversight. We have provided the missing issue/page/article numbers for the references.

      e) The title of Figure 1: Please emphasize that the male infertile-associated variant is "homozygous".

      Response: Thanks to reviewer’s suggestions. We have revised the title of Figure 1 to emphasize the homozygous variant as follows: “Identification of a homozygous splicing mutation in IQCH in a consanguineous family with male infertility”.

      f) Table 1: Please provide the reference paper for the normal values. Response: We appreciate the reviewer's detailed checks. We have provided the reference paper for the normal values in Table 1.

      g) Figure 5F is distorted. Please make sure that it is a perfect circle.

      Response: Thanks to reviewer’s suggestions. We have revised both the graphical representation and layout of Figure 5 in revised manuscript to make sure the readability.

      Reviewer #3 (Recommendations For The Authors):

      While the writing is generally clear, there are multiple examples of where the writing could be improved for clarity.

      1) While some terms are defined throughout the manuscript, many abbreviations are not defined upon their first mention, such as WES, RT-PCR, TYH, HTF, KSOM, KEGG, RIPA, PMSE, SDS-PAGE, H&L, and HRP.

      Response: Thanks to reviewer’s suggestions. We have provided the full explanations for all abbreviations upon their initial appearance.

      2) On line 44, the claim that spermatogenesis is the "most complex biological process" is rather subjective and hard to support with concrete data.

      Response: Thanks to reviewer’s suggestions. We have modified this description in the Introduction section as follow: “Spermatogenesis is one of the most complex biological process in male organisms and functions to produce mature spermatozoa from spermatogonia in three phases: (i) spermatocytogenesis (mitosis), (ii) meiosis, and (iii) spermiogenesis.”

      3) On line 54, I think the authors meant "heterogeneous," not "heterologous."

      Response: Thanks to reviewer’s comment. We have changed “heterologous” into “heterogeneous”.

      4) On line 156, I think the authors meant "deficiency," not "deficient."

      Response: Thanks to reviewer’s comment. We are sorry to make this mistake. We have made the correction in the revised version of the manuscript.

      5) On line 300, K562 cells are mentioned, but neither in the Methods nor the Results are any details about the biological origin of these cells (or rationale for their use other than co-expression of IQCH and CaM) provided.

      Response: Thanks to reviewer’s suggestion. K562 cell line is a human leukemia cell line and is enriched in the expression of IQCH and CaM, we thus opted to use this cell line for an easier knockdown of IQCH and CaM. We have supplemented the details about the biological origin of these cells in Method section of revised manuscript.

      6) For the Results section describing Figure 6H, it would be nice to provide some explanation of the results of ICHQ overexpression alone relative to control situations and not just relative to the delta-IQ version or relative to simultaneous CaM manipulation.

      Response: According to the reviewer’s suggestion, we have supplemented the co-transfection of control and CaM plasmids in HEK293T cells, and the results showed that the expression of HNRPAB in cells co-transfected with control and CaM plasmids was similar to that of co-transfected with IQCH (△IQ) /CaM plasmids, but was lower than that in the cells overexpressing the WT-IQCH and CaM plasmids, confirming the nonfunction of IQCH (△IQ) plasmids. We have shown the results in Figure 6H in the revised manuscript.

      7) The sentence on lines 352-354 is confusing.

      Response: We apologize for any confusion caused by the sentence in question. We have revisited the sentence and made appropriate revisions to enhance its clarity as follows: “Our findings suggest that the fertilization function is the main action of IQ motif-containing proteins, while each specific IQ motif-containing protein also has its own distinct role in spermatogenesis.”

      8) The use of "employee" on line 371 is awkward and not very scientific.

      Response: Thanks to reviewer’s comment. We have changed “employee” in to “downstream effector protein” on line 376

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides an important cell atlas of the gill of the mussel Gigantidas platifrons using a single nucleus RNA-seq dataset, a resource for the community of scientists studying deep sea physiology and metabolism and intracellular host-symbiont relationships. The work, which offers solid insights into cellular responses to starvation stress and molecular mechanisms behind deep-sea chemosymbiosis, is of relevance to scientists interested in host-symbiont relationships across ecosystems.

      Public Reviews:

      Reviewer #1 (Public Review):

      Wang et al have constructed a comprehensive single nucleus atlas for the gills of the deep sea Bathymodioline mussels, which possess intracellular symbionts that provide a key source of carbon and allow them to live in these extreme environments. They provide annotations of the different cell states within the gills, shedding light on how multiple cell types cooperate to give rise to the emergent functions of the composite tissues and the gills as a whole. They pay special attention to characterizing the bacteriocyte cell populations and identifying sets of genes that may play a role in their interaction with the symbiotes.

      Wang et al sample mussels from 3 different environments: animals from their native methane-rich environment, animals transplanted to a methane-poor environment to induce starvation, and animals that have been starved in the methane-poor environment and then moved back to the methane-rich environment. They demonstrated that starvation had the biggest impact on bacteriocyte transcriptomes. They hypothesize that the upregulation of genes associated with lysosomal digestion leads to the digestion of the intracellular symbiont during starvation, while the non-starved and reacclimated groups more readily harvest the nutrients from symbiotes without destroying them.

      Strengths:

      This paper makes available a high-quality dataset that is of interest to many disciplines of biology. The unique qualities of this non-model organism and the collection of conditions sampled make it of special interest to those studying deep sea adaptation, the impact of environmental perturbation on Bathymodioline mussels populations, and intracellular symbiotes. The authors do an excellent job of making all their data and analysis available, making this not only an important dataset but a readily accessible and understandable one.

      The authors also use a diverse array of tools to explore their data. For example, the quality of the data is augmented by the use of in situ hybridizations to validate cluster identity and KEGG analysis provides key insights into how the transcriptomes of bacteriocytes change.

      The authors also do a great job of providing diagrams and schematics to help orient non-mussel experts, thereby widening the audience of the paper.

      Thank the reviewer for the valuable feedback on our study. We are grateful that the reviewers found our work to be interesting and we appreciate their thorough evaluation of our research. Their constructive comments will be considered as we continue to develop and improve our study.

      Weaknesses:

      One of the main weaknesses of this paper is the lack of coherence between the images and the text, with some parts of the figures never being referenced in the body of the text. This makes it difficult for the reader to interpret how they fit in with the author's discussion and assess confidence in their analysis and interpretation of data. This is especially apparent in the cluster annotation section of the paper.

      We appreciate the feedback and suggestions provided by the reviewer, and we have revised our manuscript to make it more accessible to general audiences.

      Another concern is the linking of the transcriptomic shifts associated with starvation with changes in interactions with the symbiotes. Without examining and comparing the symbiote population between the different samples, it cannot be concluded that the transcriptomic shifts correlate with a shift to the 'milking' pathway and not other environmental factors. Without comparing the symbiote abundance between samples, it is difficult to disentangle changes in cell state that are due to their changing interactions with the symbiotes from other environmental factors.

      We are grateful for the valuable feedback and suggestions provided by the reviewer. Our keen interest lies in understanding symbiont responses, particularly at the single-cell level. However, it's worth noting that existing commercial single-cell RNA-seq technologies rely on oligo dT priming for reverse transcription and barcoding, thus omitting bacterial gene expression information from our dataset. We hope that advancements in technology will soon enable us to perform an integrated analysis encompassing both host and symbiont gene expression.

      Additionally, conclusions in this area are further complicated by using only snRNA-seq to study intracellular processes. This is limiting since cytoplasmic mRNA is excluded and only nuclear reads are sequenced after the organisms have had several days to acclimate to their environment and major transcriptomic shifts have occurred.

      We appreciate the comments shared by the reviewer and agree that scRNA-seq provides more comprehensive transcriptional information by targeting the entire mRNA of the cell. However, we would like to highlight that snRNA-seq has some unique advantages over scRNA-seq. Notably, snRNA-seq allows for simple snap-freezing of collected samples, facilitating easier storage, particularly for samples obtained during field trips involving deep-sea animals and other ecologically significant non-model animal samples. Additionally, unlike scRNA-seq, snRNA-seq eliminates the need for tissue dissociation, which often involves prolonged enzymatic treatment of deep-sea animal tissue/cells under atmospheric pressure. This process can potentially lead to the loss of sensitive cells or alterations in gene expression. Moreover, snRNA-seq procedures disregard the size and shape of animal cells, rendering it a superior technology for constructing the cell atlas of animal tissues. Consequently, we assert that snRNA-seq offers flexibility and represents a suitable choice for the research objects of our current research.

      Reviewer #2 (Public Review):

      Wang, He et al. shed insight into the molecular mechanisms of deep-sea chemosymbiosis at the single-cell level. They do so by producing a comprehensive cell atlas of the gill of Gigantidas platifrons, a chemosymbiotic mussel that dominates the deep-sea ecosystem. They uncover novel cell types and find that the gene expression of bacteriocytes, the symbiont-hosting cells, supports two hypotheses of host-symbiont interactions: the "farming" pathway, where symbionts are directly digested, and the "milking" pathway, where nutrients released by the symbionts are used by the host. They perform an in situ transplantation experiment in the deep sea and reveal transitional changes in gene expression that support a model where starvation stress induces bacteriocytes to "farm" their symbionts, while recovery leads to the restoration of the "farming" and "milking" pathways.

      A major strength of this study includes the successful application of advanced single-nucleus techniques to a non-model, deep-sea organism that remains challenging to sample. I also applaud the authors for performing an in situ transplantation experiment in a deep-sea environment. From gene expression profiles, the authors deftly provide a rich functional description of G. platifrons cell types that is well-contextualized within the unique biology of chemosymbiosis. These findings offer significant insight into the molecular mechanisms of deep-sea host-symbiont ecology, and will serve as a valuable resource for future studies into the striking biology of G. platifrons.

      The authors' conclusions are generally well-supported by their results. However, I recognize that the difficulty of obtaining deep-sea specimens may have impacted experimental design. In this area, I would appreciate more in-depth discussion of these impacts when interpreting the data.

      Thank the reviewer for their valuable feedback on our study. We're grateful that the reviewers found our work interesting, and we appreciate their thorough evaluation of our research. We'll consider their constructive comments as we continue to develop and improve our study.

      Because cells from multiple individuals were combined before sequencing, the in situ transplantation experiment lacks clear biological replicates. This may potentially result in technical variation (ie. batch effects) confounding biological variation, directly impacting the interpretation of observed changes between the Fanmao, Reconstitution, and Starvation conditions. It is notable that Fanmao cells were much more sparsely sampled. It appears that fewer cells were sequenced, resulting in the Starvation and Reconstitution conditions having 2-3x more cells after doublet filtering. It is not clear whether this is due to a technical factor impacting sequencing or whether these numbers are the result of the unique biology of Fanmao cells. Furthermore, from Table S19 it appears that while 98% of Fanmao cells survived doublet filtering, only ~40% and ~70% survived for the Starvation and Reconstitution conditions respectively, suggesting some kind of distinction in quality or approach.

      There is a pronounced divergence in the relative proportions of cells per cell type cluster in Fanmao compared to Reconstitution and Starvation (Fig. S11). This is potentially a very interesting finding, but it is difficult to know if these differences are the expected biological outcome of the experiment or the fact that Fanmao cells are much more sparsely sampled. The study also finds notable differences in gene expression between Fanmao and the other two conditions- a key finding is that bacteriocytes had the largest Fanmao-vs-starvation distance (Fig. 6B). But it is also notable that for every cell type, one or both comparisons against Fanmao produced greater distances than comparisons between Starvation and Reconstitution (Fig. 6B). Again, it is difficult to interpret whether Fanmao's distinctiveness from the other two conditions is underlain by fascinating biology or technical batch effects. Without biological replicates, it remains challenging to disentangle the two.

      As highlighted by the reviewer, our experimental design involves pooling multiple biological samples within a single treatment state before sequencing. We acknowledge the concern regarding the absence of distinct biological replicates and the potential impact of batch effects on result interpretation. While we recognize the merit of conducting multiple sequencing runs for a single treatment to provide genuine biological replicates, we contend that batch effects may not exert a strong influence on the observed patterns.

      In addition, we applied a bootstrap sampling algorithm to assess whether the gene expression patterns within a cluster are more similar than those between clusters. This algorithm involves selecting a portion of cells per cluster and examining whether this subset remains distinguishable from other clusters. Our assumption was that if different samples exhibited distinct expression patterns due to batch effect, the co-assignment probabilities of a cluster would be very low. This expectation was not met in our data, as illustrated in Fig. S2. The lack of significantly low co-assignment probabilities within clusters suggests that batch effects may not exert a strong influence on our results.

      Indeed, we acknowledge a noticeable shift in the expression patterns of certain cell types, such as the bacteriocyte. However, this is not universally applicable across all cell types. For instance, the UMAP figure in Fig. 6A illustrates a substantial overlap among basal membrane cell 2 from Fanmao, Starvation, and Reconstitution treatments, and the centroid distances between the three treatments are subtle, as depicted in Fig. 6B. This consistent pattern is also observed in DEPC, smooth muscle cells, and the food groove ciliary cells.

      The reviewer also noted variations in the number of cells per treatment. Specifically, Fanmao sequencing yielded fewer than 10 thousand cells, whereas the other two treatments produced 2-3 times more cells after quality control (QC). It is highly probable that the technician loaded different quantities of cells into the machine for single-nucleus sequencing—a not uncommon occurrence in this methodology. While loading more cells may increase the likelihood of doublets, it is crucial to emphasize that this should not significantly impact the expression patterns post-QC. It's worth noting that overloading samples has been employed as a strategic approach to capture rare cell types, as discussed in a previous study (reference: 10.1126/science.aay0267).

      The reviewer highlighted the discrepancy in cell survival rates during the 'doublet filtering' process, with 98% of Fanmao cells surviving compared to approximately 40% and 70% for the Starvation and Reconstitution conditions, respectively. It's important to clarify that the reported percentages reflect the survival of cells through a multi-step QC process employing various filtering strategies.

      Post-doublet removal, we filtered out cells with <100 or >2500 genes and <100 or >6000 unique molecular identifiers (UMIs). Additionally, genes with <10 UMIs in each data matrix were excluded. The observed differences in survival rates for Starvation and Reconstitution cells can be attributed to the total volume of data generated in Illumina sequencing. Specifically, we sequenced approximately 91 GB of data for Fanmao, ~196 GB for Starvation, and ~249 GB for Reconstitution. As a result, the qualified data obtained for Starvation and Reconstitution conditions was only about twice that of Fanmao due to the limited data volume.

      The reviewer also observed a divergence in the relative proportions of cells per cell type cluster in Fanmao compared to Reconstitution and Starvation, as depicted in Fig. S1. This discrepancy may hold true biological significance, presenting a potentially intriguing finding. However, our discussion on this pattern was rather brief, as we acknowledge that the observed differences could be influenced by the sample preparation process for dissection and digestion. It is crucial to consider that cutting a slightly different area during dissection may result in variations in the proportion of cells obtained. While we recognize the potential impact of this factor, we do not think that the sparsity of sampling alone could significantly affect the relative proportions of cells per cell type.

      In conclusion, we acknowledge the reviewer's suggestion that sequencing multiple individual samples per treatment condition would have been ideal, rather than pooling them together. However, the homogenous distribution observed in UMAP and the consistent results obtained from bootstrap sampling suggest that the impact of batch effects on our analyses is likely not substantial. Additionally, based on our understanding, the smaller number of cells in the Fanmao sample should not have any significant effect on the resulting different proportion of cells or the expression patterns per each cluster.

      Reviewer #3 (Public Review):

      Wang et al. explored the unique biology of the deep-sea mussel Gigantidas platifrons to understand the fundamental principles of animal-symbiont relationships. They used single-nucleus RNA sequencing and validation and visualization of many of the important cellular and molecular players that allow these organisms to survive in the deep sea. They demonstrate that a diversity of cell types that support the structure and function of the gill including bacteriocytes, specialized epithelial cells that host sulfur-oxidizing or methane-oxidizing symbionts as well as a suite of other cell types including supportive cells, ciliary, and smooth muscle cells. By performing experiments of transplanting mussels from one habitat which is rich in methane to methane-limited environments, the authors showed that starved mussels may consume endosymbionts versus in methane-rich environments upregulated genes involved in glutamate synthesis. These data add to the growing body of literature that organisms control their endosymbionts in response to environmental change.

      The conclusions of the data are well supported. The authors adapted a technique that would have been technically impossible in their field environment by preserving the tissue and then performing nuclear isolation after the fact. The use of single-nucleus sequencing opens the possibility of new cellular and molecular biology that is not possible to study in the field. Additionally, the in-situ data (both WISH and FISH) are high-quality and easy to interpret. The use of cell-type-specific markers along with a symbiont-specific probe was effective. Finally, the SEM and TEM were used convincingly for specific purposes in the case of showing the cilia that may support water movement.

      We appreciate the valuable feedback provided by the reviewer on our study. It is encouraging to know that our work was found to be interesting and that they conducted a thorough evaluation of our research. We will take their constructive comments into account as we strive to develop and enhance our study. Thank the reviewer for all the input.

      The one particular area for clarification and improvement surrounds the concept of a proliferative progenitor population within the gill. The authors imply that three types of proliferative cells within gills have long been known, but their study may be the first to recover molecular markers for these putative populations. The markers the authors present for gill posterior end budding zone cells (PEBZCs) and dorsal end proliferation cells (DEPCs) are not intuitively associated with cell proliferation and some additional exploration of the data could be performed to strengthen the argument that these are indeed proliferative cells. The authors do utilize a trajectory analysis tool called Slingshot which they claim may suggest that PEBZCs could be the origin of all gill epithelial cells, however, one of the assumptions of this analysis is that differentiated cells are developed from the same precursor PEBZC population.

      However, these conclusions do not detract from the overall significance of the work of identifying the relationship between symbionts and bacteriocytes and how these host bacteriocytes modulate their gene expression in response to environmental change. It will be interesting to see how similar or different these data are across animal phyla. For instance, the work of symbiosis in cnidarians may converge on similar principles or there may be independent ways in which organisms have been able to solve these problems.

      We are grateful for the valuable comments and suggestions provided by the reviewer. All suggestions have been carefully considered, and the manuscript has been revised accordingly. We particularly value the reviewer's insights regarding the characterization of the G. platifrons gill proliferative cell populations. In a separate research endeavor, we have conducted experiments utilizing both cell division and cell proliferation markers on these proliferative cell populations. While these results are not incorporated into the current manuscript, we would be delighted to share our preliminary findings with the reviewer. Our preliminary results indicate that the proliferative cell populations exhibit positivity for cell proliferation markers and contain a significant number of mitotic cells..

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Further experiments are needed to link the changes in transcriptomes of Bathymodioline mussels in the different environmental conditions to changes in their interactions with symbiotes. For example, quantifying the abundance and comparing the morphology of symbiotes between the environmental conditions would lend much support for shifting between milking and farming strategies. Without analyzing the symbiotes and comparing them across populations, it is difficult to comment on the mechanisms of interactions between symbiotes and the hosts. Without this analysis, this data is better suited towards comments about the general effect of environmental perturbation and stress on gene expression in these mussels.

      We appreciate the reviewer’s comments. We are also very curious about the symbiont responses, especially at the single-cell level. However, all the current commercial single-cell RNA-seq technologies are based on oligo dT priming for reverse transcription and barcoding. Therefore, the bacterial gene expression information is omitted from our dataset. Hopefully, with the development of technology, we could conduct an integrated analysis of both host and symbiont gene expression soon.

      Additionally, clarification is needed on which types of symbiotes are being looked at. Are they MOX or SOX populations? Are they homogenous? What are the concentrations of sulfur at the sampled sites?

      We thank you for your valuable comments and suggestions. Gigantidas platifrons harbors a MOX endosymbiont population characterized by a single 16S rRNA phylotype. We apologize for any confusion resulting from our previous wording. To clarify, we have revised lines 57-59 of our introduction

      In the text and images, consider using standardized gene names and leaving out the genome coordinates. This would greatly help with readability. Also, be careful to properly follow gene naming and formatting conventions (ie italicizing gene names and symbols).

      We appreciate the reviewer’s insightful comments. In model animals, gene nomenclature often stems from forward genetic approaches, such as the identification of loss-of-function mutants. These gene names, along with their protein products, typically correspond to unique genome coordinates. Conversely, in non-model invertebrates (e.g., Gigantidas platifrons of present study), gene prediction relies on a combination of bioinformatics methods, including de novo prediction, homolog-based prediction, and transcriptomics mapping. Subsequently, the genes are annotated by identifying their best homologs in well-characterized databases. Given that different genes may encode proteins with similar annotated functions, we chose to include both the gene ID (genome coordinates) and the gene name in our manuscript. This dual labeling approach ensures that our audience receives accurate and comprehensive information regarding gene identification and annotation.

      Additionally, extending KEGG analysis to the atlas annotation section could help strengthen the confidence of annotations. For example, when identifying bacteriocyte populations, the functional categories of individual marker genes (lysosomal proteases, lysosomal traffic regulators, etc) are used to justify the annotation. Presenting KEGG support that these functional categories are upregulated in this population relative to others would help further support how you characterize this cluster by showing it's not just a few specific genes that are enriched in this cell group, but rather an overall functionality.

      We appreciate the valuable suggestion provided by the reviewer. Indeed, incorporating KEGG analysis into the atlas annotation section could further enhance the confidence in our annotations. However, in our study, we encountered some limitations that impeded us from conducting a comprehensive KEGG enrichment analysis.

      Firstly, the number of differentially expressed genes (DEGs) that we identified for certain cell populations was relatively small, making it challenging to meet the threshold required for meaningful KEGG enrichment analysis. For instance, among the 97 marker genes identified for the Bacteriocyte cluster, only two genes, Bpl_scaf_59648-4.5 (lysosomal alpha-glucosidase-like) and Bpl_scaf_52809-1.6 (lysosomal-trafficking regulator-like isoform X1), were identified as lysosomal genes. To generate reliable KEGG enrichments, a larger number of genes is typically required.

      Secondly, single-nucleus sequencing, as employed in our study, tends to yield a relatively smaller number of genes per cell compared to bulk RNA sequencing. This limited gene yield can make it challenging to achieve sufficient gene representation for rigorous KEGG enrichment analysis.

      Furthermore, many genes in the genome still lack comprehensive annotation, both in terms of KEGG and GO annotations. In our dataset, out of the 33,584 genes obtained through single-nuclei sequencing, 26,514 genes have NO KEGG annotation, and 25,087 genes have NO GO annotation. This lack of annotations further restricts the comprehensive application of KEGG analysis in our study.

      The claim that VEPCs are symbiote free is not demonstrated. Additional double in situs are needed to show that markers of this cell type localize in regions free of symbiotes.

      We appreciate your comments and suggestions. In Figure 5B, our results demonstrate that the bacteriocytes (green fluorescent signal) are distant from the VEPCs, which are located around the tip of the gill filaments (close to the food groove). We have revised our Figure 5B to make it clear.

      Additionally, it does not seem like trajectory analysis is appropriate for these sampling conditions. Generally, to create trajectories confidently, more closely sampled time points are needed to sufficiently parse out the changes in expression. More justification is needed for the use of this type of analysis here and a discussion of the limitations should be mentioned, especially when discussing the hypotheses relating to PEBZCs, VEPCs, and DEPCs.

      We greatly appreciate your thoughtful commentary. It is important to acknowledge that in the context of a developmental study, incorporating more closely spaced time points indeed holds great value. In our ongoing project investigating mouse development, for instance, we have implemented time points at 24-hour intervals. However, in the case of deep-sea adult animals, we hypothesized a slower transcriptional shift in such extreme environment, which led us to opt for a time interval of 3-7 days. Examining the differential expression profiles among the three treatments, we observed that most cell types exhibited minimal changes in their expression profiles. For the cell types strongly impacted by in situ transplantation, their expression profiles per cell type still exhibited highly overlap in the UMAP analysis (Figure 6a), thus enabling meaningful comparisons. Nevertheless, we recognize that our sampling strategy may not be flawless. Additionally, the challenging nature of conducting in situ transplantation in 1000-meter depths limited the number of sampling occasions available to us. We sincerely appreciate your input and understanding.

      Finally, more detail should be added on the computational methods used in this paper. For example, the single-cell genomics analysis protocol should be expanded on so that readers unfamiliar with BD single-cell genomics handbooks could replicate the analysis. More detail is also needed on what criteria and cutoffs were used to calculate marker genes. Also, please be careful to cite the algorithms and software packages mentioned in the text.

      Acknowledged, thank you for highlighting this. In essence, the workflow closely resembles that of the 10x Genomics workflow (despite the use of a different software, i.e., Cell Ranger). We better explain the workflow below, and also noting that this information may no longer be relevant for newer users of BD or individuals who are not acquainted with BD, given that the workflow underwent a complete overhaul in the summer of 2023.

      References to lines

      Line 32: typo "..uncovered unknown tissue heterogeny" should read "uncovering" or "and uncovered")

      Overall abstract could include more detail of findings (ex: what are the "shifts in cell state" in line 36 that were observed)

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 60: missing comma "...gill filament structure, but also"

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 62-63: further discussion here, or in the relevant sections of the specific genes identified in the referenced bulk RNA-seq project could help strengthen confidence in annotation

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 112: what bootstrapping strategy? Applied to what?

      This is a bootstrap sampling algorithm to assess the robustness of each cell cluster developed in a recent biorxiv paper. (Singh, P. & Zhai, Y. Deciphering Hematopoiesis at single cell level through the lens of reduced dimensions. bioRxiv, 2022.2006.2007.495099 (2022). https://doi.org:10.1101/2022.06.07.495099)

      Lines 127-129: What figures demonstrate the location of the inter lamina cells? Are there in situs that show this?

      We apologize for any errors; the referencing of figures in the manuscript has been revised for clarity

      Lines 185-190: does literature support these as markers of SMCs? Are they known smooth muscle markers in other systems?

      We characterized the SMCs by the expression of LDL-associated protein, angiotensin-converting enzyme-like protein, and the "molecular spring" titin-like protein, all of which are commonly found in human vascular smooth muscle cells. Based on this analysis, we hypothesize that these cells belong to the smooth muscle cell category.

      Line 201: What is meant by "regulatory roles"?

      In this context, we are discussing the expression of genes encoding regulatory proteins, such as SOX transcription factors and secreted-frizzled proteins.

      Line 211: which markers disappeared? What in situs show this?

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 211: typo, "role" → "roll"

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 214: what are these "hallmark genes"

      We apologize for the mistakes, here we are referring to the genes listed in figure 4B. We have revised the manuscript accordingly.

      Line 220: are there meristem-like cells in metazoans? If so, this would be preferable to a comparison with plants.

      In this context, we are discussing the morphological characteristics of gill proliferative cell populations found in filibranch bivalves. These populations, namely PEPC, VEPC, and DEPC, consist of cells exhibiting morphological traits akin to those of plant cambial-zone meristem cells. These cells typically display small, round shapes with a high nucleus-to-plasma ratio. We acknowledge that while these terms are utilized in bivalve studies (citations below), they lack the robust support seen in model systems backed by molecular biology evidences. The present snRNA-seq data, however, may offer valuable cell markers for future comprehensive investigations.

      Leibson, N. L. & Movchan, O. T. Cambial zones in gills of Bivalvia. Mar. Biol. 31, 175-180 (1975). https://doi.org:10.1007/BF00391629

      Wentrup, C., Wendeberg, A., Schimak, M., Borowski, C. & Dubilier, N. Forever competent: deep-sea bivalves are colonized by their chemosynthetic symbionts throughout their lifetime. Environ. Microbiol. 16, 3699-3713 (2014). https://doi.org:10.1111/1462-2920.12597

      Cannuel, R., Beninger, P. G., McCombie, H. & Boudry, P. Gill Development and its functional and evolutionary implications in the blue mussel Mytilus edulis (Bivalvia: Mytilidae). Biol. Bull. 217, 173-188 (2009). https://doi.org:10.1086/BBLv217n2p173

      Line 335: what is slingshot trajectory analysis? Does this differ from the pseudotime analysis?

      Slingshot is an algorithm that uses the principal graph of the cells to infer trajectories. It models trajectories as curves on the principal graph, capturing the progression and transitions between different cellular states.

      Both Slingshot and pseudotime aim to infer cellular trajectories. Slingshot focuses on capturing branching patterns which is fully compatible with the graph generated using dimensionality reduction such as UMAP and PHATE, while pseudotime analysis aims to order cells along a continuous trajectory. It does not rely on dimensionality reduction graphs. We used both in the MS for different purposes.

      Line 241: introduce FISH methodology earlier in the paper, when in situ images are first referenced

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 246-249: can you quantify the decrease in signal or calculate the concentration of symbiotes in the cells? Was 5C imaged whole? This can impact the fluorescent intensity in tissues of different thicknesses.

      We appreciate your comment. In Figure 5C, most of the typical gill filament region is visible (the ventral tip of the gill filament, and the mid part of the gill filament) except for the dorsal end. The gill filament of bathymodioline mussels exhibits a simple structure: a single layer of bacteriocytes grow on the basal membrane. Consequently, the gill slices have a fairly uniform thickness (with two layers of bacteriocytes and one layer of interlamina cells in between), minimizing any potential impact on fluorescent intensity. As of now, detailed quantification of intracellular symbionts may necessitate continuous TEM or ultra-resolution confocal sections to 3D reconstruct the bacteriocytes, which may exceed the scope of the current study. Therefore, fluorescent intensity remains the only method available to us for estimating bacterial density/distribution across the gill filament.

      Line 249: What is meant by 'environmental gradient?'

      Here we are refereeing the gases need for symbiont’s chemosynthesis. We have revised the manuscript to make it clear.

      Lines 255-256: Were the results shown in the TEM images previously known? Not clear what novel information is conveyed in images Fig 5 C and D

      In the Fig 5 C and D, we’ve delivered a high-quality SEM TEM image of a typical bacteriocyte, showcasing its morphology and subcellular machinery with clarity. These electron microscopy images offer the audience a comprehensive introduction to the cellular function of bacteriocytes. Additionally, they serve as supportive evidence for the bacteriocytes' snRNA-seq data.

      Line 295-296: Can you elaborate on what types of solute carrier genes have been shown to be involved with symbioses?

      We appreciate the comment, and have revised the manuscript accordingly. The putative functions of the solute carriers could be found in Figure 5I.

      Line 297-301: Which genes from the bulk RNA-seq study? Adding more detail and references in cluster annotation would help readers better understand the justifications.

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 316 -322: Can you provide the values of the distances?

      We also provide values in the main text, in addition to the Fig6b. We also provide a supplementary Table (Supplementary Table S19).

      Line 328: What are the gene expression patterns?

      We observed genes that are up- and down-regulated in Starvation and reconstitution.

      LIne 334-337: A visualization of the different expression levels of the specific genes in clusters between sites might be helpful to demonstrate the degree of difference between sites.

      We have prepared a new supplementary file showing the different expression levels.

      Line 337: Citation needed

      We appreciate the comment. Here, we hypothesize the cellular responds based on the gene’s function and their expression patterns.

      Line 402-403: Cannot determine lineages from data presented. Need lineage tracing over time to determine this

      We acknowledge the necessity of conducting lineage tracing over time to validate this hypothesis. Nonetheless, in practical terms, it is difficult to obtain samples for testing this. Perhaps, it is easier to use their shallow sea relatives to test this hypothesis. However, in practice, it is very difficult.

      413-414: What are the "cell-type specific responses to environmental change"? It could be interesting to present these results in the "results and discussion" section

      These results are shown in Supplementary Figure S8.

      Line 419-424: Sampling details might go better earlier on in the paper, when the sampling scheme is introduced.

      We appreciate the comments. Here, we are discussing the limitations of our current study, not sampling details.

      Line 552: What type of sequencing? Paired end? How long?

      We conducted 150bp paired-end sequencing.

      556-563: More detail here would be useful to readers not familiar with the BD guide. Also be careful to cite the software used in analysis!

      The provided guide and handbook elucidate the intricacies of gene name preparation, data alignment to the genome, and the generation of an expression matrix. It is worth mentioning that we relied upon outdated versions of the aforementioned resources during our data analysis phase, as they were the only ones accessible to us at the time. However, we have since become aware of a newer pipeline available this year, rendering the information presented here of limited significance to other researchers utilizing BD.

      Many thanks for your kind reminding. We have now included a reference for STAR. All other software was cited accordingly. There are no scholarly papers or publications to refer to for the BD pipeline that we can cite.

      Line 577-578: How was the number of clusters determined? What is meant by "manually combine the clusters?" If cells were clustered by hand, more detail on the method is needed, as well as direct discussion and justification in the body of the paper.

      It would be more appropriate to emphasize the determination of cell types rather than clusters. The clusters were identified using a clustering function, as mentioned in the manuscript. It's important to note that the clustering function (in our case, the FindClusters function of Seurat) provides a general overview based on diffuse gene expression. Technically speaking, there is no guarantee that one cluster corresponds to a single cell type. Therefore, it is crucial to manually inspect the clustering results to assign clusters to the appropriate cell types. In some cases, multiple clusters may be assigned to the same cell type, while in other cases, a single cluster may need to be further subdivided into two or more cell types or sub-cell types, depending on the specific circumstances.

      For studies conducted on model species such as humans or mice, highly and specifically expressed genes within each cluster can be compared to known marker genes of cell types mentioned in previous publications, which generally suffices for annotation purposes. However, in the case of non-model species like Bathymodioline mussels, there is often limited information available about marker genes, making it challenging to confidently assign clusters to specific cell types. In such situations, in situ hybridisation proves to be incredibly valuable. In our study, WISH was employed to visualise the expression and morphology of marker genes within clusters. When WISH revealed the expression of marker genes from a cluster in a specific type of cell, we classified that cluster as a genuine cell type. Moreover, if WISH demonstrated uniform expression of marker genes from different clusters in the same cell, we assigned both clusters to the same cell type.

      We expanded the description of the strategy in the Method section.

      LIne 690-692: When slices were used, what part of the gill were they taken from?

      We sectioned the gill around the mid part which could represent the mature bacteriocytes.

      References to figures:

      General

      Please split the fluorescent images into different channels with an additional composite. It is difficult to see some of the expression patterns. It would also make it accessible to colorblind readers.

      We appreciate the comments and suggestions from the reviewer. We have converted our figures to CMYK colour which will help the colorblind audiences to read our paper.

      Please provide the number of replicates for each in situ and what proportion of those displayed the presented pattern.

      We appreciate the reviewer’s comments. We have explained in the material and methods part of the manuscript.

      Figure 2.C' is a fantastic summary and really helps the non-mussel audience understand the results. Adding schematics like this to Figures 3-5 would be helpful as well.

      We value the reviewer's comments. We propose that Figures 3K, 4C, and 5A-D could offer similar schematic explanations to assist the audience.

      Figure 2:

      Figures 2.C-F, 2.C', 2.H-J are not referenced in the text. Adding in discussions of them would help strengthen your discussions on the cluster annotation

      We appreciate the reviewer's comments. We have revise the manuscript accordingly.

      In 2.B. 6 genes are highlighted in red and said to be shown in in situs, but only 5 are shown.

      We apology for the mistake. We didn’t include the result 20639-0.0 WISH in present study. We have changed the label to black.

      Figure 3:

      FIg 2C-E not mentioned.

      We appreciate the reviewer's comments. We have revise the manuscript accordingly.

      In 3.B 8 genes are highlighted in red and said to be shown in in situs. Only 6 are.

      The result of the WISH were provided in Supplementary Figures S4 and S5.

      FIgure 3.K is not referenced in the legend.

      We appreciate the comment, and have revised the manuscript accordingly.

      Figure 4:

      In Figure D, it might be helpful to indicate the growth direction.

      We appreciate the comment, and have revised the manuscript accordingly by adding an arrow in panel D to indicate growth direction.

      4F: A double in situ with the symbiote marker is needed to demonstrate the nucleolin-like positive cells are symbiote free.

      We appreciate the comment. The symbiont free region could be found in Figure 5A.

      Figure 5:

      In 5.A, quantification of symbiote concentration would help support your conclusion that they are denser around the edges.

      We appreciate the comment, as we mentioned above, detailed quantification of intracellular symbionts may necessitate continuous TEM or ultra-resolution confocal sections to 3D reconstruct the bacteriocytes, which may exceed the scope of the current study. Therefore, fluorescent intensity remains the only method available to us for estimating bacterial density/distribution across the gill filament.

      In 5.D, the annotation is not clear. Adding arrows like in 5.C would be helpful.

      We appreciate the comment, and have revised the manuscript accordingly.

      A few genes in 5.F are not mentioned in the paper body when listing other genes. Mentioning them would help provide more support for your clustering.

      We appreciate the comment, and have revised the manuscript accordingly.

      Is 5.I meant to be color coded with the gene groups from 5.F? Color Coding the gene names, rather than organelles or cellular structures might portray this better and help visually strengthen the link between the diagram and your dot plot.

      We appreciate the suggestions. We've experimented with color-coding the gene names, but some colors are less discernible against a white background.

      Figure 6:

      6.B Is there a better way to visualize this data? The color coding is confusing given the pairwise distances. Maybe heatmaps?

      We attempted a heatmap, as shown in the figure below. However, all co-authors agree that a bar plot provides clearer visualization compared to the heatmap. We agree that the color scheme maya be confusing because they use the same color as for individual treatment. So we change the colors.

      Author response image 1.

      Figure 6.D: Why is the fanmao sample divided in the middle?

      Fig6C show that single-cell trajectories include branches. The branches occur because cells execute alternative gene expression programs. Thus, in Fig 6D, we show changes for genes that are significantly branch dependent in both lineages at the same time. Specifically, in cluster 2, the genes are upregulated during starvation but downregulated during reconstitution. Conversely, genes in cluster 1 are downregulated during starvation but upregulated during reconstitution. It's of note that Fig 6D displays only a small subset of significantly branch-dependent genes.

      FIgure 6.D: Can you visualize the expression in the same format as in figures 2-5?

      We appreciate the comments from the reviewer. As far as we know, this heatmap are the best format to demonstrate this type of gene expression profile.

      Supplementary Figure S2:

      Please provide a key for the cell type abbreviations

      We appreciate the comment, and have added the abbreviations of cell types accordingly.

      Supplementary Figures S4 and S5:

      What part of the larger images are the subsetted image taken from?

      We appreciate the comment, these images were taken from the ventral tip and mid of the gill slices, respectively. We have revised the figure legends to make it clear.

      Supplemental Figure S7:

      If clusters 1 and 2 show genes up and downregulated during starvation, what do clusters 4 and 3 represent?

      Cluster 1: Genes that are obviously upregulated during Starvation, and downregulated during reconstitution; luster4: genes are downregulated during reconstitution but not obviously upregulated during Starvation.

      Cluster 2 show genes upregulated during reconstitution, and cluster 3 obviously downregulated during Starvation.

      Author response table 1.

      Supplemental Figure S8:

      This is a really interesting figure that I think shows some of the results really well! Maybe consider moving it to the main figures of the paper?

      We appreciate the comments and suggestions. We concur with the reviewer on the significance of the results presented. However, consider the length of this manuscript, we have prioritized the inclusion of the most pertinent information in the main figures. Supplementary materials containing additional figures and details on the genes involved in these pathways are provided for interested readers.

      Supplemental Figure S11:

      Switching the axes might make this image easier for the reader to interpret. Additionally, calculating the normalized contribution of each sample to each cluster could help quantify the extent to which bacteriocytes are reduced when starving.

      Thank you for the insightful suggestion, which we have implemented as detailed below. We acknowledge the importance of understanding the changes in bacteriocyte proportions across different treatments. However, it's crucial to note that the percentage of cells per treatment is highly influenced by factors such as the location of digestion and sequencing, as previously mentioned.

      Author response image 2.

      Reviewer #2 (Recommendations For The Authors):

      The following are minor recommendations for the text and figures that may help with clarity:

      Fig. 3K: This figure describes water flow induced by different ciliary cells. It is not clear what the color of the arrows corresponds to, as they do not match the UMAP (i.e. the red arrow) and this is not indicated in the legend. Are these colours meant to indicate the different ciliary cell types? If so it would be helpful to include this in the legend.

      We appreciate the reviewer's comments and suggestions. The arrows indicate the water flow that might be agitated by the certain types of cilium. We have revised our figure and figure legends to make it clear.

      Line 369: The incorrect gene identifier is given for the mitochondrial trifunctional enzyme. This gene identifier is identical to the one given in line 366, which describes long-chain-fatty-acid-ligase ACSBG2-like (Bpl_scaf_28862-1.5).

      We appreciate the reviewer's comments and suggestions. We have revised our manuscript accordingly.

      Line 554: The Bioproject accession number (PRJNA779258) does not appear to lead to an existing page in any database.

      We appreciate the reviewer's comments and suggestions. We have released this Bioproject to the public.

      Line 597-598: it would be helpful to know the specific number of cells that the three sample types were downsampled to, and the number of cells remaining in each cluster, as this can affect the statistical interpretation of differential expression analyses.

      The number of cells per cluster in our analysis ranged from 766 to 14633. To mitigate potential bias introduced by varying cell numbers, we implemented downsampling, restricting the number of cells per cluster to no more than 3500. This was done to ensure that the differences between clusters remained less than 5 times. We experimented with several downsampling strategies, exploring cell limits of 4500 and 2500, and consistently observed similar patterns across these variations.

      Data and code availability:

      The supplementary tables and supplementary data S1 appear to be the final output of the differential expression analyses. Including the raw data (e.g. reads) and/or intermediate data objects (e.g. count matrices, R objects), in addition to the code used to perform the analyses, may be very helpful for replication and downstream use of this dataset. As mentioned above, the Bioproject accession number appears to be incorrect.

      We appreciate the reviewer's comments and suggestions. Regarding our sequencing data, we have deposited all relevant information with the National Center for Biotechnology Information (NCBI) under Bioproject PRJNA779258. Additionally, we have requested the release of the Bioproject. Furthermore, as part of this round of revision, we have included the count matrices for reference.

      Reviewer #3 (Recommendations For The Authors):

      As noted in the public review, my only major concerns are around the treatment of progenitor cell populations. I am sympathetic to the challenges of these experiments but suggest a few possible avenues to the authors.

      First, there could be some demonstration that these cells in G. platifrons are indeed proliferative, using EdU incorporation labeling or a conserved epitope such as the phosphorylation of serine 10 in histone 3. It appears in Mytilus galloprovincialis that proliferating cell nuclear antigen (PCNA) and phospho-histone H3 have previously been used as good markers for proliferative cells (Maiorova and Odintsova 2016). The use of any of these markers along with the cell type markers the authors recover for PEBZCs for example would greatly strengthen the argument that these are proliferative cells.

      If performing these experiments would not be currently possible, the authors could use some computation approaches to strengthen their arguments. Based on conserved cell cycle markers and the use of Cell-Cycle feature analysis in Seurat could the authors provide evidence that these progenitors occupy the G2/M phase at a greater percentage than other cells? Other than the physical position of the cells is there much that suggests that these are proliferative? While I am more convinced by markers in VEPCs the markers for PEBZCs and DEPCs are not particularly compelling.

      While I do not think the major findings of the paper hinge on this, comments such as "the PBEZCs gave rise to new bacteriocytes that allowed symbiont colonization" should be taken with care. It is not clear that the PBEZCs are proliferative and there does not seem to be any direct evidence that PBEZCs (or DEPCs or VEPCS for that manner) are the progenitor cells through any sort of labeling or co-expression studies.

      We appreciate the comments and suggestions from the reviewer. We have considered all the suggestions and have revised the manuscript accordingly. We especially appreciate the reviewer’s suggestions about the characterisations of the G. platifrons gill proliferative cell populations. In a separate research project, we have tested both cell division and cell proliferation markers on the proliferation cell populations. Though we are not able to include these results in the current manuscript, we are happy to share our preliminary results with the reviewer. Our results demonstrate the proliferative cell populations, particularly the VEPCs, are cell proliferation marker positive, and contains high amount of mitotic cells.

      Author response image 3.

      Finally, there is a body of literature that has examined cell proliferation and zones of proliferation in mussels (such as Piquet, B., Lallier, F.H., André, C. et al. Regionalized cell proliferation in the symbiont-bearing gill of the hydrothermal vent mussel Bathymodiolus azoricus. Symbiosis 2020) or other organisms (such as Bird, A. M., von Dassow, G., & Maslakova, S. A. How the pilidium larva grows. EvoDevo. 2014) that could be discussed.

      We appreciate the comments and suggestions from the reviewer. We have considered all the suggestions and have revised the manuscript accordingly (line 226-229).

      Minor comments also include:

      Consider changing the orientation of diagrams in Figure 2C' in relationship to Figure 2C and 2D-K.

      We appreciate the comments and suggestions from the reviewer. The Figure 2 has been reorganized.

      For the diagram in Figure 3K, please clarify if the arrows drawn for the direction of inter lamina water flow is based on gene expression, SEM, or some previous study.

      We are grateful for the reviewer's valuable feedback and suggestions. The arrows in the figure indicate the direction of water flow that could be affected by specific types of cilium. Our prediction is based on both gene expression and SEM results. To further clarify this point, we have revised the figure legend of Fig. 3.

      Please include a label for the clusters in Figure 5E for consistency.

      We have revised our Figure 5E to keep our figures consistent.

      Please include a note in the Materials and Methods for Monocle analysis in Figure 6.

      We conducted Monocle analyses using Monocle2 and Monocle 3 in R environment. We have revised our material and methods with further information of Figure 6.

      In Supplement 2, the first column is labeled PEBC while the first row is labeled PEBZ versus all other rows and columns have corresponding names. I am guessing this is a typo and not different clusters?

      We appreciate the great effort of the reviewer in reviewing our manuscript. We have corrected the typo in the revised version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      1. The most important concern that I have refers to the FDTD simulations to characterize the ZMW, as shown in Appendix 2, Figure 4. So far, the explanations given in the caption of Figure 4 are confusing and misleading: the authors should provide more detailed explanations on how the simulations were performed and the actual definition of the parameters used. In particular:

      a. lines 1330-1332: it is not clear to me how the fluorescence lifetime can be calculated from the detected signal S (z), and why they are horizontal, i.e., no z dependence? Which lifetimes are the authors referring to?

      b. lines 1333-1335: Where do these values come from? And how do they relate to panels D & E? From what I can see in these panels the lifetimes are highly dependent on z and show the expected reduction of lifetime inside the nanostructures.

      c. lines 1336-1337: Why the quantum yield of the dyes outside the ZMW differs from those reported in the literature? In particular the changes of quantum yield and lifetime for Alexa 488 are very large (also mentioned in the corresponding part of Materials & Methods but not explained in any detail).

      We thank the Reviewer for his detailed questions on the FDTD simulations. We have now added the missing equation related to the computation of signal-averaged fluorescence lifetimes from the FDTD simulations. Specifically to the three points raised:

      a) The fluorescence lifetime is indeed not calculated from the detected signal S(z), but from the radiative and non-radiative rates in the presence of the ZMW as given in eq. 9-10. However, we use the detected signal S(z) to compute the average fluorescence lifetime over the whole z-profile of the simulation box, which we relate to the experimentally measured fluorescence lifetimes as given in Appendix 7, Figure 1. We have now added the equation to compute the signal-weighted fluorescence lifetimes, which we denote as <𝜏>S , in eq. 13 in the methods. To clarify this point, we have added the symbol <𝜏>S to the plots in Appendix 2, Figure 4 D-E and Appendix 7, Figure 1 C-D.

      b) The estimated lifetimes were obtained as the signal-weighted average over the lifetime profiles, (<𝜏>S) as given in the new eq. 13. All plotted quantities, i.e., the detection efficiency η, quantum yield ϕ, detected signal S(z), and fluorescence lifetime, are computed from the radiative and loss rates obtained from the FDTD simulation according to eqs. 8-11. To make this clearer, we have now added the new Appendix 2 – Figure 5 which shows the z-profiles of the quantities (radiative and loss rates) used to derive the experimental observables.

      c) There are multiple reasons for the differences of the quantum yields of the two analytes used in this study compared to the literature values. For cyanine dyes such as Alexa647, it is well known that steric restriction (as e.g. caused by conjugation to a biomolecule) can lead to an increase of the quantum yield and fluorescence lifetime. We observe a minor increase of the fluorescence lifetime for Alexa647 from the literature value of 1.17 ns to a value of 1.37 ns when attached to Kap95, which is indicative of this effect. In the submitted manuscript, this was discussed in the methods in lines 936-938 (lines 938-945 in the revised manuscript). For the dye Alexa488, which is used to label the BSA protein, this effect is absent. Instead, we observe (as the Reviewer correctly notes) a quite drastic reduction of the fluorescence lifetime compared to the unconjugated dye from 4 ns to 2.3 ns. In cases where a single cysteine is labeled on a protein, such a drastic reduction of the quantum yield usually indicates the presence of a quenching moiety in proximity of the labeling site, such as tryptophane, which acts via the photo-induced electron transfer mechanism. Indeed, BSA contains two tryptophanes that could be responsible for the low quantum yield of the conjugated dyes. The situation is complicated by the fact that BSA contains 35 cysteines that can potentially be labeled (although 34 are involved in disulfide bridges). The labeled BSA was obtained commercially and the manufacturer lists the degree of labeling as ~6 dye molecules per protein, with a relative quantum yield of 0.2 compared to the standard fluorescein. This corresponds to an absolute quantum yield of ~0.16, which is low compared to the literature value for Alexa488 of ~0.8.

      Based on the measured fluorescence lifetime, we estimate a quantum yield of 0.46, which is higher than the photometrically obtained value of 0.16 reported by the manufacturer. Fully quenched, nonfluorescent dyes will not contribute to the lifetime measurement but are detected in the photometric quantum yield estimates. The difference between the lifetime and photometric based quantum yield estimates thus suggest that part of the fluorophores are almost fully quenched. While it is unknown where the dyes are attached to the protein, the low quantum yield could be indicative of dye-dye interactions via pi-pi stacking, which can often lead to non-fluorescent dimers. This is supported by the fact that the manufacturer reports color differences between batches of labeled protein, which indicate spectral shifts of the absorption spectrum when dye-dye adducts are formed by π-π stacking. We have now added a short discussion of this effect in lines 938-941. We note that the conclusions drawn on the quenching effect of the metal nanostructure remain valid despite the drastic reduction of the quantum yield for Alexa488, which leads to a further quantum yield reduction of the partly quenched reference state.

      2) A second important concern refers to Figure 3: Why is there so much variability on the burst intensities reported on panels C, D? They should correspond to single molecule translocation events and thus all having comparable intensity values. In particular, the data shown for BSA in panel D is highly puzzling, since it not only reflects a reduced number of bursts (which is the main finding) but also very low intensity values, suggesting a high degree of quenching of the fluorophore being proximal to the metal on the exit side of the pore. In fact, the count rates for BSA on the uncoated pore range form 50-100kcounts/s, while on the coated pores thy barely reach 30 kcounts/s, a clear indication of quenching. Importantly, and in direct relation to this, could the authors exclude the possibility that the low event rates measured on BSA are largely due to quenching of the dye by getting entangled in the Nsp mesh just underneath the pore but in close contact to the metal?

      The Reviewer raises a valid concern, but further analysis shows that this is unproblematic. Notably, the burst intensities are in fact not reduced, in contrast to the visual impression obtained from the time traces shown in the figure. The time trace of the BSA intensity is visually dominated by high-intensity bursts which mask the low-intensity bursts in the plot. In contrast, in Figure 3 the reduced number of BSA events results in a sparser distribution of the intensity spikes, which allows low-intensity events to be seen. Different to the visual inspection, the spike-detection algorithm does not exhibit any bias in terms of the duration or the number of photons of the detected events between the different conditions for both BSA and Kap95, as shown in the new Appendix 7 – Figure 1. Using FCS analysis it can be tested whether the event duration varies between the different conditions shown in Figure 3 C-D. This did not show a significant difference in the estimated diffusion time for BSA (Appendix 7 – Figure 1 C,D). Contrary to the suggestion of the Reviewer, we also do not observe any indication of quenching by the metal between uncoated and Nsp1-coated pores for BSA. Such quenching should result in differences of the fluorescence lifetimes, which however is not evident in our experimental data (Appendix 7 – Figure 1 F).

      3) Line 91: I suggest the authors remove the word "multiplexed" detection since it is misleading. Essentially the authors report on a two-color excitation/detection scheme which is far from being really multiplexing.

      We have changed the word to “simultaneous” now and hope this avoids further confusion.

      4) Line 121: why are the ZMW fabricated with palladium? Aluminum is the gold-standard to reduce light transmissivity. An explanation for the choice of this material would be appreciated by the community.

      In a previous study (Klughammer and Dekker, Nanotechnology, 2021), we established that palladium can have distinct advantages compared to other ZMW metals such as aluminum and gold, most prominently, an increased chemical stability and reduced photoluminescence. For this study, we chose palladium over aluminum as it allowed the use of simple thiol chemistry for surface modification. In the beginning of the project, we experimented with aluminum pores as well. We consistently found that the pores got closed after measuring their ionic conductance in chlorine-containing solutions such as KCl or PBS. This problem was avoided by choosing palladium.

      5) Lines 281-282: This statement is somewhat misleading, since it reads such that the molecules stay longer inside the pore. However, if I understand correctly, these results suggest that Kap95 stays closer to the metal on the exit side. This is because measurements are being performed on the exit side of the pore as the excitation field inside the pore is quite negligible.

      We thank the Reviewer for this comment and have clarified the text in lines 290-292 as suggested to: “(…) this indicates that, on the exit side, Kap95 diffuses closer to the pore walls compared to BSA due to interactions with the Nsp1 mesh”

      6) Lines 319-320: Although the MD simulations agree with the statement being written here, the variability could be also due to the fact that the proteins could interact in a rather heterogenous manner with the Nsp mesh on the exit side of the pore, transiently trapping molecules that then would stay longer and/or closer to the metal altering the emission rate of the fluorophores. Could the authors comment on this?

      The variation mentioned in the text refers to a pore-to-pore variation and thus needs to be due to a structural difference between individual pores. This effect would also need to be stable for the full course of an experiment, typically hours. We did not find any structural changes in the fluorescence lifetimes measured on individual pores such as suggested by the Reviewer. We think that the suggested mechanism would show up as distinct clusters in Appendix 7 – Figure 1 E,F where we found no trace of such a change to happen. If we understand correctly, the Reviewer suggests a mechanism, not based on changes in the Nup layer density, that would lead to a varying amount of trapping of proteins close to the surface. Such a behavior should show up in the diffusion time of each pore ( Appendix 7 – figure 1 C,D), where we however find no trace of such an effect.

      7) Lines 493-498: These claims are actually not supported by the experimental data shown in this contribution: a) No direct comparison in terms of signal-to-noise ratio between fluorescence-based and conductance-based readouts has been provided in the ms. b) I would change the word multiplexed by simultaneous since it is highly misleading. c) The results shown are performed sequentially and thus low throughput. d) Finally, the use of unlabeled components is dubious since the detection schemes relies on fluorescence and thus requiring labeling.

      We thank the Reviewer for pointing this out.

      a) We have now added a section in appendix 3 that discusses the signal-to-noise ratios. In brief, there are three observations that led us to conclude that ZMWs provide beneficial capabilities to resolve individual events from the background:

      1. The signal-to-background ratio was determined to be 67±53 for our ZMW data of Kap95 which is an order of magnitude higher compared to the ~5.6 value for a conductance-based readout.

      2. The detection efficiency for ZMWs is independent of the Kap95 occupancy within the pore. This is different from conductance based approaches that have reduced capability to resolve individual Kap95 translocations at high concentrations.

      3. The fraction of detected translocations is much higher for ZMWs than for conductance-based data (where lots of translocations occur undetected) and matches closer to the theoretical predictions.

      b) We have changed the wording accordingly.

      c) We agree with the Reviewer that our method is still low throughput. However, the throughput is markedly increased compared to previous conductance-based nanopore measurements. This is because we can test many (here up to 8, but potentially many more) pores per chip in one experiment, whereas conductance-based readouts are limited to a single pore. We have now changed the wording to “increased throughput” in line 507 to avoid confusion.

      d) We agree that only labeled components can be studied directly with our methods. However, the effect of unlabeled analytes can be assessed indirectly without any perturbation of the detection scheme due to the specificity of the fluorescent labeling. This is distinct from previous nanopore approaches using a conductance-based readout that lack specificity. In our study, we have for example used this advantage of our approach to access event rates at high concentrations (1000nM Kap95, 500nM BSA) and large pore diameters by reducing the fraction of labeled analyte in the sample. Finally, the dependence of the BSA leakage rate as a function of the concentration of Kap95 (Figure 6) relies on a specific readout of BSA events in the presence of large amounts of Kap95, which would be impossible in conductance-based experiments.

      8) Line 769: specify the NA of the objective. Using a very long working distance would also affect the detection efficiency. Have the authors considered the NA of the objective on the simulations of the detection efficiency? This information should be included and it is important as the authors are detecting single molecule events.

      We used an NA of 1.1 for the simulation of the Gaussian excitation field in the FDTD simulations, corresponding to the NA of the objective lens used in the experiments and as specified in the methods. The Reviewer is correct that the NA also affects the absolute detection efficiency of the fluorescence signal due to the finite opening angle of the collection cone of ~56˚. In our evaluation of the simulations, we have neglected this effect for simplicity, because the finite collection efficiency of the objective lens represents only an additional constant factor that does not depend on the parameters of the simulated system, such as the pore diameter. Instead, we focused solely the effect of the ZMW and defined the detection efficiency purely based on the fraction of the signal that is emitted towards the detection side and can potentially be detected in the experiment, which also provides the benefit that the discussed numbers are independent of the experimental setup used.

      To clarify this, we have now made this clearer in the method text on lines 917-920.

      9) Line 831: I guess that 1160ps is a mistake, right?

      This is not a mistake. We performed a tail fit of the fluorescence decay curves, meaning that the initial rise of the decay was excluded from the fit. The initial part of the fluorescence decay is dominated by the instrument response function (IRF) of the system, with an approximate width of ~500 ps. To minimize the influence of the IRF on the tail fit, we excluded the first ~1 ns of the fluorescence decay.

      10) Lines 913-917: Why are the quantum yield of Alexa 488 and lifetime so much reduced as compared to the published values in literature?

      See answer to point 1. We have added a short discussion at lines 938-941 where we speculate that the reduced quantum yield is most likely caused by dye-dye interactions due to the high degree of labeling of ~6 dyes per protein.

      11) Lines 1503-1509: The predicted lifetimes with the Nsp-1 coating have not been shown in Appendix 2 - Figure 4. How have they been estimated?

      We have not performed predictions of fluorescence lifetimes in the presence of an Nsp1 coating. Predictions of the fluorescence lifetime in the absence of the Nsp1 coating were obtained by assuming a uniform occupancy of the molecules over the simulation box. A prediction of the fluorescence lifetimes in the presence of the Nsp1 coating would require a precise knowledge of the spatial distribution of analytes, which depends, among other factors, on the extension of the Nsp1 brushes and the interaction strengths with the FG repeats. While simulations provide some insights on this, we consider a quantitative comparison of predicted and measured fluorescence lifetimes in the presence of the Nsp1 coating beyond the scope of the present study.

      12) Lines 1534-1539: I disagree with this comment, since the measurements reported here have been performed outside the nano-holes, and thus the argument of Kap95 translocating along the edges of the pore and being responsible for the reduced lifetime does not make sense to me.

      In accordance with our answer to point 5 above, we have now changed the interpretation to the proximity of Kap95 to the metal surface on the exit side, rather than speculating on the path that the protein takes through the pore (lines 1662-1664), as follows:

      “This indicates that, in the presence of Nsp1, Kap95 molecules diffuse closer to or spend more time in proximity of the metal nanoaperture on the exit side.”

      Reviewer #2:

      (Numbers indicate the line number.)

      48: should cite more recent work: Timney et al. 2016 Popken et al 2015

      59: should cite Zilman et al 2007, Zilman et al 2010

      62: should cite Zilman et al 2010

      We thank the Reviewer for the suggestions and have added them to the manuscript now.

      65: one should be careful in making statements that the "slow" phase is immobile, as it likely rapidly exchanging NTRs with the "fast" phase.

      We have removed this description and replaced it by “This 'slow phase' exhibits a reduced mobility due to the high affinity of NTRs to the FG-Nup mesh.” to avoid misunderstanding.

      67: Schleicher 2014 does not provide evidence of dedicated channels

      We agree with the Reviewer and therefore moved the reference to an earlier position in the sentence.

      74-75: must cite work by Lusk & Lin et al on origami nanochannels

      We thank the Reviewer for this suggestion. We have now added a reference to the nanotraps of Shen et al. 2021, JACS, in line 75. In addition, we now also refer to Shen et al. 2023, NSMB, in the discussion where viral transport is discussed.

      77: Probably Jovanovic- Talisman (2009)?

      We thank the Reviewer for pointing out this typo.

      93; should cite Auger&Montel et al, PRL 2014

      We thank the Reviewer for pointing out this reference. To give proper credit to previous ZMW, we have now incorporated a sentence in lines 100-102 citing this reference.

      111-112: there appears to be some internal inconsistency between this interpretation and the BSA transport mostly taking place through the "central hole" (as seems to be implied by Equation (3). Probably it should be specified explicitly that the "central hole" in large channels is a "void".

      We thank the Reviewer for this suggestion and have added a clarifying sentence.

      115-177: This competition was studied in Jovanovic-Talisman 2009 and theoretically analysed in Zilman et al Plos Comp Biol 2010. The differences in the results and the interpretation should be discussed.

      We agree, therefore it is discussed in the discussion section (around line 594) and now added the reference to Zilman et al.

      Figure 2 Caption: "A constant flow..." - is it clear that is flow does not generate hydrodynamic flow through the pore?

      The Reviewer raises an important point. Indeed, the pressure difference over the membrane generates a hydrodynamic flow through the pore that leads to a reduction of the event rate compared to when no pressure is applied. However, as all experiments were performed under identical pressures, one can expect a proportional reduction of the absolute event rates due to the hydrodynamic flow against the concentration gradient. In other words, this will not affect the conclusions drawn on the selectivity, as it is defined as a ratio of event rates.

      We have now added additional data on the influence of the hydrodynamic flow on the translocation rate in Appendix 3 – Figure 2, where we have measured the signal of free fluorophores at high concentration on the exit side of the pore as a function of the applied pressure. The data show a linear dependence of the signal reduction on the applied pressure. At the pressure values used for the experiments of 50 mbar, we see a ~5% reduction compared to the absence of pressure, implying that the reported absolute event rates are underestimated only by ~5%. Additionally we have added such data for Kap95 translocations that shows a similar effect (however less consistent). Measuring the event rate at zero flow is difficult, since this leads to an accumulation of fluorophores on the detection side.

      Figure 3: it would help to add how long is each translocation, and what is the lower detection limit. A short explanation of why the method detects actual translocations would be good

      With our method, unfortunately, we can not assess the duration of a translocation event since we only see the particle as it exists the pore. Instead, the measured event duration is determined by the time it takes for the particle to diffuse out of the laser focus. This is confirmed by FCS analysis of translocation events that show the same order of magnitude of diffusion times as for free diffusion (Appendix 7 – Figure 1 C,D) in contrast to a massively reduced diffusion time within a nanopore. In Figure 2D we show the detection efficiency at different locations around the ZMW as obtained from FDTD simulations and discuss the light blocking. This clearly shows that the big majority of the fluorescence signal comes from the laser illuminated side and therefore only particles that translocated through the ZMW are detected as presented between lines 170-190. In Yang et al. 2023, bioRxiv (https://doi.org/10.1101/2023.06.26.546504) a more detailed discussion about the optical properties of Pd nanopores is given.

      This point also explains why we see actual translocations: since the light is blocked by the ZMW, fluorophores can only be detected after they have translocated. On parts of the membrane without pores and upstream the amount of spikes found in a timetrace was found to be negligibly small. Additionally, if a significant part of the signal would be contributed by leaking fluorescence from the dark top side, there should no difference in BSA event rate found between small open and Nsp1 pores which we did not observe.

      With respect to the lower detection limit for events: In the burst search algorithm we require a false positive level rate of lower than 1 event in 100. Additionally, as described in Klughammer and Dekker, Nanotechnology (2021), we apply an empirical filtering to remove low signal to noise ratio events that contain less than 5 detected photons per event or a too low event rate. From the event detection algorithm there is no lower limit set on the duration of an event. Such a limit is then set by the instrument and the maximum frequency it which it can detect photons. This time is below 1μs. Practically we don’t find events shorter than 10μs as can be seen in the distribution of events where also the detection limits can be estimated (Appendix 7 – figure 1 A and B.)

      Equation (1): this is true only for passive diffusion without interactions (see eg Hoogenboom et al Physics Reports 2021 for review). Using it for pores with interactions would predict, for instance, that the inhibition of the BSA translocation comes from the decrease in D which is not correct.

      We agree with the Reviewer that this equation would not reproduce the measured data in a numerically correct way. We included it to justify why we subsequently fit a quadratic function to the data. As we write in line 260 we only used the quadratic equation “as a guide to the eye and for numerical comparison” and specifically don’t claim that this fully describes the translocation process. In this quadratic function, we introduced a scaling factor α that can be fitted to the data and thus incorporates deviations from the model. In appendix 5 we added a more elaborate way to fit the data including a confinement-based reduction of the diffusion coefficient (although not incorporating interactions). Given the variations of the measured translocation rates, the data is equally well described by both the simple and the more complex model function.

      Equation (1): This is not entirely exact, because the concentration at the entrance to the pore is lower than the bulk concentration, which might introduce corrections

      We agree with the Reviewer and have added that the concentration difference Δc is measured at the pore entrance and exit, and this may be lower than the bulk concentration. As described in our reaction to the Reviewer’s previous comment, equation (1) only serves as a justification to use the quadratic dependence and any deviations in Δc are absorbed into the prefactor α in equation (2).

      Equation (3): I don't understand how this is consistent with the further discussion of BSA translocation. Clearly BSA can translocate through the pore even if the crossection is covered by the FG nups (through the "voids" presumably?).

      The Reviewer raises an important point here. Equation 3 can only be used for a pore radius r > rprot + b. b was determined to be 11.5 nm and rprot is 3.4 nm for BSA, thus it needs to be that r > 15 nm. We would like to stress, however, that b does not directly give a height of a rigid Nsp1 ring but is related to the configuration of the Nsp1 inside the pore. Equation (3) (and equation (2)) were chosen because even these simple equations could fit the experimentally measured translocation rates well, and not because they would accurately model the setup in the pore. As we found from the simulations, the BSA translocations at low pore diameters presumably happen through transient openings of the mesh. The dynamics leading to the stochastic opening of voids on average leads to the observed translocation rate.

      296-297: is it also consistent with the simulations?

      We compare the experimentally and simulated b values in lines 387-388 and obtained b=9.9 ± 0.1 nm from the simulations (as obtained from fitting the translocation rates and not from measuring the extension of the Nsp1 molecules) and 11.5 ± 0.4 nm from the experiments – which we find in good agreement.

      331: has it been established that the FG nups equilibrate on the microsecond scale?

      As an example, we have analyzed the simulation trajectory of the most dense nanopore (diameter = 40 nm, grafting = 1/200 nm2). In Author response image 1 we show for each of the Nsp1-proteins how the radius of gyration (Rg) changes in time over the full trajectory (2 μs + 5 μs). As expected, the Rg values reached the average equilibrium values very well within 2 μs simulation time, showing that the FG-Nups indeed equilibrate on the (sub)microsecond scale.

      Author response image 1.

      334-347: the details of the method should be explained explicitly in the supplementary (how exactly voids distributions are estimated and the PMF are calculated etc)

      The void analysis was performed with the software obtained from the paper of Winogradoff et al. In our Methods we provide an overview of how this software calculates the void probability maps and how these are converted into PMFs. For a more detailed description of how exactly the analysis algorithm is implemented in the software, we refer the reader to the original work. The analysis codes with the input files that were used in this manuscript have been made public ( https://doi.org/10.4121/22059227.v1 ) along with the manuscript.

      Equation (4) is only an approximation (which works fine for high barriers but not the low ones). Please provide citations/derivation.

      To our knowledge, the Arrhenius relation is a valid approximation for our nanopore simulations. We are unaware of the fact that it should not work for low barriers and cannot find mention of this in the literature. It would be helpful if the Reviewer can point us to relevant literature.

      Figure 4: how was transport rate for Kaps calculated?

      As mentioned in lines 388-391, we assumed that the Kap95 translocation rate through Nsp1-coated pores is equal to that for open pores, as we did not observe any significant hindrance of Kap95 translocation by the Nsp1 mesh in the experiment (Figure 4 A,C).

      378: It's a bit strange to present the selectivity ratio as prediction of the model when only BSA translocation rate was simulated (indirectly).

      We agree with the Reviewer that ideally we should also simulate the Kap95 translocation rate to obtain an accurate selectivity measure of the simulated nanopores. However, as the experiments showed very similar Kap95 translocation rates for open pores and Nsp1-coated pores, we believe it is reasonable to take the Kap95 rates for open and Nsp1-pores to be equal.

      Figure 5C and lines 397: I am a bit confused how is this consistent with Figure 4D?

      Figure 5C and figure 4D both display the same experimental data, where 4D only focuses on a low diameter regime. In relation to line 397 (now 407), the Nsp1 mesh within the 60-nm pore dynamically switches between closed configurations and configurations with an open channel. When taking the temporal average of these configurations, we find that the translocation rate is higher than for a closed pore but lower than for a fully open pore. The stochastic opening and closing of the Nup mesh results in the continuous increase of the translocation rates with increasing diameter, which is in contrast to a step-wise increase that would be expected from an instantaneous collapse of the Nsp1 mesh at a certain pore diameter.

      428-439: Please discuss the differences from Jovanovic-Talisman 2009.

      How our results for a Kap95 induced change of the BSA translocation rate are related to previous literature is discussed extensively in the lines 598-620.

      440: How many Kaps are in the pore at different concentrations?

      This is a very interesting question that we were, unfortunately, not able to answer within the scope of this project. With our fluorescent based methods we could not determine this number because the excitation light does not reach well into the nanopore.

      In our previous work on Nsp1-coated SiN nanopores using conductance measurements, we quantified the drop in conductance at increasing concentrations of Kap95 (Fragasso et al., 2023, NanoResearch, http://dx.doi.org/10.1007/s12274-022-4647-1). From this, we estimated that on average ~20 Kap95 molecules are present in a pore with a diameter of 55 nm at a bulk concentration of 2 µM. In these experiments, however, the height of the pore was only ~20 nm, which is much lower compared to 100 nm long channel used here, and the grafting density of 1 per 21 nm2 was high compared to the grafting density here of 1 per 300 nm2. Assuming that the Kap95 occupancy scales linearly with the number of binding sites (FG repeats) in the vicinity of the pore, and hence the amount of Nsp1 molecules bound to the pore, we would expect approximately ~7 Kap95 molecules in a pore of similar diameter under saturating (> 1 µM) concentrations.

      On the other hand, the simulations showed that the density of Nsp1 within the pore is equal to the density within the 20-nm thick SiN pores (line 380). For the longer channel and lower grafting density used here, Nsp1 was also more constrained to the pore compared to thinner pores used in previous studies (Fragasso et al., 2023, NanoResearch), where the grafted protein spilled out from the nanopores. Thus assuming that the Kap95 occupancy depends on the protein density in the pore volume rather than the total protein amount grafted to the pore walls, we would estimate a number of 100 Kap95 molecules per pore.

      These varying numbers already show that we cannot accurately provide an estimate of the Kap95 occupancy within the pore from our data due to limitations of the ZMW approach.

      445: how is this related to the BSA translocation increase?

      For the calculation of the selectivity ratio, we assumed the normalized Kap95 translocation rate to be independent of the Kap95 concentration. Hence, the observed trends of the selectivity ratios at different concentrations of Kap95, as shown in Figure 6 D, are solely due to a change in the BSA translocation rate at different concentrations of Kap95, as given in Figure 6 B,C.

      462-481: it's a bit confusing how this interfaces with the "void" analysis ( see my previous comments)

      We agree that the phenomenological descriptions in terms of transient openings (small, dynamic voids) that for larger pores become a constantly opened channel (a single large, static void) might cause some confusion to the reader. In the last part of the results, we aimed to relate the loss of the BSA rate to a change of the Nsp1 mesh. We acknowledge that the model of a rim of Nsp1 and an open center described in Figure 5F is highly simplifying . We now explain this in the revised paper at lines 483-486 by referring to an effective layer thickness which holds true under the simplifying assumption of a central transport channel.

      Figure 6D: I think the illustration of the effect of kaps on the brush is somewhat misleading: at low pore diameters, it is possible that the opposite happens: the kaps concentrate the polymers towards the center of the pore. It should be also made clear that there are no kaps in simulations (if I understand correctly?)

      Indeed, at small pore diameters we think it would be possible to observe what the Reviewer describes. The illustration should only indicate what we think is happening for large pore diameters where we observed the opening of a central channel. To avoid confusion, we now shifted the sketches to panel G where the effective layer thickness is discussed.

      Indeed, as stated in lines 331-340 no Kap95 or BSA molecules were present in the simulations. We have now clarified this point in lines 872-876.

      518: Please provide more explanation on the role of hydrodynamics pressure.

      We have now performed additional experiments and quantified the effect of the pressure to be a ~5% reduction of the event rates, as described in the answer to a previous question above.  

      Reviewer #3 (Recommendations For The Authors):

      No experiments have been performed with the Ran-Mix regeneration system. It would be beneficial to add Ran-Mix to the trans compartment and see how this would affect Kap95 translocation events frequency and passive cargo diffusion. As the authors note in their outlook, this setup offers an advantage in using Ran-Mix and thus could also be considered here or in a future follow-up study.

      We thank the Reviewer for this suggestion. We think, however, that it is beyond the scope of this paper and an interesting subject for a follow-up study.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study and associated data is compelling, novel, important, and well-carried out. The study demonstrates a novel finding that different chemotherapeutic agents can induce nucleolar stress, which manifests with varying cellular and molecular characteristics. The study also proposes a mechanism for how a novel type of nucleolar stress driven by CDK inhibitors may be regulated. The study sheds light on the importance of nucleolar stress in defining the on-target and offtarget effects of chemotherapy in normal and cancer cells.

      We are thankful to the reviewers and the editor for their feedback and thorough assessment of our work. Our responses to the comments and suggestions are below.

      Reviewer #1 (Public Review):

      The study titled "Distinct states of nucleolar stress induced by anti-cancer drugs" by Potapova and colleagues demonstrates that different chemotherapeutic agents can induce nucleolar stress, which manifests with varying cellular and molecular characteristics. The study also proposes a mechanism for how a novel type of nucleolar stress driven by CDK inhibitors may be regulated. As a reviewer, I appreciate the unbiased screening approach and I am enthusiastic about the novel insights into cell biology and the implications for cancer research and treatment. The study has several significant strengths: i) it highlights the understudied role of nucleolar stress in the on- and off-target effects of chemotherapy; ii) it defines novel molecular and cellular characteristics of the different types of nucleolar stress phenotypes; iii) it proposes novel modes of action for well-known drugs. However, there are several important points that should be addressed:

      • The rationale behind choosing RPE cells for the screen is unclear. It might be more informative to use cancer cells to study the effects of chemotherapeutic agents. Alternatively, were RPE cells selected to evaluate the side effects of these agents on normal cells? Clarifying these points in the introduction and discussion would guide the reader.

      RPE1, a non-cancer-derived cell line, was chosen for this study to evaluate the effects of anticancer drugs on normal nucleolar function, with the underlying premise that nucleolar stress in normal cells can contribute to non-specific toxicity. This clarification is added to the introduction. Another factor that played in selecting a normal cell line for the drug screen and subsequent experiments was the spectrum of known and unknown genetic and metabolic alterations present in various cancer cell lines. These variables are often unique to a particular cancer cell line and may or may not impact nucleolar proteome and function. Therefore, the nucleolar stress response can be influenced by the spectrum of alterations inherent to each cancer. Our primary focus was to determine the impact of these drugs under normal conditions.

      That said, the selected hits of main drug classes were validated in a panel of cell lines that included two other hTERT lines (BJ5TA and CHON-002) and two cancer lines (DLD1 and HCT116). In cancer cells starting nucleolar normality scores were lower than in hTERT cells, suggesting that genetic and metabolic changes in these cells may indeed affect nucleolar morphology. Nonetheless, all drugs from a panel of selected hits from different target classes validated in both cancer cell lines (Fig. 2F).

      • Figure 2F indicates that DLD1 and HCT116 cells are less sensitive to nucleolar changes induced by several inhibitors, including CDK inhibitors. It would be crucial to correlate these differences with cell viability. Are these differences due to cell-type sensitivity or variations in intracellular drug levels? Assessing cell viability and intracellular drug concentration for the same drugs and cells would provide valuable insights.

      One of the reasons for the reduced magnitude of the effects of selected drugs in DLD1 and HCT116 cells is their lower baseline normality scores compared to hTERT cells (now shown in Sup. Fig. 1B-C). Other potential factors include proteomic and metabolic shifts and alterations in signaling pathways that control ribosome production. The less-likely possibility of variations in intracellular drug levels cannot be excluded, but measuring this for every compound in every cell line was not feasible in this study. These limitations are now noted in the results section.

      Regarding the point about viability - our initial screen output, in addition to normality scores, included cell count (cumulative count of cells in all imaged fields), which serves as a proxy for viability. By this measure, all hit compounds in our screen were cytostatic or cytotoxic in RPE1 cells (Fig. 2C). The impact of these drugs on the viability of cancer cells that can have various degrees of addiction to ribosome biogenesis merits a separate study of a large cancer cell line panel.

      • Have the authors interpreted nucleolar stress as the primary cause of cell death induced by these drugs? When cells treated with CDK inhibitors exhibit the dissociated nucleoli phenotype, is this effect reversible? Is this phenotype indicative of cell death commitment? Conducting a washout experiment to measure the recovery of nucleolar function and cell viability would address these questions.

      Whether nucleolar toxicity is the primary cause of cytotoxicity for a given chemotherapy drug is an incisive and thought-provoking question. Our screen did not discern whether the cytotoxic effects of our hits were due to inhibition of their intended targets, their impact on the nucleolus, or a combined effect. This point is now mentioned in the results section. Regarding the reversibility of the nucleolar disassembly phenotype seen in CDK inhibitors –in the case of flavopiridol, which is a reversible CDK inhibitor, we demonstrated that nucleoli re-assembled within 4-6 hours after the drug was washed out. An example of this is shown in Sup. Figure 3 and in Video 5. For these experiments, cells were pretreated with the drug for 5 hours, not long enough to cause cell death.

      • The correlation between the loss of Treacle phosphorylation and nucleolar stress upon CDK inhibition is intriguing. However, it remains unclear how these two events are related. Would Treacle knockdown yield the same nucleolar phenotype as CDK inhibition? Moreover, would point mutations that abolish Treacle phosphorylation prevent its interaction with Pol-I? Experiments addressing these questions would enhance our understanding of the correlation/causation between Treacle phosphorylation and the effects of CDK inhibition on nucleolar stress.

      We agree that the Treacle finding is interesting and warrants further investigation. In our attempts to knock down Treacle with siRNA, its protein levels were reduced by no more than 50%, which was not sufficient to cause a strong nucleolar stress response. Therefore, these data were not incorporated into the manuscript. However, in our view, Treacle is unlikely to be the only nucleolar CDK substrate whose dephosphorylation is causing the “bare scaffold” phenotype caused by the transcriptional CDK inhibitors. Our phospho-proteomics studies identified multiple nucleolar CDK substrates with established roles in the formation of the nucleolus. For instance, the granular component protein Ki-67 was also dephosphorylated on multiple sites and dispersed throughout the nucleus (shown in Sup. Fig 4). Given that CDKs typically phosphorylate many substrates that can have multiple phosphorylation sites, identifying a sole protein or phosphorylation site responsible for nucleolar disassembly may be an unattainable target.

      Overall, this study is significant and novel as it sheds light on the importance of nucleolar stress in defining the on-target and off-target effects of chemotherapy in normal and cancer cells.

      Thank you, we appreciate the positive and constructive assessment of our study.

      Reviewer #2 (Public Review):

      This is an interesting study with high-quality imaging and quantitative data. The authors devise a robust quantitative parameter that is easily applicable to any experimental system. The drug screen data can potentially be helpful to the wider community studying nucleolar architecture and the effects of chemotherapy drugs. Additionally, the authors find Treacle phosphorylation as a potential link between CDK9 inhibition, rDNA transcription, and nucleolar stress. Therefore I think this would be of broad interest to researchers studying transcription, CDKs, nucleolus, and chemotherapy drug mechanisms. However, the study has several weaknesses in its current form as outlined below.

      1) Overall the study seems to suffer from a lack of focus. At first, it feels like a descriptive study aimed at characterizing the effect of chemotherapy drugs on the nucleolar state. But then the authors dive into the mechanism of CDK inhibition and then suddenly switch to studying biophysical properties of nucleolus using NPM1. Figure 6 does not enhance the story in any way; on the contrary, the findings from Fig. 6 are inconclusive and therefore could lead to some confusion.

      This study was specifically designed to examine a broad range of chemotherapy drugs. The newly created nucleolar normality score enabled us to measure nucleolar stress precisely and in high throughput. Our primary objective was to find drugs that disrupt the normal nucleolar morphology and then study in-depth the most interesting and novel hits. We have made revisions to emphasize that these are the primary focal points of the manuscript.

      As context, we were motivated to explore the biophysical properties of the nucleolus because they are thought to underlie its formation and function, which also suggested a potential predictive value for modeling nucleolar responses to drug treatments. For this, we edited the RPE1 cell line by endogenously tagging NPM1, a granular component protein that behaves in line with the phase-separation paradigm in vitro and when over-expressed. We fully expected to confirm that its behavior in vivo would be consistent with LLPS, but instead found that even in an untreated scenario, the dynamics of endogenous NPM1 could not be fully explained by the phase separation theory (Fig. 6 A-C). Our message is that accurately predicting drug responses using the nucleolar normality score as a readout, based on our current understanding of the biophysical forces governing nucleolar assembly, is unworkable. For instance, normality scores decrease and NPM1 dynamics increase radically when CDKs are inhibited, without changes in NPM1 concentration or concentrations of other protein components (Fig.6 E-H). These observations are important because they highlight our gaps in understanding the relative contribution of phase separation versus active assembly in nucleolar formation. We believe that these observations are worth sharing with the scientific community.

      2) The justification for pursuing CDK inhibitors is not clear. Some of the top hits in the screen were mTOR, PI3K, HSP90, Topoisomerases, but the authors fail to properly justify why they chose CDKi over other inhibitors.

      We decided to focus on CDK inhibitors for several reasons. First, their effects were completely new and unexpected, suggesting the existence of an unknown mechanism regulating nucleolar structure and function. In addition, CDK inhibitors caused a very strong and distinct nucleolar stress phenotype with the lowest normality scores that merited its own term, the “bare scaffold” phenotype. One more reason for pursuing CDK-inhibiting drugs was their high rate of failure in clinics because of the intense and hard-to-explain toxicity. We suspect that this toxicity may be due at least in part to their profound effect on nucleolar organization and ribosome production throughout the body. We stated this rationale more explicitly in the manuscript.

      3) In addition to poor justification, it seems like a very superficial attempt at deciphering the mechanism of CDK9imediated nucleolar stress. I think the most interesting part of the study is the link between CDK9, Pol I transcription, and nucleolar stress. But the data presented is not entirely convincing. There are several important controls missing as detailed below.

      We agree with the reviewer that follow-up studies of CDK9, Pol I, and nucleolar stress connection are important long-term goals. However, the primary objective of this study was to ascertain the scope of anticancer agents that can cause nucleolar stress and the establishment of nucleolar stress categories. This is an important advance and could serve as the foundation for a standalone in-depth study or multiple studies. We have included the complete screen, proteomics, and phospho-proteomics results (Sup. Tables 1, 2, and 3), which will enable other investigators to mine the screen information based on their specific interests. Furthermore, we have made multiple text revisions to clarify rationale and interpretation, and incorporated additional data that strengthen the manuscript.

      4) The authors did not test if inhibition of CDK7 and/or CDK12 also induces nucleolar stress. CDK7 and CDK12 are also major kinases of RNAPII CTD, just like CDK9. Importantly, there are well-established inhibitors against both these kinases. It is not clear from the text whether these inhibitors were included in the screen library.

      Our anticancer compound library contained CDK7 inhibitor THZ1⦁2HCL, and it was a hit at both 1 and 10 uM concentrations (Sup. Table 1). However, its nucleolar stress phenotype was morphologically distinct from CDK9 inhibitors, resembling the stress caps phenotype instead of the bare scaffold phenotype. We did not pursue CDK7 because of its two hard-to-separate functions: in addition to its role as an RNAPII CTD kinase, it also acts as a CDK-activating kinase (CAK) by promoting the associations of multiple CDKs with their cyclin partners. This dual role of CDK7 makes the interpretation of THZ1-induced nucleolar stress phenotype difficult because it could be attributed to either or both of these functions. Moreover, it was reported to cause DNA damage, which may explain why it causes stress caps. An image depicting nucleolar stress phenotype caused by THZ1⦁2HCL is provided in Author response image 1.

      Author response image 1.

      Control and THZ1 - treated RPE1 cells, images from screen plates.

      We are not aware of specific inhibitors of CDK12, as they also reportedly inhibit CDK13. None of the CDK12/CDK13 inhibitors were present in our library, therefore we can neither confirm nor exclude the possible involvement of these kinases in regulating nucleolar structure. Many other existing CDK inhibitors were absent from our library. Our work highlights the importance of assessing their potential to induce nucleolar stress and offers an approach for this assessment.

      5) In Figure 4E, the authors show that Pol I is reduced in nucleolus/on rDNA. The authors should include an orthogonal method like chromatin fractionation and/or ChIP

      We acknowledge the reviewer’s request for additional validation of reduced occupancy of rDNA by Pol I.<br /> Nucleolar chromatin fractionation in cells treated with CDK inhibitors is unlikely to work due to nearly complete nucleolar disassembly. Chromatin immunoprecipitation would require finding and validating a suitable ChIP-grade antibody. Moreover, the evaluation of repetitive regions by ChIP is non-trivial and error-prone. To help address this request and further confirm the POLR1A immunofluorescence results in 4E, we included additional immunofluorescence data obtained with a different POLR1A antibody (Sup. Fig. 3D), and the results were similar.

      6) In Fig. 5D, in vitro kinase lacks important controls. The authors should include S to A mutants of Treacle S1299A/S1301A to demonstrate that CDK9 phosphorylates these two residues specifically.

      7) To support their model, the authors should test if overexpression of Treacle mutants S1299A/S1301A can partially phenocopy the nucleolar stress seen upon CDK9 inhibition. This would considerably strengthen the author's claim that reduced Treacle phosphorylation leads to Pol I disassociation from rDNA and consequently leads to nucleolar stress.

      8) Additionally, it would be interesting if S1299D/S1301D mutants could partially rescue CDK9 inhibition.

      Points (6-8):

      We reiterate that transcriptional CDKs target multiple nucleolar proteins, and the observed phenotype might be due to the combined effects of de-phosphorylation of multiple substrates. We concur that deconstructing the role of Treacle phosphorylation sites is very interesting and warrants further in-depth studies. The phospho-proteomics enrichment method, while an effective first-pass strategy, might not capture 100% of the phosphorylated sites. Treacle is a phospho-protein with an abundance of serine and threonine residues. It could potentially have been selectively dephosphorylated on more sites than were detected by this method. Therefore, the suggested mutations may not be the exclusive contributors responsible for the functional phenotype. Additionally, overexpressing Treacle impairs the viability of RPE1 cells, complicating the interpretation of experiments involving overexpression of both wild-type and mutant proteins. A conceivable strategy would involve generating phosphomimetic and non-phosphorylatable mutants by gene editing, studying their interactions by biochemical approaches, and determining their impact on nucleolar function, but this may take years of additional work. We hope that our work will inspire further studies that explore Treacle phosphorylation and other functions of transcriptional CDKs in nucleolar formation.

      Thank you for the thoughtful review and suggestions.

      Reviewer #2 (Recommendations For The Authors):

      1) The manuscript could be re-organized to focus on 'CDK9-Treacle-Pol I-nucleolar stress' as the central part of the story.

      While we acknowledge this suggestion, it's important to emphasize that the primary focus of this manuscript is on the identification of anticancer drugs that induce nucleolar stress and the establishment of nucleolar stress categories.

      2) Include a "no ATP" control in the in vitro kinase assay and indicate molecular sizes.

      We provided an additional kinase assay (Sup. Fig. 4B) that includes no ATP control lanes and a fragment of a Coomassie blue stained gel showing molecular weight markers. No ATP control assays (lanes 4 and 5) were blank as expected. Molecular weight markers were added to all other kinase assays based on the known sizes of isolated Pol II holoenzyme subunits Rbp1 (191 kDa) and Rbp2 (138 kDa).

      3) For in vitro phosphorylation, please provide an explanation for using CDK9/cyclin K instead of Cyclin T1 which is the predominant cyclin for CDK9

      Recombinant CDK9/cyclin K complex was used for in vitro kinase assays for a technical reason: CDK9/cyclin T obtained from the same vendor appeared to be low quality, as it showed only minimal activity toward our positive control, the isolated Pol II complex. The kinase assays using recombinant CDK9/cyclin T in parallel with CDK9/cyclin K are now presented it Sup. Fig. 4B. The first two assays in this experiment contained Pol II as a substrate, and it is evident that Pol II was phosphorylated much stronger by CDK9/cyclin K than CDK9/cyclin T (comparing lane 1 vs lane 2). Therefore, the lack of detectable Treacle phosphorylation by CDK9/Cyclin T (lane 7), in contrast to strong phosphorylation by CDK9/cyclin K (lane 6), was likely attributable to poor reagent quality rather than physiological differences. We can conclude that CDK9/cyclin K reliably phosphorylates Treacle in vitro, but CDK9/cyclin T kinase assays were inconclusive.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study presents a novel pipeline for the large-scale genomic prediction of members of the non-ribosomal peptide group of pyoverdines based on a dataset from nearly 2000 Pseudomonas genomes. The advance presented in this study is largely based on solid evidence, although some main claims are only incompletely supported. This study on bacterial siderophores has broad theoretical and practical implications beyond a singular subfield.

      Thank you for the supportive and encouraging words. We appreciate the editor’s and reviewers’ careful and professional assessment of this manuscript. The reviewers’ scrutiny has helped us to improve the presentation and discussion of our work. We have now carefully revised the manuscript following their instructive suggestions and comments. Please find below our detailed responses (marked in blue) to each of the comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript introduces a bioinformatic pipeline designed to enhance the structure prediction of pyoverdines, revealing an extensive and previously overlooked diversity in siderophores and receptors. Utilizing a combination of feature sequence and phylogenetic approaches, the method aims to address the challenging task of predicting structures based on dispersed gene clusters, particularly relevant for pyoverdines.

      Predicting structures based on gene clusters is still challenging, especially pyoverdines as the gene clusters are often spread to different locations in the genome. An improved method would indeed be highly useful, and the diversity of pyoverdine gene clusters and receptors identified is impressive.

      However, so far the method basically aligns the structural genes and domains involved in pyoverdine biosynthesis and then predicts A domain specificity to predict the encoded compounds. Both methods are not particularly new as they are included in other tools such as PRISM (10.1093/nar/gkx320) or Sandpuma (https://doi.org/10.1093/bioinformatics/btx400) among others. The study claims superiority in A domain prediction compared to existing tools, yet the support is currently limited, relying on a comparison solely with AntiSMASH. A more extensive and systematic comparison with other tools is needed.  

      Thanks for pointing this out. In the revised manuscript, we have included a comprehensive comparative analysis, in which we compared our pipeline to six different commonly used methods, including NP.searcher, PRISM4, AdenPredictor, SeMPI2, SANDPUMA, antiSMASH5 (see Supplementary_table 6 for details, and lines 281-286). These approaches either consist of a single specific algorithm or integrate several methods. Our approach performs best (see table below), demonstrating a clear improvement over previous tool. The improvements are due to several methodological differences inherent to our approach. Additionally, while exploring existing prediction tools, we found that some had not been maintained for years. For instance, we were unable to access NRPSsp (www.nrpssp.com) and NRPSpredictor2 (http://nrps.informatik.uni-tuebingen.de/). Below, we briefly explain these differences, particularly in relation to PRISM and SANDPUMA, as highlighted by the reviewer. 

      Author response table 1.

      PRISM annotates biosynthetic gene clusters (BGC) and reconstructs the linear structures of NRPS synthetases, with this function depending on proper annotations of open reading frames. This pipeline can have difficulties in assembling the linear structure into a final product. In our approach, we found that the annotations of NRPS gene are frequently truncated because of sequencing errors and annotation issues. Our method fixes this problem through rescanning all possible reading frames of the BGC to rebuild complete pyoverdine synthetase genes. 

      Sandpum and our approach are based on similar ideas (using the prediCAT algorithm) to predict A domain substrates, namely by using the closest reference A domain annotated. However, our method uses a self-adaptive feature extraction step to reduce the co-founding influence of phylogeny. This small adjustment significantly improves the performance of our approach and even works well for small training sets (101 experimentally validated A domains with our approach as opposed to 494 A domains used by Sandpuma from MIBiG).

      Additionally, in contradiction to the authors' claims, the method's applicability seems constrained to well-known and widely distributed gene clusters. The absence of predictions for new amino acids raises concerns about its generalizability to NRPS beyond the studied cases.

      We thank the reviewers for this comment. We acknowledge that our method cannot directly predict new amino acids. Nevertheless, for several reasons we believe that our approach is not constrained and can be widely applied in the future.

      First, our method can identify A domains that select new unknown amino acid substrates. In fact, three of the four unresolved cases in our experimental verification analysis (Fig. 3d) represent new amino acids. Obviously, experimental verification is required to characterize the unknown substrate. Once verified, the new A domains and their substrates can expand the reference dataset, allowing targeted improvement of our phylogeny-focused prediction technique. We now discuss this aspect in lines 634-645.

      Second, despite that the overall substrate diversity in NRPS is high across the microbial kingdom, our analysis suggests that the number of amino acids used for a specific group of secondary metabolites quickly reaches a saturation point. The discovery rate of new amino acids was 1.7% for our experimental Pseudomonas data set (Fig. 3d). The discovery rate of new amino acids was even 0.0 % for the Burkholderiales data set. This suggests that as the database expands, the discovery rate of novel amino acid substrates is expected to drop rapidly.

      Third, we acknowledge that the inability to predict the substrates of unknown domains is a common limitation among all knowledge-guided learning algorithms, including ours. However, we have made significant improvements in prediction accuracy. As the database grows, we expect the rate of unknown substrates to decrease, and the prediction accuracy to increase.

      The manuscript lacks clarity on how the alignment of structural genes operates when dealing with multiple NRPS gene clusters on different genome contigs. How would the alignment of each BGC work?

      We thank the reviewers for this comment. The pyoverdine molecules consist of a conserved fluorescent chromophore (Flu) and a peptide chain (Pep), both synthesized by NRPS enzymes. In most instances (over 90%), Flu and Pep are produced by two separate biosynthetic gene clusters (BGCs). In these cases, we merge the two BGCs by positioning Flu at the head and Pep at the tail. For the remaining less than 10%, there are two scenarios: 1. Flu and Pep are located on the same BGC, which eliminates any issues with BGC alignment. 2. In very rare cases, Flu and Pep are synthesized by three BGCs. Here, Flu is still synthesized by one BGC at the head, while Pep is produced by two BGCs. We put the BGC containing the Thioesterase (TE) domain as the tail and the BGC not containing the TE domain in the middle.

      (see lines 165-169).

      Another critical concern is that a main challenge in NRPS structure prediction is not the backbone prediction but rather the prediction of tailoring reactions, which is not addressed in the manuscript at all, and this limitation extensively restricts the applicability of the method.

      While we thank the reviewer for this comment, we only partly agree with it. Peptide backbone predictions are still a significant challenge. This challenge is clearly visible in our new analysis comparing prediction accuracies of different pipelines, such as antiSMASH5, PRISM4, AdenPredictor, SeMPI2, NP.searcher, Sandpuma. Unresolved and wrong substrate predictions are still common, highlighting the importance of our contribution in developing a new approach with improved high accuracy. 

      However, we agree with the reviewer that our current algorithm does not predict tailoring reactions (now discussed on lines 680-685). Although tailoring reactions are important for predicting the final NRPS product structure, none of the other existing pipelines address this issue either, and it remains a challenge for future work. For our study, it is important to note that the specificity of pyoverdines is primarily determined by the backbone composition, whereas tailoring reactions seem to play a minor role.

      The manuscript presents a potentially highly useful bioinformatic pipeline for pyoverdine structure prediction, showcasing a commendable exploration of siderophore diversity. However, some of the claims made remain unsubstantiated. Overall, while the study holds promise, further validation and refinement are required to fulfill its potential impact on the field of bioinformatic structure prediction.

      Thank you for the supportive and encouraging words. We deeply appreciate your constructive comments and suggestions. 

      Reviewer #2 (Public Review):

      Pyoverdines, siderophores produced by many Pseudomonads, are one of the most diverse groups of specialized metabolites and are frequently used as model systems. Thousands of Pseudomonas genomes are available, but large-scale analyses of pyoverdines are hampered by the biosynthetic gene clusters (BGCs) being spread across multiple genomic loci and existing tools' inability to accurately predict amino acid substrates of the biosynthetic adenylation (A) domains. The authors present a bioinformatics pipeline that identifies pyoverdine BGCs and predicts the A domain substrates with high accuracy. They tackled a second challenging problem by developing an algorithm to differentiate between outer membrane receptor selectivity for pyoverdines versus other siderophores and substrates. The authors applied their dataset to thousands of Pseudomonas strains, producing the first comprehensive overview of pyoverdines and their receptors and predicting many new structural variants.

      The A domain substrate prediction is impressive, including the correction of entries in the MIBiG database. Their high accuracy came from a relatively small training dataset of A domains from 13 pyoverdine BGCs. The authors acknowledge that this small dataset does not include all substrates, and correctly point out that new sequence/structure pairs can be added to the training set to refine the prediction algorithm. 

      The authors could have been more comprehensive in finding their training set data. For instance, the authors claim that histidine "had not been previously documented in pyoverdines", but the sequenced strain P. entomophila L48, incorporates His (10.1007/s10534-009-9247-y). 

      Thank you for highlighting this issue. We agree that stating histidine has not been reported before in pyoverdine was incorrect. We have reviewed the full text and made the necessary corrections.

      The primary reason for excluding the sequenced strains P. syringae 1448a (10.1186/14712180-11-218) and P. entomophila L48 (10.1007/s10534-009-9247-y) from the training set is that the pyoverdine structures of these strains were not determined solely through experimental methods. In these works, the pyoverdine structures were predicted based on the synthetic gene sequence using bioinformatical analysis, followed by structural analysis experiments based on this predicted structure. We found that pre-prediction probably has introduced biases into downstream analyses. Specifically, in the case of Pseudomonas entomophila L48, we discovered inaccuracies in the annotation of certain domains (see figures below). For example, the third A domain of the peptide chain in P. entomophila L48 pyoverdine was initially annotated with Dab specificity. However, upon closer examination, it appears to differ significantly from other Dab references (top) or Dab from our experimentally validated (right) domains (left panel in the figure below). By analyzing the interface (I) domain (10.1073/pnas.1903161116) in its predicted site, we suggested that it should actually recognize OHHis. The OHAsp domain of P. entomophila L48 reported in the paper is actually close in sequence similarity to the OHAsp domain (left panel in the figure below), while the Ala domain reported is more similar to the Ser domain (right panel in the figure below). For these reasons, we did not include this supervised pyoverdine structure analysis strain in the training set data.

      Author response image 1.

      The workflow cannot differentiate between different variants of Asp and OHOrn, and it's not clear if this is a limitation of the workflow, the training data, or both. 

      Thanks for pointing this out. It is generally challenging to differentiate between variants of the same amino acid (for all the algorithms existing to date). In this sense, it is a limitation of our but also of all other workflows. Nonetheless, we wish to stress that we observed feature sequence divergence (using the A motif4-5 region), which helped us to separate some (but not all) of the Asp and Orn variants. For example, separations between Asp-variants are distinct (left panel in the figure below). To be on the conservative side, we only differentiated between OHAsp and Asp for our predictions, but also differentiation between DOHAsp and OHAsp would be possible. In the case of Orn-variants, there was a clear separation between Orn and the OHOrn variants (right panel). In contrast, it was difficult to differentiate between the subgroups of OHOrn variants. We believe that no A domain prediction tool will be able to solve this issue. Instead, it would be important to include information on substrate-modifying enzymes in future approaches.

      Author response image 2.

      The prediction workflow holds up well in Burkholderiales A domains, however, they fail to mention in the main text that they achieved these numbers by adding more A domains to their training set.

      We thank the reviewers for this comment. We apologize for not having mentioned the training data set in the main text, while we described it in detail in the methods section (lines 714-732). We now provided more details on the analysis procedure in the main text (lines 307313). Important to note is that we did not add more A domains to the training data set but built up a new independent data set for Burkholderiales. The aim was to mirror the analysis we performed for pyoverdines with a completely new data set, featuring 124 A domains for training and 178 A domains as test set.

      To validate their predictions, they elucidated structures of several new pyoverdines, and their predictions performed well. However, the authors did not include their MS/MS data, making it impossible to validate their structures. In general, the biggest limitation of the submitted manuscript is the near-empty methods section, which does not include any experimental details for the 20 strains or details of the annotation pipeline (such as "Phydist" and "Syndist"). The source code also does not contain the requisite information to replicate the results or re-use the pipeline, such as the antiSMASH version and required flags. That said, skimming through the source code and data (kindly provided upon request) suggests that the workflow itself is sound and a clear improvement over existing tools for pyoverdine BGC annotation.

      Thank you for highlighting these issues. We agree that the methods section is short. This is because the entire paper is a step-by-step methodological introduction to our pipeline. We have now carefully revised the main text to add the information requested by the reviewer. Moreover, we have included a supplementary file with the MS/MS data of the experimentally analyzed pyoverdine structures. Finally, we further include a link to a one-click online notebook that can be used to replicate the annotation and substrate prediction results See: https://drive.google.com/drive/folders/1JsfyPUGDTFo8BDDZk8JLSvKry8emzMhr?usp=drive_ link , following a more detail explanation on code.

      Predicting outer membrane receptor specificity is likewise a challenging problem and the authors have made a promising achievement by finding specific gene regions that differentiate the pyoverdine receptor FpvA from FpvB and other receptor families. Their predictions were not tested experimentally, but the finding that only predicted FpvA receptors were proximate to the biosynthesis genes lends credence to the predictive power of the workflow. The authors find predicted pyoverdine receptors across an impressive 468 genera, an exciting finding for expanding the role of pyoverdines as public goods beyond Pseudomonas. However, whether or not these receptors can recognize pyoverdines (and if so, which structures!) remains to be investigated.

      Thank you for the supportive and encouraging words. The bioinformatic analysis and experimental testing of pyoverdine-receptor matching is complicated and it is not part of this paper. We treated it in a separate manuscript in which we developed an experimentally verified co-evolution algorithm that matches pyoverdines to receptors. With this algorithm, we can identify self-receptors (i.e. receptors used to take up the self-produced pyoverdine), and therefore establish pyoverdine sharing and interaction networks across strains in communities.

      Please see DOI:10.1101/2023.11.05.565711 for details.

      In all, the authors have assembled a rich dataset that will enable large-scale comparative genomic analyses. This dataset could be used by a variety of researchers, including those studying natural product evolution, public good eco/evo dynamics, and NRPS engineering.

      Thank you for the supportive and encouraging words. We are grateful for the reviewers’ instructive suggestions and comments.

      Reviewer #3 (Public Review):

      Summary:

      Secondary metabolites are produced by numerous microorganisms and have important ecological functions. A major problem is that neither the function of a secondary metabolite enzyme nor the resulting metabolite can be precisely predicted from gene sequence data.

      In the current paper, the authors addressed this highly relevant question.

      The authors developed a bioinformatic pipeline to reconstruct the complete secondary metabolism pathway of pyoverdines, a class of iron-scavenging siderophores produced by Pseudomonas spp. These secondary metabolites are biosynthesized by a series of nonribosomal peptide synthetases and require a specific receptor (FpvA) for uptake. The authors combined knowledge-guided learning with phylogeny-based methods to predict with high accuracy encoding NRPSs, substrate specificity of A domains, pyoverdine derivatives, and receptors. After validation, the authors tested their pipeline with sequence data from 1664 phylogenetically distinct Pseudomonas strains and were able to determine 18,292 enzymatic A domains involved in pyoverdine synthesis, reliably predicted 97.8% of their substrates, identified 188 different pyoverdine molecule structures and 4547 FpvA receptor variants belonging to 94 distinct groups. All the results and predictions were clearly superior to predictions that are based on antiSMASH. Novel pyoverdine structures were elucidated experimentally by UHPLC-HR-MS/MS.

      To assess the extendibility of the pipeline, the authors chose Burkholderiales as a test case which led to the results that the pipeline consistently maintains high prediction accuracy within Burkholderiales of 83% which was higher than for antiSMASH (67%).

      Together, the authors concluded that supervised learning based on a few known compounds produced by species from the same genus probably outperforms generalized prediction algorithms trained on many products from a diverse set of microbes for NRPS substrate predictions. As a result, they also show that both pyoverdine and receptor diversity have been vastly underestimated.

      Strengths:

      The authors developed a very useful bioinformatic pipeline with high accuracy for secondary metabolites, at least for pyoverdines. The pipelines have several advantages compared to existing pipelines like the extensively used antiSMASH program, e.g. it can be applied to draft genomes, shows reduced erroneous gene predictions, etc. The accuracy was impressively demonstrated by the discovery of novel pyoverdines whose structures were experimentally substantiated by UHPLC-HR-MS/MS.

      The manuscript is very well written, and the data and the description of the generation of pipelines are easy to follow.

      Weaknesses:

      The only major comment I have is the uncertainty of whether the pipeline can be applied to more complex non-ribosomal peptides. In the current study, the authors only applied their pipeline to a very narrow field, i.e., pyoverdines of Pseudomonas and Burkholderia strains.

      Thanks for your positive and encouraging comment. Regarding your only major comment, we think that the design concept of our pipeline has the potential to be applied to more complex non-ribosomal peptides. Currently, our method is tailored to accurately predict the structural composition of the Pseudomonas siderophore pyoverdine (see also response 3). A key point emphasized in our article is the importance of considering phylogeny in developing substrate prediction algorithms for A domains. Currently, the main challenge in advancing these algorithms is the limited availability of data on A domains and their corresponding substrates. However, with the future accumulation of more reference data, we are confident that the design principles of our method will enable precise predictions of the structural compositions of all products synthesized by non-ribosomal peptide synthetases (see our discussions in lines 634-

      645). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I believe that the manuscript would benefit from focusing solely on the task of improving pyoverdine predictions. This aspect alone is significant, and robustly supporting this claim would strengthen the manuscript. The diversity analysis provided is valuable and would undoubtedly benefit the scientific community. However, additional systematic comparisons with other methods are necessary. Furthermore, clarification of certain terms, such as 'featurebased' (e.g., whether it refers to NRPS domains or CDS), would enhance clarity.

      Thank you for the supportive and encouraging words. We followed the reviewer’s suggestion and now provide the requested method comparison, see also response 2 for details. Furthermore, we have carefully checked the main text to clarify terms whenever needed. Specifically, we now define the terms “feature sequence” and “feature sequence distance” in lines 227-229.  

      Additionally, several minor points could be improved upon:

      In line 85, clarification is needed on how pyoverdine genes were identified.

      Thank you for your thorough review. In the introduction section, we provided a brief overview of our work, while the detailed methodology is outlined in the results section on lines 160-174.

      In line 382, it would be helpful to know the source of the sequences.

      We agree and have now carefully revised the manuscript following your suggestions (lines 403-405).

      Line 392 could be explained more clearly. Does it mean that the authors used an hmm search to search pHMMs against each reference sequence?

      Thanks for your comment. Yes, we used an hmm search to search pHMMs against each reference sequence. We have now revised the manuscript to improve explanations (lines 413-418).

      Reviewer #2 (Recommendations For The Authors):

      The authors state they "elucidated the chemical structure of the 20 pyoverdines using culturebased methods combined with UHPLC-HR-MS/MS", so I was alarmed to see that KR and LB already published several of those structures in the cited paper. I hope that this "double dipping" will be fixed in a revision process.

      Thank you for pointing this out. We agree that we have not explained clearly enough what steps were conducted in this study and which data were used from a previous paper (https://doi.org/10.1007/s00216-022-03907-w). The genomes of the 20 strains used for the verification analysis (Fig. 3d) were sequenced as part of this study (access code now provided). 14 out of the 20 pyoverdine structures were elucidated with UHPLC-HR-MS/MS in this study. For 6 out of the 20 pyoverdines, we had structural information already at hand from the previous paper. We have now clarified these details in our manuscript (lines 276-280). 

      Thank you for providing the source code and data, and I hope that the final non-redundant dataset will be uploaded to Zenodo or another repository. Please deposit the 20 newlysequenced genomes to GenBank or another public repository. Please also show the UHPLC-

      HR-MS/MS data, preferably in the form of raw data uploaded to GNPS.

      We have followed the reviewer’s advice and deposited our data:

      - The sequences of the 20 newly sequenced strains are available on ENA accession PRJEB76792.

      - The MS/MS plots of the 14 newly analyzed pyoverdines are shown in the Supplementary Materials.

      - We provide a one-click online notebook to allow readers to replicate the pyoverdine cluster annotation and substrate prediction of the 20 experimentally analyzed strains.

      I suggest adding "at least" or a similar qualifier when the 73 variants are mentioned unless the literature search was truly exhaustive. What were the criteria for inclusion of the 13 strains in Table S2? For instance, sequenced strains P. syringae 1448a (10.1186/1471-2180-11-218) and P. entomophila L48 (10.1007/s10534-009-9247-y) were not included.

      Thank you for your comment. We have now carefully revised the manuscript following your suggestions (lines 291-295). Regarding the criteria for including the 13 strains in Table S2, we aimed to select strains with the high credibility for inclusion in the training set data. The primary reason for excluding the two strains from the training set is that their siderophore structures were analyzed through supervised experiments. We wanted to avoid any form of biases that bioinformatic pre-predictions could introduce to downstream analyses (see Response 13 for details).

      OHAsp in pyoverdines has been reported to arise from hydroxylation of Asp after it's already been activated by the A domain (10.1073/pnas.1903161116). Was there a clear difference between A domains that lead to Asp and OHAsp? Conversely, acetylation and formylation of OHOrn occur before adenylation. Can your workflow be used to differentiate cOHOrn, fOHOrn, and AcOHOrn, which are currently difficult to predict through genome mining?

      Thank you for these considerations. We treated these aspects in our response 8.  

      Throughout, define non-proteinogenic AA substrate abbreviations (ex: Rsc, Dab).

      Revised as per suggestion (lines 329-333).

      Additional line comments:

      189: Mention PhyloPhlAn in the main text.

      Revised as per suggestion (lines 189).

      191: Define these filtering/selection criteria.

      Thanks for your comment, we have added the criteria in the main text (line 196 and line 198). 

      309, 620: An A domain presumably loading histidine is present in sequenced strain P. entomophila L48 (10.1007/s10534-009-9247-y). Please also clarify that Val has previously been seen in a pyoverdine (it is in Table S1) albeit not sequenced.

      We have clarified these aspects as per suggestion (lines 314-315 and line 630).

      310: The pipeline can "highlight" new substrates, but not identify them.

      Revised as per suggestion (line 295).

      354: Please clarify "13 amino acid substrates form the core of all the 188 pyoverdine structures", considering that 279 A domain substrates couldn't be predicted.

      Thanks for your comments. We have now clarified “our analysis found that 13 amino acids form the main structural substrates of all the 188 pyoverdine structures.” (lines

      360-363)

      630: "discovered" implies that there is experimental evidence. I suggest something like "here we predicted 151 putatively new variants".

      Revised as per suggestion (line 648).

      Reviewer #3 (Recommendations For The Authors):

      Weakness:

      The only major comment I have is the uncertainty of whether the pipeline can be applied to more complex non-ribosomal peptides. In the current study, the authors only applied their pipeline to a very narrow field, i.e., pyoverdines of Pseudomonas and Burkholderia strains

      Thanks for your comment. Please see our Responses 3+13 above, where we treat this concern in detail. Moreover, we discussed the possibility of extension to other groups of secondary metabolites in our discussion. We believe that we deliver a balanced view on the applicability of our approach and the next steps to be taken.  

      Please comment on this aspect.

      Minor:

      (1)  When you speak about "synthesis" it is rather biosynthesis. Synthesis is chemical synthesis.

      Please replace all instances of the word synthesis with biosynthesis.

      Revised as per suggestion.

      (2)  Line 188: synthetase is rather synthetases

      Revised as per suggestion (line 191).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Point 1: While the manuscript is methodologically sound, the following aspects of image acquisition and data analysis need to be clarified to ensure replicability and reproducibility. The authors state that the sample is a "population-derived adult lifespan sample", the lack of demographic information makes it impossible to know if the sample is truly representative. Though this may seem inconsequential, education may impact both cognitive performance and functional activation patterns. Moreover, the authors do not report race/ethnicity in the manuscript. This information is essential to ensure representativeness in the sample. It is imperative that barriers to study participation within minoritized groups are addressed to ensure rigor and reproducibility of findings.

      First, the section Methods-Participants has been updated to refer readers to a prior article where the sample’s demographics are broken down into nine decile age groups (see Wu et al. 2023 Table 1), including information about their education levels. Secondly, we have updated the Data Availability section text to indicate that all Cam-CAN IDs are included in the available OSF datasets, allowing anyone to verify additional participant demographics described in the Cam-CAN protocol article (Shafto et al., 2014). Third, we have updated the Participants section text to refer to another prior study that reported on the representativeness of the Cam-CAN sample indicating that at least some elements of the sample have been independently deemed as representative (e.g., Sex).

      Page-24

      “A healthy population-derived adult lifespan human sample (N = 223; ages approximately uniformly distributed from 19 - 87 years; females = 112; 50.2%) was collected as part of the Cam-CAN study (Stage 3 cohort; Shafto et al., 2014). Participants were fluent English speakers in good physical and mental health, based on the Cam-CAN cohort’s exclusion criteria which includes poor mini mental state examination, ineligibility for MRI and medical, psychiatric, hearing or visual problems. Throughout analyses, age is defined at the Home Interview (Stage 1; Shafto et al., 2014). The study was approved by the Cambridgeshire 2 (now East of England–Cambridge Central) Research Ethics Committee and participants provided informed written consent. Further demographic information of the sample is reported in Wu et al. (2023) and is openly available (see section Data Availability) with a recent report indicating the representativeness of the sample across sexes (Green et al., 2018).”

      Page-30

      “Raw and minimally pre-processed MRI (i.e., from automatic analysis; Taylor et al., 2017) and behavioural data are available by submitting a data request to Cam-CAN (https://camcan-archive.mrc-cbu.cam.ac.uk/dataaccess/). The univariate and multivariate ROI data, and behavioural data, can be downloaded from the Open Science Framework, which includes Cam-CAN participant identifiers allowing the retrieval of any additional demographic data (https://osf.io/v7kmh), while the analysis code is available on GitHub.”

      Point 2: For the whole-brain analysis in which the ROIs were derived, the authors used a threshold-free cluster enhancement (TFCE; Smith & Nichols 2009). The methodological paper cited suggests that individuals' TCFE image should still be corrected for multiple comparisons using the following: "to correct for multiple comparisons, one [...] has to build up the null distribution (across permutations of the input data) of the maximum (across voxels) TFCE score, and then test the actual TFCE image against that. Once the 95th percentile in the null distribution is found then the TFCE image is simply thresholded at this level to give inference at the p < 0.05 (corrected) level." (Smith & Nichols, 2009). Although the authors mention that clusters were estimated using 2000 permutations, there is no mention of the TFCE image itself being thresholded. While this would impact the overall size of the ROIs used in the study, the remaining analyses are methodologically sound.

      We have updated the text to detail the t=1.97 (i.e., p = .05) threshold we applied before interpretation of the resultant TFCE images to the section: Experimental Design & Statistical Analysis. This threshold value can also be verified in the analytics code that is referenced on GitHub from the section Data Availability within the requisite toolbox functions: https://github.com/kamentsvetanov/CommonalityAnalysis/blob/main/code/ca_vba_tfce_threshold.m#L24 and https://github.com/kamentsvetanov/CommonalityAnalysis/blob/main/code/external/ca_matlab_tfce_transform.m

      Page-30

      “For whole-brain voxelwise analyses, clusters were estimated using threshold-free cluster enhancement (TFCE; Smith & Nichols 2009) with 2000 permutations and the resulting images were thresholded at a t-statistic of 1.97 before interpretation.”

      Point 3: The authors should consider moving the ROI section to results. The way the manuscript currently reads, the ROIs seem to be derived a priori as opposed to being derived from activation maps in the current study.

      After consideration of this point, we have decided to leave the methodological details regarding the definition of ROIs in the methods, to maintain the focus of the Results section. However, we have improved signposting in the results section to highlight that the ROIs were derived from the overlapped activation maps.

      Page-8

      “Crucially, two areas of the brain showed spatially-overlapping positive effects of age and performance, which is suggestive of an age-related compensatory response (Figure 2A yellow intersection). These were in bilateral cuneal cortex (Figure 2B magenta) and bilateral frontal cortex (Figure 2B brown), the latter incorporating parts of the middle frontal gyri and anterior cingulate. Therefore, based on traditional univariate analyses, these are two candidate regions for age-related functional compensation (Cabeza et al. 2013; 2018). Accordingly, we defined regions of interest within these two regions using the overlap activation maps (see section: ROIs) to be used for subsequent univariate and multivariate analysis.”

      Point 4: The manuscript can be strengthened by explaining why the authors chose a greedy search algorithm over a dynamic Bayesian model.

      The text is updated to refer to appropriateness of the computationally efficient greedy search implementation, due to the size of the fMRI cohort dataset.

      Page-28

      “The pattern weights specifying the mapping of data features to the target variable are optimized with a greedy search algorithm using a standard variational scheme (Friston et al., 2007) which was particularly appropriate given the large dataset.”

      Reviewer #2:

      Point 1: However, it might have been nice to see an analysis of a more crystallised intelligence task included too, as a contrast since this is an area that does not demonstrate such a decline (and perhaps continues to improve over aging).

      We (Samu et al., 2017) have previously investigated, but failed to find, univariate evidence for functional compensation in this cohort’s performance on a sentence comprehension task that is more closely aligned to a measure of crystallised intelligence. Based on the additional previous studies where we have applied these types of univariate and multivariate criteria of functional compensation (Morcom & Henson, 2018; Knights et al., 2021), we have consistently observed that the uni-/multivariate effects are in the same direction. Therefore, we would not initially expect a different conclusion here, where the univariate and multivariate effects suggest different outcomes. Notably, the univariate analysis approach in Samu et al. (2017) did differ from focusing on the age x behaviour interaction term here, so it could still be worth future investigation, but it does seem less likely that evidence of compensation would be observed than for fluid intelligence. However, as the Reviewer suggests, such a task may make another good contrast to show evidence against the existence of functional compensation (as in Morcom & Henson, 2018; Knights et al., 2021).

      Point 2: Figure 1B: Consider adding coefficients describing relationships to plots.

      Annotations of the coefficients have been added to Figure 1B:

      Point 3: Figure 2C. The scale of the axis for RSFA-Scales cuneal cortex ROI activations should be the same as the other 3 plots.

      Figure axes are updated such that ROIs are on matching scales, according to whether data were RSFA-scaled or not.

      Point 4: Figure 2C. Adding in the age ranges for each of the three groups following the tertile split may be informative to the reader.

      The age group tertile definition used for Figure 2C visualisations is now added to the Figure description.

      Page-10

      “Figure 2. Univariate analysis. (A) Whole-brain effects of age and performance. Age (green) and performance (red) positively predicted unique aspects of increased task activation, with their spatial overlap (yellow) being overlaid on a template MNI brain, using p < 0.05 TFCE. (B) Intersection ROIs. A bilateral cuneal (magenta) and frontal cortex (brown) ROI were defined from voxels that showed a positive and unique effect of both age and performance (yellow map in Figure 2A). (C) ROI Activation. Activation (raw = left; RSFA-scaled = right) is plotted against behavioural performance based on a tertile split between three age groups (19-44, 45-63 & 64-87 years).”

      Reviewer #3:

      Point 1: [Public Review] 1) I don't quite follow the argumentation that compensatory recruitment would need to show via non-redundant information carried by any given non-MDN region (cf. p14). Wouldn't the fact that a non-MDN region carries task-related information be sufficient to infer that it is involved in the task and, if activated increasingly with increasing age, that its stronger recruitment reflects compensation, rather than inefficiency or dedifferentiation? Put differently, wouldn't "more of the same" in an additional region suffice to qualify as compensation, as compared to the "additional information in an additional region" requirement set by the authors? As a consequence, in my honest opinion, showing that decoding task difficulty from non-MDN ROIs works better with higher age would already count as evidence for compensation, rather than asking for age-related increases in decoding boosts obtained from adding such ROIs. It would be interesting to see whether the arguably redundant frontal ROI would satisfy this less demanding criterion. At any rate, it seems useful to show whether the difference in log evidence for the real vs. shuffled models is also related to age.

      We agree with the logic for conducting a weaker assessment of functional compensation whereby a brain region does not necessarily have to provide a unique contribution beyond that of the ordinarily activated task-relevant network. However, although non-unique recruitment is predicted by a compensation theory, it can also be explained by a nonspecific mechanism that recruits multiple regions in tandem. In contrast, unique additional recruitment is compatible with compensation but not with nonspecific recruitment. In this article, and those prior (Morcom & Henson, 2018; Knights et al. 2021), we have also deliberately avoided using the specific kind of analysis proposed (i.e., testing for an effect of age on differential log evidence) because these would involve applying statistical tests directly to the log evidence, a variable that is already a statistical test output.

      Nevertheless, temporarily putting these caveats aside, we did run the suggested test. Results from multiple regression showed that using log evidence from frontal cortex models still did not meet this less demanding criterion for functional compensation as there was an effect of age in the opposite direction to that expected by functional compensation: there was a significant negative effect of age (t(218) = -7.95, p = < .001) indicating that as age increased, the difference in log evidence decreased. This effect is visualised below for transparency, but we preferred not to add this information to the article because we do not wish to encourage using this kind of analysis for the reason mentioned above. Thus, although our main multivariate test of interest is stringent, the additional step of mapping log evidence back to the boost-likelihood categories (e.g., boost vs. no difference to model performance) lends itself to the more appropriate logistic regression statistical approach.

      Author response image 1.

      Negative effect of age on MVB log evidence model outcomes for frontal cortex.

      A different approach that could be taken to assess a more lenient definition of functional compensation would be to analyse the effects of age on the spread of multivariate responses predicting task difficulty (i.e., standard deviation of fitted MVB voxel weights; also see Morcom & Henson, 2018; Knights et al., 2021) specifically from models that only include the candidate ‘compensation’ ROIs.

      Accordingly, these analyses and their discussion have been added to the article. To summarise, these analyses showed that (1) the frontal cortex still did not show evidence of functional compensation (i.e., a negative effect of age like in Morcom & Henson, 2018) and (2) no effect of age on the cuneal ROI, implying that the original model comparison approach (i.e., Figure 2C in the manuscript now) can provide more sensitivity for detecting evidence of functional compensation (perhaps because of the importance of including task-relevant network responses when building decoding models).

      Page-15

      “As a final analysis, we also tested a more lenient definition of functional compensation, whereby the multivariate contribution from the “compensation ROI” does not necessarily need to be above and beyond that of the task-relevant network (Morcom & Henson, 2018; Knights et al., 2021). To do this, we again assessed whether age was associated with an increase in the spread (standard deviation) of the weights over voxels, for smaller models containing only the cuneal or frontal ROI. This tested whether increased age led to more voxels carrying substantial information about task difficulty, a pattern predicted by functional compensation (but also consistent with non-specific additional recruitment). In this case, the results of this test did not support functional compensation, as there was no effect detected for the cuneal cortex and even a negative effect of age for the frontal cortex where the spread of the information across voxels was lower for older age (Figure 3C; Table 2).”

      Page-21

      “The age- and performance-related activation in our frontal region satisfied the traditional univariate criteria for functional compensation, but our multivariate (MVB) model comparison analysis showed that additional multivariate information beyond that in the MDN was absent in this region, which is inconsistent with the strongest definition of compensation. In fact, the results from the spread analysis showed that as age increased, this frontal area processed less, rather than more, multivariate information about the cognitive outcome (Figure 3C) as previously observed in two (memory) tasks for a comparable ROI within the same Cam-CAN cohort (Morcom & Henson, 2018).”

      Page-24

      “This said, univariate criteria for functional compensation will continue to play a role in hypothesis testing. For instance, the over-additive interaction observed in the cuneal cortex - where the increase in activity with better performance is more pronounced in older adults - offers stronger evidence of compensation compared to the simple additive effect of age and performance observed in the frontal cortex (Figure 2C). So far, the two studies that have combined these rigorous univariate, behavioral and multivariate approaches to assess functional compensation (i.e., Knights et al., 2021; the present study) have generally found converging evidence regardless of the method used. However, it is important to note that the MVB approach uniquely shifts the focus from individual differences to the specific task-related information that compensatory neural activations are assumed to carry and provides a specific test of region- (or network-) unique information. With further studies, it may also be that multivariate approaches prove more sensitive for detecting compensation effects than when using mean responses over voxels (e.g., Friston et al., 1995) particularly since over-additive effects are challenging to observe because compensatory effects are typically ‘partial’ and do not fully restore function (for review see Scheller et al., 2014; Morcom & Johnson, 2015). Within the multivariate analysis options themselves, it is also interesting to highlight that the stringent MVB boost likelihood analysis could detect functional compensation unlike the more lenient analysis focusing on the spread of MVB voxel weights. This suggests the importance of including task-relevant network responses when building decoding models to assess compensation.”

      Page-32

      “Alongside the MVB boost analysis, we also included an additional measure using the spread (standard deviation) of voxel classification weights (Morcom & Henson, 2018). This measure indexes the absolute amplitude of voxel contributions to the task, reflecting the degree to which multiple voxels carry substantial task-related information. When related to age this can serve as a multivariate index of information distribution, unlike univariate analyses. However, it is worth highlighting that even if an ROI shows an effect of age on this spread measure, such an effect could instead be explained by a non-specific mechanism that represents the same information in tandem across multiple regions (rather than reflecting compensation) as seen previously (Knights et al., 2021; also see Morcom & Johnson, 2015). Thus, it is the MVB boost analysis that is the most compelling assessment of functional compensation because it can directly detect novel information representation.”

      Point 2: [Public Review] 2) Relatedly, does the observed boost in decoding by adding the cuneal ROI (in older adults) really reflect "additional, non-redundant" information carried by this ROI? Or could it be that this boost is just a statistical phenomenon that is obtained because the cuneus just happens to show a more clear-cut, less noisy difference in hard vs. easy task activation patterns than does the MDN (which itself may suffer from increased neural inefficiency in older age), and thus the cuneaus improves decoding performance without containing additional (novel) pieces of information (but just more reliable ones)? If so, the compensation account could still be maintained by reference to the less demanding rationale for what constitutes compensation laid out above.

      We agree that this is a possibility and have added this as an additional explanation to the Discussion. We have also discussed why we think it is a less likely possibility, but do concede that it cannot be ruled out currently.

      Page-20

      “Another possibility is that the age-related increases in fMRI activations (for hard versus easy) in one or both of our ROIs do not reflect greater fMRI signal for hard problems in older than younger people, but rather lower fMRI signal for easy problems in the older. Without a third baseline condition, we cannot distinguish these two possibilities in our data. However, a reduced “baseline” level of fMRI signal (e.g., for easy problems) in older people is consistent with other studies showing an age-related decline in baseline perfusion levels, coupled with preserved capacity of cerebrovascular reactivity to meet metabolic demands of neuronal activity at higher cognitive load  (Calautti et al., 2001; Jennings et al., 2005). Though age-related decline in baseline perfusion occurs in the cuneal cortex (Tsvetanov et al., 2021), the brain regions showing modulation of behaviourally-relevant Cattell fMRI activity by perfusion levels did not include the cuneal cortex (Wu et al., 2023). This suggests that the compensatory effects in the cuneus are unlikely to be explained by age-related hypo-perfusion, consistent with the minimal effect here of adjusting for RSFA (Figure 2C).

      One final possibility is whether the observed boost in decoding from adding the cuneal ROI simply reflects less noisy task-related information (i.e., a better signal-to-noise ratio (SNR)) than the MDN and, consequently, the boosted decoding is the result of more resilient patterns of information (rather than the representation of additional information) based on a steeper age-related decline of SNR in the MDN. Overall then, as none of the explanations above agree with all aspects of the results, to functionally explain the role of the cuneal cortex in this task would require further investigation.”

      Point 3: [Public Review] 3) On page 21, the authors state that "...traditional univariate criteria alone are not sufficient for identifying functional compensation." To me, this conclusion is quite bold as I'd think that this depends on the unvariate criterion used. For instance, it could be argued that compensation should be more clearly indicated by an over additive interaction as observed for the relationship of cuneal activity with age and performance (i.e., the activity increase with better performance becomes stronger with age), rather than by an additive effect of age and performance as observed for the prefrontal ROI (see Fig. 2C). In any case, I'd appreciate it if the authors discussed this issue and the relationship between univariate and multivariate results in more detail (e.g. how many differences in sensitivity between the two approaches have contributed), in particular since the sophisticated multivariate approach used here is not widely established in the field yet.

      We have now considered this point further in a section of the Discussion (which is merged with points 1 & 2 above) about the relevance and distinction of univariate / multivariate criteria for functional compensation. As described in text below, whilst we agree that univariate / behavioural approaches have a role in testing functional compensation, we still view the MVB boost analysis to be a particularly compelling approach for assessing this theory.

      Page-22

      “This said, univariate criteria for functional compensation will continue to play a role in hypothesis testing. For instance, the over-additive interaction observed in the cuneal cortex - where the increase in activity with better performance is more pronounced in older adults - offers evidence of compensation compared to the simple additive effect of age and performance observed in the frontal cortex (Figure 2C). However, the conclusions that can be drawn from age-related differences in cross-sectional associations of brain and behaviour are limited, mainly because individual performance differences are largely lifespan-stable (see Lindenberger et al., 2011; Morcom & Johnson, 2015). So far, the two studies that have combined these univariate-behavioral and multivariate approaches to assess functional compensation (i.e., Knights et al., 2021; the present study) have generally found converging evidence regardless of the method used. However, it is important to note that the MVB approach uniquely shifts the focus from individual differences to the specific task-related information that compensatory neural activations are assumed to carry. With further studies, it may also be that multivariate approaches prove more sensitive for detecting compensation effects than when using mean responses over voxels (e.g., Friston et al., 1995) particularly since over-additive effects are challenging to observe because compensatory effects are typically ‘partial’ and do not fully restore function. Within the multivariate analysis options themselves, it is also interesting to highlight that the stringent MVB boost likelihood analysis could detect functional compensation unlike the more lenient analysis focusing on the spread of MVB voxel weights. This suggests the importance of including task-relevant network responses when building decoding models to asses compensation.”

      Point 4: [Public Review] 4) As to the exclusion of poorly performing participants (see p24): If only based on the absolute number of errors, wouldn't you miss those who worked (overly) slowly but made few errors (possibly because of adjusting their speed-accuracy tradeoff)? Wouldn't it be reasonable to define a criterion based on the same performance measure (correct - incorrect) as used in the main behavioural analyses?

      This is a good point, though if we were to exclude participants using a chance level exclusion rate based on the formulae used for measuring behavioural performance, this removes identical subjects to those originally excluded. Based on this, the text has been updated to reflect this more parsimonious approach for defining exclusion criteria.

      Page-25

      “In a block design, participants completed eight 30-second blocks which contained a series of puzzles from one of two difficulty levels (i.e., four hard and four easy blocks completed in an alternating block order; Figure 1A). The fixed block time allowed participants to attempt as many trials as possible. Therefore, to balance speed and accuracy, behavioural performance was measured by subtracting the number of incorrect from correct trials and averaging over the hard and easy blocks independently (i.e., ((hard correct - hard incorrect) + (easy correct - easy incorrect))/2; Samu et al., 2017). For assessing reliability and validity, behavioural performance (total number of puzzles correct) was also collected from the same participants during a full version of the Cattell task (Scale 2 Form A) administered outside the scanner at Stage 2 of the Cam-CAN study (Shafto et al., 2014). Both the in- and out-of-scanner measures were z-scored. We excluded participants (N = 28; 17 females) who performed at chance level ((correct + incorrect) / incorrect < 0.5) on the fMRI task, leading to the same subset as reported in Samu et al. (2017).”

      Point 5: [Public Review] 5) Did the authors consider testing for negative relationships between performance and brain activity, given that there is some literature arguing that neural efficiency (i.e. less activation) is the hallmark of high intelligence (i.e. high performance levels in the Cattell task)? If that were true, at least for some regions, the set of ROIs putatively carrying task-related information could be expanded beyond that examined here. If no such regions were found, it would provide some evidence bearing on the neural efficiency hypothesis.

      No, we did not test for negative relationships between performance and brain activity in this study. However, In Wu et al. (2023) we did specifically test for this and neither of the relevant results reported in section 3.3.1 (i.e., unique relationship between activity and performance) nor section 3.3.2 (i.e., age-related relationship between activity and performance) showed the queried direction of effects. Note that the negative effect in section 3.3.2 (Age U Performance) is a more unique suppression effect representing a positive relationship between performance and activity where this becomes stronger as age is added to the model.

      Point 6: [Recommendations for the authors] 1) Page 26: It is not quite clear how the authors made sure their age and performance covariates functioned as independent regressors in the univariate group-level GLM, given the correlation between age and performance (i.e. shared variance).

      We included age and performance as covariates (of the age x performance effect of interest) by simply including these as independent regressors in the group-level GLM design matrix in addition to the interaction term (i.e., activity ~ age*performance + covariates equivalent to activity ~ age:performance + age + performance + covariates; Wilkinson & Roger 1973 notation), allowing us to examine the unique variance explained by each predictor (Table 1 and Table 2) and to control for their shared variance.

      We should note that while the GLM approach we used accounts for unique and shared effects, it does not explicitly report shared effects in its standard output. To directly examine shared variance, one would need to employ commonality analysis. For reference, results from a commonality analysis on this task have been previously reported in Wu et al. (2023).

      Prompted by this point, we have made some further minor improvements to help ensure our methodological steps are reproducible, as highlighted below.

      Page-30

      “Continuous age and behavioural performance variables were standardised and treated as linear predictors in multiple regression throughout the behavioural (Figure 1B), wholebrain voxelwise (Figure 1C/2A), univariate (Table 1; Figure 1B/2B) and MVB (Table 2; Figure 3) analyses. Throughout, sex was included as a covariate. The models, including interaction terms, can be described, according to Wilkinson & Roger’s (1973) notation, as activity ~ age * performance + covariates (which is equivalent to activity ~ age:performance + age + performance + covariates), allowing us to examine the unique variance explained by each predictor (Table 1) and to control for their shared variance. For whole-brain voxelwise analyses, clusters were estimated using threshold-free cluster enhancement (TFCE; Smith & Nichols 2009) with 2000 permutations and the resulting images were thresholded at a t-statistic of 1.97 before interpretation. Bonferroni correction was applied to a standard alpha = 0.05 based on the two ROIs (cuneal and frontal) that were examined. For Bayes Factors, interpretation criteria norms were drawn from Jarosz & Wiley (2014).”

      Point 7: [Recommendations for the authors] 2) Figure 3: I suggest changing the subheading in panel B to "Joint vs. MDN-only Model," in line with the wording in the main text.

      The subheading of Figure 3B is updated as suggested to `Joint vs. MDN-only Model`.

      Point 8: [Recommendations for the authors] 3) In Figures 1C and 2A, MNI z coordinates should be added to the section views. The appreciation of Figure 2B could be enhanced by adding some rendering with a saggital (medial and/or lateral) view.

      The slice mosaics in Figure 1C and 2A are now updated with each slice’s MNI Z coordinates and mentioned in the figure descriptions.

      Point 9: [Recommendations for the authors] 4) Page 7 (l. 135): What exactly is meant by "lateral occipital temporal cortex"?

      The text is updated to specify the anatomical landmarks that were used for guidance when referring to activation within the lateral occipital temporal cortex, based on ROI criteria definitions used in Knights, Mansfield et al. (2021):

      Page-7 Line-135:

      “Additional activation was observed bilaterally in the inferior/ventral and lateral occipital temporal cortex (i.e., a cluster around the lateral occipital sulcus that extended anteriorly beyond the anterior occipital sulcus), likely due to the visual nature of the task.”

      Point 10: [Recommendations for the authors] 5) On p18ff. (ll. 259-318) the authors discuss in quite some detail how the age-related decoding boost seen with the cuneus ROI can be functionally explained, but it seems like none of the explanations agrees with all aspects of the results. While this is not a major problem for the paper, it may be advisable if this part of the discussion ends with a clearer statement that this issue is not fully solved yet and provides material for future research.

      A more direct sentence has been added to make it clear that future investigation will be needed to explain the role of the cuneal cortex here.

      Page-20 Line-322:

      “Another possibility is that the age-related increases in fMRI activations (for hard versus easy) in one or both of our ROIs do not reflect greater fMRI signal for hard problems in older than younger people, but rather lower fMRI signal for easy problems in the older. Without a third baseline condition, we cannot distinguish these two possibilities in our data. However, a reduced “baseline” level of fMRI signal (e.g., for easy problems) in older people is consistent with other studies showing an age-related decline in baseline perfusion levels, coupled with preserved capacity of cerebrovascular reactivity to meet metabolic demands of neuronal activity at higher cognitive load  (Calautti et al., 2001; Jennings et al., 2005). Though age-related decline in baseline perfusion occurs in the cuneal cortex (Tsvetanov et al., 2021), the brain regions showing modulation of behaviourally-relevant Cattell fMRI activity by perfusion levels did not include the cuneal cortex (Wu et al., 2021). This suggests that the compensatory effects in the cuneus are unlikely to be explained by age-related hypo-perfusion, consistent with the minimal effect here of adjusting for RSFA (Figure 2C). Overall then, as none of the explanations above agree with all aspects of the results, to functionally explain the role of the cuneal cortex in this task will require further investigation.”

      Point 11: [Recommendations for the authors] 6) The threshold choice for Bayesian log evidence (> 3) should be motivated in some more detail, rather than just pointing to a book reference, as there is no established convention in the field, the choice may depend on the type of data and/or analysis, and a sizeable part of the readership may not be deeply familiar with the particular Bayesian approach used here.

      Text is updated to further clarify our motivation for using the log evidence BF>3 criterion:

      Page-29

      “The outcome measure was the log evidence for each model (Morcom & Henson, 2018; Knights et al., 2021). To test whether activity from an ROI is compensatory, we used an ordinal boost measure (Morcom & Henson, 2018; Knights et al., 2021) to assess the contribution of that ROI for the decoding of task-relevant information (Figure 3B). Specifically, Bayesian model comparison assessed whether a model that contains activity patterns from a compensatory ROI and the MDN (i.e., a joint model) boosted the prediction of task-relevant information relative to a model containing the MDN only. The compensatory hypothesis predicts that the likelihood of a boost to model decoding will increase with older age. The dependent measure, for each participant, was a categorical recoding of the relative model evidence to indicate the outcome of the model comparison. The three possible outcomes were: a boost to model evidence for the joint vs. MDN-only model (difference in log evidence > 3), ambiguous evidence for the two models (difference in log evidence between -3 to 3), or a reduction in evidence for the joint vs. MDN-only model (difference in log evidence < -3).These values were selected because a log difference of three corresponds to a Bayes Factor of 20, which is generally considered strong evidence (Lee & Wagenmakers, 2014). Further, with uniform priors, this chosen criterion (Bayes Factor > 3) corresponds to a p-value of p<~.05 (since the natural logarithm of 20 equals three, as evidence for the alternative hypothesis).”

      Point 12: [Recommendations for the authors] 7) Adding page numbers would be helpful.

      Page numbers have been added to the manuscript file – apologies for this oversight.

      References

      Green, E., Bennett, H., Brayne, C., & Matthews, F. E. (2018). Exploring patterns of response across the lifespan: The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study. BMC Public Health18, 1-7.

      Knights, E., Mansfield, C., Tonin, D., Saada, J., Smith, F. W., & Rossit, S. (2021). Hand-selective visual regions represent how to grasp 3D tools: brain decoding during real actions. Journal of Neuroscience41(24), 5263-5273.

      Samu, D., Campbell, K. L., Tsvetanov, K. A., Shafto, M. A., & Tyler, L. K. (2017). Preserved cognitive functions with age are determined by domain-dependent shifts in network responsivity. Nature communications, 8(1), 14743.

      Shafto, M. A., Tyler, L. K., Dixon, M., Taylor, J. R., Rowe, J. B., Cusack, R., ... & Cam-CAN. (2014). The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC neurology14, 1-25.

      Wu, S., Tyler, L. K., Henson, R. N., Rowe, J. B., & Tsvetanov, K. A. (2023). Cerebral blood flow predicts multiple demand network activity and fluid intelligence across the adult lifespan. Neurobiology of aging121, 1-14.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviewer):

      It is not clear from the analysis presented in the paper how persistent those environmentally induced changes, do they remain with the bats till the end of their lives.

      Currently, the long-term effects of enrichment on the bats remain uncertain. Preliminary results suggest that these differences may persist throughout the bats’ lifetimes; however, further data analysis is ongoing to determine the extent of these effects. We also addressed now at the manuscript discussion

      Reviewer #2 (Public Reviewer):

      (1) Assessing personality metrics and the indoor paradigm: While I applaud this effort and think the metrics used are justified, I see a few issues in the results as they are currently presented:

      (a) [Major] I am somewhat concerned that here, the foraging box paradigm is being used for two somewhat conflicting purposes: (1) assessing innate personality and (2) measuring changes in personality as a result of experience. If the indoor foraging task is indeed meant to measure and reflect both at the same time, then perhaps this can be made more explicit throughout the manuscript. In this circumstance, I think the authors could place more emphasis on the fact that the task, at later trials/measurements, begins to take on the character of a "composite" measure of personality and experience.

      Personality traits should generally be stable over time, but personality can also somewhat change with experience. We used the foraging box to assess individual personality, but we also examined the assumption that what we are measuring is a proxy of personality and hence is stable over time. We now clarify this in the manuscript. 

      (b) [Major] Although you only refer to results obtained in trials 1 and 2 when trying to estimate "innate personality" effects, I am a little worried that the paradigm used to measure personality, i.e. the stable components of behavior, is itself affected by other factors such as age (in the case of activity, Fig. 1C3, S1C1-2), the environment (see data re trial 3), and experience outdoors (see data re trials 4/5).

      We found that boldness was the most consistent trait, showing persistence between trials 1 to 5, i.e., 144 days apart on average. We thus also used Boldness as the primary parameter for assessing the effects of personality on the outdoors behavior. While we evaluated other traits for completeness, boldness was the only one that consistently met the criteria for personality, which is why we focused on it in our analyses. The other traits which were not stable over time could be used to assess the effects of experience on behavior

      Ideally, a study that aims to disentangle the role of predisposition from early-life experience would have a metric for predisposition that is relatively unchanging for individuals, which can stand as a baseline against a separate metric that reflects behavioral differences accumulated as a result of experience.

      I would find it more convincing that the foraging box paradigm can be used to measure personality if it could be shown that young bats' behavior was consistent across retests in the box paradigm prior to any environmental exposure across many baseline trials (i.e. more than 2), and that these "initial settings" were constant for individuals. I think it would be important to show that personality is consistent across baseline trials 1 and 2. This could be done, for example, by reproducing the plots in Fig. 1C1-3 while plotting trial 1 against trial 2. (I would note here that if a significant, positive correlation were to be found (as I would expect) between the measures across trial 1 and 2, it is likely that we would see the "habituation effect" the authors refer to expressed as a steep positive slope on the correlation line (indicating that bold individuals on trial 1 are much bolder on trial 2).)

      We agree and thus used boldness which was found to be stable over five trials (three of which were without external experience). We note that if Boldness as we measured it increased over time, the differences between individuals remained similar and this is what is expected from personality traits measured in the same paradigm several times (after the animal acquires experience).  

      (c) Related to the previous point, it was not clear to me why the data from trial 2 (the second baseline trial) was not presented in the main body of the paper, and only data from trial 1 was used as a baseline.

      We added a main figure, showing the correlation between the two baseline trials

      In the supplementary figure and table, you show that the bats tended to exhibit more boldness and exploratory behavior, but fewer actions, in trial 2 as compared with trial 1. You explain that this may be due to habituation to the experimental setup, however, the precise motivation for excluding data from trial 2 from the primary analyses is not stated. I would strongly encourage the authors to include a comparison of the data between the baseline trials in their primary analysis (see above), combine the information from these trials to form a composite baseline against which further analyses are performed, or further justify the exclusion of data as a baseline.

      We had no intention of excluding data from baseline 2. As we have shown several times before (e.g., Harten, 2021) bats’ boldness as we measure it in the box experiment increases over sessions performed nearby in time. This means that trial 2’s boldness was higher than that of trial 1 and trial 3 which made the data less suitable for a Linear model. Moreover, our measurement of boldness is capped (with a maximum of 1) again making it less suitable for a Linear model. However, following the reviewer’s question we now ran all analyses with trial 2’s data included and not only that the results remained the same, some of the models fit better (based on the AIC criterion). We added this information to the revised manuscript.  

      (2) Comparison of indoor behavioral measures and outdoor behavioral measures Regarding the final point in the results, correlation between indoor personality on Trial 4 and outdoor foraging behavior: It is not entirely clear to me what is being tested (neither the details of the tests nor the data or a figure are plotted). Given some of the strong trends in the data - namely, (1) how strongly early environment seems to affect outdoor behavior, (2) how strongly outdoor experience affects boldness, measured on indoor behavior (Fig. 1D) - I am not convinced that there is no relationship, as is stated here, between indoor and outdoor behavior. If this conclusion is made purely on the basis of a p-value, I would suggest revisiting this analysis.

      We agree that the relationship between indoor personality measures and outdoor foraging behavior is of great interest and had expected to find some correspondence between the two. To test this, we conducted multiple GLM analyses using the different indoor behavioral traits as predictors of outdoor behaviors. These analyses did not reveal any significant correlations. We also performed a separate analysis using PC1 (derived from the indoor behavioral variables) as a predictor, and again found no significant associations with outdoor behavior.

      We were indeed surprised by this outcome. It is possible that the behavioral traits we assessed indoors (boldness, exploration, and activity) do not fully capture the dimensions of behavior that are most relevant to foraging in the wild. For example, traits such as neophobia or decisionmaking under risk, which we did not assess directly, may have had stronger predictive value for outdoor behavior. We now highlight this point more clearly in the Discussion and acknowledge the possibility that alternative or additional personality traits might have revealed meaningful relationships.

      (3) Use of statistics/points regarding the generalized linear models While I think the implementation of the GLMM models is correct, I am not certain that the interpretation of the GLMM results is entirely correct for cases where multivariate regression has been performed (Tables 4s and S1, and possibly Table 3). (You do not present the exact equation they used for each model (this would be a helpful addition to the methods), therefore it is somewhat difficult to evaluate if the following critique properly applies, however...)

      The "estimate" for a fixed effect in a regression table gives the difference in the outcome variable for a 1 unit increase in the predictor variable (in the case of numeric predictors) or for each successive "level" or treatment (in the case of categorical variables), compared to the baseline, the intercept, which reflects the value of the outcome variable given by the combination of the first value/level of all predictors. Therefore, for example, in Table 4a - Time spend outside: the estimate for Bat sex: male indicates (I believe) the difference in time spent outside for an enriched male vs. an enriched female, not, as the authors seem to aim to explain, the effect of sex overall. Note that the interpretation of the first entry, Environmental condition: impoverished, is correct. I refer the authors to the section "Multiple treatments and interactions" on p. 11 of this guide to evaluating contrasts in G/LMMS: https://bbolker.github.io/mixedmodelsmisc/notes/contrasts.pdf

      We are not certain we fully understand the comment; however, if our understanding is correct, we respectfully disagree. A GLM analysis without interaction terms—as conducted in our study—functions as a multiple linear regression, wherein each factor's estimate reflects its individual effect on the dependent variable. For example in the case of sex, it examines he effect of sex on the tie spent out independently of enrichment. An interaction term would be needed to test sex*enrichment. We have added the models’ formula, and we hope this clarifies our approach

      Reviewer #1 (Recommendations for the authors):

      I would recommend the following:

      (1) As video tracking and behavioral analysis softwares are wide spread, it would be great to see this applied to the bat behavior indoor to answer questions like how does the bat velocity or heading or acceleration correlate with the behavioral measures boldness , activity or exploration? In the same gist, can one infer boldness, activity or exploration from measured bat velocity or other parameters? I think this will further make the indoor behavior more quantitative.

      In a tent of the size used in our study, bats’ flight behavior tends to be highly stereotypical: they typically perch on the wall, take off, circle the tent—sometimes multiple times—and then either land or not, and enter or not. Flight velocity is largely determined by individual maneuverability and the physical constraints of the space; thus, precise tracking is unlikely to provide further insight into boldness. In contrast, decision-making behaviors—such as whether to land or enter—more accurately reflect personality traits, as we have shown previously (Harten et al., 2018). Moreover, accurate 3D tracking in such an environment is possible but definitely not easy due to the many blind-spots resulting from the cameras being inside the 3D volume.  Nonetheless, we quantified flight activity and assessed its correlation with the other behavioral axes. As it was highly correlated with general activity, we did not include it as an independent parameter in the main analysis. However, in response to the reviewer’s suggestion, we now present this analysis in the Supplementary Materials.

      (2) It is not clear whether the bats come from the same genetic background. they might be but it is not mentioned in the methods under the experimental subjects.

      We have shown in the past that there is no familial relations in a randomly caught sample of bats in the colony where we usually work (Harten et al., 2018). The bats were caught in three, not related wild colonies. The text referring to the table was clarified in the revised manuscript

      (3) It will be great to include the author's thoughts about mechanisms underlying those environmentally induced changes in behavior in the discussion section along with how this will affect the bats' social foraging abilities. Another question that comes to mind is whether growing up with a large number of bats constitute an enriched environment in itself.

      We agree that this could count as an enrichment, and we thus ensured similar group sizes in both groups for this reason. We clarify this in the revised manuscript. 

      We have elaborated on the underlying mechanisms in the discussion, focusing on how they contribute to behavioral changes.

      Reviewer #2 (Recommendations for the authors):

      (1) Outdoor foraging behavior

      If I understand correctly, the data you display in Fig. 3A is only from the 2nd to 3rd weeks of exploration, i.e. just before the first post-exploration trial.

      What does the data look like for the second outdoor exploration data, i.e. before the final trial?

      Is there a specific reason why these measures were only computed on the GPS data from the 3rd week outside? If so, can this sampling of the data be motivated or briefly addressed (in the methods and wherever else necessary)?

      In order to allow a comparison between individuals, we had to restrict ourself to a period we had data from many individuals (some dissapeared later on).

      Following the reviewer suggestion – we added a supplemenry figure including days 21-26

      I would find it important and of great interest to see movement maps for more animals, as these give very rich information that is not entirely captured by the three proxies of outdoor activity.

      Are these four exemplary animals sampled from both seasons?

      Did you check to see if there were any overall differences in outdoor foraging behavior as a function of the season in which the bats were captured?

      Yes, the samples represent individuals from both tested years. This was clarified, and additional examples were included in a supplementary figure.

      Variable of time spent outdoors: You mention that you did not include the nights that the bat spent in the colony in these calculations. Did you also look to see if 'the number of nights when the bats left the colony' predicted the bat's earlier enrichment treatment? This could also be interesting to consider.

      In response to the reviewer’s comment, we conducted an additional analysis to test whether the proportion of nights each bat spent foraging outside the roost was predicted by its earlier environmental condition (enriched vs. impoverished). We also examined whether sex or age influenced this variable. This analysis showed no significant effect of environmental condition, sex, or age on the proportion of nights spent foraging outside the roost

      [Following on point 3 in public review...]

      When wishing to discuss the effect/significance of predictors overall, it is common to present the modelling results as an analysis of variance table. See, for example, the two-way anova section (p. 182) in the book Practical Regression and ANOVA using R: https://cran.r-project.org/doc/contrib/Faraway-PRA.pdf

      I think the output of passing the model object to an "anova" yields the table that you may be looking for, where the variance accounted for by a predictor is given overall, and not just relative to the first level of all predictors. Naturally, this information can be used in combination with the information provided by the raw model output presented in the paper.

      I assume you have done this analysis in R, but am not sure, as the statistical software used is not mentioned. There are several packages in R that allow users to quickly plot the graphical interaction of the parameters they use in models, which aids in interpreting results. It would be good to check results of model fitting in this manner.

      Relatedly, I was unable to locate the data and code for this paper using the DOI provided. Neither searching the internet using the doi nor entering the doi on the Mendeley Data website returned the right results. I tried searching Mendeley Data using the senior author's last name, but the most recent entry does not appear to be from this paper. https://data.mendeley.com/datasets/fr48bmnhxj/1

      We thank the reviewer for the helpful comment. The analysis was indeed conducted in MATLAB, and this has now been clarified in the manuscript. We have also revised the result tables to improve clarity and included the exact formulas used for each model. Regarding the data availability, the reviewer is correct — the dataset had not yet been published at the time of submission. It is now available at the provided DOI link.

      ### Suggestions and questions for the present paper, grouped thematically:

      [Major] Expansion and development of results: I thought there were many interesting and suggestive points in this data that could be expanded upon. I mention some of these here. While the authors of course do not need to implement all of these suggestions, I think the paper would benefit from a more substantial presentation of this rich data set:

      (a) Individual differences as such are not emphasized in the paper so much, as the analyses, particularly those expressed as boxplots, are grouped. The scatter plots in Figure 1 give the richest insight into how individual behavior changes throughout the course of the experiment. I would advocate for the authors to show additional comparisons using such scatter plots (perhaps in the supplementary, if needed).

      We thank the reviewer and added scatter plots to figure 2

      (b) In the second paragraph of the results, the authors introduce the concept of a pareto front and that of personality archetypes (lines 101-107). I found this very interesting, but these concepts were never reiterated upon later in the results or in the discussion. In fact, at many points, I found myself curious as to how the three indoor measures of personality might be combined to form a composite measure of personality (and likewise for outdoor measures). Have you tried to combine measures into a composite and tried to measure whether this composite metric provides any additional insight into these phenomena? For example, what if you mapped the starting position of each bat as a point in a three-dimensional space, given by the three personality measures, and then evaluated their trajectory through this space with measurements taken at later trials. Could innate personality be interpreted as the starting vector in this space (measured across the two baseline trials)? 

      Following the reviewer’s (justified) curiosity we ran a PCA analysis on the behavioral data from trials 1 and 5 and found that there is a significant correlation between the individual scores on PC1. This can be thought of as a measurement that takes both boldness and exploration into account (the weight of activity was very low). We added this information to the revised manuscript and also use this new behavioral parameter as a predisposition in the models (instead of exploration and activity). 

      Could environmental exposure be quantified as a warping of the trajectory through this space? Finally, could outdoor experience also be incorporated to evaluate how an individual arrives at its final measurement of personality combined with experience (trial 5)?

      The paper currently tries to explain outdoors behavior given personality and not vice versa. While this is a very interesting suggestion, we feel that adding this analysis would make the premise of the paper less clear and since the paper is already somewhat complex, we prefer to leave this analysis for a future study. 

      Examining the 3D trajectories of the individuals through the personality space did not reveal any immediate clear pattern (triangles mark the first trial and colours depict the environmental treatment) – 

      Author response image 1.

      Related to this point: I think the strongest part of the paper is the result showing that bats exposed to enriched environments explore farther, more often, and over larger distances than bats that were raised in an impoverished environment.

      We completely agree and tried to further emphasize this  

      (c) While these results of the outdoor GPS tracking are very clear, I wish that more information were extracted from the tracking data, which is incredibly rich and certainly can be used to derive many interest parameters beyond those that the authors have shown here. Examples might include: distance travelled (as opposed to estimated km2 or farthest point), a metric of navigational ability (how much "dead reckoning" the animal engages in). I even wonder if the areas or landmarks visited by the enriched bats might be found to be more complex, challenging, or richer by some measure.

      This study was a first step, aiming to establish a connection between early exposure and outdoors foraging

      We agree that there are many more analyses that can be done and indeed that ones related to navigation capabilities are missing. We are still collecting data on these bats and hope to present a more advanced analysis with a time span of years. 

      (d) Related to the above point: I find it very interesting that in 3 of the 4 bats for which you show exemplary movement data (Fig. 3, panels B and C), they appear to travel to the farthest distances and cover the most ground early on, and become more "conservative" in their flight paths on later evenings. This point is not explored in the discussion, nor related to earlier measurements.

      During the first months of exploration, bats will occasionally perform long exploratory flights in between bouts of shorter flights where they return to nearby familiar trees. This behavior can be seen in more detail in Harten et al Science 2020. We are currently quantifying this more carefully for another study. 

      (e) Finally, my points about the possible strength of a composite measure of the three personality metrics is related to my concern about one of the conclusions, which is that innate personality does not have an effect on outdoor foraging behavior. I think the manner in which this was tested statistically is likely to bias the results against finding such a result given that personality metrics are used to predict outdoor behaviors in an individual manner (6 models in total, each examining a single comparison of predisposition to outdoor behavior), while both indoor personality metrics (Fig 1B) and outdoor behaviors appear to be correlated with each other (Table 5).

      Are there other analyses you have performed that are not presented in the paper and that have led you to conclude that there is no relationship here?

      We agree with the reviewer, that our findings do not exclude an effect of innate personality on foraging but only suggest no such affect for the parameter we measured. That said, we did expect to find an effect of boldness because this parameter has been shown to differentiate much between groups (Harten et al., 2018), and to correlate with other parameters of behavior. We were therefore surprised to find no significant effects, as we had anticipated observing some differences.

      Following the reviewer’s previous comment we now also tested another predisposition parameter – the PC1 score and also found that it did not explain foraging. 

      (f) Personality measured before and after early environmental exposure (related to point (a) above): I find it interesting that the positive correlation in boldness between baseline and post-enrichment or baseline and post-release suggests that the individuals that were the most bold remained bold (and likewise for less adventurous individuals). The correlation for activity, too, still suggests that more active individuals early in life are likely to remain very active after enrichment, even accounting for the fact that activity is confounded with age.

      Perhaps you could place some emphasis on the fact that the initial variation between individuals also appears to be relatively stable over repeated trials. You might also consider measuring this directly (population variance over successive trials; relationship of population variance on indoor measures vs. outdoor measures...)

      Yes – this is a main point of interest. We further emphasize that in the revised manuscript 

      (g) Effect of indoor behavior following early experience on outdoor behavior: You evaluate the effect of predisposition (measured on baseline trial 1) and environmental condition on measures of outdoor activity (Table 4). I wonder if you also tried using indoor behavioral measures measured on the post-enrichment trial 3 to predict outdoor foraging behavior.

      Assuming that these measures are in fact reflecting a combination of predisposition and accumulated experience, then measurements at this closer time point may tell you how the combination of innate traits and early acquired experience affect behavior in the wild.

      We appreciate the reviewer’s insightful suggestion to test whether indoor behavior from post-enrichment Trial 3, reflecting both innate traits and experience, predicts outdoor foraging behavior. We conducted this analysis, but found that the boldness in Trial 3 did not significantly predict any of the outdoor activity measures.

      (2) [Minor] Age/development: While the authors discuss the effect of their manipulations on behavioral measures, they do not much discuss the effect of age.

      I think it would be important to include at some point a mention of the developmental stages of Rousettus, giving labels to certain age ranges, e.g. pup, juvenile, adult, and to provide more context about the stages at which bats were tested in the discussion. Presently, age is only really mentioned as an explanation for declining activity levels, but I wonder if it might also have an influence on boldness.

      It would also be very elegant for figures where age is given in days, to additional label then with these stages.

      All bats were juveniles during the trials (approximately 4 to 8 months old), so they could not be divided into distinct age groups. To assess the effect of age, it was included as a predictor (in days) in the GLM analysis.

      (3) [Major] Effect of early experience and outdoor experience on the indoor task: In the paragraph on lines 278-285, you argue that the effect of seeing earlyenriched bats exhibit more boldness in trial 5 was likely due to post-sampling bias...

      I tend to disagree with this conclusion. I actually find this result both interesting and intuitive - that bats that were exposed to an enriched environment and have had experience in the wild, show much bolder activity on a familiar indoor foraging test (i.e. outside experience has made the animals bolder than before) (Fig 1, lines 159-161, Fig. S1). I did not notice this possibility mentioned in the discussion of the results.

      I also do not fully understand this argument. Could you please explain further?

      We accept the reviewer's comment and updated the manuscript (lines 336346) explaining the two hypotheses more clearly and arguing that it is difficult to tell them apart with the current data.

      [Minor] You also say that "this difference... can be seen in Figure 2 when examining only the bats that had remained until the last trial (Figure 2A2)." Do you mean supplementary Figure S1 A2? In fact, I am entirely unclear on what data is plotted in the supplementary Figure S1 and what differentiates the two columns of figures and the two models presented in the supplementary table. Did you plot data similar to that in Figure 2, with only bats that were present for all trials, but not show this data?

      There was a mistake: what was previously referred to as 2A2 is actually S2 A2.

      On the right side—only among the individuals with GPS data—the change is already evident at Baseline 2, where only the bolder individuals remain. If you have suggestions for a better analysis approach, we would be happy to hear them.

      ### Minor points

      General points regarding figures:

      For Figures 2 and 3A1-3 (as well as Fig. S1): Authors must show the raw data points over the box plots. It is very difficult to interpret the data and conclusions without being able to see the true distribution.

      Done

      For all figures showing grouped individual data, please annotate all panels or sets of boxplots with the number of bats whose data entered into each, as it is a little difficult to keep track of the changing sample sizes across experimental stages.

      To enhance transparency, we have added individual data points to all boxplots, allowing visual estimation of sample sizes across experimental stages. While numerical annotations are not included on the figures, the exact number of bats contributing to each group is provided in the Methods section (Table 8), ensuring this information is readily accessible to readers.In response to the reviewer’s request, we have updated all relevant figures to display individual data points within each boxplot. This addition makes it easier to track changes in sample size across different experimental stages.

      Unless I've missed the reason behind differences in axis labelling across the figures, it seems that trials are not always referred to consistently. E.g. Fig. 1 labels say "Trial 1 (baseline)" and fig. 2 labels say "Baseline 1 0 days." I'm not entirely sure if these correspond to exactly the same data. If so, perhaps the labels can be made uniform. I think the descriptive ones (Baseline 1, Postenrichment...) may be more helpful to the reader than providing the trial number (Trial 1, etc....).

      Done

      Figure 1:

      Very good Fig. 1A and 1B.

      For panels C1-3 & D, I think it would make it easier for the reader if the personality measure labels were placed at the top of each panel, e.g. "Boldness (entrance proportion)". The double axis labels are not only harder to read, they are also redundant, as the personality measure label repeats on both axes.

      Done

      Panel C1: For the first panel in this sequence, I think it would be elegant to include an annotation in the figure that indicates what the datapoints lying on either side of the dashed line means, i.e. "bolder after enrichment treatment" in the upper left corner, and "bolder before enrichment treatment" in the bottom right corner.

      Panel C2: It appears as though many of the data points in this panel overlap, and it appears to me that the blue data points in particular are overlaid by the orange ones. I am guessing this happens because proportion values based on entrances to only 6 boxes end up giving a more "discrete" looking distribution. I wonder if you can find a way to allow all the data to be visible by, e.g., jittering the data slightly; if there is rounding being done to the proportions, perhaps don't round them so that minute differences will allow them to escape the overlap; or possibly split the panel by enrichment treatment.

      Caption for C1-3: it may be helpful to mention the correlation line color scheme: "enriched (blue lines), the impoverished (orange lines)". The caption also says positive correlations were found for "both environments together," but this correlation line is not shown. Perhaps mention "(not shown)" or show line. Please rephrase the sentence "Dashed line represents the Y=X line." for more transparency and clarity. I understand you mean an "equality" or "unity" line, but perhaps you can explicitly state the information that this line provides, something like e.g. "Dashed line indicates equal values measured on both trials."

      We added the line for a reference, the caption was corrected

      Figure 3:

      Panels B1-C2: I would suggest giving these panels supertitles that indicate that B panels are enriched, C panels are impoverished, and that each panel is data from a different individual.

      The legend was corrected to be more clear about the figure

      General points regarding tables:

      Please revisit tables for formatting and typos, particularly in Table 4. Please also revise table captions for clarity. E.g. "first exploration as predisposition" to "Exploration (Baseline 1)" or similar

      Done

      Supplementary Tables and Figure: these are missing captions and explanations.

      The missing parts were adddad and corrected

      Points of clarification/style:

      It would seem to me more logical to present the results shown in Table 3 before those in Table 2, given that the primary in-lab manipulation is discussed with relation to Table 3, and the analysis in Table 2 is discussed rather as a limitation (though I believe this result can be expanded upon further, see above).

      For the activity metric, I would suggest showing this data as actions/hour instead of actions/minute. I think it is much more intuitive to consider, for example, that a bat makes 2 actions every hour, than that it makes 0.002 actions per minute.

      Done

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors present a model for multisensory correlation detection that is based on the neurobiologically plausible Hassenstein Reichardt detector. It modifies their previously reported model (Parise & Ernst, 2016) in two ways: a bandpass (rather than lowpass) filter is initially applied and the filtered signals are then squared. The study shows that this model can account for synchrony judgement, temporal order judgement, etc in two new data sets (acquired in this study) and a range of previous data sets.

      Strengths:

      (1) The model goes beyond descriptive models such as cumulative Gaussians for TOJ and differences in cumulative Gaussians for SJ tasks by providing a mechanism that builds on the neurobiologically plausible Hassenstein-Reichardt detector.

      (2) This modified model can account for results from two new experiments that focus on the detection of correlated transients and frequency doubling. The model also accounts for several behavioural results from experiments including stochastic sequences of A/V events and sine wave modulations.

      Additional thoughts:

      (1) The model introduces two changes: bandpass filtering and squaring of the inputs. The authors emphasize that these changes allow the model to focus selectively on transient rather than sustained channels. But shouldn't the two changes be introduced separately? Transients may also be detected for signed signals.

      We updated the original model because our new psychophysical evidence demonstrates the fundamental role of unsigned transient for multisensory perception. While the original model received input from sustained unimodal channels (low-pass filters), the new version receives input from unsigned unimodal transient channels. Transient channels are normally modelled through bandpass filters (to remove the DC and high-frequency signal components) and squaring (to remove the sign). While these may appear as two separate changes in the model, they are, in fact, a single one: the substitution of sustained with unsigned transient channels (for a similar approach, see Stigliani et al. 2017, PNAS). Either change alone would not be sufficient to implement a transient channel that accounts for the present results.

      That said, we were also concerned with introducing too many changes in the model at once. Indeed, we simply modelled the unimodal transient channels as a single band-pass filter followed by squaring. This is already a stripped-down version of the unsigned transient detectors proposed by Adelson and Bergen in their classic Motion Energy model. The original model consisted of two biphasic temporal filters 90 degrees out of phase (i.e., quadrature filters), whose output is later combined. While a simpler implementation of the transient channels was sufficient in the present study, the full model may be necessary for other classes of stimuli (including speech, Parise, 2024, BiorXiv). Therefore, for completeness, we now include in the Supplementary Information a formal description of the full model, and validate it by simulating our two novel psychophysical studies. See Supplementary Information “The quadrature MCD model” section and Supplementary Figure S8.

      (2) Because the model is applied only to rather simple artificial signals, it remains unclear to what extent it can account for AV correlation detection for naturalistic signals. In particular, speech appears to rely on correlation detection of signed signals. Can this modified model account for SJ or TOJ judgments for naturalistic signals?

      It can. In a recent series of studies we have demonstrated that a population of spatially-tuned MCD units can account for audiovisual correlation detection for naturalistic stimuli, including speech (e.g. the McGurk Illusion). Once again, unsigned transients were sufficient to replicate a variety of previous findings. We have now extended the discussion to cover this recent research: Parise, C. V. (2024). Spatiotemporal models for multisensory integration. bioRxiv, 2023-12.

      Even Nidiffer et al. (2018) which is explicitly modelled by the authors report a significant difference in performance for correlated and anti-correlated signals. This seems to disagree with the results of study 1 reported in the current paper and the model's predictions. How can these contradicting results be explained? If the brain detects correlation on signed and unsigned signals, is a more complex mechanism needed to arbitrate between those two?

      We believe the reviewer here refers to our Experiment 2 (where, like Nidiffer at al. (2018) we used periodic stimuli, not Experiment 1, which consists of step stimuli). We were also puzzled by the difference between our Experiment 2 and Nidiffer et al. (2018): we induced frequency doubling, Nidiffer did not. Based on quantitative simulations, we concluded that this difference could be attributed to the fact that while Nidiffer included on each trial an intensity ramp in their periodic audiovisual stimuli, we did not. As a result, when considering the ramp (unlike in Nidiffer’s analyses), all audiovisual signals used by Nidiffer were positively correlated (irrespective of frequency and phase offset), while our signals in Experiment 2 were sometimes correlated and other times not (depending on the phase offset). This important simulation is included in Supplementary Figure S7; we also have now updated the text to better highlight the role of the pedestal in determining the direction of the correlation.

      (3) The number of parameters seems quite comparable for the authors' model and descriptive models (e.g. PSF models). This is because time constants require refitting (at least for some experimental data sets) and the correlation values need to be passed through a response mode (i.e. probit function) to account for behavioural data. It remains unclear how the brain adjusts the time constants to different sensory signals.

      This is a deep question. For simplicity, here the temporal constants were fitted to the empirical psychometric functions. To avoid overfitting, whenever possible we fitted such parameters over some training datasets, while trying to predict others. However, in some cases, it was necessary to fit the temporal constants to specific datasets. This may suggest that the temporal tuning of those units is not crystalised to some pre-defined values, but is adjusted based on recent perceptual history (e.g., the sequence of trials and stimuli participants are exposed to during the various experiments).

      For transparency, here we show how varying the tuning of the temporal constants of the filters affects the goodness of fit of our new psychophysical experiments (Supplementary Figure S8). As it can be readily appreciated, the relative temporal tuning of the unimodal transient detector was critical, though their absolute values could vary over a range of about 15 to over 100ms. The tuning of the low-pass filters of the correlation detector (not shown here) displayed much lower temporal sensitivity over a range between 0.1s to over 1s.

      This simulation shows the impact of temporal tuning in our simulations, however, the question remains as to how such a tuning gets selected in the first place. An appealing explanation relies on natural scene statistics: units are temporally tuned to the most common audiovisual stimuli. Although our current empirical evidence does not allow us to quantitatively address this question, in previous simulations (see Parise & Ernst, 2016, Supplementary Figure 8), by analogy with visual motion adaptation, we show how the temporal constants of our model can dynamically adjust and adapt to recent perceptual history. We hope these new and previous simulations address the question about the nature of the temporal tuning of the MCD units.

      (4) Fujisaki and Nishida (2005, 2006) proposed mechanisms for AV correlation detection based on the Hassenstein-Reichardt motion detector (though not formalized as a computational model).

      This is correct, Fujisaki and Nishida (2005, 2007) also hypothesized that AV synchrony could be detected using a mechanism analogous to motion detection. Interestingly, however, they ruled out such a hypothesis, as their “data do not support the existence of specialized low-level audio-visual synchrony detectors”. Yet, along with our previous work (Parise & Ernst, 2016, where we explicitly modelled the experiments of Fujisaki and Nishida), the present simulations quantitatively demonstrate that a low-level AV synchrony detector is instead sufficient to account for audiovisual synchrony perception and correlation detection. We now credit Fujusaki and Nishida in the modelling section for proposing that AV synchrony can be detected by a cross-correlator.

      Finally, we believe the reviewer is referring to the 2005 and 2007 studies of Fujisaki and Nishida (not 2006); here are the full references of the two articles we are referring to:

      Fujisaki, W., & Nishida, S. Y. (2005). Temporal frequency characteristics of synchrony–asynchrony discrimination of audio-visual signals. Experimental Brain Research, 166, 455-464.

      Fujisaki, W., & Nishida, S. Y. (2007). Feature-based processing of audio-visual synchrony perception revealed by random pulse trains. Vision Research, 47(8), 1075-1093.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting and well-written manuscript that seeks to detail the performance of two human psychophysical experiments designed to look at the relative contributions of transient and sustained components of a multisensory (i.e., audiovisual) stimulus to their integration. The work is framed within the context of a model previously developed by the authors and is now somewhat revised to better incorporate the experimental findings. The major takeaway from the paper is that transient signals carry the vast majority of the information related to the integration of auditory and visual cues, and that the Multisensory Correlation Detector (MCD) model not only captures the results of the current study but is also highly effective in capturing the results of prior studies focused on temporal and causal judgments.

      Strengths:

      Overall the experimental design is sound and the analyses are well performed. The extension of the MCD model to better capture transients makes a great deal of sense in the current context, and it is very nice to see the model applied to a variety of previous studies.

      Weaknesses:

      My one major issue with the paper revolves around its significance. In the context of a temporal task(s), is it in any way surprising that the important information is carried by stimulus transients? Stated a bit differently, isn't all of the important information needed to solve the task embedded in the temporal dimension? I think the authors need to better address this issue to punch up the significance of their work.

      In hindsight, it may appear unsurprising that transient signals carry most information for audiovisual integration. Yet, so somewhat unexpectedly, this has never been investigated using perhaps the most diagnostic psychophysical tools for perceived crossmodal timing; namely temporal order and simultaneity judgments–along with carefully designed experiments with quantitative predictions for the effect of either channel. The fact that the results conform to intuitive expectations further supports the value of the present work: grounding empirically with what is intuitively expected. This offers solid psychophysical evidence that one can build on for future advancements. Importantly, developing a model that builds on our new results and uses the same parameters to predict a variety of classic experiments in the field, further supports the current approach.

      If “significance” is intended as shaking previous intuitions or theories, then no: this is not a significant contribution. If instead, by significance we intend to build a solid empirical and theoretical ground for future work, then we believe this study is not significant, it is foundational. We hope that this work's significance is better captured in our discussion.

      On a side note, there is an intriguing factor around transient vs. sustained channels: what matters is the amount of change, not the absolute stimulus intensity. Previous studies, for example, have suggested a positive cross modal mapping between auditory loudness and visual lightness or brightness [Odegaard et al., 2004]. This study, conversely, challenges this view and demonstrates that what matters for multisensory integration in time is not the intensity of a stimulus, but changes thereof.

      In a more minor comment, I think there also needs to be a bit more effort into articulating the biological plausibility/potential instantiations of this sustained versus transient dichotomy. As written, the paper suggests that these are different "channels" in sensory systems, when in reality many neurons (and neural circuits) carry both on the same lines.

      The reviewer is right, in our original manuscript we glossed over this aspect. We have now expanded the introduction to discuss their anatomical basis. However, we are not assuming any strict dichotomy between transient and sustained channels; rather, our results and simulations demonstrate that transient information is sufficient to account for audiovisual temporal integration.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Related to point 2 of the public review, can the authors provide additional results showing that the model can also account for naturalistic signals and more complex stochastic signals?

      While working on this manuscript, we were also working in parallel on a project related to audiovisual integration of naturalistic signals. A pre-print is available online [Parise, 2024, BiorXiv], and the related study is now discussed in the conclusions.

      (2) As noted in the public review, Fujisaki and Nishida (2005, 2006) already proposed mechanisms for AV correlation detection based on the Hassenstein-Reichardt motion detector. Their work should be referenced and discussed.

      We have now acknowledged the contribution of Fujisaki and Nishida in the modelling section, when we first introduce the link between our model and the Hassenstein-Reichardt detectors.

      (3) Experimental parameters: Was the phase shift manipulated in blocks? If yes, what about temporal recalibration?

      To minimise the effect of temporal recalibration, the order of trials in our experiments was randomised. Nonetheless, we can directly assess potential short-term recalibration effects by plotting our psychophysical responses against both the current SOA, and that of the previous trials. The resulting (raw) psychometric surfaces below are averaged across observers (and conditions for Experiment 1). In all our experiments, responses are obviously dependent on the current SOA (x-axis). However, the SOA of the previous trials (y-axis) does not seem to meaningfully affect simultaneity and temporal order judgments. The psychometric curves above the heatmaps represent the average psychometric functions (marginalized over the SOA of the previous trial).

      All in all, the present analyses demonstrate negligible temporal recalibration across trials, likely induced by a random sequence of lags or phase shifts. Therefore, when estimating the temporal constants of the model, it seems reasonable to ignore the potential effects of temporal recalibration. To avoid increasing the complexity of the present manuscript, we would prefer not to include the present analyses in the revised version.

      Author response image 1.

      Effect of previous trial. Psychometric surfaces for Experiments 1 and 2 plotted against the lag in the current vs. the previous trial. While psychophysical responses are strongly modulated by the lag in the last trial (horizontal axis), they are relatively unaffected by the lag in the previous trial (vertical axis).

      (4) The model predicts no differences for experiment 1 and this is what is empirically observed. Can the authors support these null results with Bayes factors?

      This is a good suggestion: we have now included a Bayesian repeated measures ANOVA to the analyses of Experiment 1. As expected, these analyses provide further, though mild evidence in support for the null hypothesis (See Table S2). For completeness, the new Bayesian analyses are presented alongside the previous frequentist ones in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to consider the effects of phonotactics on the effectiveness of memory reactivation during sleep. They have created artificial words that are either typical or atypical and showed that reactivation improves memory for the latter but not the former.

      Comment 1:

      Strengths:

      This is an interesting design and a creative way of manipulating memory strength and typicality. In addition, the spectral analysis on both the wakefulness data and the sleep data is well done. The article is clearly written and provides a relevant and comprehensive of the literature and of how the results contribute to it.

      We thank the reviewer for his/her positive evaluation of our manuscript. 

      Comment 2:

      Weaknesses:

      (1) Unlike most research involving artificial language or language in general, the task engaged in this manuscript did not require (or test) learning of meaning or translation. Instead, the artificial words were arbitrarily categorised and memory was tested for that categorisation. This somewhat limits the interpretation of the results as they pertain to language science, and qualifies comparisons with other language-related sleep studies that the manuscript builds on.

      We thank the reviewer for this comment. We agree that we did not test for meaning or translation but used a categorization task in which we trained subjects to discriminate artificial words according to their reward associations (rewarded vs. non-rewarded). Previous language studies (Batterink et al., 2014; Batterink and Paller, 2017; Reber, 1967) used artificial words to investigate implicit learning of hidden grammar rules. Here, the language researchers studied generalization of the previously learned grammar knowledge by testing subject’s ability to categorize correctly a novel set of artificial words into rule-congruent versus rule-incongruent words. These differences to our study design might limit the comparability between the results of previous language studies of artificial grammar learning and our findings. We discussed now this aspect as a limitation of our novel paradigm. 

      We added the following sentences to the discussion on p.14, ll. 481-488:

      Based on our paradigm, we investigated categorization learning of artificial words according to their reward associations (rewarded vs. unrewarded) and did not studied aspects of generalization learning of artificial grammar rules (Batterink et al., 2014; Batterink and Paller, 2017; Reber, 1967). This difference might limit the comparability between these previous language-related studies and our findings. However, the usage of artificial words with distinct phonotactical properties provided a successful way to manipulate learning difficulty and to investigate word properties on TMR, whereas our reward categorization learning paradigm had the advantage to increase the relevance of the word learnings due to incentives.    

      Comment 3:

      (2) The details of the behavioural task are hard to understand as described in the manuscript. Specifically, I wasn't able to understand when words were to be responded to with the left or right button. What were the instructions? Were half of the words randomly paired with left and half with right and then half of each rewarded and half unrewarded? Or was the task to know if a word was rewarded or not and right/left responses reflected the participants' guesses as to the reward (yes/no)? Please explain this fully in the methods, but also briefly in the caption to Figure 1 (e.g., panel C) and in the Results section.

      We thank the reviewer for this comment and added additional sentences into the document to provide additional explanations. We instructed the participants to respond to each word by left- and right-hand button presses, whereas one button means the word is rewarded and the other button means the word is unrewarded. The assignment of left- and right-hand button presses to their meanings (rewarded versus unrewarded) differed across subjects. In the beginning, they had to guess. Then over trial repetitions with feedback at the end of each trial, they learned to respond correctly according to the rewarded/unrewarded associations of the words.        

      We added the following sentences to the results section on p.5, ll. 161-168: 

      As a two alternative forced-choice task, we assigned left- and right-hand button presses to the rewarded and the unrewarded word category, counterbalanced across subjects. We instructed the participants to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points). In the beginning, they had to guess. By three presentations of each word in randomized order and by feedback at the end of each trial, they learned to respond correctly according to the rewarded/unrewarded associations of the words (Fig. 1c). 

      We added the following sentences to the caption of Figure 1 on p.6, ll. 188-194:

      As a two alternative forced-choice task, responses of left- and right-hand button presses were assigned to the rewarded and the unrewarded word category, respectively. The participants were instructed to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points). d) Feedback matrix with the four answer types (hits: rewarded and correct; CR, correct rejections: unrewarded and correct; misses: rewarded and incorrect; FA, false alarms: unrewarded and incorrect) regarding to response and reward assignment of the word.

      We added the following sentences to the methods on p.19, ll. 687-692:  

      As a two alternative forced-choice task, we assigned left- and right-hand button presses to the rewarded and the unrewarded word category, counterbalanced across subjects. We instructed the participants to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points).

      Comment 4:  

      (3) Relatedly, it is unclear how reward or lack thereof would translate cleanly into a categorisation of hits/misses/correct rejections/false alarms, as explained in the text and shown in Figure 1D. If the item was of the non-rewarded class and the participant got it correct, they avoided loss. Why would that be considered a correct rejection, as the text suggests? It is no less of a hit than the rewarded-correct, it's just the trial was set up in a way that limits gains. This seems to mix together signal detection nomenclature (in which reward is uniform and there are two options, one of which is correct and one isn't) and loss-aversion types of studies (in which reward is different for two types of stimuli, but for each type you can have H/M/CR/FA separably). Again, it might all stem from me not understanding the task, but at the very least this required extended explanations. Once the authors address this, they should also update Fig 1D. This complexity makes the results relatively hard to interpret and the merit of the manuscript hard to access. Unless there are strong hypotheses about reward's impact on memory (which, as far as I can see, are not at the core of the paper), there should be no difference in the manner in which the currently labelled "hits" and "CR" are deemed - both are correct memories. Treating them differently may have implications on the d', which is the main memory measure in the paper, and possibly on measures of decision bias that are used as well.

      We thank the reviewer for this comment giving us the opportunity to clarify. As explained in the previous comment, for our two alternative forced-choice task, we instructed the participants to press one button when they were thinking the presented word is rewarded and the other button, when they were thinking the word is unrewarded. Based on this instruction, we applied the signal detection theory (SDT), because the subjects had the task to detect when reward was present or to reject when reward was absent. Therefore, we considered correct responses of words of the rewarded category as hits and words of the unrewarded category as correct rejections (see Table below). However, the reviewer is correct because in addition to false alarms, we punished here the incorrect responses by subtraction of money points to control for alternative task strategies of the participants instead of reward association learning of words. We agree that further explanation/argumentation to introduce our nomenclature is necessary.  

      Author response table 1.

      We adjusted the results section on p.5, ll. 169-177:

      To obtain a measurement of discrimination memory with respect to the potential influence of the response bias, we applied the signal detection theory (Green and Swets, 1966). Because, we instructed the participants to respond to each word by left- or right-hand button presses and that one button means reward is present whereas the other button means reward is absent, we considered correct responses of words of the rewarded category as hits and words of the unrewarded category as correct rejections. Accordingly, we assigned the responses with regard to the reward associations of the words to the following four response types: hits (rewarded, correct); correct rejections (unrewarded, correct); misses (rewarded, incorrect); and false alarms (unrewarded, incorrect). Dependent on responses, subjects received money points (Fig. 1d). 

      Comment 5:

      (4) The study starts off with a sample size of N=39 but excludes 17 participants for some crucial analyses. This is a high number, and it's not entirely clear from the text whether exclusion criteria were pre-registered or decided upon before looking at the data. Having said that, some criteria seem very reasonable (e.g., excluding participants who were not fully exposed to words during sleep). It would still be helpful to see that the trend remains when including all participants who had sufficient exposure during sleep. Also, please carefully mention for each analysis what the N was.

      Our study was not pre-registered. Including all the subjects independent of low prememory performance, but with respect to a decent number of reactivations (> 160 reactivations, every word at least 2 times), resulted in a new dataset with 15 and 13 participants of the high- and low-PP cueing condition, respectively. Here, statistical analyses revealed no significant overnight change anymore in memory performance in the high-PP cueing condition (Δ memory (d'): t(14) = 1.67, p = 0.12), whereas the increase of the bias in decision making towards risk avoidance still remained significant (Δ bias (c-criterion): t(14) = 3.36, p = 0.005).

      We modified and added the following sentences to the discussion on p.13, ll. 456-458:

      Our study has limitations due to a small sample size and between-subject comparisons. The criteria of data analyses were not pre-registered and the p-values of our behavior analyses were not corrected for multiple comparisons.

      Comment 6:             

      (5) Relatedly, the final N is low for a between-subjects study (N=11 per group). This is adequately mentioned as a limitation, but since it does qualify the results, it seemed important to mention it in the public review.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. Accordingly, we now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.        

      We added the following sentences to the discussion about the limitations on p.14, ll. 465-488: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 7:

      (6) The linguistic statistics used for establishing the artificial words are all based on American English, and are therefore in misalignment with the spoken language of the participants (which was German). The authors should address this limitation and discuss possible differences between the languages. Also, if the authors checked whether participants were fluent in English they should report these results and possibly consider them in their analyses. In all fairness, the behavioural effects presented in Figure 2A are convincing, providing a valuable manipulation test.

      We thank the reviewer pointing to the misalignment between the German-speaking participants and the used artificial words based on American English. Further, we did not assessed the English language capability of the participants to control it as a potential confounder, whereas comparative control analyses revealed no significant differences between the both cueing groups in pre-sleep memory performance (see Table S1). 

      We now discussed these comments as limitations on p.14, ll. 473-481: 

      Further, we used artificial words based on American English in combination with German speaking participants, whereas language differences of pronunciation and phoneme structures might affect word perception and memory processing (Bohn and Best, 2012). On the other hand, both languages are considered to have the same language family (Eberhard et al., 2019) and the phonological distance between English and German is quite short compared for example to Korean (Luef and Resnik, 2023). Thus, major common phonological characteristics across both languages are still preserved. In addition, our behavior analyses revealed robust word discrimination learning and distinct memory performance according to different levels of phonotactic probabilities providing evidence of successful experimental manipulation. 

      Comment 8:

      (7) With regard to the higher probability of nested spindles for the high- vs low-PP cueing conditions, the authors should try and explore whether what the results show is a general increase for spindles altogether (as has been reported in the past to be correlated with TMR benefit and sleep more generally) or a specific increase in nested spindles (with no significant change in the absolute numbers of post-cue spindles). In both cases, the results would be interesting, but differentiating the two is necessary in order to make the claim that nesting is what increased rather than spindle density altogether, regardless of the SW phase.

      We conducted additional analyses based on detected sleep spindles to provide additional data according to this question. 

      We added the following section to the supplementary data on pp. 31-32, ll. 1007-1045:  

      After conducting a sleep spindle detection (frequency range of 12-16Hz, see methods for details), we compared the sleep spindle density between the TMR conditions of high- and lowPP showing no significant difference (see Fig. S8a and Table S9). Next, we subdivided the detected sleep spindles into coupled and uncoupled sleep spindles with the previously detected slow waves (SW; analyses of Fig. 4). Sleep spindles were defined as coupled when their amplitude peak occurred during the SW up-state phase (0.3 to 0.8s time-locked to the SW troughs). A two-way mixed design ANOVA on the amplitude size of the sleep spindles with the cueing group as a between-subject factor (high-PP-cued vs. low-PP-cued) and SW-coupling as a within-subject factor (coupled vs. uncoupled) showed a significant interaction effect (cueing group × SW-coupling: F(1,20) = 4.51, p = 0.046, η2 = 0.18), a significant main effect of SW-coupling (F(1,20) = 85.02, p < 0.001, η2 = 0.81), and a trend of significance of the main effect of the cueing group (F(1,20) = 3.54, p = 0.08). Post-hoc unpaired t-tests revealed a significant higher amplitude size of the coupled sleep spindles of the cueing group of high- compared to low-PP (t(20) = 2.13, p = 0.046, Cohen’s d = 0.91; Fig. S8b) and no significant group difference of the uncoupled sleep spindles (t(20) = 1.62, p = 0.12). An additional comparison of the amount of coupled sleep spindles between the cueing groups revealed no significant difference (see Table S9). 

      Here, we found that detected sleep spindles coupled to the SW up-state phase occurred with higher amplitude after TMR presentations of the high-PP words in comparison to the low-PP words, whereas the sleep spindle density and the amount of sleep spindles coupled to the SW up-state phase did not differed between the cueing conditions.     

      We added the following sentences to the methods on pp. 22-23, ll. 822-839:  

      Sleep spindle analyses 

      We detected fast sleep spindles by band-pass filtering (12-16Hz) the signal of the Pz electrode during the auditory cueing trials in the time windows of -2 to 8s according to stimulus onsets. The amplitude threshold was calculated individually for each subject as 1.25 standard deviations (SDs) from the mean. The beginning and end times of the sleep spindles were then defined as the points at which the amplitude fell below 0.75 SDs before and after the detected sleep spindle. Only sleep spindles with a duration of 0.5-3 s were included in subsequent analyses. 

      To compare the sleep spindle densities between the different cueing conditions of high- and low-PP, we computed the grand average sleep spindle density distribution in number per trial with a bin size of 0.5s from -0.5 to 6s time-locked to stimulus onset in each condition (see Fig. S8a and Table S9).     

      Based on the detected slow waves and sleep spindles, we defined coupling events when the positive amplitude peak of a detected sleep spindle was occurring during the slow wave upstate phase in a time window of 0.3 to 0.8s according to the trough of a slow wave. 

      We computed the averaged amplitude size of each detected sleep spindle by calculating the mean of the absolute amplitude values of all negative and positive peaks within a detected sleep spindle (see Fig. S8b).

      We added the following sentences to the results on p.10, ll. 338-343:  

      By conducting an additional analyses based on detection of fast sleep spindles (12-16Hz; see methods), we confirmed that fast sleep spindles during the SW up-states (from 0.3 to 0.8s after the SW trough) occurred with significantly higher amplitude after the cueing presentation of high- compared to low-PP words, whereas parameters of sleep spindle density and the amount sleep spindles coupled to the SW up-state did not differed between the cueing conditions (see Fig. S8 and Table S9).       

      Reviewer #2 (Public Review):

      Summary:

      The work by Klaassen & Rasch investigates the influence of word learning difficulty on sleepassociated consolidation and reactivation. They elicited reactivation during sleep by applying targeted memory reactivation (TMR) and manipulated word learning difficulty by creating words more similar (easy) or more dissimilar (difficult) to our language. In one group of participants, they applied TMR of easy words and in another group of participants, they applied TMR of difficult words (between-subjects design). They showed that TMR leads to higher memory benefits in the easy compared to the difficult word group. On a neural level, they showed an increase in spindle power (in the up-state of an evoked response) when easy words were presented during sleep.

      Comment 9:

      Strengths:

      The authors investigate a research question relevant to the field, that is, which experiences are actually consolidated during sleep. To address this question, they developed an innovative task and manipulated difficulty in an elegant way.

      Overall, the paper is clearly structured, and results and methods are described in an understandable way. The analysis approach is solid.

      We thank the reviewer for his/her positive evaluation of our manuscript.

      Weaknesses:

      Comment 10:

      (1) Sample size

      For a between-subjects design, the sample size is too small (N = 22). The main finding (also found in the title "Difficulty in artificial word learning impacts targeted memory reactivation") is based on an independent samples t-test with 11 participants/group.

      The authors explicitly mention the small sample size and the between-subjects design as a limitation in their discussion. Nevertheless, making meaningful inferences based on studies with such a small sample size is difficult, if not impossible.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. Accordingly, we now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.        

      We added the following sentences to the discussion about the limitations on p.14, ll. 465-473: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table

      S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 11:

      (2) Choice of task

      though the task itself is innovative, there would have been tasks better suited to address the research question. The main disadvantage the task and the operationalisation of memory performance (d') have is that single-trial performance cannot be calculated. Consequently, choosing individual items for TMR is not possible.

      Additionally, TMR of low vs. high difficulty is conducted between subjects (and independently of pre-sleep memory performance) which is a consequence of the task design.

      The motivation for why this task has been used is missing in the paper.

      We used a reward task combined with TMR because previous studies revealed beneficial effects of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021). In addition, we wanted to increase the motivation of the participants, as they could receive additional monetary compensation according to their learning and memory task performances. Furthermore, we designed the task, with the overall possibility to translate this task to operant conditioning in rats (see research proposal: https://data.snf.ch/grants/grant/168602). However, the task turned out to be too difficult to translate to rats, whereas we developed a different learning paradigm for the animal study (Klaassen et al., 2021) of this cross-species research project.       

      We added the following sentence to the introduction on p.4, ll. 134-137:

      To consider the beneficial effect of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021), we trained healthy young participants to categorize these words into rewarded and unrewarded words to gain and to avoid losses of money points.  

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors investigated the effects of targeted memory reactivation (TMR) during sleep on memory retention for artificial words with varying levels of phonotactical similarity to real words. The authors report that the high phonotactic probability (PP) words showed a more pronounced EEG alpha decrease during encoding and were more easily learned than the low PP words. Following TMR during sleep, participants who had been cued with the high PP TMR, remembered those words better than 0, whilst no such difference was found in the other conditions. Accordingly, the authors report higher EEG spindle band power during slow-wave up-states for the high PP as compared to low PP TMR trials. Overall, the authors conclude that artificial words that are easier to learn, benefit more from TMR than those which are difficult to learn.

      Comment 12 & 13:

      Strengths:

      (1) The authors have carefully designed the artificial stimuli to investigate the effectiveness of TMR on words that are easy to learn and difficult to learn due to their levels of similarity with prior wordsound knowledge. Their approach of varying the level of phonotactic probability enables them to have better control over phonotactical familiarity than in a natural language and are thus able to disentangle which properties of word learning contribute to TMR success.

      (2) The use of EEG during wakeful encoding and sleep TMR sheds new light on the neural correlates of high PP vs. low PP both during wakeful encoding and cue-induced retrieval during sleep.

      We thank the reviewer for his/her positive evaluation of our manuscript.

      Weaknesses:

      Comment 14:

      (1) The present analyses are based on a small sample and comparisons between participants. Considering that the TMR benefits are based on changes in memory categorization between participants, it could be argued that the individuals in the high PP group were more susceptible to TMR than those in the low PP group for reasons other than the phonotactic probabilities of the stimuli (e.g., these individuals might be more attentive to sounds in the environment during sleep). While the authors acknowledge the small sample size and between-subjects comparison as a limitation, a discussion of an alternative interpretation of the data is missing.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. We thank the reviewer for this helpful comment and now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.

      We added the following sentences to the discussion on p.14, ll. 465-473: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 15:

      (2) While the one-tailed comparison between the high PP condition and 0 is significant, the ANOVA comparing the four conditions (between subjects: cued/non-cued, within-subjects: high/low PP) does not show a significant effect. With a non-significant interaction, I would consider it statistically inappropriate to conduct post-hoc tests comparing the conditions against each other. Furthermore, it is unclear whether the p-values reported for the t-tests have been corrected for multiple comparisons. Thus, these findings should be interpreted with caution.

      We thank the reviewer for this comment giving us the opportunity to correct our analyses and clarify with additional description. Indeed, we investigated at first overnight changes in behavior performance within the four conditions, conducting t-tests against 0 of Δ-values of d' and c-criterion. Whereas for all our statistical analyses the p-value was set at p < 0.05 for two-tailed testing, we did not corrected the p-value of our behavior analyses for multiple comparisons. To investigate subsequently differences between conditions, we conducted additional ANOVAs. We agree with the reviewer that without significant of results of the ANOVA, post-hoc analyses should not be conducted. Taken in account as well the recommendation of reviewer 1, we included now only post-hoc pairwise comparisons when the interaction effect of the ANOVA revealed at least a trend of significance (p < 0.1). 

      We removed the following post-hoc analyses from the results section on p.9, ll. 291-295: 

      Additional post-hoc pairwise comparisons revealed a significant difference between the highPP cued and low-PP uncued (high-PP cued vs. low-PP uncued: t(10) = 2.43, p = 0.04), and no difference to other conditions (high-PP cued vs.: high-PP uncued t(20) = 1.28, p = 0.22; lowPP cued t(20) = 1.57, p = 0.13).  

      Further, we mentioned the lack of correction for multiple comparisons as a limitation of our results in the discussion on p.13, ll. 456-458:  

      The criteria of data analyses were not pre-registered and the p-values of our behavior analyses were not corrected for multiple comparisons.

      We added the following sentences to the methods p.23, ll. 842-849:

      To analyze overnight changes of sleep behavioral data within TMR conditions, we conducted at first dependent sample t-tests against 0 of Δ-values (post-sleep test minus pre-sleep test) of d' and c-criterion (see Fig. 3). Two-way mixed design ANOVAs were computed to compare Δvalues between TMR conditions. After confirming at least a trend of significance (p < 0.1) for the interaction effect, we conducted post-hoc pairwise comparisons by independent and dependent sample t-tests. For all behavior statistical analyses, the p-value was set at p < 0.05 for two-tailed testing. A p-value < 0.1 and > 0.05 was reported as a trend of significance.

      Comment 16:

      (3) With the assumption that the artificial words in the study have different levels of phonotactic similarity to prior word-sound knowledge, it was surprising to find that the phonotactic probabilities were calculated based on an American English lexicon whilst the participants were German speakers. While it may be the case that the between-language lexicons overlap, it would be reassuring to see some evidence of this, as the level of phonotactic probability is a key manipulation in the study.

      We thank the reviewer pointing to the misalignment between the German-speaking participants and the used artificial words based on American English. In line with this recommendation, we added a more outlined argumentation to the manuscript about the assumption of our study that major common phonetic characteristics across both languages are still preserved.       

      We now discussed these aspects on p.14, ll. 473-481:

      Further, we used artificial words based on American English in combination with German speaking participants, whereas language differences of pronunciation and phoneme structures might affect word perception and memory processing (Bohn and Best, 2012). On the other hand, both languages are considered to have the same language family (Eberhard et al., 2019) and the phonological distance between English and German is quite short compared for example to Korean (Luef and Resnik, 2023). Thus, major common phonological characteristics across both languages are still preserved. In addition, our behavior analyses revealed robust word discrimination learning and distinct memory performance according to different levels of phonotactic probabilities providing evidence of successful experimental manipulation. 

      Comment 17:

      (4) Another manipulation in the study is that participants learn whether the words are linked to a monetary reward or not, however, the rationale for this manipulation is unclear. For instance, it is unclear whether the authors expect the reward to interact with the TMR effects.

      We used a reward task combined with TMR because previous studies revealed beneficial effects of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021). In addition, we wanted to increase the motivation of the participants, as they could receive additional monetary compensation according to their learning and memory task performances. Furthermore, we designed the task, with the overall possibility to translate this task to operant conditioning in rats (see research proposal: https://data.snf.ch/grants/grant/168602). However, the task turned out to be too difficult to translate to rats, whereas we developed a different learning paradigm for the animal study (Klaassen et al., 2021) of this cross-species research project.       

      We added the following sentence to the introduction on p.4, ll. 134-137:

      To consider the beneficial effect of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021), we trained healthy young participants to categorize these words into rewarded and unrewarded words to gain and to avoid losses of money points.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comment 18:

      (1) Please clearly define all linguistics terms - and most importantly the term "phonotactics" - at first use.

      We thank the reviewer for this recommendation and we added the definition of phonotactics and further reduced the diversity of linguistic terms to improve readability. 

      We added the following sentences to the beginning of the introduction on p.3, ll. 72-76:

      One critical characteristic of similarity to pre-existing knowledge in auditory word processing is its speech sound (phoneme) pattern. In phonology as the field of language specific phoneme structures, phonotactics determines the constraints of word phoneme composition of a specific language.

      Comment 19:

      (2) Some critical details about the methods should be included in the Results section to make it comprehensible. For example, the way the crucial differences between G1-4 words should be addressed in the Results, not only in Figure 1.

      According to the recommendation, we added this information to the results section.  We added the following sentences to the results section on p.4, ll. 145-154:

      To study the impact of difficulty in word learning on TMR, we developed a novel learning paradigm. We formed four sets of artificial words (40 words per set; see Table S3 and S4) consisting of different sequences of two vowels and two consonants. Here, we subdivided the alphabet into two groups of consonants (C1: b, c, d, f, g, h, j, k, l, m; C2: n, p, q, r, s, t, v, w, x, z) and vowels (V1: a, e, I; V2: o, u, y). Four-letter-words were created by selecting letters from the vowel and consonant groups according to four different sequences (G1:C1, V1, V2, C2; G2: C1, V1, C2, V2; G3: V1, C1, C2, V2; G4: V1, C1, V2, C2; Fig. 1a; see methods for further details). Comparison analyses between the sets revealed significant differences in phonotactic probability (PP; Fig. 1b; unpaired t-tests: G1 / G2 > G3 / G4, p < 0.005, values of Cohen’s d > 0.71).

      Comment 20

      (3) Was scoring done both online and then verified offline? If so, please note that.

      We included now this information.  

      We adjusted the method section on p.21, ll. 765-769:   

      The sleep stages of NREM 1 to 3 (N1 to N3), wake, and REM sleep were scored offline and manually according to the criteria of the American Academy of Sleep Medicine (AASM) by visual inspection of the signals of the frontal, central, and occipital electrodes over 30s epochs (Iber et al., 2007). Based on offline scoring, we confirmed TMR exposure during N2 and N3 and no significant differences (p-values > 0.05) of sleep parameters between the cueing groups (see Table S2).  

      Comment 21:

      (4) In Figure 2, please arrange the panel letters in an easier-to-read way (e.g., label upper right panel b with a different letter).

      Now we rearranged the panel letters according to the recommendation.

      We adjusted Figure 2 on p.8, ll. 242-258:     

      Comment 22

      (5) In the first paragraph on TMR effects, please note which memory measure you are comparing (i.e., d').

      We added this information according to the recommendation.  

      We adjusted the sentence of the results on p.8, ll. 260-263:

      To examine whether TMR during sleep impacts memory consolidation of discrimination learning with respect to learning difficulty, we calculated the overnight changes by subtracting the pre- from the post-sleep memory performance based on d'-values of the reactivated sequences (cued) and non-reactivated sequences (uncued).

      Comment 23:

      (6) Please show the pre-sleep and post-sleep test scores for both word categories (not only the delta). It may be best to show this as another data point in Fig 2a, but it may be helpful to also see this split between cued and uncued.

      We added the pre-sleep and post-sleep test scores with the individual data points as an additional figure. 

      We added the following figure to the supplementary data on p.28, ll. 936-940:  

      Comment 24:

      (7) In the sentence "An additional two-way mixed design ANOVA on the same values with cueing as a between-subject factor (cued vs. uncued) ...", a more exact phrasing for the last parentheses would probably be "(high-PP-Cued vs Low-PP-Cued)". Both groups were cued.

      We thank the reviewer pointing this out. According to the recommendation, we corrected the descriptions of the two-way mixed design ANOVAs. In addition, we detected a mistake of wrong assignments of the conditions to ANOVAs and corrected the reported values.   

      We adjusted the sentences and corrected the values on p.9, ll. 271-275 and ll. 289-291: 

      An additional two-way mixed design ANOVA on the same values with the factor cueing (cued vs. uncued) as a within-subject factor and group as a between-subject factor revealed trends of significance (p < 0.1) for the interaction (cueing × group: F(1,20) = 3.47, p = 0.08) and the main effect of group (F(1,20) = 3.28, p = 0.09). The main effect of cueing was not significant (F(1,20) = 0.58, p = 0.46).

      An ANOVA on c-criterion changes showed no significant effects (interaction cueing × group: F(1,20) = 2.66, p = 0.12; main effect cueing  F(1,20) = 2.08, p = 0.17; main effect group F(1,20) = 0.38, p = 0.55).

      Comment 25:

      (8) In the same ANOVA, please mention that there is a trend toward an interaction effect. If there wasn't one, the post-hoc comparison would be unwarranted. Please consider noting other p<0.1 pvalues as a trend as well, for consistency.

      Regarding this recommendation, we included now only post-hoc pairwise comparisons after confirming at least a trend toward an interaction effect of these ANOVAs and reported consistently a p-value < 0.1 and > 0.05 as a trend of significance.

      We added the following sentences to the methods p.23, ll. 844-849:

      Two-way mixed design ANOVAs were computed to compare Δ-values between TMR conditions. After confirming at least a trend of significance (p < 0.1) for the interaction effect, we conducted post-hoc pairwise comparisons by independent and dependent sample t-tests. For all behavior statistical analyses, the p-value was set at p < 0.05 for two-tailed testing. A p-value < 0.1 and > 0.05 was reported as a trend of significance.

      We removed the following post-hoc analyses from the results section on p.9, ll. 291-295: 

      Additional post-hoc pairwise comparisons revealed a significant difference between the highPP cued and low-PP uncued (high-PP cued vs. low-PP uncued: t(10) = 2.43, p = 0.04), and no difference to other conditions (high-PP cued vs.: high-PP uncued t(20) = 1.28, p = 0.22; lowPP cued t(20) = 1.57, p = 0.13).          

      Comment 26:      

      (9) Please consider adding an analysis correlating spindle power with memory benefit across participants. Even if it is non-significant, it is important to report given that some studies have found such a relationship.

      According to this recommendation, we conducted an additional correlation analyses.

      We added the following sentences to the manuscript into the results (pp. 10-11, ll. 346-349), the discussion (p.12, ll. 413-417), and the methods (p.23, ll. 864-867):   

      Whereas we found a significant group difference in spindle power nested during SW up-states,   conducting further whole sample (n = 22) correlation analyses between the individual spindle power values of the significant cluster and the overnight changes of behavior measurements revealed no significant correlations (Δ d': r = 0.16, p = 0.48; Δ c-criterion: r = 0.19, p = 0.40).

      In addition to our result of the significant group difference, we failed to find significant correlations between SW nested spindle power values and overnight changes in behavior measurements, whereas previous studies reported associations of SW and spindle activities during sleep with the integration of new memories in pre-existing knowledge networks (Tamminen et al., 2013, 2010).

      By using the same extracted power values (0.3 to 0.8s; 11-14Hz; Pz, P3, P4, O2, P7) per subject, we performed whole sample (n = 22) Pearson correlation analyses between these power values and the overnight changes of behavior measurements of the cued condition (Δ d' and Δ ccriterion).

      Reviewer #2 (Recommendations For The Authors):

      (1) Choice of task

      Comment 27:      

      In general, I find your task well-designed and novel. In light of your research question, however, I wonder why you chose this task. When you outlined the research question in the introduction, I expected a task similar to Schreiner et al. (2015). For example, participants have to associate high PP words with each other and low PP words. The advantage here would be that you could test the benefits of TMR in a within-subjects design (for example, cueing half of the remembered high and half of the remembered low PP words).

      Please see our previous response at comment 14.    

      Comment 28:

      Why did you decide to introduce a reward manipulation?

      Please see our previous response at comment 11.    

      Comment 29:

      Why did you do the cueing on a category level (cueing all high PP or all low PP words instead of single word cueing or instead of cueing 20 reward high-PP, 20 unrewarded high-PP plus 20 reward low-PP and 20 unrewarded low-PP)? Both alternatives would have provided you the option to run your statistics within participants.

      Please see our previous response at comment 14.    

      Comment 30:

      (2) Between-subjects design and small sample size.

      Why did you decide on a between-subjects design that severely reduces your power?

      Why did you just collect 22 participants with such a design? Were there any reasons for this small sample size? Honestly, I think publishing a TMR study with healthy participants and such a small sample size (11 participants for some comparisons) is not advisable.

      Please see our previous response at comment 14.

      Comment 31:

      (3) Encoding performance.

      Is d' significantly above 0 in the first repetition round? I would assume that the distinction between rewarded and non-rewarded words is just possible after the first round of feedback.

      Indeed, conducting t-tests against 0 revealed significantly increased d'-values in the first repetition round (2nd presentation) in both PP conditions (high-PP: 0.85 ± 0.09, t(32) = 9.17, p < 0.001; low-PP: 0.62 ± 0.09, t(32) = 6.83, p < 0.001).  

      Comment 32:

      (4) Encoding response options

      If you want to you could make it more explicit what exactly the response options are. I assume that one button means a word has a high reward and the other button means a word has a low reward. Making it explicit increases the understanding of the results section.

      Please see our previous response at comment 3.

      Comment 33:           

      (5) Alpha desynchronisation.

      Relative change

      Why did you subtract alpha power during the 1st presentation from alpha power during 2nd and 3rd presentation? You baseline-corrected already and individually included the 1st, 2nd, and 3rd repetition in your behavioural analysis.

      Based on this analysis, we aimed to examine the relative change in alpha power between PP-conditions of memory-relevant word repetitions. Therefore, to extract memory relevant changes of EEG activities, the first word presentation of naive stimulus processing could serve as a more representative baseline condition covering the time-window of interest of 0.7 to 1.9 s after the stimulus onset compared to a baseline condition before stimulus onset (-1 to -0.1s). 

      To explain the rational of the analyses with the baseline condition more clearly, we added this information to the results section on p.7, ll. 222-226: 

      We obtained the changes in power values by subtracting the first from the second and third presentation for the high- and low-PP condition, respectively. Here, the first word presentation of naive stimulus processing served us with a more representative baseline condition covering the time-window of interest of 0.7 to 1.9 s after the stimulus onset to examine relevant changes of encoding.  

      Comment 34:

      (6) Alpha desynchronisation as a neural correlate of encoding depth & difficulty?

      "In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth. In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth."

      Given that the low-PP words are more difficult to learn, I was expecting to see higher alpha desynchronisation in the low-PP relative to the high-PP words. Could you outline in a bit more detail how your findings fit into the literature (e.g., Simon Hanslmayr did a lot of work on this)?

      I would also advise you to add citations e.g., after your sentence in the quote above ("as an assumed neural correlate of encoding depth").

      We thank the reviewer for the recommendation giving us the opportunity to discuss in more detail how our results relate to previous findings. 

      We added additional sentences to the discussion on p.13, ll. 441-455:    

      Additional studies linked alpha desynchronization to cognitive effort and cognitive load (Proskovec et al., 2019; Zhu et al., 2021). So, one could assume to observe higher alpha desynchronization in the more difficult to learn condition of low-PP compared to high-PP. On the other hand numerous studies investigating oscillatory correlates of learning and memory showed that alpha desynchronization is associated with memory across different tasks, modalities and experimental phases of encoding and retrieval (Griffiths et al., 2016, 2021, 2019a, 2019b; Hanslmayr et al., 2009; Michelmann et al., 2016). Strikingly, Griffith and colleagues (Griffiths et al., 2019a) revealed by simultaneous EEG-fMRI recordings a negative correlation between the occurrence of patterns of stimulus-specific information detected by fMRI and cortical alpha/beta suppression. Here, the authors suggested that a decrease of alpha/beta oscillations might represent the neuronal mechanism of unmasking the task-critical signal by simultaneous suppression of task-irrelevant neuronal activities to promote information processing. Following this interpretation, we assume that over the course of learning elevated memory processing of the easier to learn stimuli is associated with enhanced information processing and thus accompanied by higher cortical alpha desynchronization in comparison of the more difficult to learn stimuli.

      In addition, we added the mentioned quote on p.7, ll. 239-240:

      In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth (Griffiths et al., 2021; Hanslmayr et al., 2009).

      Comment 35:

      (7) Exclusion criterion.

      Why did you use a d' > 0.9 as a criterion for data inclusion?

      This criterion ensured that each included subject had at least in one PP-condition a d' > 1.05 of pre-sleep memory performance, which corresponds to a general accuracy rate of 70%. 

      Accordingly, we adjusted these sentences of the method section on p.19, ll. 677-680: 

      Data were excluded from subjects who did not reach the minimal learning performance of d' > 1.05 during the pre-sleep memory test in at least one of the two PP conditions, whereas this threshold value corresponds to accuracy rates of 70% (n = 5). In addition, we excluded one subject who showed a negative d' in one PP condition of the pre-sleep memory test (n = 1). 

      Comment 36:

      (8) Coherence of wording.

      When you talk about your dependent variable (d') you sometimes use sensitivity. I would stick to one term.

      We replaced the word sensitivity with d'.    

      (9) Criterion

      Comment 37:

      Why do you refer to a change in criterion (Figure 3b, axis labels) as a change in memory? Do you think the criterion says something about memory?

      We corrected the axis label of Figure 3b and deleted here the word memory.

      Comment 38:

      Additionally, why did you analyse the effect of TMR on the criterion? Do you expect the criterion to change due to sleep-dependent memory consolidation? This section would benefit from more explanation. Personally, I am very interested in your thoughts and your hypothesis (if you had one, if not that is also fine but then, make it explicit that it was an exploratory analysis).

      By conducting exploratory analyses of overnight changes of the c-criterion measurements, we aimed to examine the bias of decision-making to provide comprehensive data according to the framework of the signal detection theory. Regarding the previous literature showing mainly beneficial effects of sleep on learning and memory, we focused with our hypothesis on d' and explored additionally the c-criterion.

      Despite our task design with gains/hits of +10 money points and losses/FAs of -8 (instead of -10), the subjects showed already during the pre-sleep memory task significant biases towards loss avoidance in both PP conditions (t-tests against 0: high-PP: 0.44 ± 0.07, t(21) = 5.63, p < 0.001; low-PP: 0.47 ± 0.09, t(21) = 5.51, p < 0.001). As already reported in the preprint, we found an additional significant increase of c-criterion by TMR solely for the high-PP words (see Fig. 3b). Even by integrating subjects with poor pre-sleep memory performance (high-PP-cueing group: n = 15; low-PP-cueing group: n = 13), t-tests against 0 revealed a significant increase of the high-PP cueing condition (t(14) = 3.36, p = 0.005) and no significant overnight changes in the other conditions (high-PP uncued: t(12) = 1.39, p = 0.19; low-PP cued: t(12) = 1.47, p = 0.17; low-PP uncued: t(14) = -0.20, p = 0.84). These exploratory findings on c-criterion suggest potential applications of TMR to affect decision-making biases in combination with reward learning.      

      We revised the manuscript mentioning the exploratory character of the c-criterion analyses of the results on p.9, ll. 282-283 and of the discussion on p.12, ll. 400-402:  

      We examined next as an exploratory analysis whether TMR conditions influence biases in decision-making.

      By conducting an additional exploratory analysis, we observed a significant change of the decision bias in the cueing condition of the easy to learn words and no overnight changes in the other conditions.

      Comment 39:

      (10) You detected SWs in the time range of 0-6 sec post sound stimulation. How was the distribution of all detected SW down-states in this time range? (You could plot a histogram for this.)

      We illustrated now the detected SWs in the time range of 0 to 6 s after stimulus onset. 

      We added a histogram to the supplementary section on p.30, ll. 982-986:  

      Reviewer #3 (Recommendations For The Authors):

      Comment 40:

      (1) In line with the weakness outlined above, I would recommend including a discussion of how the between-subject comparison and small sample size could affect the results and provide alternative interpretations.

      Please see our previous response at comment 14.

      Comment 41:

      (2) Regarding my point about statistical comparisons, I would recommend that the authors follow best practice guidelines for post-hoc tests and multiple comparisons. In Figures 3a and b, I would also recommend removing the stars indicating significance from the post-hoc tests (if this is what they reflect). Perhaps this link will be useful: https://www.statology.org/anova-post-hoc-tests/

      Please see our previous response at comment 15.    

      Comment 42:

      (3) Furthermore, to address any doubts about the possible phonotactic probability differences between languages, I would recommend that the authors show whether the languages overlap, the level of English fluency in the German-speaking participants, and/or another way of reassuring that this is unlikely to have affected the results.

      Please see our previous response at comment 7.    

      Comment 43:

      (4) In the introduction, I would recommend that the authors outline a clear rationale for the reward/no reward manipulation.

      Please see our previous response at comment 11.    

      Comment 44:

      (5) Figure 1c: Please include what response options participants had, e.g., 'rewarded/not rewarded'. This would make the type of categorization clearer to the reader.

      Please see our previous response at comment 3.

      Comment 45:

      (6) It is unclear whether the additional ANOVA conducted on the time and frequency of the identified clusters included all channels or only the channels contributing to the cluster. Consider clarifying this in the relevant methods and results. Furthermore, I would recommend labelling this as a posthoc test as this analysis was guided by an initial peak at the data and the timings, frequencies, and channels of interest were not selected a-priori.

      We thank the reviewer for this recommendation and labelled the additional repeatedmeasure ANOVA as a post-hoc test. Further, we mentioned the used channels (Pz and Cz) for this analyses.

      We adjusted the results section on p.7, ll. 230-233 and the methods section on p.23, ll. 858-860:            

      A post-hoc repeated-measure ANOVA on alpha power changes (merged over Pz and Cz electrodes) with PP (high vs. low) and presentations (2 to 3) as within-subjects factors revealed a main effect of PP (F(1,32) = 5.42, p = 0.03, η2 = 0.15), and a significant interaction (F(1,32)  = 7.38, p = 0.01, η2 = 0.19; Fig. 2e).

      After confirming the existence of a significant cluster, we conducted an additional post-hoc repeated-measure ANOVA with averaged values of the identified time and frequency range of interest and merged over the Pz and Cz electrodes (see Fig. 2e).

      Comment 46:

      (7) Figure 3: To better illustrate within- vs. between-subjects comparisons and promote transparency, please add individual points and lines between the within-subjects conditions.

      According to this recommendation, we changed Figure 3 to add the individual data points by lines.  

      We modified Figure 3 on p.9, ll. 299-303:  

      Comment 47:

      (8) For the SW density time-bin analyses, please include statistics for all comparisons (i.e., through 0 s to 3 s) and say whether these were corrected for multiple comparisons.

      According to this recommendation, we included now statistics for all comparisons. 

      We added table S6 table to the supplementary data on p.29, l.962:     

      Comment 48:

      (9) Consider reporting effect sizes.

      We thank the reviewer for this recommendation and we added now effect sizes of significant results. 

      Comment 49:

      (10) For transparency and replicability, consider including a list of the four stimulus sets including their phoneme and biphone probabilities.

      We included a list of the four stimulus sets with their phoneme and biphone probabilities  

      We added table S3 and table S4 to the supplementary data on pp. 26-27:       

      References

      Asfestani MA, Brechtmann V, Santiago J, Peter A, Born J, Feld GB. 2020. Consolidation of Reward Memory during Sleep Does Not Require Dopaminergic Activation. J Cogn Neurosci 32:1688– 1703. doi:10.1162/JOCN_A_01585

      Batterink LJ, Oudiette D, Reber PJ, Paller KA. 2014. Sleep facilitates learning a new linguistic rule.

      Neuropsychologia 65:169–79. doi:10.1016/j.neuropsychologia.2014.10.024

      Batterink LJ, Paller KA. 2017. Sleep-based memory processing facilitates grammatical generalization: Evidence from targeted memory reactivation. Brain Lang 167:83–93. doi:10.1016/J.BANDL.2015.09.003

      Bohn OS, Best CT. 2012. Native-language phonetic and phonological influences on perception of American English approximants by Danish and German listeners. J Phon 40:109–128. doi:10.1016/J.WOCN.2011.08.002

      Cairney SA, Guttesen A á. V, El Marj N, Staresina BP. 2018. Memory Consolidation Is Linked to Spindle-Mediated Information Processing during Sleep. Curr Biol 28:948-954.e4. doi:10.1016/j.cub.2018.01.087

      Eberhard DM, Simons GF, Fennig CD. 2019. Ethnologue: Languages of the world . SIL International. Online version: http://www.ethnologue.com.

      Fischer S, Born J. 2009. Anticipated reward enhances offline learning during sleep. J Exp Psychol Learn Mem Cogn 35:1586–1593. doi:10.1037/A0017256

      Green DM, Swets JA. 1966. Signal detection theory and psychophysics., Signal detection theory and psychophysics. Oxford,  England: John Wiley.

      Griffiths B, Mazaheri A, Debener S, Hanslmayr S. 2016. Brain oscillations track the formation of episodic memories in the real world. Neuroimage 143:256–266. doi:10.1016/j.neuroimage.2016.09.021

      Griffiths BJ, Martín-Buro MC, Staresina BP, Hanslmayr S, Staudigl T. 2021. Alpha/beta power decreases during episodic memory formation predict the magnitude of alpha/beta power decreases during subsequent retrieval. Neuropsychologia 153. doi:10.1016/j.neuropsychologia.2021.107755

      Griffiths BJ, Mayhew SD, Mullinger KJ, Jorge J, Charest I, Wimber M, Hanslmayr S. 2019a. Alpha/beta power decreases track the fidelity of stimulus specific information. Elife 8. doi:10.7554/eLife.49562

      Griffiths BJ, Parish G, Roux F, Michelmann S, van der Plas M, Kolibius LD, Chelvarajah R, Rollings DT, Sawlani V, Hamer H, Gollwitzer S, Kreiselmeyer G, Staresina B, Wimber M, Hanslmayr S. 2019b. Directional coupling of slow and fast hippocampal gamma with neocortical alpha/beta oscillations in human episodic memory. Proc Natl Acad Sci U S A 116:21834–21842. doi:10.1073/pnas.1914180116

      Hanslmayr S, Spitzer B, Bäuml K-H. 2009. Brain oscillations dissociate between semantic and nonsemantic encoding of episodic memories. Cereb Cortex 19:1631–40. doi:10.1093/cercor/bhn197

      Iber C, Ancoli‐Israel S, Chesson AL, Quan SF. 2007. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. Westchester, IL: American Academy of Sleep Medicine.

      Klaassen AL, Heiniger A, Sánchez PV, Harvey MA, Rainer G. 2021. Ventral pallidum regulates the default mode network, controlling transitions between internally and externally guided behavior. Proc Natl Acad Sci U S A 118:1–10. doi:10.1073/pnas.2103642118

      Lansink CS, Goltstein PM, Lankelma J V., McNaughton BL, Pennartz CMA. 2009. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol 7. doi:10.1371/JOURNAL.PBIO.1000173

      Luef EM, Resnik P. 2023. Phonotactic Probabilities and Sub-syllabic Segmentation in Language

      Learning. Theory Pract Second Lang Acquis 9:1–31. doi:10.31261/TAPSLA.12468

      Michelmann S, Bowman H, Hanslmayr S. 2016. The Temporal Signature of Memories: Identification of a General Mechanism for Dynamic Memory Replay in Humans. PLoS Biol 14:e1002528. doi:10.1371/journal.pbio.1002528

      Proskovec AL, Heinrichs-Graham E, Wilson TW. 2019. Load Modulates the Alpha and Beta Oscillatory Dynamics Serving Verbal Working Memory. Neuroimage 184:256. doi:10.1016/J.NEUROIMAGE.2018.09.022

      Reber AS. 1967. Implicit learning of artificial grammars. J Verbal Learning Verbal Behav 6:855–863.

      doi:10.1016/S0022-5371(67)80149-X

      Schreiner T, Rasch B. 2015. Boosting vocabulary learning by verbal cueing during sleep. Cereb Cortex 25:4169–4179. doi:10.1093/cercor/bhu139

      Sterpenich V, van Schie MKM, Catsiyannis M, Ramyead A, Perrig S, Yang H-D, Van De Ville D, Schwartz S. 2021. Reward biases spontaneous neural reactivation during sleep. Nat Commun 2021 121 12:1–11. doi:10.1038/s41467-021-24357-5

      Tamminen J, Lambon Ralph MA, Lewis PA. 2013. The role of sleep spindles and slow-wave activity in integrating new information in semantic memory. J Neurosci 33:15376–15381. doi:10.1523/JNEUROSCI.5093-12.2013

      Tamminen J, Payne JD, Stickgold R, Wamsley EJ, Gaskell MG. 2010. Sleep spindle activity is associated with the integration of new memories and existing knowledge. J Neurosci 30:14356–60. doi:10.1523/JNEUROSCI.3028-10.2010

      Zhu Y, Wang Q, Zhang L. 2021. Study of EEG characteristics while solving scientific problems with different mental effort. Sci Rep 11. doi:10.1038/S41598-021-03321-9

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the researchers aimed to investigate the cellular landscape and cell-cell interactions in cavernous tissues under diabetic conditions, specifically focusing on erectile dysfunction (ED). They employed single-cell RNA sequencing to analyze gene expression patterns in various cell types within the cavernous tissues of diabetic individuals. The researchers identified decreased expression of genes associated with collagen or extracellular matrix organization and angiogenesis in several cell types, including fibroblasts, chondrocytes, myofibroblasts, valve-related lymphatic endothelial cells, and pericytes. They also discovered a newly identified marker, LBH, that distinguishes pericytes from smooth muscle cells in mouse and human cavernous tissues. Furthermore, the study revealed that pericytes play a role in angiogenesis, adhesion, and migration by communicating with other cell types within the corpus cavernosum. However, these interactions were found to be significantly reduced under diabetic conditions. The study also investigated the role of LBH and its interactions with other proteins (CRYAB and VIM) in maintaining pericyte function and highlighted their potential involvement in regulating neurovascular regeneration. Overall, the manuscript is well-written and the study provides novel insights into the pathogenesis of ED in patients with diabetes and identifies potential therapeutic targets for further investigation.

      Reviewer #2 (Public Review):

      Summary: In this manuscript, the authors performed single cell RNA-sequencing of cells from the penises of healthy and diabetes mellitus model (STZ injection-based) mice, identified Lbh as a marker of penis pericytes, and report that penis-specific overexpression of Lbh is sufficient to rescue erectile function in diabetic animals. In public human single cell RNA-sea datasets, the authors report that LBH is similarly specific to pericytes and down regulated in diabetic patients. Additionally, the authors report discovery of CRYAB and VIM1 as protein interacting partners with LBH.

      The authors contributions are of interest to the erectile dysfunction community and their Lbh overexpression experiments are especially interesting and well-conducted. However, claims in the manuscript regarding the specificity of Lbh as a pericyte marker, the mechanism by which Lbh overexpression rescues erectile function, cell-cell interactions impaired by diabetes, and protein-interaction partners require qualification or further evidence to justify.

      Major claims and evidence:

      1) Marker gene specificity and quantification: One of the authors' major contributions is the identification of Lbh as a marker of pericytes in their data. The authors present qualitative evidence for this marker gene relationship, but it is unclear from the data presented if Lbh is truly a specific marker gene for the pericyte lineage (either based on gene expression or IF presented in Fig. 2D, E). Prior results (see Tabula Muris Consortium, 2018) suggest that Lbh is widely expressed in non-pericyte cell types, so the claims presented in the manuscript may be overly broad. Even if Lbh is not a globally specific marker, the authors' subsequent intervention experiments argue that it is still an important gene worth studying.

      Answer: We appreciate this comment. In our scRNAseq data for the mouse cavernosum tissues, previously known markers such as Rgs5, Pdgfrb, Cspg4, Kcnj8, Higd1b, and Cox4i2 were found to be expressed not exclusively in pericytes, while Lbh exhibited specific expression patterns in pericytes (Fig. 2 and Supplementary Fig. 5). LBH expression was easily distinguishable from α-SMA, not only in mouse cavernosum but also in dorsal artery and dorsal vein tissues within penile tissues. This distinctive expression pattern of LBH was also observed in the human cavernous pericytes (Fig. 5). Then, we examined Lbh expression patterns in various mouse tissues using the mouse single-cell atlas (Tabula Muris), although endothelial and pericyte clusters were not subclustered in most tissues from Tabula Muris. To identify pericytes, we relied on the expression pattern of known marker genes (Pecam1 for endothelial cells, Rgs5, Pdgfrb, and Cspg4 for pericytes). Lbh was expressed in pericytes of the bladder, heart and aorta, kidney, and trachea but not as specifically in penile pericytes (Supplementary Fig. 6A-D). However, it is worth noting that other known pericyte markers were also did not exhibit exclusive expression in pericytes across all the tissues we analyzed. Therefore, in certain tissues, particularly in mouse penile tissues, Lbh may be a valuable marker in conjunction with other established pericyte marker genes for distinguishing pericytes.

      2) Cell-cell communication and regulon activity changes in the diabetic penis: The authors present cell-cell communication analysis and TF regulon analysis in Fig 3 and report differential activities in healthy and DM mice. These results are certainly interesting, however, no statistical analyses are performed to justify claimed changes in the disease state and no validations are performed. It is therefore challenging to interpret these results, and the relevant claims do not seem well supported.

      Answer: In response to these helpful suggestions, we calculated statistical significance and performed experimental validation. CellphoneDB permutes the cluster labels of all cells 1000 times and calculates the mean(mean(molecule 1 in cluster X), mean(molecule 2 in cluster Y)) at each time for each interaction pair, for each pairwise comparison between two cell types. We only considered interactions in which the difference in means calculated by these permutations were greater than 0.25-fold between diabetes and normal. Also, we considered that the interactions with P-value < 0.05 were significant.

      To assess differential regulon activities of transcription factor (SCENIC) between diabetic and normal pericytes, we utilized a generalized linear model with scaled activity scores for each cell as input. These scaled regulon activity values for angiogenesis-related TFs exhibited differences between diabetic and normal pericytes. The results of the generalized linear model revealed that Klf5, Egr1, and Junb were TFs with significantly altered regulon activities in diabetic pericytes. Experimental data indicated that the expression level of Lmo2, Junb, Elk1, and Hoxd10 was higher (Hoxd10) or lower (Lmo2, Junb, Elk1) in diabetic pericytes compared to normal pericytes (Supplementary Fig. 9). We have added the scaled regulon activity values and statistical significance in Fig. 3E.

      3) Rescue of ED by Lbh overexpression: This is a striking and very interesting result that warrants attention. By simple overexpression of the pericyte marker gene Lbh, the authors report rescue of erectile function in diabetic animals. While mechanistic details are lacking, the phenomenon appears to have a large effect size and the experiments appear sophisticated and well conducted. If anything, the authors appear to underplay the magnitude of this result.

      Answer: We appreciate this comment. Therefore, we have added relevant clarification in the revised manuscript discussion section to emphasize the importance of LBH overexpression on rescuing ED as follows: “To test our hypothesis, we utilized the diabetes-induced ED mouse model, commonly employed in various studies focusing on microvascular complications associated with type 1 diabetes. We observed that the overexpression of LBH in diabetic mice led to the restoration of reduced erectile function by enhancing neurovascular regeneration. However, this study primarily demonstrated the observed phenomenon without delving into the detailed mechanisms. Nonetheless, these results of LBH on erections provide us with new strategies for treating ED and should be of considerable concern.” (Please see revised ‘Discussion’)

      4) Mechanistic claims for rescue of ED by Lbh overexpression: The authors claim that cell type-specific effects on MPCs are responsible for the rescue of erectile function induced by Lbh overexpression. This causal claim is unsupported by the data, which only show that Lbh overexpression influences MPC performance. In vivo, it's likely that Lbh is being over expressed by diverse cell types, any of which could be the causal driver of ED rescue. In fact, the authors report rescue of cell type abundance in endothelial cells and neuronal cells. Therefore, it cannot be concluded that MPC effects alone or in principal are responsible for ED rescue.

      Answer: We agree with these claims. Therefore, we have added relevant clarifications in the discussion section of the revised manuscript. Our findings suggest that LBH can affect the function of cavernous pericytes, although we cannot definitively specify which particular cavernous cell types are affected by the overexpressed LBH, whether it be cavernous endothelial cells, smooth muscle cells, or others. Subsequent research will be required to conduct more comprehensive mechanistic investigations, such as in vitro studies using cavernous endothelial cells, smooth muscle cells, and fibroblasts to address these knowledge gaps. (Please see revised ‘Discussion’)

      5) Protein interaction data: The authors claim that CRYAB and VIM1 are novel interacting partners of LBH. However, the evidence presented (2 blots in Fig. 6A,B) lack the relevant controls. It is possible that CRYAB and VIM1 are cross-reactive with the anti-LBH antibody or were not washed out completely. The abundance of bands on the Coomassie stain in Fig. 6A suggests that either event is plausible. Therefore, the evidence presented is insufficient to support the claim that CRYAB and VIM1 are protein interacting partners of LBH.

      Answer: We agree with these claims. Therefore, we have added the relevant controls(Input) and performed Co-IP (IP: CRYAB or VIM, WB: LBH) to demonstrate CRYAB and VIM1 are not simply cross-reactive antigens to their LBH antibody. Our results show that we can detect the expression of CRYAB and VIM after LBH IP, and we also detect the expression of LBH after CRYAB and VIM IP. In addition, it can be seen from our results that the binding of LBH to VIM is higher than that of CRYAB. Regardless, these results indicate that the binding of CRYAB or VIM to LBH is not a random phenomenon. (Please see revised ‘Result’ and ‘Figure 6B’)

      Impact: These data will trigger interest in Lbh as a target gene within the erectile dysfunction community.

      Reviewer #3 (Public Review):

      Bae et al. described the key roles of pericytes in cavernous tissues in diabetic erectile dysfunction using both mouse and human single-cell transcriptomic analysis. Erectile dysfunction (ED) is caused by dysfunction of the cavernous tissue and affects a significant proportion of men aged 40-70. The most common treatment for ED is phosphodiesterase 5 inhibitors; however, these are less effective in patients with diabetic ED. Therefore, there is an unmet need for a better understanding of the cavernous microenvironment, cell-cell communications in patients with diabetic ED, and the development of new therapeutic treatments to improve the quality of life.

      Pericytes are mesenchymal-derived mural cells that directly interact with capillary endothelial cells (ECs). They play a vital role in the pathogenesis of erectile function as their interactions with ECs are essential for penile erection. Loss of pericytes has been associated with diabetic retinopathy, cancer, and Alzheimer's disease and has been investigated in relation to the permeability of cavernous blood vessels and neurovascular regeneration in the authors' previous studies. This manuscript explores the mechanisms underlying the effect of diabetes on pericyte dysfunction in ED. Additionally, the cellular landscape of cavernous tissues and cell type-specific transcriptional changes were carefully examined using both mouse and human single-cell RNA sequencing in diabetic ED. The novelty of this work lies in the identification of a newly identified pericyte (PC)-specific marker, LBH, in mouse and human cavernous tissues, which distinguishes pericytes from smooth muscle cells. LBH not only serves as a cavernous pericyte marker, but its expression level is also reduced in diabetic conditions. The LBH-interacting proteins (Cryab and Vim) were further identified in mouse cavernous pericytes, indicating that these signaling interactions are critical for maintaining normal pericyte function. Overall, this study demonstrates the novel marker of pericytes and highlights the critical role of pericytes in diabetic ED.

      Reviewer #1 (Recommendations For The Authors):

      1) The methods are poorly written. It lacks specific information on the sample size, experimental design, and data analysis methods employed. The absence of these crucial details makes it difficult to evaluate the robustness and reliability of the findings.

      Answer: We agree with the reviewer’s suggestion, now we revised the methods of our manuscript, and added detailed information or references. For sample size we have added detailed information in Figure legend (Please see revised ‘Method’ , Figure Legend, and Supplementary information.)

      2) The cell number in the scRNA-seq analysis is small (~12000) and some minor cell types are probably underrepresented. It is not clear whether the authors pooled the cells from different mice as one sample, or replicates in different groups have been included. It will be helpful to label different samples in the UMAP. The authors should repeat the experiments with more replicates to increase the cell number and validate the findings.

      Answer: We understand the reviewer's concern, but due to the small size of mouse penile tissue, we had to pool 5 corpus cavernosum tissues for each group (using pooled samples) for scRNA-seq analysis. Moreover, owing to the unique nature of mouse penile tissue, which is highly resistant, it posed challenges for the dissolution and isolation of single cells using conventional single-cell separation methods. Consequently, we had to increase the concentration of the enzyme to finally obtain 12,894 cells. Rather than conducting a repetitive scRNAseq analysis on the same mouse model, we validated our findings in human cavernous single-cell transcriptome data. This analysis allowed us to confirm the presence of pericyte in human corpus cavernosum, specific expression of LBH in human cavernous pericytes, and the identification of relevant GO terms associated with pericyte functions (Figure 5). We have add these information in ‘Method’ (Please see revised ‘Method’).

      3) Functional studies are lacking to justify how manipulating LBH expression or its interacting proteins might lead to effective therapeutic approaches for diabetic ED.

      Answer: We have performed the functional study to evaluate LBH expression might lead to effective therapeutic approaches for diabetic ED as showed in Figure 4G. Assessment of intracavernous pressure (ICP) is the most representative test for evaluating erectile function. Therefore, we modulated LBH expression in the penis of diabetic mice and assessed the erectile function of the mice by intracavernous pressure. However, we have not performed ICP studies and relative in vitro studies (migration, survival experiment) to assess whether LBH-interacting proteins have the same effect.

      4) Although the abstract identifies novel targets for potential interventions, such as LBH and its interacting proteins, the clinical relevance of these findings remains uncertain. The authors should include a discussion regarding the translation of these discoveries into therapeutic strategies or their potential impact on patients with diabetes and ED.

      Answer: We appreciate the reviewer's suggestion and have added a discussion as per the reviewer’s recommendation (Please see revised ‘Discussion’).

      5) While the study highlights the importance of pericytes in penile erection, it fails to mention the broader context of other cell types involved in the pathogenesis of ED. Neglecting to discuss potential contributions from endothelial cells, smooth muscle cells, or neural elements limits the comprehensive understanding of the cellular interactions underlying diabetic ED.

      Answer: We agree with the reviewer's suggestion and have added a discussion regarding the significance of other cell populations in penile tissues, such as endothelial cells, smooth muscle cells fibroblasts, and neural elements, along with the rationale for our focus on pericytes. (Please see revised ‘Discussion’).

      Reviewer #2 (Recommendations For The Authors):

      We congratulate the authors on an interesting study. We were especially excited to see their Lbh overexpression results. However, we felt other claims in the paper could benefit from additional investigation, analysis, and statistical rigor. We have provided a set of suggestions for improvement below.

      Major points:

      1) Pericyte marker gene proposal: See public review for commentary on the following suggested experiments. The authors should perform binary classification analysis using Lbh and report the performance of this gene as a marker (e.g. using the area under the receiver operating characteristic, accuracy, precision and recall). Further, they should consider performing this analysis for all other genes in their data to determine whether Lbh is the best marker gene.

      Answer: We appreciate this comment. AUC scores of Rgs5, Pln, Ednra, Npylr, Atp1b2, and Gpc3 for ability of a binary classifier to distinguish between pericyte and the other cell types in mouse penile tissues were measured by using FindMarkers function. Rgs5 had the highest AUC, but Rgs5 was also expressed in SMCs in our data. Pln, Ednra, Gpc3, and Npy1r also seemed to be candidate markers, but the literature search excluded these genes as they are also expressed in the SMCs of other tissues or different cell types. The AUC score of Lbh was over 0.7, and expression in SMC was not identified in previous studies, and ultimately, we experimentally identified that Lbh is penis pericyte specific. We have added this to the manuscript.

      Author response table 1.

      Robust differential expression analysis should also be performed for this gene (if not all) and the statistics should be reported, given known issues with the statistical approach used by the authors for differential expression (see: Squair 2021, 10.1038/s41467-021-25960-2). The authors' should also report the number of cells involved in these comparisons, as the number of pericytes in the data (Fig 1B) appears quite small.

      Answer: We appreciate this comment. We used “MAST” to identify differentially expressed genes. This test is often used to find DEGs in single-cell RNA data. However, because the pseudobulk method has advantages over the single cell DEG method (Squair 2021, 10.1038/s41467-021-25960-2), we additionally performed DEG analysis with DESeq2 to confirm whether Lbh can distinguish pericytes from other cell types in the penile. As a result, even when tested with DESeq2, Lbh expression was significantly higher in pericytes than in other cell types in penile (adjusted p-value = 2.694475e-07 in Pericyte vs SMC, adjusted P-value = 3.700118e-58 in Pericyte vs the other cell types). Mouse penile tissue is small in size, and the number of pericytes in mouse penile tissue is relatively smaller compared to fibroblasts and chondrocytes. In our mouse penile scRNAseq data, the number of pericytes is as follows: normal: 58, diabetes: 116. Despite the limited number of cells, we were able to establish statistical significance in our analyses.

      Immunostaining results in Fig. 2D, E should likewise be quantified. At present, it's unclear that LBH and aSMA are mutually exclusive as claimed. The authors should also investigate Lbh expression in public single cell genomics data, rather than performing candidate gene literature searches. For example, the Tabula Muris suggests Lbh is expressed widely outside pericytes.

      Answer: For Figure 2D and E, the aim of these analyses was to assess the distribution of LBH and other cellular markers to see if they overlap and if they can be distinguished. We think that some of the overlapping staining in the tissue may be caused by multilayered cellular structures, so staining within cells would be more convincing. Therefore, we quantified the percentage of LBH- or α-SMA-expressed pericytes and relative expression in smooth muscle cells in cell staining (Supplementary Fig. 5E). We found that only 3% of smooth muscle cells expressed LBH, 67% of mouse cavernous pericytes (MCPs) expressed α-SMA, and more than 97% of MCPs expressed LBH. Therefore, these results may illustrate the specific expression of LBH in MCPs. These information was added as ‘Supplementary Fig. 5E’ (Please see revised ‘Supplementary information’). We also examined Lbh expression patterns in various mouse tissues using the public mouse single-cell atlas (Tabula Muris), and provided a detailed response in reviewer 2’s public review 1.

      Even if Lbh is not the best marker, the authors' intervention experiment still motivates study of the gene, but these analyses would help contextualize the result for readers.

      2) Statistical anslyses for cell-cell communication and TF regulon analysis: See public review for context on these comments. The authors should perform statistical tests to evaluate the significance of differences detected for each of these analysis. For example, generalized linear models can be used to assess the significance of TF regulon activity scores from SCENIC, and permutation tests can be used to measure the significance of cell-cell interaction score changes. Without these statistical tests, it's challenging for a reader to interpret whether the results reported are meaningful or within the realm of experimental noise.

      Answer: We appreciate this comment. We calculated statistical significance TF regulon analyses as suggested by the reviewer and described a detailed statistical calculation method for cell-cell communication. We provided a detailed response in reviewer 2’s public review 2.

      3) Mechanism of ED rescue by Lbh overexpression: To support this claim, the authors would need to perform an experiment where Lbh is over expressed specifically in MPCs (using e.g. a specific promoter on their LTV construct, or a transgenic line with a cell type-specific Cre-Lox system). Absent these data, the claim should be removed.

      Answer: We agree with the reviewer's suggestion and we have reworked the claim that ‘LBH overexpression is affected by pericytes during ED recovery’ and have added relevant clarification in the Discussion section to clearly state that LBH overexpression may affect many cavernosum cells, such as cavernous endothelial cells, smooth muscle cells, fibroblasts, and pericytes (Please see revised ‘Result’ and ‘Discussion’)

      4) Protein interaction claims: This experiment would require that the authors perform a similar pull-down with LBH KO cells and or a reciprocal Co-IP (e.g. IP: CRYAB or VIM1, WB: LBH) to demonstrate CRYAB and VIM1 are not simply cross-reactive antigens to their LBH antibody. Further, these experiments appear to only have a single replicate for each condition. The authors should either remove associated claims, or perform a Co-IP experiment with the relevant controls with sufficient replication.

      Answer: We agree with the claims. Therefore, we have included the necessary controls (Input) and performed Co-IP (IP: CRYAB or VIM1, WB: LBH) to demonstrate that CRYAB and VIM1 are not simply cross-reactive antigens to their LBH antibody. Our results show that we can detect the expression of CRYAB and VIM after LBH IP, and we also detect the expression of LBH after CRYAB and VIM IP. In addition, it can be seen from our results that the binding of LBH to VIM is higher than that of CRYAB. Regardless, these results indicate that the binding of CRYAB or VIM to LBH is not a random phenomenon. Additionally, all IP experiments were replicated at least three times. (Please see revised ‘Result’ and ‘Figure 6B’)

      Minor Points:

      • The reference "especially in men" on line 56 seems odd given that only males can experience penile erectile dysfunction.

      Answer: We agree with the reviewer's suggestion and have removed the description 'especially male' (Please see revised ‘Introduction’)

      • Line 109, it's unclear what genes showed altered expression in Schwann cells.

      Answer: We apologize for the confusion. There was no significant differentially expressed genes between normal and diabetes in Schwann cells. We revised this part in the manuscript. (Schwann cells showed an increased expression compared to normal cells in diabetes, though not significant. In Schwann cells, there were no significant DEGs between diabetic and normal cells.)

      • It would be helpful for readers to see an analysis of the cell types that are transduced in the Lbh overexpression experiment in vivo. At present, some pericyte specificity is implied, but not demonstrated.

      Answer: We appreciate this comment. Our findings suggest that LBH can affect the function of cavernous pericytes, although we cannot definitively conclude which specific-cavernous cell types are affected by the overexpressed LBH, whether it be cavernous endothelial cells, smooth muscle cells, or others. Subsequent research will be required to conduct more comprehensive mechanistic investigations, such as in vitro studies using cavernous endothelial cells, smooth muscle cells, and fibroblasts to address these knowledge gaps. These were also mentioned in the manuscript.

      • To improve clarity and enhance readability, define abbreviations before their initial usage in the text. For instance, in the second paragraph of the Introduction, the abbreviation 'ECs' is used without prior definition. It can be inferred that it is referring to endothelial cells, mentioned in parentheses in the subsequent sentence.

      Answer: We agree with the reviewer's suggestion to expand acronyms and ensure that all acronyms are defined in the revised manuscript before they are used for the first time in the text (Please see revised Manuscript).

      • It is important to include relevant references that align with the content being discussed. For example, in the Introduction, pericytes are described as being involved in various processes such as angiogenesis, vasoconstriction, and permeability. The text refers to a single reverence, a review by Gerhardt and Besholtz, which primarily focuses on pericyte's role in regulating angiogenesis. Adding additional sources, such as the review by Bergers and Song (Neuro Oncol., 2005) is recommended.

      Answer: We agree with the reviewer's suggestion, and have added the reference as reviewer recommended (Please see revised Manuscript and reference).

      • Figure 3E: it is stated that a panel of 53 angiogenesis factors were tested, it is stated that only MMP3 showed increased expression. However, various unlabeled spots appear to show changed expression patterns. It would be helpful to show a summary graph with the relative intensities of the full array of factors tested.

      Answer: We agree with the reviewer’s suggestion, now we showed all spots density in angiogenesis array as Supplementary Table 1. The condition of the spots we selected was that the expression density was at least above 1500, and the change ratio was greater than 1.2. (Please see revised ‘Supplementary information’)

      Reviewer #3 (Recommendations For The Authors):

      Detailed statistical power calculation

      Data availability statement( were both mouse and human scRNA deposited in GEO with a taken and when will they be released to the public?)

      Answer: Human scRNA data have been deposited in GEO under accession number GSE206528. Our mouse scRNA dataset has been uploaded to KoNA and is available for download (https://www.kobic.re.kr/kona/review?encrypt_url=amlod2FucGFya3xLQUQyMzAxMDEz)

      Major concerns about this work

      1) The single cell RNAseq data collected for mouse diabetic ED(Fig 1B), FB are the most abundant cell population compared to PC, EC, SMC and other clusters. The rationale for studying FB clusters (in Figure 1, D-F) instead of PC cluster is unclear. Which cluster DEG did the authors annotate for Fig 1G-H?

      Answer: We understand the reviewer's suggestion and confusion. Although other major cell populations in penile tissue such as smooth muscle cells, endothelial cell, and fibroblasts have been extensively studied, pericytes have mainly been investigated in the context of the central nervous system (CNS). For example, in the CNS, pericytes are involved in maintaining the integrity of the brain's blood-brain barrier (BBB) [PMID: 27916653], regulating blood flow at capillary junctions [PMID: 33051294], and promoting neuroinflammatory processes [PMID: 31316352], whose dysfunction is considered an important factor in the progression of vascular diseases such as Alzheimer's disease [PMID: 24946075]. But little is known about the role of pericytes in penile tissue [PMID: 35865945; PMID: 36009395; PMID: 26044953]. In order to explore the role of pericytes in repairing the corpus cavernosum vascular and neural tissues damaged by DM, we focused on pericytes, which are multipotent perivascular cells that contribute to the generation and repair of various tissues in response to injury. Although recent studies have shown that pericytes are involved in physiological mechanisms of erection, little is known about their detailed mechanisms. We have also added this rationale in discussion.

      Single cell level study has not been conducted in mouse penile tissues. Therefore, before delving into pericytes, we aimed to identify overall transcriptome differences between normal and diabetic conditions in mouse penile tissues. We presented the analyses of FB, which make up the largest proportion among the cell types in the mouse penis, in Fig. 1D-F. The analysis of other cell types is provided in Supplementary Fig. 1-4. Fig. 1G-H are GO terms for Fibroblasts clusters. We added this information in the figure.

      2) Fig 2 is the critical data to show Lbh is a cavernous PC specific marker. More PC violin plots to identify PC cluster such as Cspg4, Kcnj8, Higd1b, Cox4i2 and more SMC violin plots to identify SMC cluster such as Acta2, Myh11, Tagln, Actg2 should be used for inclusion and exclusion of PC( the same concern applied to human scRNAseq in Fig 5B).

      Answer: We appreciate this comment. We examined the expression of other marker genes of pericytes and SMCs. Although some marker genes were rarely expressed in the mouse penis data (Kcnj8, Higd1b), the expression of marker genes tended to be relatively high in each cluster. The expression of Cspg4 and Cox4i2 was higher in pericytes than in SMCs, while the expression of Acta2, Myh11,and Tagln was higher in SMCs than in pericytes. Actag2 was specifically expressed in SMCs. Through the gene set enrichment test as well as the expression of known cell type marker genes, we identified that the annotation of pericyte and SMC was appropriate (Fig. 2B and Fig. 5C). We added the violin plots of these marker genes in Supplementary Fig. 5.

      Author response image 1.

      (Mouse)

      In human penis data, ACTA2 and MYH11 were expressed in SMCs, pericytes, and myofibroblasts, as in the previous paper [PMID: 35879305]. Among pericyte markers, the number of cells expressing KCNJ8 and HIGD1B was small. The cluster we annotated as pericyte was double positive for pericyte markers CSPG4 and COX4I2. ACTG2, a marker for SMC, was expressed more highly in SMC than in pericytes and myofibroblasts. As in the mouse penis data, we identified that the annotation of each cell type was appropriate through the gene set enrichment test in the human penis data. We added the violin plots of CSPG4, COX4I2, and ACTG2 in Supplementary Fig. 11.

      Author response image 2.

      (Human)

      When exploring Lbh expression levels in "Database of gene expression in adult mouse brain and lung vascular and perivascular cells" from https://betsholtzlab.org/VascularSingleCells/database.html, Lbh is not uniquely expressed in PC, suggesting its tissue-specific expression level. This difference should be discussed in the Discussion section.

      Answer: We appreciate this valuable comment. For the answer to this comment, we extensively analyzed Lbh expression patterns in various mouse tissues using the public mouse single-cell atlas (Tabula Muris) as also suggested by Reviewer 2. Please see our detailed response in reviewer 2’s public review 1.

      3) In prior studies on PC morphology and location (PMID: 21839917), they reside in capillaries (diameter less than 10um) or distal vessels (diameter less than 25um) and have oval cell body and long processes. Due to the non-specificity of Pdgfrb, SMC are positive for Pdgfrb staining (this has been shown in many publications that SMC are Pdgfrb+; unfortunately, NG2 antibody also stains for both PC and SMC). Therefore, the LBH immunostaining (in Fig 2D and 2E of large-sized vessels) are very likely for SMC identity, not PC. PC should be in close contact with CD31+ ECs in healthy conditions. The LBH immunostaining of PC in both mouse and human tissues (Fig 4) must be replaced and better characterized.

      Answer: We agree with the reviewer's suggestion. As it is widely known, peicytes are primarily located in capillaries, where they surround endothelial cells of blood vessels. However, recent discoveries have identified cells with pericyte-like characteristics in the walls of large blood vessels, challenging the traditional concept [PMID: 27268036]. In our study, we observed minimal overlap in staining between LBH and α-SMA, suggesting that the cells expressing LBH were not smooth muscle cells but possibly pericyte-like cells in large vessels. In small vessels within the bladder, kidney, and even the aorta, we found LBH-expressing cells surrounding CD31-expressing vessels, consistent with the known characteristics of pericytes. Further research is needed to comprehend the differences in LBH expression and its characteristics in both large and small blood vessels. We have added discussions and references for this issue (Please see revised ‘Discussion’ and ‘Reference’)

      4) How do mouse cavernous pericytes isolate? How is purity?

      Answer: As the reviewer points out, we isolated mouse spongiform pericytes following our and other previously published methods. We used pigment epithelium-derived factor (PEDF), which removes non-pericytic cells [PMID: 30929324, 23493068]. Although there are no purity study results such as FACS, other staining results thoroughly support the notion that this method yields pericytes with a notably high level of purity. (Please see ‘Method’ section).

      5) Can mouse scRNAseq cell-cell communication in Fig 3 be reproducible in human scRNAseq cell-cell communication? The results in human ED are more clinically significant than in mouse data.

      Answer: In human scRNAseq data, the difference between angiogenesis-related interactions between normal and diabetes was not as significant as that in mouse data. Because the cell type composition of the human and mouse penis is not completely identical, there are limitations in comparing cell-cell interactions. However, in the human penis data, some interactions related to angiogenesis between pericytes and other cell types were decreased in diabetes compared to normal (boxed parts).

      Author response image 3.

      6) Fibroblasts also express Vim. Murine PC VIM/CRYAB( should be written as Vim/Cryab as mouse proteins) direct interaction with Lbh is unclear from Lbh IP as Fig 6A red boxes showed a wide range of sizes. Where is the band for Lbh? Do human PC LBH interact with VIM/CRYAB?

      Answer: We agree with the reviewer's comment. VIM is a type III intermediate filament protein expressed in many cell types. We have added the relevant controls (Input) and performed Co-IP (IP: CRYAB or VIM, WB: LBH) to demonstrate CRYAB and VIM are not simply cross-reactive antigens to their LBH antibody. In western blot study, the LBH band was expressed between 35 kDa-48 kDa. From Figure 6A, we detected CRYAB in band 1 and VIM in bands 2 and 3. This may be due to the formation of dimers or multimers by VIM. We did not use human PCs for IP studies because IP requires large amounts of protein, making IP studies using human pericyte challenging. Nevertheless, the interaction between LBH and CRYAB in humans has been reported through fluorescent resonance energy transfer assay and affinity chromatography technology assay [PMID:34000384, PMID:20587334].

      7) In Fig 6H and I, why does CRYAB expression significantly reduce in vitro and in vivo under diabetic conditions, whereas VIM expression significantly increases?

      Answer: As the reviewer pointed out, and we have discussed on this issue in the manuscript, CRYAB is known to promote angiogenesis. Diabetes reduces CRYAB expression, so angiogenesis may be impaired. Furthermore, since VIM is a multifunctional protein, it interacts with several other proteins with multiple functions under various pathophysiological conditions. There are many relevant literatures showing that VIM expression is increased under diabetic conditions [PMID: 28348116 and PMID: 32557212]. And VIM deficiency protects against obesity and insulin resistance in patients with type 2 diabetes. Therefore, we hypothesize that exogenous LBH may have the ability to bind to the increased VIM in diabetic conditions and inactivate the effects of VIM. Thereby achieving the protective effect. This needs to be proved in further studies.

      8) The therapeutic strategies targeting (Lbh-Cryab-Vim) on mouse diabetic ED model is not investigated and need to be further validated and discussed.

      Answer: As the reviewers pointed out, in this study, we did not evaluate the targeted therapeutic strategy for LBH-CRYAB-VIM in a mouse diabetic ED model. We only identified the binding potential of these three proteins. Evaluation of this treatment strategy requires further study. For example, we can employ shRNA lentivirus, either alone or in combination, to downregulate CRYABexpression [PMID: 31612679] in normal mice, utilize a lentiviral vector CMV-GFP-puro-vimentin to overexpress Vimentin [PMID: 36912679], and then treat it with LBH to evaluate whether the LBH effect still exists (in vivo erectile function study and in vitro angiogenesis assay). We include this information in the Discussion section as a limitation of this study (Please see revised ‘Discussion’).

      9) The Discussion of current knowledge of pericytes in diabetic ED and other diseases and the significance of this study as well as clinical implications, should be expanded.

      Answer: As the reviewers pointed out, we have expanded the current knowledge of pericytes in diabetic ED and other diseases (CNS disease) and clinical implications as follows: “Although other major cell populations in penile tissue such as smooth muscle cells, endothelial cell, and fibroblasts have been extensively studied, pericytes have mainly been investigated in the context of the central nervous system (CNS). For example, in the CNS, pericytes are involved in maintaining the integrity of the brain's blood-brain barrier (BBB), regulating blood flow at capillary junctions, and promoting neuroinflammatory processes, whose dysfunction is considered an important factor in the progression of vascular diseases such as Alzheimer's disease. But little is known about the role of pericytes in penile tissue.” (Please see revised ‘Discussion’).

      10) How many clinical samples were used? How many times did each experiment repeat?

      Answer: As the reviewers pointed out, the clinical samples’ information was added in ‘method’ section. A total four human samples were used in this study (‘human corpus cavernosum tissues were obtained from two patients with congenital penile curvature (59-year-old and 47-year-old) who had normal erectile function during reconstructive penile surgery and two patients with diabetic ED (69-year-old and 56-year-old) during penile prosthesis implantation.’). For in vivo study, we quantified four different fields from human samples.

      Minor concerns

      1) Fig 1A, why normal mouse's body size is the same as DM?

      Answer: As the reviewer pointed out, in Figure 1A, while the size of normal mice and DM mice may not appear significantly different, there are indeed notable difference in body weight and size. The normal mice body weigh we used was about 30 grams, while DM mice body weigh was generally less than 24 grams. We found that we missed information on physiological and metabolic parameters from in vivo studies (ICP function study). Therefore, we have added it in Supplementary Table 2 (Please see revised ‘Supplementary information’)

      2) The label and negative, and positive controls for Fig 6B are missing.

      Answer: We thank for pointing out this. We have added the relevant controls (Input) and performed Co-IP (IP: CRYAB or VIM1, WB: LBH) to demonstrate CRYAB and VIM1 are not simply cross-reactive antigens to their LBH antibody and all IP was replicated for at least 3 times. (Please see revised ‘Result’ and ‘Figure 6B’)

      3) The limitation of this study and future work should be discussed.

      Answer: As the reviewer pointed out, we have added the limitation of this study and future direction in the discussion section (Please see revised ‘Discussion’).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors report an fMRI investigation of the neural mechanisms by which selective attention allows capacity-limited perceptual systems to preferentially represent task-relevant visual stimuli. Specifically, they examine competitive interactions between two simultaneously-presented items from different categories, to reveal how task-directed attention to one of them modulates the activity of brain regions that respond to both. The specific hypothesis is that attention will bias responses to be more like those elicited by the relevant object presented on its own, and further that this modulation will be stronger for more dissimilar stimulus pairs. This pattern was confirmed in univariate analyses that measured the mass response of a priori regions of interest, as well as multivariate analyses that considered the patterns of evoked activity within the same regions. The authors follow these neuroimaging results with a simulation study that favours a "tuning" mechanism of attention (enhanced responses to highly effective stimuli, and suppression for ineffective stimuli) to explain this pattern.

      Strengths:

      The manuscript clearly articulates a core issue in the cognitive neuroscience of attention, namely the need to understand how limited perceptual systems cope with complex environments in the service of the observer's goals. The use of a priori regions of interest, and the inclusion of both univariate and multivariate analyses as well as a simple model, are further strengths. The authors carefully derive clear indices of attentional effects (for both univariate and multivariate analyses) which makes explication of their findings easy to follow.

      Weaknesses:

      There are some relatively minor weaknesses in presentation, where the motivation behind some of the procedural decisions could be clearer. There are some apparently paradoxical findings reported -- namely, cases in which the univariate response to pairs of stimuli is greater than to the preferred stimulus alone -- that are not addressed. It is possible that some of the main findings may be attributable to range effects: notwithstanding the paradox just noted, it seems that a floor effect should minimise the range of possible attentional modulation of the responses to two highly similar stimuli. One possible limitation of the modelled results is that they do not reveal any attentional modulation at all under the assumptions of the gain model, for any pair of conditions, implying that as implemented the model may not be correctly capturing the assumptions of that hypothesis.

      We thank the reviewer for the constructive comments. In response, in the current version of the manuscript we have improved the presentation. We further discuss how the response in paired conditions is in some cases higher than the response to the preferred stimulus in this letter. For this, we provide a vector illustration, and a supplementary figure of the sum of weights to show that the weights of isolated-stimulus responses for each category pair are not bound to the similarity of the two isolated responses.

      Regarding the simulation results, we have clarified that the univariate effect of attention is not the attentional modulation itself, but the change in the amount of attentional modulation in the two paired conditions. We provide an explanation for this in this letter below, and have changed the term “attentional modulation” to “univariate shift” in the manuscript to avoid the confusion.

      Reviewer #2 (Public Review):

      Summary:

      In an fMRI study requiring participants to attend to one or another object category, either when the object was presented in isolation or with another object superimposed, the authors compared measured univariate and multivariate activation from object-selective and early visual cortex to predictions derived from response gain and tuning sharpening models. They observed a consistent result across higher-level visual cortex that more-divergent responses to isolated stimuli from category pairs predicted a greater modulation by attention when attending to a single stimulus from the category pair presented simultaneously, and argue via simulations that this must be explained by tuning sharpening for object categories.

      Strengths:

      - Interesting experiment design & approach - testing how category similarity impacts neural modulations induced by attention is an important question, and the experimental approach is principled and clever.

      - Examination of both univariate and multivariate signals is an important analysis strategy.

      - The acquired dataset will be useful for future modeling studies.

      Weaknesses:

      - The experimental design does not allow for a neutral 'baseline' estimate of neural responses to stimulus categories absent attention (e.g., attend fixation), nor of the combination of the stimulus categories. This seems critical for interpreting results (e.g., how should readers understand univariate results like that plotted in Fig. 4C-D, where the univariate response is greater for 2 stimuli than one, but the analyses are based on a shift between each extreme activation level?).

      We are happy to clarify our research rationale. We aimed to compare responses in paired conditions when the stimuli were kept constant while varying the attentional target. After we showed that the change in the attentional target resulted in a response change , we compared the amount of this response change to different stimulus category pairs to investigate the effect of representation similarity between the target and the distractor on the response modulation caused by attentional shift. While an estimate of the neural responses in the absence of attention might be useful for other modeling studies, it would not provide us with more information than the current data to answer the question of this study.

      Regarding the univariate results in Fig. 4C-D (and other equivalent ROI results in the revised version) and our analyses, we did not impose any limit on the estimated weights of the two isolated responses in the paired response and thus the sum of the two weights could be any number. We however see that the naming of “weighted average”, which implies a sum of weights being capped at one, has been misleading . We have now changed the name of this model to “linear combination” to avoid confusion

      Previous studies (Reddy et al., 2009, Doostani et al., 2023) using a similar approach have shown a related results pattern: the response to multiple stimuli is higher than the average, but lower than the sum of the isolated responses, which is exactly what our results suggest. We have added discussion on this topic in the Results section in lines 409-413 for clarification:

      “Note that the response in paired conditions can be higher or lower than the response to the isolated more preferred stimulus (condition Mat), depending on the voxel response to the two presented stimuli, as previously reported (Doostani et al. 2023). This is consistent with previous studies reporting the response to multiple stimuli to be higher than the average, but lower than the sum of the response to isolated stimuli (Reddy et al. 2009).”

      We are not sure what the reviewer means by “each extreme activation level”. Our analyses are based on all four conditions. The two isolated conditions are used to calculate the distance measures and the two paired conditions are used for calculating the shift index. Please note that either the isolated or the paired conditions could show the highest response and we seeboth cases in our data. For example, as shown in Figure 4A in EBA, the isolated Body condition and the paired BodyatCar condition show the highest activation levels for the Body-Car pair, whereas in Figure 4C, the two paired conditions (BodyatCat and BodyCatat) elicit the highest response.

      - Related, simulations assume there exists some non-attended baseline state of each individual object representation, yet this isn't measured, and the way it's inferred to drive the simulations isn't clearly described.

      We agree that the simulations assume a non-attended baseline state, and that we did not measure that state empirically. We needed this non-attended response in the simulations to test which attention mechanism led to the observed results. Thus, we generated the non-attended response using the data reported in previous neural studies of object recognition and attention in the visual cortex (Ni et al., 2012, Bao and Tsao, 2018). Note that the simulations are checking for the profile of the modulations based on category distance. Thus, they do not need to exactly match the real isolated responses in order to show the effect of gain and tuning shift on the results. We include the clarification and the range of neural responses and attention parameters used in the simulations in the revised manuscript in lines 327-333:

      “To examine which attentional mechanism leads to the effects observed in the empirical data, we generated the neural response to unattended object stimuli as a baseline response in the absence of attention, using the data reported by neural studies of object recognition in the visual cortex (Ni et al., 2012, Bao and Tsao, 2018). Then, using an attention parameter for each neuron and different attentional mechanisms, we simulated the response of each neuron to the different task conditions in our experiment. Finally, we assessed the population response by averaging neural responses.”

      - Some of the simulation results seem to be algebraic (univariate; Fig. 7; multivariate, gain model; Fig. 8)

      This is correct. We have used algebraic equations for the effect of attention on neural responses in the simulations. In fact, thinking about the two models of gain and tuning shift leads to the algebraic equations, which in turn logically leads to the observed results, if no noise is added to the data. The simulations are helpful for visualizing these logical conclusions. Also, after assigning different noise levels to each condition for each neuron, the results are not algebraic anymore which is shown in updated Figure 7 and Figure 8.

      - Cross-validation does not seem to be employed - strong/weak categories seem to be assigned based on the same data used for computing DVs of interest - to minimize the potential for circularity in analyses, it would be better to define preferred categories using separate data from that used to quantify - perhaps using a cross-validation scheme? This appears to be implemented in Reddy et al. (2009), a paper implementing a similar multivariate method and cited by the authors (their ref 6).

      Thank you for pointing out the missing details about how we used cross-validation. In the univariate analysis, we did use cross validation, defining preferred categories and calculating category distance on one half of the data and calculating the univariate shift on the other half of the data. Similarly, we employed cross-validation for the multivariate analysis by using one half of the data to calculate the multivariate distance between category pairs, and the other half of the data to calculate the weight shift for each category pair. We have now added this methodological information in the revised manuscript.

      - Multivariate distance metric - why is correlation/cosine similarity used instead of something like Euclidean or Mahalanobis distance? Correlation/cosine similarity is scale-invariant, so changes in the magnitude of the vector would not change distance, despite this likely being an important data attribute to consider.

      Since we are considering response patterns as vectors in each ROI, there is no major difference between the two measures for similarity. Using euclidean distance as a measure of distance (i.e. inverse of similarity) we observed the same relationship between weight shift and category euclidean distance. There was a positive correlation between weight shift and the euclidean category distance in all ROIs ( ps < 0.01, ts > 2.9) except for V1 (p = 0.5, t = 0.66). We include this information in the revised manuscript in the Results section lines 513-515:

      “We also calculated category distance based on the euclidean distance between response patterns of category pairs and observed a similarly positive correlation between the weight shift and the euclidean category distance in all ROIs (ps < 0.01, ts >2.9) except V1 ( p = 0.5, t = 0.66).”

      - Details about simulations implemented (and their algebraic results in some cases) make it challenging to interpret or understand these results. E.g., the noise properties of the simulated data aren't disclosed, nor are precise (or approximate) values used for simulating attentional modulations.

      We clarify that the average response to each category was based on previous neurophysiology studies (Ni et al., 2012, Bao and Tsao, 2018). The attentional parameter was also chosen based on previous neurophysiology (Ni et al., 2012) and human fMRI (Doostani et al., 2023) studies of visual attention by randomly assigning a value in the range from 1 to 10. We have included the details in the Methods section in lines 357-366:

      “We simulated the action of the response gain model and the tuning sharpening model using numerical simulations. We composed a neural population of 4⨯105 neurons in equal proportions body-, car-, cat- or house-selective. Each neuron also responded to object categories other than its preferred category, but to a lesser degree and with variation. We chose neural responses to each stimulus from a normal distribution with the mean of 30 spikes/s and standard deviation of 10 and each neuron was randomly assigned an attention factor in the range between 1 and 10 using a uniform distribution. These values are comparable with the values reported in neural studies of attention and object recognition in the ventral visual cortex (Ni et al. 2012, Bao and Tsao 2018). We also added poisson noise to the response of each neuron (Britten et al. 1993), assigned randomly for each condition of each neuron.”

      - Eye movements do not seem to be controlled nor measured. Could it be possible that some stimulus pairs result in more discriminable patterns of eye movements? Could this be ruled out by some aspect of the results?

      Subjects were instructed to direct their gaze towards the fixation point. Given the variation in the pose and orientation of the stimuli, it is unlikely that eye movements would help with the task. Eye movements have been controlled in previous experiments with individual stimulus presentation (Xu and Vaziri-Pashkam, 2019) and across attentional tasks in which colored dots were superimposed on the stimuli (Vaziri-Pashkam and Xu, 2017) and no significant difference for eye movement across categories or conditions was observed. As such, we do not think that eye movements would play a role in the results we are observing here.

      - A central, and untested/verified, assumption is that the multivariate activation pattern associated with 2 overlapping stimuli (with one attended) can be modeled as a weighted combination of the activation pattern associated with the individual stimuli. There are hints in the univariate data (e.g., Fig. 4C; 4D) that this might not be justified, which somewhat calls into question the interpretability of the multivariate results.

      If the reviewer is referring to the higher response in the paired compared to the isolated conditions, as explained above, we have not forced any limit on the sum of the estimated weights to equal 1 or 2. Therefore, our model is an estimation of a linear combination of the two multivariate patterns in the isolated conditions. In fact, Leila Reddy et al. (reference 6) reported that while the combination is closer to a weighted average than to a weighted sum, the sum of the weights are on average larger than 1. In Figure 4C and 4D the responses in the paired conditions are higher than either of the isolated-condition responses. This suggests that the weights for the linear combination of isolated responses in the multivariate analysis should add up to larger than one. This is what we find in our results. We have added a supplementary figure to Figure 6, depicting the sum of weights for different category pairs in all ROIs. The figure illustrates that in each ROI, the sum of weights are greater than 1 for some category pairs. It is however noteworthy that we normalized the weights in each condition by the sum of weights to calculate the weight shift in our analysis. The amount of the weight shift was therefore not affected by the absolute value of the weights.

      - Throughout the manuscript, the authors consistently refer to "tuning sharpening", an idea that's almost always used to reference changes in the width of tuning curves for specific feature dimensions (e.g., motion direction; hue; orientation; spatial position). Here, the authors are assaying tuning to the category (across exemplars of the category). The link between these concepts could be strengthened to improve the clarity of the manuscript.

      The reviewer brings up an excellent point. Whereas tuning curves have been extensively used for feature dimensions such as stimulus orientation or motion direction, here, we used the term to describe the variation in a neuron’s response to different object stimuli.

      With a finite set of object categories, as is the case in the current study, the neural response in object space is discrete, rather than a continuous curve illustrated for features such as stimulus orientation. However, since more preferred and less preferred features (objects in this case) can still be defined, we illustrated the neural response using a hypothetical curve in object space in Figure 3 to show how it relates with other stimulus features. Therefore, here, tuning sharpening refers to the fact that the response to the more preferred object categories has been enhanced while the response to the less preferred stimulus categories is suppressed.

      We clarify this point in the revised manuscript in the Discussion section lines 649-659:

      “While tuning curves are commonly used for feature dimensions such as stimulus orientation or motion direction, here, we used the term to describe the variation in a neuron’s response to different object stimuli. With a finite set of object categories, as is the case in the current study, the neural response in object space is discrete, rather than a continuous curve illustrated for features such as stimulus orientation. The neuron might have tuning for a particular feature such as curvature or spikiness (Bao et al., 2020) that is present to different degrees in our object stimuli in a continuous way, but we are not measuring this directly. Nevertheless, since more preferred and less preferred features (objects in this case) can still be defined, we illustrate the neural response using a hypothetical curve in object space. As such, here, tuning sharpening refers to the fact that the response to the more preferred object categories has been enhanced while the response to the less preferred stimulus categories is suppressed.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      a. The authors should address the apparent paradox noted above (and report whether it is seen in other regions of interest as well). On what model would the response to any pair of stimuli exceed that of the response to the preferred stimulus alone? This implies some kind of Gestalt interaction whereby the combined pair generates a percept that is even more effective for the voxels in question than the "most preferred" one?

      The response to a pair of stimuli can exceed the response to each of the stimuli presented in isolation if the voxel is responsive to both stimuli and as long as the voxel has not reached its saturation level. This phenomenon has been reported in many previous studies (Zoccolan et al., 2005, Reddy et al., 2009, Ni et al., 2012, Doostani et al., 2023) and can be modeled using a linear combination model which does not limit the weights of the isolated responses to equal 1 (Doostani et al., 2023). Note that the “most preferred” stimulus does not necessarily saturate the voxel response, thus the response to two stimuli could be more effective based on voxel responsiveness to the second stimulus.

      As for the current study, the labels “more preferred” and “less preferred” are only relatively defined (as explained in the Methods section), meaning that the more preferred stimulus is not necessarily the most preferred stimulus for the voxels. Furthermore, the presented stimuli are semi-transparent and presented with low-contrast, which moves the responses further away from the saturation level. Based on reported evidence for multiple-stimulus responses, responses to single stimuli are in many cases sublinearly added to yield the multiple-stimulus response (Zoccolan et al., 2005, Reddy et al., 2009, Doostani et al., 2023). This means that the multiple-stimulus response is lower than the sum of the isolated responses and not lower than each of the isolated responses. Therefore, it is not paradoxical to observe higher responses in paired conditions compared to the isolated conditions. We observe similar results in other ROIs, which we provide as supplementary figures to Figure 4 in the revised manuscript.

      We address this observation and similar reports in previous studies in the Results section of the revised manuscript in lines 409-413:

      “Note that the response in paired conditions can be higher or lower than the response to the isolated more preferred stimulus (condition Mat), depending on the voxel preference for the two presented stimuli, as previously reported (Doostani et al., 2023). This is consistent with previous studies reporting the response to multiple stimuli to be higher than the average, but lower than the sum of the response to isolated stimuli (Reddy et al., 2009).”

      b. Paradox aside, I wondered to what extent the results are in part explained by range limits. Take two categories that evoke a highly similar response (either mean over a full ROI, or in the multivariate sense). That imposes a range limit such that attentional modulation, if it works the way we think it does, could only move responses within that narrow range. In contrast, the starting point for two highly dissimilar categories leaves room in principle for more modulation.

      We do not believe that the results can be explained by range limits because responses in paired conditions are not limited by the isolated responses, as can be observed in Figure 4. However, to rule out the possibility of the similarity between responses in isolated conditions affecting the range within which responses in paired conditions can change, we turned to the multivariate analysis. We used the weight shift measure as the change in the weight of each stimulus with the change in the attentional target. In this method, no matter how close the two isolated vectors are, the response to the pair could still have a whole range of different weights of the isolated responses. We have plotted an example illustration of two-dimensional vectors for better clarification. Here, the vectors Vxat and Vyat denote the responses to the isolated x and y stimuli, respectively, and the vector Pxaty denotes the response to the paired condition in which stimulus x is attended. The weights a1 and a2 are illustrated in the figure, which are equal to regression coefficients if we solve the equation Pxaty \= [a1 a2] [x y]’. While the weight values depend on the amplitude of and the angle between the three vectors, they are not limited by a lower angle between Vxat and Vyat.

      We have updated Figure 2 in the manuscript to avoid the confusion. We have also added a figure including the sum of weights for different category pairs in different regions, showing that the sum of weights are not dependent on the similarity between the two stimuli. The conclusions based on the weight shift are therefore not confounded by the similarity between the two stimuli.

      c. Finally, related to the previous point, while including V1 is a good control, I wonder if it is getting a "fair" test here, because the range of responses to the four categories in this region, in terms of (dis)similarity, seems compressed relative to the other categories.

      We believe that V1 is getting a fair test because the single-subject range of category distance in V1 is similar to LO, as can be observed Author response image 1_:_

      Author response image 1.

      Range of category distance in each ROI averaged across participants

      The reason that V1 is showing a more compressed distance range on the average plot is that the category distance in V1 is not consistent among participants. Although the average plots are shown in Figure 5 and Figure 6, we tested statistical significance in each ROI based on single-subject correlation coefficients.

      Please also note that a more compressed range of dissimilarity does not necessarily lead to a less strong effect of category distance on the effect of attention. For instance, while LO shows a more compressed dissimilarity range for the presented categories compared to the other object selective regions, it shows the highest correlation between weight shift and category distance. Furthermore, as illustrated in Figure 5, no significant correlation is observed between univariate shift and category distance in V1, even though the range of the univariate distance in V1 is similar to LO and pFs, where we observed a significant correlation between category distance and univariate shift.

      d. In general, the manuscript does a very good job explaining the methods of the study in a way that would allow replication. In some places, the authors could be clearer about the reasoning behind those methodological choices. For example: - How was the sample size determined?

      Estimating conservatively based on the smallest amount of attentional modulation we observed in a previous study (Doostani et al., 2023), we chose a medium effect size (0.3). For a power of 0.8, the minimum number of participants should be 16. We have added the explanation to the Methods section in lines 78-81:

      “We estimated the number of participants conservatively based on the smallest amount of attentional modulation observed in our previous study (Doostani et al., 2023). For a medium effect size of 0.3 and a power of 0.8, we needed a minimum number of 16 participants.”

      - Why did the authors choose those four categories? What was the evidence that would suggest these would span the range of similarities needed here?

      We chose these four categories based on a previous behavioral study reporting the average reaction time of participants when detecting a target from one category among distractors from another category (Xu and Vaziri-Pashkam, 2019). Ideally the experiment should include as many object categories as possible. However, since we were limited by the duration of the experiment, the number of conditions had to be controlled, leading to a maximum of 4 object categories. We chose two animate and two inanimate object categories to include categories that are more similar and more different based on previous behavioral results (Xu and Vaziri-Pashkam, 2019). We included body and house categories because they are both among the categories to which highly responsive regions exist in the cortex. We chose the two remaining categories based on their similarity to body and house stimuli. In this way, for each category there was another category that elicited similar cortical responses, and two categories that elicited different responses. While we acknowledge that the chosen categories do not fully span the range of similarities, they provide an observable variety of similarities in different ROIs which we find acceptable for the purposes of our study.

      We include this information in the Methods section of the revised manuscript in lines 89-94:

      “We included body and house categories because there are regions in the brain that are highly responsive and unresponsive to each of these categories, which provided us with a range of responsiveness in the visual cortex. We chose the two remaining categories based on previous behavioral results to include categories that provided us with a range of similarities (Xu and Vaziri-Pashkam, 2019). Thus, for each category there was a range of responsiveness in the brain and a range of similarity with the other categories.”

      - Why did the authors present the stimuli at the same location? This procedure has been adopted in previous studies, but of course, it does also move the stimulus situation away from the real-world examples of cluttered scenes that motivate the Introduction.

      We presented the stimuli at the same location because we aimed to study the mechanism of object-based attention and this experimental design helped us isolate it from spatial attention. We do not think that our design moves the stimulus situation away from real-world examples in such a way that our results are not generalizable. We include real-world instances, as well as a discussion on this point, in the Discussion section of the revised manuscript, in lines 611-620:

      “Although examples of superimposed cluttered stimuli are not very common in everyday life, they still do occur in certain situations, for example reading text on the cellphone screen in the presence of reflection and glare on the screen or looking at the street through a patterned window. Such instances recruit object-based attention which was the aim of this study, whereas in more common cases in which attended and unattended objects occupy different locations in space, both space-based and object-based attention may work together to resolve the competition between different stimuli. Here we chose to move away from usual everyday scenarios to study the effect of object-based attention in isolation. Future studies can reveal the effect of target-distractor similarity, i.e. proximity in space, on space-based attention and how the effects caused by object-based and space-based attention interact.”

      - While I'm not concerned about this (all relevant comparisons were within-participants) was there an initial attempt to compare data quality from the two different scanners?

      We compared the SNR values of the two groups of participants and observed no significant difference between these values (ps > 0.34, ts < 0.97). We have added this information to the Methods section.

      Regarding the observed effect, we performed a t-test between the results of the participants from the two scanners. For the univariate results, the observed correlation between univariate attentional modulation and category distance was not significantly different for participants of the two scanners in any ROIs (ps > 0.07 , ts < 1.9). For the multivariate results, the observed correlation between the weight shift and multivariate category distance was not significantly different in any ROIs (ps > 0.48 , ts < 0.71) except for V1 (p-value = 0.015 , t-value = 2.75).

      We include a sentence about the comparison of the SNR values in the preprocessing section in the revised manuscript.

      e. There are a couple of analysis steps that could be applied to the existing data that might strengthen the findings. For one, the authors have adopted a liberal criterion of p < 0.001 uncorrected to include voxels within each ROI. Why, and to what extent is the general pattern of findings robust over more selective thresholds? Also, there are additional regions that are selective for bodies (fusiform body area) and scenes (occipital place area and retrosplenial cortex). Including these areas might provide more diversity of selectivity patterns (e.g. different responses to non-preferred categories) that would provide further tests of the hypothesis.

      We selected this threshold to allow for selection of a reasonable number of voxels in each hemisphere across all participants. To check whether the effect is robust over more selective thresholds, we exemplarily redefined the left EBA region using p < 0.0001 and p < 0.00001 and observed that the weight shift effect remained equivalent. We have made a note of this analysis in the Results section. As for the additional regions suggested by the reviewer, we chose not to include them because they could not be consistently defined in both hemispheres of all participants. Please note that the current ROIs also show different responses to non-preferred categories (e.g. in LO and pFs). We include this information in the Methods section in lines 206-207:

      “We selected this threshold to allow for selection of a reasonable number of voxels in each hemisphere across all participants.”

      And in the Results section in lines 509-512:

      “We performed the analysis including only voxels that had a significantly positive GLM coefficient across the runs and observed the same results. Moreover, to check whether the effect is robust over more selective thresholds for ROI definition, we redefined the left EBA region with p < 0.0001 and p < 0.00001 criteria. We observed a similar weight shift effect for both criteria.”

      f. One point the authors might address is the potential effect of blocking the paired conditions. If I understood right, the irrelevant item in each paired display was from the same category throughout a block. To what extent might this knowledge shape the way participants attend to the task-relevant item (e.g. by highlighting to them certain spatial frequencies or contours that might be useful in making that particular pairwise distinction)? In other words, are there theoretical reasons to expect different effects if the irrelevant category is not predictable?

      We believe that the participants’ knowledge about the distractor does not significantly affect our results because our results are in agreement with previous behavioral data (Cohen et al., 2014, Xu and Vaziri-Pashkam, 2019), in which the distractor could not be predicted. These reports suggest there is a theoretical reason to expect similar effects if the participants could not predict the distractor. To directly test this, one would need to perform an fMRI experiment using an event-related design, an interesting venue for future research.

      We have made a note of this point in the Discussion section of the revised manuscript in lines 621-626:

      “Please note that we used a blocked design in which the target and distractor categories could be predicted across each block. While it is possible that the current design has led to an enhancement of the observed effect, previous behavioral data (Cohen et al., 2014, Xu and Vaziri-Pashkam, 2019) have reported the same effect in experiments in which the distractor was not predictable. To study the effect of predictability on fMRI responses, however, an event-related design is more appropriate, an interesting venue for future fMRI studies.”

      g. The authors could provide behavioural data as a function of the specific category pairs. There is a clear prediction here about which pairs should be more or less difficult.

      We provide the behavioral data as a supplementary figure to Figure 1 in the revised manuscript. We however do not see differences in behavior for the different category paris. This is so because our fMRI task was designed in a way to make sure the participants could properly attend to the target for all conditions. The task was rather easy across all conditions and due to the ceiling effect, there was no significant difference between behavioral performance for different category pairs. However, the effect of category pair on behavior has been previously tested and reported in a visual search paradigm with the same categories (Xu and Vaziri-Pashkam, 2019), which was in fact the basis for our choice of categories in this study (as explained in response to point “d” above).

      h. Figure 4 shows data for EBA in detail; it would be helpful to have a similar presentation of the data for the other ROIs as well.

      We provide data for all ROIs as figure supplements 1-4 to Figure 4 in the revised manuscript.

      i. For the pFs and LOC ROIs, it would be helpful to have an indication of what proportion of voxels was most/least responsive to each of the four categories. Was this a relatively even balance, or generally favouring one of the categories?

      In LO, the proportion of voxels most responsive to each of the four categories was relatively even for Body (31%) and House (32%) stimuli, which was higher than the proportion of Car- and Cat-preferring voxels (18% and 19%, respectively). In pFs, 40% of the voxels were house-selective, while the proportion was relatively even for voxels most responsive to bodies, cars, and houses with 21%, 17%, and 22% of the voxels, respectively. We include the percentage of voxels most responsive to each of the four categories in each ROI as Appendix 1-table 1.

      j. Were the stimuli in the localisers the same as in the main experiment?

      No, we used different sets of stimuli for the localizers and the main experiment. We have added the information in line 146 of the Methods section.

      Reviewer #2 (Recommendations For The Authors):

      (1) Why are specific ROIs chosen? Perhaps some discussion motivating these choices, and addressing the possible overlap between these and retinotopic regions (based on other studies, or atlases - Wang et al, 2015) would be useful.

      Considering that we used object categories, we decided to look at general object-selective regions (LO, pFS) as well as regions that are highly selective for specific categories (EBA, PPA). We also looked at the primary visual cortex as a control region. We have added this clarification in the Methods section lines 128-133:

      “Considering that we used object categories, we investigated five different regions of interest (ROIs): the object-selective areas lateral occipital cortex (LO) and posterior fusiform (pFs) as general object-selective regions, the body-selective extrastriate body area (EBA) and the scene-selective parahippocampal place area (PPA) as regions that are highly selective for specific categories, and the primary visual cortex (V1) as a control region. We chose these regions because they could all be consistently defined in both hemispheres of all participants and included a large number of voxels.”

      (2) The authors should consider including data on the relative prevalence of voxels preferring each category for each ROI (and/or the mean activation level across voxels for each category for each ROI). If some ROIs have very few voxels preferring some categories, there's a chance the observed results are a bit noisy when sorting based on those categories (e.g., if a ROI has essentially no response to a given pair of categories, then there's not likely to be much attentional modulation detectable, because the ROI isn't driven by those categories to begin with).

      We thank the reviewer for the insightful comment.

      We include the percentage of voxels most responsive to each of the four categories in each ROI in the Appendix ( Appendix 1-table 1, please see the answer to point “i” of the first reviewer).

      We also provide a table of average activity across voxels for each category in all ROIs as Appendix 1-table 2.

      As shown in the table, voxels show positive activity for all categories in all ROIs except for PPA, where voxels show no response to body and cat stimuli. This might explain why we observed a marginally significant correlation between weight shift and category distance in PPA only. As the reviewer mentions, since this region does not respond to body and cat stimuli, we do not observe a significant change in response due to the shift in attention for some pairs. We include the table in the Appendix and add the explanation to the Results section of the revised manuscript in lines 506-508:

      _“_Less significant results in PPA might arise from the fact that PPA shows no response to body and cat stimuli and little response to car stimuli (Appendix 1-table 2). Therefore, it is not possible to observe the effect of attention for all category pairs.”

      a. Related - would it make sense to screen voxels for inclusion in analysis based on above-basely activation for one or both of the categories? [could, for example, imagine you're accidentally measuring from the motor cortex - you'd be able to perform this analysis, but it would be largely nonsensical because there's no established response to the stimuli in either isolated or combined states].

      We performed all the analyses including only voxels that had a significantly positive GLM coefficient across the runs and the results remained the same. We have added the explanation in the Results section in line 509-510.

      (3) Behavioral performance is compared against chance level, but it doesn't seem that 50% is chance for the detection task. The authors write on page 4 that the 1-back repetition occurred between 2-3 times per block, so it doesn't seem to be the case that each stimulus had a 50% chance of being a repetition of the previous one.

      We apologize for the mistake in our report. We have reported the detection rate for the target-present trials (2-3 per block), not the behavioral performance across all trials. We have modified the sentence in the Results section.

      (4) Authors mention that the stimuli are identical for 2-stimulus trials where each category is attended (for a given pair) - but the cue is different, and the cue appears as a centrally-fixated word for 1 s. Is this incorporated into the GLM? I can't imagine this would have much impact, but the strict statement that the goals of the participant are the only thing differentiating trials with otherwise-identical stimuli isn't quite true.

      The word cue was not incorporated as a separate predictor into the GLM. As the reviewer notes, the signals related to the cue and stimuli are mixed. But given that the cues are brief and in the form of words rather than images, they are unlikely to have an effect on the response in the regions of interest.

      To be more accurate, we have included the clarification in the Methods section in lines 181-182:

      “We did not enter the cue to the GLM as a predictor. The obtained voxel-wise coefficients for each condition are thus related to the cue and the stimuli presented in that condition.”

      And in the Results section in lines 425-428 :

      “It is important to note that since the cue was not separately modeled in the GLM, the signals related to the cue and the stimuli were mixed. However, given that the cues were brief and presented in the form of words, they are unlikely to have an effect on the responses observed in the higher-level ROIs.”

      (5) Eq 5: I expected there to be some comparison of a and b directly as ratios (e.g., a_1 > b_1, as shown in Fig. 2). The equations used here should be walked through more carefully - it's very hard to understand what this analysis is actually accomplishing. I'm not sure I follow the explanation of relative weights given by the authors, nor how that maps onto the delta_W quantity in Equation 5.

      We provide a direct comparison of a and b, as well as a more thorough clarification of the analysis, in the Methods section in lines 274-276:

      “We first projected the paired vector on the plane defined by the isolated vectors (Figure 2A) and then determined the weight of each isolated vector in the projected vector (Figure 2B).”

      And in lines 286-297:

      “A higher a1 compared to a2 indicates that the paired response pattern is more similar to Vxat compared to Vyat, and vice versa. For instance, if we calculate the weights of the Body and Car stimuli in the paired response related to the simultaneous presentation of both stimuli, we can write in the LO region: VBodyatCar \= 0.81 VBody + 0.31 VCar, VBodyCarat \= 0.43 VBody + 0.68 VCar. Note that these weights are averaged across participants. As can be observed, in the presence of both body and car stimuli, the weight of each stimulus is higher when attended compared to the case when it is unattended. In other words, when attention shifts from body to car stimuli, the weight of the isolated body response (VBody) decreases in the paired response. We can therefore observe that the response in the paired condition is more similar to the isolated body response pattern when body stimuli are attended and more similar to the isolated car response pattern when car stimuli are attended.”

      And lines 303-306:

      “As shown here, even when body stimuli are attended, the effect of the unattended car stimuli is still present in the response, shown in the weight of the isolated car response (0.31). However, this weight increases when attention shifts towards car stimuli (0.68 in the attended case).”

      We also provide more detailed clarification for the 𝛥w and the relative weights in lines 309-324:

      “To examine whether this increase in the weight of the attended stimulus was constant or depended on the similarity of the two stimuli in cortical representation, we defined the weight shift as the multivariate effect of attention:

      𝛥w = a1/(a1+a2) – b1/(b1+b2)                                                                                          (5)

      Here, a1, a2, b1,and b2 are the weights of the isolated responses, estimated using Equation 4. We calculate the weight of the isolated x response once when attention is directed towards x (a1), and a second time when attention is directed towards y (b1). In each case, we calculate the relative weight of the isolated x in the paired response by dividing the weight of the isolated x by the sum of weights of x and y (a1+a2 when attention is directed towards x, and b1+b2 when attention is directed towards y). We then define the weight shift, Δw, as the change in the relative weight of the isolated x response in the paired response when attention shifts from x to y. A higher Δw for a category pair indicates that attention is more efficient in removing the effect of the unattended stimulus in the pair. We used relative weights as a normalized measure to compensate for the difference in the sum of weights for different category pairs. Thus, using the normalized measure, we calculated the share of each stimulus in the paired response. For instance, considering the Body-Car pair, the share of the body stimulus in the paired response was equal to 0.72 and 0.38, when body stimuli were attended and unattended, respectively. We then calculated the change in the share of each stimulus caused by the shift in attention using a simple subtraction ( Equation 5: Δw=0.34 for the above example of the Body-Car pair in LO) and used this measure to compare between different pairs.”

      We hope that this clarification makes it easier to understand the multivariate analysis and the weight shift calculation in Equation 5.

      We additionally provide the values of the weights (a1, b1, a2, and b2 ) for each category pair averaged across participants as Appendix 1 -table 4.

      (6) For multivariate analyses (Fig. 6A-E), x axis is normalized (pattern distance based on Pearson correlation), while the delta_W does not seem to be similarly normalized.

      We calculated ΔW by dividing the weights in each condition by the sum of weights in that condition. Thus, we use relative weights which are always in the range of 0 to 1, and ΔW is thus always in the range of -1 to 1. This means that both axes are normalized. Note that even if one axis were not normalized, the relationship between the independent and the dependent variables would remain the same despite the change in the range of the axis.

      (7) Simulating additional scenarios like attention to both categories just increasing the mean response would be helpful - is this how one would capture results like those shown in some panels of Fig. 4?

      We did not have a condition in which participants were asked to attend to both categories. Therefore it was not useful for our simulations to include such a scenario. Please also note that the goal of our simulations is not to capture the exact amount of attentional modulation, but to investigate the effect of target-distractor similarity on the change in attentional modulation (univariate shift and weight shift).

      As for the results in some panels of Figure 4, we have explained the reason underlying higher responses in paired conditions compared to isolated conditions) in response to the “weaknesses” section of the second reviewer. We hope that these points satisfy the reviewer’s concern regarding the results in Figure 4 and our simulations.

      (8) Lines 271-276 - the "latter" and "former" are backwards here I think.

      We believe that the sentence was correct, but confusing.. We have rephrased the sentence to avoid the confusion in lines 371-376 of the revised manuscript:

      “We modeled two neural populations: a general object-selective population in which each voxel shows preference to a particular category and voxels with different preferences are mixed in with each other (similar to LO and pFS), and a category-selective population in which all voxels have a similar preference for a particular category (similar to EBA and PPA).”

      (9) Line 314 - "body-car" pair is mentioned twice in describing the non-significant result in PPA ROI.

      Thank you for catching the typo. We have changed the second Body-Car to Body-Cat.

      (10) Fig. 5 and Fig. 6 - I was expecting to see a plot that demonstrated variability across subjects rather than across category pairs. Would it be possible to show the distribution of each pair's datapoints across subjects, perhaps by coloring all (e.g.) body-car datapoints one color, all body-cat datapoints another, etc? This would also help readers better understand how category preferences (which differ across ROIs) impact the results.

      We demonstrated variability across category pairs rather than subjects because we aimed to investigate how the variation in the similarity between categories (i.e. category distance) affected the univariate and multivariate effects of attention. The variability across subjects is reflected in the error bars in the bar plots of Figure 5 and Figure 6.

      Here we show the distribution of each category pair’s data points across subjects by using a different color for each pair:

      Author response image 2.

      Univariate shift versus category distance including single-subject data points in all ROIs.

      Author response image 3.

      Weight shift versus category distance including single-subject data points in all ROIs.

      As can be observed in the figures, category preference has little impact on the results. Rather, the similarity in the preference (in the univariate case) or the response pattern (in the multivariate case) to the two presented categories is what impacts the amount of the univariate shift and the weight shift, respectively. For instance, in EBA we observe a low amount of attentional shift both for the Body-Cat pair, with two stimuli for which the ROI is highly selective, and the Car-House pair, including stimuli to which the region shows little response. A similar pattern is observed in the object-selective regions LO and pFs which show high responses to all stimulus categories.

      We believe that the figures including the data points related to all subjects are not strongly informative. However, we agree that using different colors for each category pair helps the readers better understand that category preference has little impact on the results in different ROIs. We therefore present the colored version of Figure 5 and Figure 6 in the revised manuscript, with a different color for each category pair.

      (11) Fig. 5 and Fig. 6 use R^2 as a dependent variable across participants to conclude a positive relationship. While the positive relationship is clear in the scatterplots, which depict averages across participants for each category pair, it could still be the case that there are a substantial number of participants with negative (but predictive, thus high positive R^2) slopes. For completeness and transparency, the authors should illustrate the average slope or regression coefficient for each of these analyses.

      We concluded the positive relationship and calculated the significance in Figure 5 and Figure 6 using the correlation r rather than r.^2 This is why the result was not significantly positive in V1. We acknowledge that the use of r-squared in the bar plot leads to confusion. We have therefore changed the bar plots to show the correlation coefficient instead of the r-squared. Furthermore, we have added a table of the correlation coefficient for all participants in all ROIs for the univariate and weight shift analyses supplemental to Figure 5 and Figure 6, respectively.

      (12) No statement about data or analysis code availability is provided

      Thanks for pointing this out. The fMRI data is available on OSF. We have added a statement about it in the Data Availability section of the revised manuscript in line 669.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer#1:

      Comment #1: It is unclear how the fraction of NK cell populations is quantified in the spatial-seq datasets. Figures display spatial data with expression scores, but the method for calculating the score and determining NK cell presence in tumor tissue is ambiguous. Clarification is needed on whether the identification relied solely on visual inspection or if quantitative analyses using other criteria were conducted.

      Thank you for your questions. We removed the background and made the accordingly modifications according to your demand. We used the AddModuleScore function in Seurat to quantify the main immune subpopulations in spatial-seq using the gene sets identified in single-cell-seq. Additionally, the tumor and non-tumor region was identified by immunohistochemistry as well as cell clusters in spatial-seq, it is rough that we can't quantify the NK cell presence in each region precisely. The consolation is that the differences of NK cell presence in tumor and non-tumor region is observable by visual inspection. The methodology has been supplemented in the revised manuscript (line 190-193).

      Comment #2: The authors do not provide a clear definition of "resting" NK cells. It remains unclear whether they refer to a senescent state or a non-matured NK cell population. Furthermore, the criteria used to define resting and activated cells based on the expression of KIR2DL4, GPR183, GRP171, CD69, IFNG, GZMK, TTC38, CD160, and PLEKNF1 in Figure 4 are not well-defined. The expression patterns of these genes in Figure 4D are not distinct, and it is unclear which combination of genes was used to classify the populations. Clarification is needed on whether the presence of GZMK alone defines resting NK cells, or if the presence of any of the described genes (GZMK, TTC38, or CD160) is sufficient. Additionally, the method used for this classification, whether visual or algorithm-based, should be described.

      Thank you for your question. The resting and activated NK cells was defined by the preferential expression of the described resting genes (AZU, BPI, CAMP, CD160,CD2, CDHR1, CEACAM8, DEFA4, ELANE, GFI1, GZMK, KLRC4, MGAM, MS4A3, NME8, PLEKHF1, TEP1, TRBC1, TTC38, ZNF135) and activated NK genes (APOBEC3G, APOL6, CCL4, CCND2, CD69, CDK6, CSF2, DPP4, FASLG, GPR171, GPR18, GRAP2, IFNG, KIR2DL4, KIR2DS4, LTA, LTB, NCR3, OSM, PTGER2, SOCS1, TNFSF14) in CIBERSORT. Actually, these marker genes were not specifically expressed in a single NK cells subset. On the other hand, combined with further flow cytometric analysis verification, the resting NK cell tend to be a decidual-like NK cells and tumor- infiltrated NK cells with higher expression of CD9, CD49a and PD-1.

      Comment #3: Criteria used to define high or low NK cell presence/infiltration in Figure 5 are not described in the main text or figure legend. Since, the claim that the presence of the resting or activated NK cells predicts cancer prognosis is based on this figure, this needs to be clearly described.

      Thank you for your questions. The activated and resting NK cell percentage in TCGA and GSE29623 was determined by CIBERSORT. Additionally, the infiltration of activated and resting NK cell was also determined by the AddModuleScore function using the gene sets of activated and resting NK cell identified in single-cell-seq, the differences of activated and resting NK cell presence in tumor and non-tumor region is also determined by visual inspection. We have amended in the main text and figure legend in the revised manuscript.

      Comment #4: The absence of FMO controls for KIR2DL4 or GZMK and the lack of increase in GZMK expression during co-culture with tumour lines raises concerns since GZMK was used as a defining feature of resting NK cells.

      Thank you for your questions. We did a new batch of flow experiments and FMO controls of all the markers used in the experiments were set up to define the precise positive gate locations.

      Author response image 1.

      The positive gate locations of CD56, GZMK, KIR2DL4, CD9, CD49a, PD-1 defined according to the FMO control.

      Comment #5: All the co-cultures were performed with tumour cell line only and no healthy cells, such as human foreskin fibroblasts, were used as control. In the absence of a non-tumour cell line, it is very difficult to draw any conclusions. Furthermore, to claim that resting or activated NK cells are responsible for tumour migration or proliferation, it is important to at least isolate resting and activated NK cells ex vivo and culture with tumour lines, instead of NK cell lines.

      Thank you for your questions. According to your suggestion, NK cells were co-cultured with human foreskin fibroblasts, the phenotype was identified by Flow cytometry. When co-cultured with HFF in direct contact (CN group), NK cells were also tending towards tissue infiltration state (high expression of CD9). However, the domestication effect is significantly reduced compared to co-culturing with tumor cells. Additionally, unlike supernatant of CNS group (NK and HCT were in contact) from NK and HCT co-culture system could significantly increase the migration of fresh HCT, fresh HCT underwent a limited increase (no statistical significance was found) in migration when cultured in the supernatant from the co-culture system in which NK and HFF were in contact (CNS group), but not when co-cultures were performed in the cell supernatant (SNS group) and fresh medium (MNS group). Finally, we tried to isolate resting and activated NK cells from fresh colon cancer surgical specimen. Unfortunately, the NK cells were too few to perform further functional experiments such as migration and proliferation.

      Author response image 2.

      Phenotype switch of NK cells in different co-cultured system and the corresponding NK cell-mediated effect on cell migration of fresh colon cancer cell (HCT-116). A-B: NK cells underwent phenotype switch (high expression of CD9) when cocultured with HCT and HFF, the phenotype switch was more obvious when co-cultured with HCT. CN: NK cells cocultured with HCT/HFF; SN: NK cells cocultured with supernatant of HCT/HFF; MN: NK cells cocultured in fresh medium. C-E: Transwell assay showed the only tumor co-cultured NK mediated the inductive effect on cell migration of colon cancer cell (HCT-116). CNS: Colon cancer cells were cultured in the supernatant from co-culture system that NK and HCT/HFF were cultured in direct contact; SNS: Colon cancer cells were cultured in the supernatant from co-culture system that NK cocultured with supernatant of HCT/HFF; MNS: Colon cancer cells were cultured in the fresh medium.

      Comment #6: It seems that flow cytometric analyses and GZMK and KIR2DL4 staining were performed without cell permeabilization. Could authors confirm if this is accurate, or if they performed intracellular staining instead?

      Thank you for your questions. For GZMK, which known as the secretory protein, flow cytometric analyses were performed both with (Fig.3) and without cell fixation and permeabilization, no significant differences were found among each group. The difference is that GZMK was nearly all negative without fixation and permeabilization while it is all positive with fixation and permeabilization. Conditions of flow cytometry analyses for GZMK may need further optimization or GZMK may not be a suitable flow cytometric marker for resting NK cells. On the other hand, for membrane protein such as CD56, CD9, CD49a, KIR2DL4, PD-1, staining was performed without cell permeabilization.

      Author response image 3.

      Phenotype switch (CD56+, GZMK+) of NK cells was analyzed by FACS after fixation and permeabilization in different co-cultured groups. CN: NK cells cocultured with colon cancer cells; SN: NK cells cocultured with supernatant of cancer cells; MN: NK cells cocultured in fresh medium.

      Comment #7: The identity of the published datasets used for analysis is not provided, and references are not cited in the results section.

      Thank you for your questions. We are sorry for the neglect of our previous work. We have added the information in the revised manuscript (section of Materials and Methods) (Line 123-128).

      Comment #8: References are difficult to locate, as the main text follows APA style while the reference section is organized numerically with no clear order.

      Thank you for your questions. We have modified the format of the references in the revised manuscript.

      Comment #9: Figure 3 shows volcano plots showing DEG genes between tumor and healthy tissue NK cells are not described clearly, and authors did not discuss the significance of these genes, highlighted in the plot.

      Thank you for your questions. Volcano plots of Figure 3 showed the DEGs between colon cancer with metastasis and without metastasis in TCGA database. We focused on the genes which were enriched in the pathway of “Natural killer cell mediated cytotoxicity” and found nearly all the genes enriched in the pathway were down-regulated in the colon cancer with metastasis. We have modified the description in the result section and added the description of importance of these genes in the discussion section in the revise manuscript (Line 322-326).

      Comment #10: The meaning of "M0" and "M1" in Figures 5A and 5B is unclear and should be defined in the text.

      Thank you for your questions. "M0" and "M1" in Figure 5A and 5B means “colon cancer without metastasis” and “colon cancer with metastasis”, respectively. We have modified in the revise manuscript (Line 350-354).

      Comment #11: Terms such as "dynamic remodelling of NK cells" and "landscape of NK cells" are used without explanation, necessitating clarification of their meaning.

      Thank you for your questions. We have modified in the revise manuscript (Line 331-334).

      Comment #12: In vitro assays are described vaguely, making it difficult for readers to understand. More clarity is needed in describing these assays.

      Thank you for your questions. We have added clarification in the revise manuscript (Line 205-211).

      Reviewer #2:

      Comment #1: This manuscript investigates the role of the abundant NK cells that are observed in colon cancer liver metastasis using sequencing and spatial approaches in an effort to clarify the pro and anti-tumorigenic properties of NK cells. This descriptive study characterises different categories of NK cells in tumor and tumor-adjacent tissues and some correlations. An attempt has been made using pseudotime trajectory analysis but no models around how these NK cells might be regulated are provided.

      Thank you for your questions. The single-cell sequencing data enrolled in this study are CD45 positive immune cells and do not involve tumor cells, cellular communication analysis between NK cells and tumor cells cannot be conducted. The change process of NK can only be predicted through pseudotime trajectory analysis. Our hypothesis is that tumor cells domesticate NK cells into a tumor- infiltrated NK cells through direct contact, and flow cytometry experiments have also confirmed that tumor cells can only have such domestication through direct contact with NK cells (with prominent high expression of CD9). However, the detailed mechanism remained unclear.

      Comment #2: A small number of patients are analyzed in this study. The descriptive gene markers, while interesting, need to be further validated to understand how strong this analysis might be and its potential application.

      Thank you for your questions. The sample size included in this study is indeed a bit small, which is also a limitation of our study. However, this is the only large sample single-cell sequencing dataset could be found that includes primary colon cancer tissues, paired paratumor normal colon tissues, paired liver metastatic cancer tissue, and paired paratumor normal liver tissues. We will expand the sample size to further verify the current conclusion in subsequent experiments. In addition, the marker genes of different NK groups used in this study refer to the CIBERSORT's classification of activated NK cells and resting NK cells, which is a widely recognized indicator. We will verify the expression and clinical application value of the screened genes in tissues in subsequent studies.

      Comment #3: Figure 1C and other figures throughout the paper. It is not clear how marker genes were selected.

      Thank you for your questions. The marker genes displayed in the Figure.3C were the highly variable genes of each cell group as well as the marker genes of each immune cells, such as T cells (CD3D, CD3E), NK cells (NKG7, KLRD1), monocytes (LYZ, S100A8, S100A9), B cells (CD79A), plasma cells (JCHAIN, IGHA1, IGHA2), Neutrophils (CXCL8, FCGR3B).

      Comment #4: Figure 1E. P and T have not been defined. Lines should not connect the datasets as they are independent assessments.

      Thank you for your questions. P and T means paratumor normal tissues and tumor tissues, respectively. Which have been added in the caption of Figure 1E. Additionally, the single cell sequencing samples included in the study were paired, with primary colon cancer tissues, paired normal tissues adjacent to colon cancer, paired liver metastatic cancer tissue, and paired normal liver tissues from 20 colon cancer patients with liver metastasis, paired test analysis was thus performed.

      Comment #5: Figure 2C. It is unclear what ST-P1 means. This is not a particularly informative figure.

      Thank you for your questions. We are sorry that it was our annotation error. Actually, it is the spatial transcriptome of the primary colon cancer tissue and liver metastasis tissue of four patients. We have made the modifications in the revised manuscript.

      Comment #6: Multiple figures - abbreviations are used but not provided in the legend. They occur in the text but are not directly related to the figures where they are used to label axes or groups.

      Thank you for your questions. We have rechecked and made corresponding modifications in the revised manuscript.

      Comment #6: Patients: it is not clear what other drugs patients have been exposed to or basic data (sex, age, underlying conditions etc)

      Thank you for your questions. The baseline data of the patient of SC dataset and ST dataset were showed in the Table.1 and Table.2 followed, respectively. They were not presented before as no patients characteristics related analysis was performed in the current study.

      Author response table 1.

      The baseline data of patient from single cell sequencing database.

      Author response table 2.

      The baseline data of patient from spatial transcriptome database.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated the dynamics of a neural network model characterized by sparsely connected clusters of neuronal ensembles. They found that such a network could intrinsically generate sequence preplay and place maps, with properties like those observed in the real-world data. Strengths of the study include the computational model and data analysis supporting the hippocampal network mechanisms underlying sequence preplay of future experiences and place maps.

      Previous models of replay or theta sequences focused on circuit plasticity and usually required a pre-existing place map input from the external environment via upstream structures. However, those models failed to explain how networks support rapid sequential coding of novel environments or simply transferred the question to the upstream structure. On the contrary, the current proposed model required minimal spatial inputs and was aimed at elucidating how a preconfigured structure gave rise to preplay, thereby facilitating the sequential encoding of future novel environments.

      In this model, the fundamental units for spatial representation were clusters within the network. Sequential representation was achieved through the balance of cluster isolation and their partial overlap. Isolation resulted in a self-reinforced assembly representation, ensuring stable spatial coding. On the other hand, overlap-induced activation transitions across clusters, enabling sequential coding.

      This study is important when considering that previous models mainly focused on plasticity and experience-related learning, while this model provided us with insights into how network architecture could support rapid sequential coding with large capacity, upon which learning could occur efficiently with modest modification via plasticity.

      I found this research very inspiring and, below, I provide some comments aimed at improving the manuscript. Some of these comments may extend beyond the scope of the current study, but I believe they raise important questions that should be addressed in this line of research.

      (1) The expression 'randomly clustered networks' needs to be explained in more detail given that in its current form risks to indicate that the network might be randomly organized (i.e., not organized). In particular, a clustered network with future functionality based on its current clustering is not random but rather pre-configured into those clusters. What the authors likely meant to say, while using the said expression in the title and text, is that clustering is not induced by an experience in the environment, which will only be later mapped using those clusters. While this organization might indeed appear as randomly clustered when referenced to a future novel experience, it might be non-random when referenced to the prior (unaccounted) activity of the network. Related to this, network organization based on similar yet distinct experiences (e.g., on parallel linear tracks as in Liu, Sibille, Dragoi, Neuron 2021) could explain/configure, in part, the hippocampal CA1 network organization that would appear otherwise 'randomly clustered' when referenced to a future novel experience.

      As suggested by the reviewer, we have revised the text to clarify that the random clustering is random with respect to any future, novel environment (lines 111-114 and 710-712).

      Lines 111-114: “To reconcile these experimental results, we propose a model of intrinsic sequence generation based on randomly clustered recurrent connectivity, wherein place cells are connected within multiple overlapping clusters that are random with respect to any future, novel environment.”

      Lines 710-712: “Our results suggest that the preexisting hippocampal dynamics supporting preplay may reflect general properties arising from randomly clustered connectivity, where the randomness is with respect to any future, novel experience.”

      The cause of clustering could be prior experiences (e.g. Bourjaily and Miller, 2011) or developmental programming (e.g. Perin et al., 2011; Druckmann et al., 2014; Huszar et al., 2022), and we have modified lines 116 and 714-718 to state this.

      Lines 116: Added citation of “Perin et al., 2011”

      Lines 714-718: “Synaptic plasticity in the recurrent connections of CA3 may primarily serve to reinforce and stabilize intrinsic dynamics, which could be established through a combination of developmental programming (Perin et al., 2011; Druckmann et al., 2014; Huszar et al., 2022) and past experiences (Bourjaily and Miller, 2011), rather than creating spatial maps de novo.”

      We thank the reviewer for suggesting that the results of Liu et al., 2021 strengthen the support for our modeling motivations. We agree, and we now cite their finding that the hippocampal representations of novel environments emerged rapidly but were initially generic and showed greater discriminability from other environments with repeated experience in the environment (lines 130-134).

      Lines 130-134: “Further, such preexisting clusters may help explain the correlations that have been found in otherwise seemingly random remapping (Kinsky et al., 2018; Whittington et al., 2020) and support the rapid hippocampal representations of novel environments that are initially generic and become refined with experience (Liu et al., 2021).”

      (2) The authors should elaborate more on how the said 'randomly clustered networks' generate beyond chance-level preplay. Specifically, why was there preplay stronger than the time-bin shuffle? There are at least two potential explanations:

      (1) When the activation of clusters lasts for several decoding time bins, temporal shuffle breaks the continuity of one cluster's activation, thus leading to less sequential decoding results. In that case, the preplay might mainly outperform the shuffle when there are fewer clusters activating in a PBE. For example, activation of two clusters must be sequential (either A to B or B to A), while time bin shuffle could lead to non-sequential activations such as a-b-a-b-a-b where a and b are components of A and B;

      (2) There is a preferred connection between clusters based on the size of overlap across clusters. For example, if pair A-B and B-C have stronger overlap than A-C, then cluster sequences A-B-C and C-B-A are more likely to occur than others (such as A-C-B) across brain states. In that case, authors should present the distribution of overlap across clusters, and whether the sequences during run and sleep match the magnitude of overlap. During run simulation in the model, as clusters randomly receive a weak location cue bias, the activation sequence might not exactly match the overlap of clusters due to the external drive. In that case, the strength of location cue bias (4% in the current setup) could change the balance between the internal drive and external drive of the representation. How does that parameter influence the preplay incidence or quality?

      Explanation 1 is correct: Our cluster-activation analyses (Figure 5) showed that the parameter values that generate preplay correspond to the parameter regions that support sustained cluster activity over multiple decoding time bins, which led us to the conclusion of the reviewer’s first proposed explanation.

      We have now added additional analyses supporting the conclusion that cluster-wise activity is the main driver of preplay rather than individual cell-identity (Figures 6 and 7). In Figure 6 we show that cluster-identity alone is sufficient to produce significant preplay by performing decoding after shuffling cell identity within clusters, and in Figure 7 we show that this result holds true when considering the sequence of spiking activity within population bursts rather than the spatial decoding.

      Lines 495-515: The pattern of preplay significance across the parameter grid in Figure 4f shows that preplay only occurs with modest cluster overlap, and the results of Figure 5 show that this corresponds to the parameter region that supports transient, isolated cluster-activation. This raises the question of whether cluster-identity is sufficient to explain preplay. To test this, we took the sleep simulation population burst events from the fiducial parameter set and performed decoding after shuffling cell identity in three different ways. We found that when the identity of all cells within a network are randomly permuted the resulting median preplay correlation shift is centered about zero (t-test 95% confidence interval, -0.2018 to 0.0012) and preplay is not significant (distribution of p-values is consistent with a uniform distribution over 0 to 1, chi-square goodness-of-fit test p=0.4436, chi-square statistic=2.68; Figure 6a). However, performing decoding after randomly shuffling cell identity between cells that share membership in a cluster does result in statistically significant preplay for all shuffle replicates, although the magnitude of the median correlation shift is reduced for all shuffle replicates (Figure 6b). The shuffle in Figure 6b does not fully preserve cell’s cluster identity because a cell that is in multiple clusters may be shuffled with a cell in either a single cluster or with a cell in multiple clusters that are not identical. Performing decoding after doing within-cluster shuffling of only cells that are in a single cluster results in preplay statistics that are not statistically different from the unshuffled statistics (t-test relative to median shift of un-shuffled decoding, p=0.1724, 95% confidence interval of -0.0028 to 0.0150 relative to the reference value; Figure 6c). Together these results demonstrate that cluster-identity is sufficient to produce preplay.

      Lines 531-551: While cluster-identity is sufficient to produce preplay (Figure 6b), the shuffle of Figure 6c is incomplete in that cells belonging to more than one cluster are not shuffled. Together, these two shuffles leave room for the possibility that individual cell-identity may contribute to the production of preplay. It might be the case that some cells fire earlier than others, both on the track and within events. To test the contribution of individual cells to preplay, we calculated for all cells in all networks of the fiducial parameter point their mean relative spike rank and tested if this is correlated with the location of their mean place field density on the track (Figure 7). We find that there is no relationship between a cell’s mean relative within-event spike rank and its mean place field density on the track (Figure 7a). This is the case when the relative rank is calculated over the entire network (Figure 7, “Within-network”) and when the relative rank is calculated only with respect to cells with the same cluster membership (Figure 7, “Within-cluster”). However, because preplay events can proceed in either track direction, averaging over all events would average out the sequence order of these two opposite directions. We performed the same correlation but after reversing the spike order for events with a negative slope in the decoded trajectory (Figure 7b). To test the significance of this correlation, we performed a bootstrap significance test by comparing the slope of the linear regression to the slope that results when performing the same analysis after shuffling cell identities in the same manner as in Figure 6. We found that the linear regression slope is greater than expected relative to all three shuffling methods for both the within-network mean relative rank correlation (Figure 6c) and the within-cluster mean relative rank correlation (Figure 6d).

      Lines 980-1000:

      “Cell identity shuffled decoding

      We performed Bayesian decoding on the fiducial parameter set after shuffling cell identities in three different manners (Figures 6 and 7). To shuffle cells in a cluster-independent manner (“Across-network shuffle”), we randomly shuffled the identity of cells during the sleep simulations. To shuffle cells within clusters (“Within-cluster shuffle”), we randomly shuffled cell identity only between cells that shared membership in at least one cluster. To shuffle cells within only single clusters (“Within-single-cluster shuffle”), we shuffled cells in the same manner as the within-cluster shuffle but excluded any cells from the shuffle that were in multiple clusters.

      To test for a correlation between spike rank during sleep PBEs and the order of place fields on the track (Figure 7), we calculated for each excitatory cell in each network of the fiducial parameter set its mean relative spike rank and correlated that with the location of its mean place field density on the track (Figure 7a). To account for event directionality, we calculated the mean relative rank after inverting the rank within events that had a negatively sloped decoded trajectory (Figure 7b). We calculated mean relative rank for each cell relative to all cells in the network (“Within-network mean relative rank”) and relative to only cells that shared cluster membership with the cell (“Within-cluster mean relative rank”). We then compared the slope of the linear regression between mean relative rank and place field location against the slope that results when applying the same analysis to each of the three methods of cell identify shuffles for both the within-network regression (Figure 7c) and the within-cluster regression (Figure 7d).”

      We also now show that the sequence of cluster-activation in events with 3 active clusters does not match the sequence of cluster biases on the track above chance levels and that events with fewer active clusters have the largest increase in median weighted decode correlation (Figure 5—figure supplement 1), showing that the reviewer’s second explanation is not the case.

      Lines 466-477: “The results of Figure 5 suggest that cluster-wise activation may be crucial to preplay. One possibility is that the random overlap of clusters in the network spontaneously produces biases in sequences of cluster activation which can be mapped onto any given environment. To test this, we looked at the pattern of cluster activations within events. We found that sequences of three active clusters were not more likely to match the track sequence than chance (Figure 5—figure supplement 1a). This suggests that preplay is not dependent on a particular biased pattern in the sequence of cluster activation. We then we asked if the number of clusters that were active influenced preplay quality. We split the preplay events by the number of clusters that were active during each event and found that the median preplay shift relative to shuffled events with the same number of active clusters decreased with the number of active clusters (Spearman’s rank correlation, p=0.0019, =-0.13; Figure 5—figure supplement 1b).”

      Lines 1025-1044:

      “Active cluster analysis

      To quantify cluster activation (figure 5), we calculated the population rate for each cluster individually as the mean firing rate of all excitatory cells belonging to the cluster smoothed with a Gaussian kernel (15 ms standard deviation). A cluster was defined as ‘active’ if at any point its population rate exceeded twice that of any other cluster during a PBE. The active clusters’ duration of activation was defined as the duration for which it was the most active cluster.

      To test whether the sequence of activation in events with three active clusters matched the sequence of place fields on the track, we performed a bootstrap significance test (Figure 5—figure supplement 1). For all events from the fiducial parameter set that had three active clusters, we calculated the fraction in which the sequence of the active clusters matched the sequence of the clusters’ left vs right bias on the track in either direction. We then compared this fraction to the distribution expected from randomly sampling sequences of three clusters without replacement.

      To determine if there was a relationship between the number of active clusters within an event and it’s preplay quality we performed a Spearman’s rank correlation between the number of active clusters and the normalized absolute weighted correlation across all events at the fiducial parameter set. The absolute weighted correlations were z-scored based on the absolute weighted correlations of the time-bin shuffled events that had the same number of active clusters.”

      We also now add control simulations showing that without the cluster-dependent bias the population burst events no longer significantly decode as preplay (Figure 4—figure supplement 4e).

      (3) The manuscript is focused on presenting that a randomly clustered network can generate preplay and place maps with properties similar to experimental observations. An equally interesting question is how preplay supports spatial coding. If preplay is an intrinsic dynamic feature of this network, then it would be good to study whether this network outperforms other networks (randomly connected or ring lattice) in terms of spatial coding (encoding speed, encoding capacity, tuning stability, tuning quality, etc.)

      We agree that this is an interesting future direction, but we see it as outside the scope of the current work. There are two interesting avenues of future work: 1) Our current model does not include any plasticity mechanisms, but a future model could study the effects of synaptic plasticity during preplay on long-term network dynamics, and 2) Our current model does not include alternative approaches to constructing the recurrent network, but future studies could systematically compare the spatial coding properties of alternative types of recurrent networks.

      (4) The manuscript mentions the small-world connectivity several times, but the concept still appears too abstract and how the small-world index (SWI) contributes to place fields or preplay is not sufficiently discussed.

      For a more general audience in the field of neuroscience, it would be helpful to include example graphs with high and low SWI. For example, you can show a ring lattice graph and indicate that there are long paths between points at opposite sides of the ring; show randomly connected graphs indicating there are no local clustered structures, and show clustered graphs with several hubs establishing long-range connections to reduce pair-wise distance.

      How this SWI contributes to preplay is also not clear. Figure 6 showed preplay is correlated with SWI, but maybe the correlation is caused by both of them being correlated with cluster participation. The balance between cluster overlap and cluster isolation is well discussed. In the Discussion, the authors mention "...Such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index..." (Lines 560-561). I believe the statement is not entirely appropriate, a network similar to ring lattice can still have the balance of cluster isolation and cluster overlap, while it will have small SWI due to a long path across some node pairs. Both cluster structure and long-range connection could contribute to SWI. The authors only discuss the necessity of cluster structure, but why is the long-range connection important should also be discussed. I guess long-range connection could make the network more flexible (clusters are closer to each other) and thus increase the potential repertoire.

      We agree that the manuscript would benefit from a more concrete explanation of the small-world index. We have added a figure illustrating different types of networks and their corresponding SWI (Figure 1—figure supplement 1) and a corresponding description in the main text (lines 228-234).

      Lines 228-234: “A ring lattice network (Figure 1—figure supplement 1a) exhibits high clustering but long path lengths between nodes on opposite sides of the ring. In contrast, a randomly connected network (Figure 1—figure supplement 1c) has short path lengths but lacks local clustered structure. A network with small world structure, such as a Watts-Strogatz network (Watts and Strogatz, 1998) or our randomly clustered model (Figure 1—figure supplement 1b), combines both clustered connectivity and short path lengths. In our clustered networks, for a fixed connection probability the SWI increases with more clusters and lower cluster participation…”

      We note that while our most successful clustered networks are indeed those with small-world characteristics, there are other ways of producing small-world networks which may not show good place fields or preplay. We have modified lines 690-692 to clarify that that statement is specific to our model.

      Lines 690-692: “In our clustered network structure, such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index (SWI, Figure 1g; Neal, 2015; Neal, 2017).”

      (5) What drives PBE during sleep? Seems like the main difference between sleep and run states is the magnitude of excitatory and inhibitory inputs controlled by scaling factors. If there are bursts (PBE) in sleep, do you also observe those during run? Does the network automatically generate PBE in a regime of strong excitation and weak inhibition (neural bifurcation)?

      During sleep simulations, the PBEs are spontaneously generated by the recurrent connections in the network. The constant-rate Poisson inputs drive low-rate stochastic spiking in the recurrent network, which then randomly generates population events when there is sufficient internal activity to transiently drive additional spiking within the network.

      During run simulations, the spatially-tuned inputs drive greater activity in a subset of the cells at a given point on the track, which in turn suppress the other excitatory cells through the feedback inhibition.

      We have added a brief explanation of this in the text in lines 281-284.

      Lines 281-284: “During simulated sleep, sparse, stochastic spiking spontaneously generates sufficient excitement within the recurrent network to produce population burst events resembling preplay (Figure 2d-f)”

      (6) Is the concept of 'cluster' similar to 'assemblies', as in Peyrache et al, 2010; Farooq et al, 2019? Does a classic assembly analysis during run reveal cluster structures?

      Our clusters correspond to functional assemblies in that cells that share a cluster membership have more-similar place fields and are more likely to reactivate together during population burst events. In the figure to the right, we show for an example network at the fiducial parameter set the Pearson correlation between all pairs of place fields split by whether the cells share membership in a cluster (blue) or do not (red).

      Author response image 1.

      We expect an assembly analysis would identify assemblies similarly to the experimental data, but we see this additional analysis as a future direction. We have added a description of this correspondence in the text at lines 134-137.

      Lines 134-137: “Such clustered connectivity likely underlies the functional assemblies that have been observed in hippocampus, wherein groups of recorded cells have correlated activity that can be identified through independent component analysis (Peyrache et al., 2010; Farooq et al., 2019).”

      (7) Can the capacity of the clustered network to express preplay for multiple distinct future experiences be estimated in relation to current network activity, as in Dragoi and Tonegawa, PNAS 2013?

      We agree this is an interesting opportunity to compare the results of our model to what has been previously found experimentally. We report here preliminary results supporting this as an interesting future direction.

      Author response image 2.

      We performed a similar analysis to that reported in Figure 3C of Dragoi and Tonegawa, 2013. We determined the statistical significance of each event individually for each of the two environments by testing whether the decoded event’s absolute weighted correlation exceeded that 99th percentile of the corresponding shuffle events. We then fit a linear regression to the fraction of events that were significant for each of the two tracks and that were significant to either of the two tracks (left panel of above figure). We then estimated the track capacity as the number of tracks at the point where the linear regression reached 100% of the network capacity. We find that applying this analysis to our fiducial parameter set returns an estimate of ~8.6 tracks (Dragoi and Tonegawa, 2013, found ~15 tracks).

      We performed this same analysis for each parameter point in our main parameter grid (right panel of above figure). The parameter region that produces significant preplay (Figure 4f) corresponds to the region that has a track capacity of approximately 8-25 tracks. In the parameter grid region that does not produce preplay, the estimated track capacity approaches the high values that this analysis would produce when applied to events that are significant only at the false-positive rate. This analysis is based on the assumption that each preplay event would significantly correspond to at least one future event. Interesting interpretation issues arise when applying this analysis to parameter regions that do not produce statistically significant preplay, which we leave to future directions to address.

      We note two differences between our analysis here and that in Dragoi and Tonegawa, 2013. First, their track capacity analysis was performed on spike sequences rather than decoded spatial sequences, which is the focus of our manuscript. Second, they recorded rats exploring three novel tracks, while in our manuscript we only simulated two novel tracks, which reduces the accuracy of our linear extrapolation of track capacity.

      Reviewer #2 (Public Review):

      Summary:

      The authors show that a spiking network model with clustered neurons produces intrinsic spike sequences when driven with a ramping input, which are recapitulated in the absence of input. This behavior is only seen for some network parameters (neuron cluster participation and number of clusters in the network), which correspond to those that produce a small world network. By changing the strength of ramping input to each network cluster, the network can show different sequences.

      Strengths:

      A strength of the paper is the direct comparison between the properties of the model and neural data.

      Weaknesses:

      My main critiques of the paper relate to the form of the input to the network.

      First, because the input is the same across trials (i.e. all traversals are the same duration/velocity), there is no ability to distinguish a representation of space from a representation of time elapsed since the beginning of the trial. The authors should test what happens e.g. with traversals in which the animal travels at different speeds, and in which the animal's speed is not constant across the entire track, and then confirm that the resulting tuning curves are a better representation of position or duration.

      We thank the reviewer for pointing out this important limitation. We see extensive testing of the time vs space coding properties of this network as a future direction, but we have performed simulations that demonstrate the robustness of place field coding to variations in traversal speeds and added the results as a supplemental figure (Figure 3—figure supplement 1).

      Lines 332-336: “To verify that our simulated place cells were more strongly coding for spatial location than for elapsed time, we performed simulations with additional track traversals at different speeds and compared the resulting place fields and time fields in the same cells. We find that there is significantly greater place information than time information (Figure 3—figure supplement 1).

      Lines 835-841: “To compare coding for place vs time, we performed repeated simulations for the same networks at the fiducial parameter point with 1.0x and 2.0x of the original track traversal speed. We then combined all trials for both speed conditions to calculate both place fields and time fields for each cell from the same linear track traversal simulations. The place fields were calculated as described below (average firing rate within each of the fifty 2-cm long spatial bins across the track) and the time fields were similarly calculated but for fifty 40-ms time bins across the initial two seconds of all track traversals.”

      Second, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. The authors should verify that their results do not depend on this idealization. Specifically, I would suggest the authors also test the spatial coding properties of their network in 2-dimensional environments, and with different kinds of input that have a range of degrees of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

      We agree that testing the robustness of our results to variations in feedforward input is important. We have added new simulation results (Figure 4—figure supplement 4) showing that the existence of preplay in our model is robust to variations in the form of input.

      Testing the model in a 2D environment is an interesting future direction, but we see it as outside the scope of the current work. To our knowledge there are no experimental findings of preplay in 2D environments, but this presents an interesting opportunity for future modeling studies.

      Lines 413-420: To test the robustness of our results to variations in input types, we simulated alternative forms of spatially modulated feedforward inputs. We found that with no parameter tuning or further modifications to the network, the model generates robust preplay with variations on the spatial inputs, including inputs of three linearly varying cues (Figure 4—figure supplement 4a) and two stepped cues (Figure 4—figure supplement 4b-c). The network is impaired in its ability to produce preplay with binary step location cues (Figure 4—figure supplement 4d), when there is no cluster bias (Figure 4—figure supplement 4e), and at greater values of cluster participation (Figure 4—figure supplement 4f).

      Finally, I was left wondering how the cells' spatial tuning relates to their cluster membership, and how the capacity of the network (number of different environments/locations that can be represented) relates to the number of clusters. It seems that if clusters of cells tend to code for nearby locations in the environment (as predicted by the results of Figure 5), then the number of encodable locations would be limited (by the number of clusters). Further, there should be a strong tendency for cells in the same cluster to encode overlapping locations in different environments, which is not seen in experimental data.

      Thank you for making this important point and giving us the opportunity to clarify. We do find that subsets of cells with identical cluster membership have correlated place fields, but as we show in Figure 9b (original Figure 7b) the network place map as a whole shows low remapping correlations across environments, which is consistent with experimental data (Hampson et al., 1996; Pavlides, et al., 2019).

      Our model includes a relatively small number of cells and clusters compared to CA3, and with a more realistic number of clusters, the level of correlation across network place maps should reduce even further in our model network. The reason for a low level of correlation in the model is because cluster membership is combinatorial, whereby cells that share membership in one cluster can also belong to separate/distinct other clusters, rendering their activity less correlated than might be anticipated.

      We have added text at lines 627-630 clarifying these points.

      Lines 628-631: “Cells that share membership in a cluster will have some amount of correlation in their remapping due to the cluster-dependent cue bias, which is consistent with experimental results (Hampson et al., 1996; Pavlides et al., 2019), but the combinatorial nature of cluster membership renders the overall place field map correlations low (Figure 9b).”

      Reviewer #3 (Public Review):

      Summary:

      This work offers a novel perspective on the question of how hippocampal networks can adaptively generate different spatial maps and replays/preplays of the corresponding place cells, without any such maps pre-existing in the network architecture or its inputs. Unlike previous modeling attempts, the authors do not pre-tune their model neurons to any particular place fields. Instead, they build a random, moderately-clustered network of excitatory (and some inhibitory) cells, similar to CA3 architecture. By simulating spatial exploration through border-cell-like synaptic inputs, the model generates place cells for different "environments" without the need to reconfigure its synaptic connectivity or introduce plasticity. By simulating sleep-like random synaptic inputs, the model generates sequential activations of cells, mimicking preplays. These "preplays" require small-world connectivity, so that weakly connected cell clusters are activated in sequence. Using a set of electrophysiological recordings from CA1, the authors confirm that the modeled place cells and replays share many features with real ones. In summary, the model demonstrates that spontaneous activity within a small-world structured network can generate place cells and replays without the need for pre-configured maps.

      Strengths:

      This work addresses an important question in hippocampal dynamics. Namely, how can hippocampal networks quickly generate new place cells when a novel environment is introduced? And how can these place cells preplay their sequences even before the environment is experienced? Previous models required pre-existing spatial representations to be artificially introduced, limiting their adaptability to new environments. Other models depended on synaptic plasticity rules which made remapping slower than what is seen in recordings. This modeling work proposes that quickly-adaptive intrinsic spiking sequences (preplays) and spatially tuned spiking (place cells) can be generated in a network through randomly clustered recurrent connectivity and border-cell inputs, avoiding the need for pre-set spatial maps or plasticity rules. The proposal that small-world architecture is key for place cells and preplays to adapt to new spatial environments is novel and of potential interest to the computational and experimental community.

      The authors do a good job of thoroughly examining some of the features of their model, with a strong focus on excitatory cell connectivity. Perhaps the most valuable conclusion is that replays require the successive activation of different cell clusters. Small-world architecture is the optimal regime for such a controlled succession of activated clusters.

      The use of pre-existing electrophysiological data adds particular value to the model. The authors convincingly show that the simulated place cells and preplay events share many important features with those recorded in CA1 (though CA3 ones are similar).

      Weaknesses:

      To generate place cell-like activity during a simulated traversal of a linear environment, the authors drive the network with a combination of linearly increasing/decreasing synaptic inputs, mimicking border cell-like inputs. These inputs presumably stem from the entorhinal cortex (though this is not discussed). The authors do not explore how the model would behave when these inputs are replaced by or combined with grid cell inputs which would be more physiologically realistic.

      We chose the linearly varying spatial inputs as the minimal model of providing spatial input to the network so that we could focus on the dynamics of the recurrent connections. We agree our results will be strengthened by testing alternative types of border-like input. We show in Figure 4—figure supplement 4that our preplay results are robust to several variations in the location-cue inputs. However, given that a sub-goal of our model was to show that place fields could arise in locations at which no neurons receive a peak in external input, whereas combining input from multiple grid cells produces peaked place-field like input, adding grid cell input (and the many other types of potential hippocampal input) is beyond the scope of the paper.

      Even though the authors claim that no spatially-tuned information is needed for the model to generate place cells, there is a small location-cue bias added to the cells, depending on the cluster(s) they belong to. Even though this input is relatively weak, it could potentially be driving the sequential activation of clusters and therefore the preplays and place cells. In that case, the claim for non-spatially tuned inputs seems weak. This detail is hidden in the Methods section and not discussed further. How does the model behave without this added bias input?

      We apologize for a lack of clarity if we have caused confusion about the type of inputs and if we implied an absence of spatially-tuned information in the network. In order for place fields to appear the network must receive spatial information, which we model as linearly-varying cues and illustrate in Figure 1b and describe in the caption (original lines 156-157), Results (original lines 189-190 & 497-499), and Methods (original lines 671-683). Such input is not place-field like, as the small bias to any cell linearly decreases from one boundary of the track or the other.

      The cluster-dependent bias, which is also described in the same lines (Figure 1 caption (original lines 156-157), Results (original lines 189-190 & 497-499), and Methods (original lines 671-683)), only affects the strength of the spatial cues that are present during simulated run periods. Crucially, this cluster-dependent bias is absent during sleep simulations when preplay occurs, which is why preplay can equally correlate with place field sequences in any context.

      We have modified the text (lines 207-210, 218, and 824-827) to clarify these points. We have also added results from a control simulation (Figure 4—figure supplement 4e) showing that preplay is not generated in the absence of the cluster-dependent bias.

      Lines 207-210: “This bias causes cells that share cluster memberships to have more similar place fields during the simulated run period, but, crucially, this bias is not present during sleep simulations so that there is no environment-specific information present when the network generates preplay.”

      Lines 218: “Second, to incorporate cluster-dependent correlations in place fields, a small…”

      Lines 824-827: “The addition of this bias produced correlations in cells’ spatial tunings based on cluster membership, but, importantly, this bias was not present during the sleep simulations, and it did not lead to high correlations of place-field maps between environments (Figure 9b).”

      Unlike excitation, inhibition is modeled in a very uniform way (uniform connection probability with all E cells, no I-I connections, no border-cell inputs). This goes against a long literature on the precise coordination of multiple inhibitory subnetworks, with different interneuron subtypes playing different roles (e.g. output-suppressing perisomatic inhibition vs input-gating dendritic inhibition). Even though no model is meant to capture every detail of a real neuronal circuit, expanding on the role of inhibition in this clustered architecture would greatly strengthen this work.

      This is an interesting future direction, but we see it as outside the scope of our current work. While inhibitory microcircuits are certainly important physiologically, we focus here on a minimal model that produces the desired place cell activity and preplay, as measured in excitatory cells. We have added a brief discussion of this to the manuscript.

      Lines 733-739: “Additionally, the in vivo microcircuitry of CA3 is complex and includes aspects such as nonlinear dendritic computations and a variety of inhibitory cell types (Rebola et al., 2017). This microcircuitry is crucial for explaining certain aspects of hippocampal function, such as ripple and gamma oscillogenesis (Ramirez-Villegas et al., 2017), but here we have focused on a minimal model that is sufficient to produce place cell spiking activity that is consistent with experimentally measured place field and preplay statistics.”

      For the modeling insights to be physiologically plausible, it is important to show that CA3 connectivity (which the model mimics) shares the proposed small-world architecture. The authors discuss the existence of this architecture in various brain regions but not in CA3, which is traditionally thought of and modeled as a random or fully connected recurrent excitatory network. A thorough discussion of CA3 connectivity would strengthen this work.

      We agree this is an important point that is missing, and we have modified lines 114-116 to address the clustered connectivity reported in CA3.

      Lines 114-116: “Such clustering is a common motif across the brain, including the CA3 region of the hippocampus (Guzman et al., 2016) as well as cortex (Song et al., 2005), …”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Based on Figure 3, the place fields are not uniformly distributed in the maze. Meanwhile, based on Figure 1b and Methods, the total input seems to be uniform across the maze. Why does the uniform total external input lead to nonuniform network activities?

      While the total input to the network is constant across the maze, the input to any individual cell can peak only at either end of the track. All excitatory cells receive input from both the left-cue and the right-cue with different input strengths. By chance and due to the cluster-dependent bias some cells will have stronger input from one cue than the other and will therefore be more likely to have a place field toward that side of the track. However, no cell receives a peak of input in the center of the track. We have modified lines 141-143 to clarify this.

      Lines 141-143: “While the total input to the network is constant as a function of position, each cell only receives a peak in its spatially linearly varying feedforward input at one end of the track.”

      (2) I find these sentences confusing: "...we expected that the set of spiking events that significantly decode to linear trajectories in one environment (Figure 4) should decode with a similar fidelity in another environment..." (Lines 513-515) and "As expected... but not with the place fields of trajectories from different environments (Figure 7c)" (Line 517-520). What is the expectation for cross-environment decoding? Should they be similar or different? Also, in Figure 7c, the example is not fully convincing. In the figure caption, it states that decoding is significant in the top row but not in the bottom row, but they look similar across rows.

      Original lines 513-515 refer to the entire set of events, while original lines 517-520 refer to one example event. The sleep events are simulated without any track-specific information present, so the degree to which preplay occurs when decoding based on the place fields of a specific future track should be independent of any particular track when considering the entire set of decoded PBEs, as shown in Figure 9d (original Figure 7). However, because there is strong remapping across tracks (Figure 9b), an individual event that shows a strong decoded trajectory based on the place fields of one track (Figure 9c, top row) should show chance levels of a decoded trajectory when decoded with the place fields of an alternative track (Figure 9c, bottom row).

      We have revised lines 643-650 for clarity, and we have added statistics for the events shown in Figure 9c.

      Lines 644-651: “Since the place field map correlations are high for trajectories on the same track and near zero for trajectories on different tracks, any individual event would be expected to have similar decoded trajectories when decoding based on the place fields from different trajectories in the same environment and dissimilar decoded trajectories when decoding based on place fields from different environments. A given event with a strong decoded trajectory based on the place fields of one environment would then be expected to have a weaker decoded trajectory when decoded with place fields from an alternative environment (Figure 9c).

      Lines 604-608: “(c) An example event with a statistically significant trajectory when decoded with place fields from Env. 1 left (absolute correlation at the 99th percentile of time-bin shuffles) but not when decoded with place fields of the other trajectories (78th, 45th, and 63rd percentiles, for Env. 1 right, Env. 2 left, and Env. 2 right, respectively). shows a significant trajectory when it is decoded with place fields from one environment (top row), but not when it is decoded with place fields from another environment (bottom row). “

      (3) In Methods, the equation at line 610, E in the last term should be E_ext.

      We modeled the feedforward inputs as excitatory connections with the same reversal potential as the recurrent excitatory connections, so  is the proper value.

      (4) Equation line 617 states that conductances follow exponential decay, but the initial conductances of g_I.g_E and g_SRA are not specified.

      We have added a description of the initial values in lines 760-764.

      Lines 760-764: “Initial feed-forward input conductances were set to values approximating their steady-state values by randomly selecting values from a Gaussian with a mean of   and a standard deviation of . Initial values of the recurrent conductances and the SRA conductance were set to zero.”

      (5) In the parameter table below line 647, W_E-E, W_E-I, and W_I-E are not described in the text.

      We have clarified in lines 757-760 that the step increase in conductance corresponds to these parameter values.

      Lines 757-760: “A step increase in conductance occurs at the time of each spike by an amount corresponding to the connection strength for each synapse ( for E-to-E connections, for E-to-I connections, and  for I-to-E connections), or by  for .”

      (6) On line 660, "...Each environment and the sleep session had unique context cue input weights...". Does that mean that within a sleep session, the network received the same context input? How strongly are the sleep dynamics driven by that context input rather than by intrinsic dynamics? Usually, sleep activity is high dimensional, what would happen if the input during sleep is more stochastic?

      Yes, within a sleep session each network receives a single set of context inputs, which are implemented as independent Poisson spike trains (so being independent, in small time-windows the dimensionality is equal to the number of neurons). The effects of any particular set of sleep context cue inputs should be minor, since the standard deviation of the input weights, , is small. Further, because the preplay analysis is performed across many networks at each parameter point, the observation of preplay is independent of any particular realization of either the recurrent network or the sleep context inputs.

      Further exploring the effects of more biophysically realistic neural dynamics during simulated sleep is an interesting future direction.

      (7) One bracket is missing in the denominator in line 831.

      We have fixed this error.

      Line 1005: “)” -> “()”

      Reviewer #2 (Recommendations For The Authors):

      - I would suggest the authors cite Chenkov et al 2017, PLOS Comp Bio, in which "replay" sequences were produced in clustered networks, and discuss how their work differs.

      We have included a contrast of our model to that of Chenkov et al., 2017 in lines 73-78.

      Lines 73-78: “Related to replay models based on place-field distance-dependent connectivity is the broader class of synfire-chain-like models. In these models, neurons (or clusters of neurons) are connected in a 1-dimensional feed-forward manner (Diesmann et al., 1999; Chenkov et al., 2017). The classic idea of a synfire-chain has been extended to included recurrent connections, such as by Chenkov et al., 2017, however such models still rely on an underlying 1-dimensional sequence of activity propagation.”

      - Figure legend 2e says "replay", should be "preplay".

      We have fixed this error.

      Line 255: “(e) Example preplay event…”

      - How much does the context cue affect the result? e.g. Is sleep notably different with different sleep context cues?

      As discussed above in our response to Reviewer 1, the context cue weights have a small standard deviation, , which means that differences in the effects of different realizations of the context inputs are small. Different sets of context cues will cause cells to have slightly higher or lower spiking rates during sleep simulations, but because there is no correlation between the sleep context cue and the place field simulations there should be no effect on preplay quality.

      - Figure 4 should include a control with a single cluster.

      We thank the reviewer for this suggestion and have added additional control simulations.

      In our model, the recurrent structure of a network with a single cluster is equivalent to a cluster-less random network. Additionally, any network where cluster participation equals the number of clusters is equivalent to a cluster-less random network, since all neurons belong to all clusters and can therefore potentially connect to any other neuron. Such a condition corresponds to a diagonal boundary where the number of clusters equals the cluster participation, which occurs at higher values of cluster participation than we had shown in our primary parameter grid.

      We now include simulation results that extend to this boundary, corresponding to cluster-less networks (Figure 4—figure supplement 4f). Networks at these parameter points do not show preplay. See our earlier response for the new text associated with Figure 4—figure supplement 4.

      - The results of Figure 4 are very noisy. I would recommend increasing the sampling, both in terms of the number of population events in each condition and the number of conditions.

      We have run simulations for longer durations (300 seconds) and with more networks (20) to produce more accurate empirical values for the statistics calculated across the parameter grids in Figures 3 and 4. Our additional simulations (Figure 4—figure supplement 4) provide support that the parameter region of preplay significance is reliable.

      Lines 831-833: “For the parameter grids in Figures 3 and 4 we simulated 20 networks with 300 s long sleep sessions in order to get more precise empirical estimates of the simulation statistics.”

      - It's not entirely clear what's different between the analysis described in lines 334-353, and the preplay analysis in Figure 2. In general, the description of this result was difficult to follow, as it included a lot of text that would be better served in the methods.

      In Figure 2 we first introduce the Bayesian decoding method, but it is not until Figure 4 that the shuffle-based significance testing is first introduced. We have simplified the description of the shuffle comparison in lines 371-375 and now refer the reader to the methods for details.

      Lines 371-375: “We find significant preplay in both our reference experimental data set (Shin et al., 2019; Figure 4a, b; see Figure 4—figure supplement 1 for example events) and our model (Figure 4c, d) when analyzed by the same methods as Farooq et al., 2019, wherein the significance of preplay is determined relative to time-bin shuffled events (see Methods). For each detected event we calculated its absolute weighted correlation. We then generated 100 time-bin shuffles of each event, and for each shuffle recalculated the absolute weighted correlation to generate a null distribution of absolute weighted correlations.”

      - Many of the figures have low text resolution (e.g. Figure 6).

      We have now fixed this.

      - How does the clustered small world network compare to e.g. a small world ring network as used in Watts and Strogatz 1998?

      As described in our above response to Reviewer 1's fourth point, we have added a supplementary figure (Figure 1—figure supplement 1, with corresponding text) comparing our model with the Watts-Strogatz model.

      Reviewer #3 (Recommendations For The Authors):

      Figure 5 would benefit from a plot of the overlap of activated clusters per event.

      In our cluster activation analysis in Figure 5, we defined a cluster as “active” if at any point in the event its population rate was twice that of any other clusters’. We used this definition—which permits no overlap of activated clusters—rather than a definition based on a z-scoring of the rate, because we determined that preplay required periods of spiking dominated by individual clusters.

      Author response image 3.

      The choice of such a definition is supported by our observation that most spiking activity within an event is dominated by whichever cluster is most active at each point in time. In the left panel of the above figure we show the distribution of the average fraction of spikes within each event that came from the most active cluster at each point in time. The right panel shows the distribution of the average across time within each event of the ratio of the population activity rate of the most active cluster to the second most active cluster. The data for both panels comes from all events at the fiducial parameter set.

      Author response image 4.

      Rather than overlapping at a given moment in time, clusters might have overlap in their probability of being active at some point within an event. We do find that there is a small but significant correlation in cluster co-activation. For each network we calculated the activation correlation across events for each pair of clusters (example network show in the left panel). We compared the distribution of resulting absolute correlations against the values that results after shuffling the correlations between cluster activations (right panel, all correlations for all networks from the fiducial parameter point).

      Figures 4e/f are referred to as 4c/d in the text (pg 14).

      We have fixed this error.

      Lines 400-412: “4c” -> “4e” and “4d” -> “4f”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The Notch signaling pathway plays an important role in many developmental and disease processes. Although well-studied there remain many puzzling aspects. One is the fact that as well as activating the receptor through trans-activation, the transmembrane ligands can interact with receptors present in the same cell. These cis-interactions are usually inhibitory, but in some cases, as in the assays used here, they may also be activating. With a total of 6 ligands and 4 receptors, there is potentially a wide array of possible outcomes when different combinations are co-expressed in vivo. Here the authors set out to make a systematic analysis of the qualitative and quantitative differences in the signaling output from different receptor-ligand combinations, generating sets of "signaling" (ligand expressing) and "receiving" (receptor +/- ligand expressing cells).

      The readout of pathway activity is transcriptional, relying on the fusion of GAL4 in the intracellular part of the receptor. Positive ligand interactions result in the proteolytic release of Gal4 that turns on the expression of H2B-citrine. As an indicator of ligand and receptor expression levels, they are linked via TA to H2B mCherry and H2B mTurq expression respectively. The authors also manipulate the expression of the glycosyltransferase Lunatic-Fringe (LFng) that modifies the EGF repeats in the extracellular domains impacting their interactions. The testing of multiple ligand-receptor combinations at varying expression levels is a tour de force, with over 50 stable cell lines generated, and yields valuable insights although as a whole, the results are quite complex.

      Strengths:

      Taking a reductionist approach to testing systematically differences in the signaling strength, binding strength, and cis-interactions from the different ligands in the context of the Notch1 and Notch 2 receptors (they justify well the choice of players to test via this approach) produces a baseline understanding of the different properties and leads to some unexpected and interesting findings. Notably:

      -                Jag1 ligand expressing cells failed to activate Notch1 receptor although were capable of activating Notch2. Conversely, Jag2 cells elicited the strongest activation of both receptors. The results with

      Jag1 are surprising also because it exhibits some of the strongest binding to plate-bound ligands. The failure to activate Notch1 has major functional significance and it will be important in the future to understand the mechanistic basis.

      -                Jagged ligands have the strongest cis-inhibitory effects and the receptors differ in their sensitivity to cis-inhibition by Dll ligands. These observations are in keeping with earlier in vivo and cell culture studies. More referencing of those would better place the work in context but it nicely supports and extends previous studies that were conducted in different ways.

      -                Responses to most trans-activating ligands showed a degree of ultrasensitivity but this was not the case for cis-interactions where effects were more linear. This has implications for the way the two mechanisms operate and for how the signaling levels will be impacted by ligand expression levels.

      -                Qualitatively similar results are obtained in a second cell line, suggesting they reflect fundamental properties of the ligands/receptors.

      We appreciate the positive and constructive feedback.

      Weaknesses:

      One weakness is that the methods used to quantify the expression of ligands and receptors rely on the co-translation of tagged nuclear H2B proteins. These may not accurately capture surface levels/correctly modified transmembrane proteins. In general, the multiple conditions tested partly compensate for the concerns - for example, as Jag1 cells do activate Notch2 even if they do not activate Notch1 some Jag1 must be getting to the surface. But even with Notch2, Jag1 activities are on the lower side, making it important to clarify, especially given the different outcomes with the plated ligands. Similarly, is the fact that all ligands "signalled strongest to Notch2" an inherent property or due to differences in surface levels of Notch 2 compared to Notch1? The results would be considerably strengthened by calibration of the ligand/receptor levels (and ideally their sub-cellular localizations). Assessing the membrane protein levels would be relatively straightforward to perform on some of the basic conditions because their ligand constructs contain Flag tags, making it plausible to relate surface protein to H2B, and there are antibodies available for Notch1 and Notch2.

      We agree that mCherry fluorescence does not provide a direct readout of active surface ligand levels. As the reviewer points out, the ability of Jag1 to activate Notch2 demonstrates that expressed Jag1 is competent for signaling. Further, in some cases, Jag1-Notch2 activation can be comparable to Dll1-Notch2 activation (Figure 2A). Following the reviewer’s suggestion, we performed a Western blot for multiple expression levels for each of three surface ligands (Dll1, Dll4, Jag1) (Figure 2—figure supplement 2). This blot revealed a signal for surface expression of Jag1. Interpretation is complicated by the expected dependence of the efficiency of surface protein purification on the number of primary amines in the protein, which varies among these ligands, and qualitatively correlates with the staining intensity. While this makes quantitative interpretation difficult, this result further supports the notion that Jag1 is present on the cell surface. Finally, we note that high signaling activity need not, in general, directly correlate with surface expression levels. In fact, one study showed an example in which increased ligand activity occurred with decreased basal ligand surface levels (Antfolk et al., 2017). While one would ideally like to know all parameters of the system, including surface protein levels, rates of recycling, etc. the perspective taken here is that the net effect of these many post-translational processing steps can be subsumed into the overall relationship between the expression of the protein (which, in our case, is read out by the co-translational reporter) and its activity, which is relevant for the behavior of developmental circuits, among other systems. To address this comment, we now explicitly mention the limitation of mCherry as a proxy for surface protein, and add a reference to previous work highlighting the relationship between surface levels and ligand activity.

      In terms of the dependence of signaling on Notch levels, the metric of signaling activity used here is explicitly normalized by the mTurquoise co-translational reporter of Notch expression to account for differences in receptor expression across receiver clones. We have added a new figure to show the variation in expression (Figure 1—figure supplement 1A) and to demonstrate this normalization (Figure 1—figure supplement 5). Having said that, as the reviewer correctly points out, we cannot directly address the dependence on surface receptor levels with mTurquoise alone. To address this comment, we have added a figure that shows cotranslational and surface receptor expression for a subset of our receiver clones (Figure 1—figure supplement 1B). Although antibody binding strengths may vary, it appears unlikely that higher surface levels could explain most ligands’ preferential activation of Notch2 over Notch1, since Notch2 levels were lower than Notch1 levels in both surface expression and cotranslational expression.

      Cis-activation as a mode of signaling has only emerged from these synthetic cell culture assays raising questions about its physiological relevance. Cis-activation is only seen at the higher ligand (Dll1, Dll4) levels, how physiological are the expression levels of the ligands/receptors in these assays? Is it likely that this would make a major contribution in vivo? Is it possible that the cells convert themselves into "signaling" and "receiving" sub-populations within the culture by post-translational mechanism? Again some analysis of the ligand/receptors in the cultures would be a valuable addition to show whether or not there are major heterogeneities.

      The cis-activation results in this paper are, as the reviewer points out, conducted in synthetic cell culture assays. Cis-activation is observed across a large dynamic range of ligand expression, possibly including non-physiologically high levels. However, our previous work (Nandagopal et al, eLife 2019) showed that cis-activation does not require over-expression, as it occurred in unmodified Caco-2 and NMuMG cells with their endogenous ligand and receptor expression levels. As shown here in Figure 4B, cis-activation for Notch2 increases monotonically and is substantial even at intermediate ligand concentrations. In other cases, cis-activation is maximal at intermediate concentrations. We agree that the in vivo role remains unclear, and is difficult to determine due to the typical close contacts among cells in tissues. Therefore, these assays do not speak to in vivo relevance. Note that we can, however, rule out the possibility of trans signaling between well-mixed cell populations at these densities (Figure 4A).

      It is hard to appreciate how much cell-to-cell variability in the "output" there is. For example, low "outputs" could arise from fewer cells becoming activated or from all cells being activated less. As presented, only the latter is considered. That may be already evident in their data, but not easy for the reader to distinguish from the way they are presented. For example, in many of the graphs, data have been processed through multiple steps of normalization. Some discussion/consideration of this point is needed.

      We agree that in different experiments changes in a mean response can reflect changes in fraction of activated cells, or level of activation or some combination of both. In this work, most assays were conducted by flow cytometry, which provides a full distribution of cellular responses. We provided distributions for some experiments in the supplementary figures (i.e., Figure 4—figure supplement 1, and Figure 5—figure supplement 4). The sheer number of experiments and samples prevents us from displaying all underlying histograms. Therefore, we have provided all flow data sets in an extensive archive that is publicly available on data.caltech.edu (https://doi.org/10.22002/gjjkn-wrj28).

      Impact:

      Overall, cataloging the outcomes from the different ligand-receptor combinations, both in cis and trans, yields a valuable baseline for those investigating their functional roles in different contexts. There is still a long way to go before it will be possible to make a predictive model for outcomes based on expression levels, but this work gives an idea about the landscape and the complexities. This is especially important now that signaling relationships are frequently hypothesized based on single-cell transcriptomic data. The results presented here demonstrate that the relationships are not straightforward when multiple players are involved.

      We appreciate this concise impact summary, and agree with its conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors extend their previous studies on trans-activation, cis-inhibition (PMID: 25255098), and cis-activation (PMID: 30628888) of the Notch pathway. Here they create a large number of cell lines using CHO-K1 and C2C12 cells expressing either Notch1-Gal4 or Notch2-Gal4 receptors which express a fluorescent protein upon receptor activation (receiver cells). For cis-inhibition and cis-activation assays, these cells were engineered to express one of the four canonical Notch ligands (Dll1, Dll4, Jag1, Jag2) under tetracycline control. Some of the receiver cells were also transfected with a Lunatic fringe (Lfng) plasmid to produce cells with a range of Lfng expression levels. Sender cells expressing all of the canonical ligands were also produced. Cells were mixed in a variety of co-culture assays to highlight trans-activation, cis-activation, and cis-inhibition. All four ligands were able to trans-activate Notch1 and Notch 2, except Jag1 did not transactivate Notch1. Lfng enhanced trans-activation of both Notch receptors by Dll1 and Dll2, and inhibited Notch1 activation by Jag2 and Notch2 activation by both Jag 1 and Jag2. Cis-expression of all four ligands was predominantly inhibitory, but Dll1 and Dll4 showed strong cis-activation of Notch2. Interestingly, cis-ligands preferentially inhibited trans-activation by the same ligand, with varying effects on other trans-ligands.

      Strengths:

      This represents the most comprehensive and rigorous analysis of the effects of canonical ligands on cis- and trans-activation, and cis-inhibition, of Notch1 and Notch2 in the presence or absence of Lfng so far. Studying cis-inhibition and cis-activation is difficult in vivo due to the presence of multiple Notch ligands and receptors (and Fringes) that often occur in single cells. The methods described here are a step towards generating cells expressing more complex arrays of ligands, receptors, and Fringes to better mimic in vivo effects on Notch function.

      In addition, the fact that their transactivation results with most ligands on Notch1 and 2 in the presence or absence of Lfng were largely consistent with previous publications provides confidence that the author's assays are working properly.

      We appreciate the thoughtful comments and feedback.

      Weaknesses:

      It was unusual that the engineered CHO cells expressing Notch1-Gal4 were not activated at all by co-culture with Jag1-expressing CHO cells. Many previous reports have shown that Jag1 can activate Notch1 in co-culture assays, including when Notch1 was expressed in CHO cells. Interestingly, when the authors used Jag1-Fc in a plate coating assay, it did activate Notch1 and could be inhibited by the expression of Lfng.

      In our assays, we do in fact also see some signaling of Jag1 to Notch1, especially when dLfng is coexpressed (Figure 2—figure supplement 4, formerly Figure 2—figure supplement 3). While these levels are lower than those observed for other ligand-receptor combinations, they are significantly elevated compared to baseline. In specific natural contexts, it will be important to determine whether the weak but non-zero Jag1-Notch1 signaling acts negatively to suppress signaling from other ligands, or provides weak but potentially functionally important levels of signaling. Evidence for both modes exists in the literature. To address this, we have expanded the discussion of Jag1-Notch1 signaling and added references to other work on Jag1-Notch1 signaling to the Discussion section.

      The cell surface level of the ligands was determined by flow cytometry of a co-translated fluorescent protein. Some calibration of the actual cell surface levels with the fluorescent protein would strengthen the results.

      This issue was also raised by Reviewers #1 and #3. Please see responses to Reviewer #1, above.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports a comprehensive analysis of Notch-Delta/Jagged signaling inclusive of the human Notch1 and Notch2 receptors and DLL1, DLL4, JAG1, and JAG2 ligands. Measurements

      encompassed signaling activity for ligand trans-activation, cis-activation, cis-inhibition, and activity modulation by Lfng. The most striking observations of the study are that JAG1 has no detectable activity as a Notch1 ligand when presented on a cell (though it does have activity when immobilized on a surface), even though it is an effective cis-inhibitor of Notch1 signaling by other ligands, and that DLL1 and DLL4 exhibit cis-activating activity for Notch1 and especially for Notch2. Notwithstanding the artificiality of the system and some of its shortcomings, the results should nevertheless be a valuable resource for the Notch signaling community.

      Strengths:

      (1)  The work is systematic and comprehensive, addressing questions that are of importance to the community of researchers investigating mammalian Notch proteins, their activation by ligands, and the modulation of ligand activity by LFng.

      (2)  A quantitative and thorough analysis of the data is presented.

      Weaknesses:

      (1) The manuscript is primarily descriptive and does not delve into the underlying, mechanistic origin or source of the different ligand activities.

      We agree that the goals of this paper were largely to discover the range of signaling modes that occur. A mechanistic analysis would be beyond the scope of this work, but we agree it is an important next step.

      (2) The amount of ligand or receptor expressed is inferred from the flow cytometry signal of a co-translated fluorescent protein-histone fusion, and is not directly measured. The work would be more compelling if the amount of ligand present on the cell surface were directly measured with anti-ligand antibodies, rather than inferred from measurements of the fluorescent protein-histone fusion.

      This issue was also raised by Reviewers #1 and #2. Please see responses to Reviewer #1, above.

      (3) It would be helpful to see plots of the raw activity data before transformation and normalization, because the plots present data after several processing steps, and it is not clear how the processed data relate to the original values determined in each measurement.

      We included examples showing how raw data is processed in Figure 4—figure supplement 1 and Figure 5—figure supplement 4. The sheer number of experiments precludes including similar figures for all data sets. However, all raw and processed data and data analysis code is publicly available at (https://doi.org/10.22002/gjjkn-wrj28).

      (4) The authors use sparse plating of engineered cells with parental (no ligand or receptor-expressing cell to measure cis activation). However, the cells divide within the cultured period of 22-24 h and can potentially trans-activate each other.

      If measured cis-activation signal arises solely from trans-activation, then the measured cis-activation signal per cell should increase with cell density, since trans-activation per cell does depend on cell density (Figure 4A). However, for the strongest cis-activators (Dll1- and Dll4-Notch2), signaling magnitude is similar when these cells are cultured sparsely or at confluence, which would otherwise allow efficient trans signaling (Figure 5A). Thus, for Dll1- and Dll4-Notch2 receivers, total signaling strength per cell depends little or not at all on the opportunity to signal intercellularly. Moreover, cis-activation signal for the Dll1- and Dll4-Notch2 combinations exceeded the maximum trans-signaling levels we could achieve for the same receivers when cis-ligand was suppressed (Figure 4B). These results argue that cis interactions dominate signaling in this context. However, we have not ruled out the possibility that trans-signaling between sister cells after division contributes to the comparatively weak cis-activation observed for Notch1 receivers.

      Reviewer #1 (Recommendations For The Authors):

      As outlined in the public review, there is a question of whether the nuclear H2B accurately reflects the surface levels of the transmembrane proteins (ligand and receptor). Clearly, it would not be feasible to check levels in all of the experimental conditions, but some baseline conditions should be analyzed.

      We addressed this above.

      Reviewer #2 (Recommendations For The Authors):

      (1)  As mentioned above, it was unusual that Jag1 did not activate Notch1 in co-culture assays, but did activate Notch1 in plate-coating assays. The authors should add some text to the Discussion to explain why they think this is happening in their engineered cells. One possibility is that the CHO cells express Manic fringe (Mfng) which is known to reduce Jag1-Notch1 activation. Data for Mfng levels in CHO cells were not included in Supplemental Table 2. Knocking down all three Fringes in CHO cells might increase Jag1-Notch1 activation.

      This is already addressed in a sentence in the results: “Strikingly, while Jag1 sender cells failed to activate Notch1 receivers above background (Figure 2D), plate-bound Jag1-ext-Fc activated Notch1 only ~3-fold less efficiently than it activated Notch2 (Figure 3B-D). This suggests that the natural endocytic activation mechanism, or potential differences in tertiary structure between the expressed and recombinant Jag1 extracellular domains, could play roles in preventing Jag1-Notch1 signaling in coculture.” Regarding the point about Mfng, we added a note to Supplementary Table about other CHO-K1 expression data.

      (2) Figure 1-supplemental figure 1: Both the Notch1-Jag1 and Notch1-Jag2 cells show high expression of Jag1 in low 4epi, but any higher concentration reduces to control levels. How much of a problem is this for interpreting your data?

      This was not the ideal behavior, but by binning cells by co-translational reporters for ligand expression, we were able to obtain enough cells in intermediate bins. (Note: Figure 1—figure supplement 1 is now Figure 1—figure supplement 2.)

      (3)  Figure 1C legend: Are these stably-expressing cells or Tet-off cells? Please state in legend.

      The figure legend has been updated.

      (4)  Figure 1E: How long is the knockdown of Rfng and Lfng effective? Does it affect the expression of Lfng later?

      siRNA effects generally last for at least 72-96 hours, so we do not anticipate this being an issue.

      (5) Page 9: "Lfng significantly decreased trans-activation of both receptors by Jag1 (>2.5-fold)". If there is no Jag1-Notch1 activation, how can Lfng decrease trans-activation?

      We added a note in the main text to clarify that while Jag1-Notch1 signaling is relatively low, it can still be detectably decreased.

      (6) Figure 4A legend: Please define what "2.5k ea senders and Rec" means. In the text, it says "To focus on cis-interactions alone, we then cultured receiver cells at low density, amid an excess of wildtype CHO-K1 cells" (page 14).

      This was clarified in the text.

      (7)  Page 14: "By contrast, Notch2 was cis-activated by both Dll1 and Dll4, to levels exceeding those produced by trans-activation by high-Dll1 senders (Figure 4B, lower left)." Where is the trans-activation data? 4B, lower right?

      We updated this reference in the main text.

      (8)  Page 16: "For Notch2-Dll1 and Notch2-Dll4, single cell reporter activities correlated with cis-ligand expression, regardless of whether cells were pre-induced at a high or low culture density (Figure 4D)." It appears that Notch2-Dll1 has lower Notch activation at sparse culture than confluent.

      We agree that the level signaling is lower in sparse compared to confluent on average. This is explained by the sensitivity of the Tet-OFF promoter to culture density (Figure 4—figure supplement 2). However, the key point of this experiment is the positive correlation, which is consistent with cis-activation, and inconsistent with the pre-generation of NEXT hypothesis diagrammed in Figure 4C, which would not be expected to produce such a correlation.

      (9a) For the creation of the C2C12-Nkd cells: Has genomic sequencing been done to confirm editing of Notch2 and Jag1 loci?

      We confirmed the knockdown but did not do genomic sequencing.

      (9b) The gel in Figure 7-Supplement 1C is not adequate for showing loss of Jag1. It should be repeated.

      In this case, we have only the single gel. We added a note in figure legend that no duplicate was performed.

      (10) Figure 7A: Which Fringes are expressed in C2C12 cells? You should provide a rationale for knocking down just Rfng.

      Figure 7—figure supplement 1A shows the levels of expression in C2C12. Note that Mfng is not highlighted because its levels were undetectable.

      (11) Figure 7-Supplement 1D: This is confusing. Notch2 levels are not reduced in the left panel, and Notch1 and Notch2 levels are not reduced in the right panel?

      C2C12-Nkd cells exhibit reduced levels of Notch1 and Notch3. This can be seen in Figure 7—figure supplement 1A. Panel D presents the results of additional siRNA knockdown, performed to prevent subsequent up-regulation of Notch1 and Notch3 during the assay. These knockdown results were variable, as shown. The Notch2 siRNA knockdown was not essential for these experiments, but performed despite very low levels of Notch2 to begin with. In the revision, we have added this note to the Methods.

      Reviewer #3 (Recommendations For The Authors):

      (1) The results section of the manuscript is very dense and difficult to follow, as are the figure legends.

      We appreciate the criticism, and regret that it is not easier to read in its current form.

      (2) The authors could emphasize areas of concordance with published results (where available) to place their artificial, engineered system into a better biological context. Are there any examples of studies in whole organisms where cis-activation plays a role?

      We are not aware of examples of cis-activation in whole organisms at this point.

      (3) How do the authors rationalize the different responses of Notch1 to cell-presented Jag1 as opposed to immobilized Jag1, where its signal strength is second in rank order on a molar basis?

      This comment was addressed above in response to the first recommendation from Reviewer #2.

      It is also difficult to understand Figure 2_—_figure Supplement 3B, in which it appears that Jag1 induces a Notch1 reporter response when LFng is knocked down (dLfng), and how those data relate to the inactive response to Jag1 shown in the main figures.

      The issue here is a difference of normalization. Figure 2A in the main text is normalized to the sender expression level, i.e. relative signaling strength. By contrast, Figure 2—figure supplement 4B (previously Figure 2—figure supplement 3B) shows absolute signaling activity, which can appear higher because it does not normalize for ligand expression. For Jag1-Notch1 signaling in particular, substantial signaling required very high levels of Jag1. We have added a new figure to demonstrate these two types of normalization (Figure 2—figure supplement 1A).

      See the Authr response image 1 below for a direct comparison of these two normalization modes using data from both Figure 2A and Figure 2—figure supplement 4B. Note how the Jag1-Notch1 signaling activities that are nonzero in the top plot go to zero in the bottom plot as a result of normalizing the values to ligand expression.

      Author response image 1. Comparison of normalization modes in Figure 2A and Figure 2—figure supplement 4B (formerly 3B). Normalized trans-activation signaling activities for different ligand-receptor combinations (with dLfng only), either with further normalization to ligand expression (bottom row) or without further normalization (top row). Normalized signaling activity is defined as reporter activity (mCitrine, A.U.) divided by cotranslational receptor expression (mTurq2, A.U.), normalized to the strongest biological replicate-averaged signaling activity across all ligand-receptor-Lfng combinations in this experiment. Saturated data points, defined here as those with normalized signaling activity over 0.75 in both dLfng and Lfng conditions, were excluded. Colors indicate the identity of the trans-ligand expressed by cocultured sender cells. Error bars denote bootstrapped 95% confidence intervals (Methods), in this case sampled from the number of biological replicates given in the legend—n1 (for Notch1) or n2 (for Notch2). See Methods and Figure 2A caption for more details. Note that the only difference between this figure and the new Figure 2—figure supplement 1A is that this figure additionally includes the Jag1-high data from Figure 2—figure supplement 4B.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Herrmannova et al explore changes in translation upon individual depletion of three subunits of the eIF3 complex (d, e and h) in mammalian cells. The authors provide a detailed analysis of regulated transcripts, followed by validation by RT-qPCR and/or Western blot of targets of interest, as well as GO and KKEG pathway analysis. The authors confirm prior observations that eIF3, despite being a general translation initiation factor, functions in mRNA-specific regulation, and that eIF3 is important for translation re-initiation. They show that global effects of eIF3e and eIF3d depletion on translation and cell growth are concordant. Their results support and extend previous reports suggesting that both factors control translation of 5'TOP mRNAs. Interestingly, they identify MAPK pathway components as a group of targets coordinately regulated by eIF3 d/e. The authors also discuss discrepancies with other reports analyzing eIF3e function.

      Strengths:

      Altogether, a solid analysis of eIF3 d/e/h-mediated translation regulation of specific transcripts. The data will be useful for scientists working in the Translation field.

      Weaknesses:

      The authors could have explored in more detail some of their novel observations, as well as their impact on cell behavior.

      The manuscript has improved with the new corrections. I appreciate the authors' attention to the minor comments, which have been fully solved. The authors have not, however, provided additional experimental evidence that uORF-mediated translation of Raf-1 mRNA depends on an intact eIF3 complex, nor have they addressed the consequences of such regulation for cell physiology. While I understand that this is a subject of follow-up research, the authors could have at least included their explanations/ speculations regarding major comments 2-4, which in my opinion could have been useful for the reader.

      Our explanations/speculations regarding major comments 2 and 3 were included in the Discussion. We apologize for this misunderstanding as we thought that we were supposed to explain our ideas only in the responses. We did not discuss the comment 4, however, as we are really not sure what is the true effect and did not want to go into wild speculations in our manuscript. We thank this reviewer for his insightful comments and understanding.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) The authors report the potential translational regulation of Raf kinase by re-initiation. It would be interesting to show that Raf is indeed regulated by uORF-mediated translation, and that this is dependent on an intact eIF3 complex. Analyzing the potential consequences of Raf1 regulation for cancer cell proliferation or apoptosis would be a plus.

      We agree that this is an interesting and likely possibility. In fact, another clue that translation of Raf1 is regulated by uORFs comes from Bohlen et al. 2023 (PMID: 36869665) where they showed that RAF1 translation is dependent on PRRC2 proteins (that promote leaky scanning through these uORFs). We noted in the discussion that our results from eIF3d/e/hKD and the PRRC2A/B/CKD partly overlap. It is a subject of our follow-up research to investigate whether eIF3 and PRRC2 co-operate together to regulate translation of this important mRNA. 

      (2) The authors show that eIF3 d/e -but not 3h- has an effect on cell proliferation. First, this indicates that proliferation does not fully correlate with eIF3 integrity. Depletion of eIF3d does not affect the integrity of eIF3, yet the effects on proliferation are similar to those of eIF3e. What is the possibility that changes in proliferation reflect functions of eIF3d outside the eIF3 complex? What could be the real consequences of disturbing eIF3 integrity for the mammalian cell? Please, discuss.

      Yes, proliferation does not fully correlate with eIF3 integrity. Downregulation of eIF3 subunits that lead to disintegration of eIF3 YLC core (a, b, c, g, i) have more detrimental effect on growth and translation than downregulation of the peripheral subunits (e, k, l, f, h, m). Our previous studies (Wagner et al. 2016, PMID: 27924037 and Herrmannová et al. 2020, PMID: 31863585) indicate that the YLC core of eIF3 can partially support translation even without its peripheral subunits. In this respect eIF3d (as a peripheral subunit) is an amazing exception, suggesting it may have some specialized function(s). Whether this function resides outside of the eIF3 complex or not we do not know, but do not think so. Mainly because in the absence of eIF3e – its interaction partner, eIF3d gets rapidly degraded. Therefore, it is not very likely that eIF3d exists alone outside of eIF3 complex with moonlighting functions elsewhere. We think that eIF3d, as a head-interacting subunit close to an important head ribosomal protein RACK1 (a landing pad for regulatory proteins), is a target of signaling pathways, which may make it important for translation of specific mRNAs. In support is these thoughts, eIF3d (in the context of entire eIF3) together with DAP5 were shown to promote translation by an alternate capdependent (eIF4F-independent) mechanism (Lee et al. 2016, PMID: 27462815; de la Parra et al. 2018, PMID:30076308). In addition, the eIF3d function (also in the context of entire eIF3) was proved to be regulated by stress-triggered phosphorylation (Lamper et al. 2020, PMID: 33184215). 

      (3) Figure 6D: Surprisingly, reduced levels of ERK1/2 upon eIF3d/e-KD are compensated by increased phosphorylation of ERK1/2 and net activation of c-Jun. Please comment on the functional consequences of buffering mechanisms that the cell deploys in order to counteract compromised eIF3 function. Why would the cell activate precisely the MAPK pathway to compensate for a compromised eIF3 function?

      This we do not know. We can only speculate that when translation is compromised, cells try to counteract it in two ways: 1) they produce more ribosomes to increase translational rates and 2) activate MAPK signaling to send pro-growth signals, which can in the end further boost ribosome biogenesis.

      (4) Regarding DAP-sensitive transcripts, can the authors discuss in more detail the role of eIF3d in alternative cap-dependent translation versus re-initiation? Are these transcripts being translated by a canonical cap- and uORF-dependent mechanism or by an alternative capdependent mechanism?

      This is indeed not an easy question. On one hand, it was shown that DAP5 facilitates translation re-initiation after uORF translation in a canonical cap-dependent manner. This mechanism is essential for translation of the main coding sequence (CDS) in mRNAs with structured 5' leaders and multiple uORFs. (Weber et al. 2022, PMID: 36473845; David et al., 2022, PMID: 35961752). On the other hand, DAP5 was proposed to promote alternative, eIF4F-independent but cap-dependent translation, as it can substitute the function of the eIF4F complex in cooperation with eIF3d (de la Parra et al., 2018, PMID: 30076308; Volta et al., 2021 34848685). Overall, these observations paint a very complex picture for us to propose a clear scenario of what is going on between these two proteins on individual mRNAs. We speculate that both mechanisms are taking place and that the specific mechanism of translation initiation differs for differently arranged mRNAs.

      Minor comments:

      (5) Figure S2C: why is there a strong reduction of the stop codon peak for 3d and 3h KDs?

      We have checked the Ribowaltz profiles of all replicates (in the Supplementary data we are showing only a representative replicate I) and the stop codon peak differs a lot among the replicates. We think that this way of plotting was optimized for calculation and visualization of P-sites and triplet periodicity and thus is not suitable for this type of comparison among samples. Therefore, we have performed our own analysis where the 5’ ends of reads are used instead of P-sites and triplicates are averaged and normalized to CDS (see below please), so that all samples can be compared directly in one plot (same as Fig. S13A but for stop codon). We can see that the stop codon peak really differs and is the smallest for eIF3hKD. However, these changes are in the range of 20% and we are not sure about their biological significance. We therefore refrain from drawing any conclusions. In general, reduced stop codon peak may signal faster termination or increased stop codon readthrough, but the latter should be accompanied by an increased ribosome density in the 3’UTR, which is not the case. A defect in termination efficiency would be manifested by an increased stop codon peak, instead.

      Author response image 1.

       

      (6) Figures 5 and S8: Adding a vertical line at 'zero' in all cumulative plots will help the reader understand the author's interpretation of the data. 

      We have added a dashed grey vertical line at zero as requested. However, for interpretation of these plots, the reader should focus on the colored curve and whether it is shifted in respect to the grey curve (background) or not. Shift to the right indicates increased expression, while shift to the left indicates decreased expression. The reported p-value then indicates the statistical significance of the shift.

      (7) The entire Figure 2 are controls that can go to Supplementary Material. The clustering of Figure S3B could be shown in the main Figure, as it is a very easy read-out of the consistent effects of the KDs of the different eIF3 subunits under analysis.

      We have moved the entire Figure 2 to Supplementary Material as suggested (the original panels can be found as Supplementary Figures 1B, 1C and 3A). Figure S3B is now the main Figure 2E. 

      (8) There are 3 replicates for Ribo-Seq and four for RNA-Seq. Were these not carried out in parallel, as it is usually done in Ribo-seq experiments? Why is there an extra replicate for RNASeq?

      Yes, the three replicates were carried out in parallel. We have decided to add the fourth replicate in RNA-Seq to increase the data robustness as the RNA-Seq is used for normalization of FP to calculate the TE, which was our main analyzed metrics in this article. We had the option to add the fourth replicate as we originally prepared five biological replicates for all samples, but after performing the control experiments, we selected only the 3 best replicates for the Ribo-Seq library preparation and sequencing.  

      (9) Please, add another sheet in Table S2 with the names of all genes that change only at the translation (RPF) levels.

      As requested, we have added three extra sheets (one for each downregulation) for differential FP with Padjusted <0.05 in the Spreadsheet S2. We also provide a complete unfiltered differential expression data (sheet named “all data”), so that readers can filter out any relevant data based on their interest.

      (10) Page 5, bottom: ' ...we showed that the expression of all 12 eIF3 subunits is interconnected such that perturbance of the expression of one subunit results in the down-regulation of entire modules...'. This is not true for eIF3d, as shown in Fig1B and mentioned in Results.

      This reviewer is correct. By this generalized statement, we were trying to summarize our previous results from Wagner et al., 2014, PMID: 24912683; Wagner et al.,2016, PMID: 27924037 and Herrmannova et al.,2020, PMID: 31863585. The eIF3d downregulation is the only exception that does not affect expression of any other eIF3 subunit. Therefore, we have rewritten this paragraph accordingly: “We recently reported a comprehensive in vivo analysis of the modular dynamics of the human eIF3 complex (Wagner et al, 2020; Wagner et al, 2014; Wagner et al., 2016). Using a systematic individual downregulation strategy, we showed that the expression of all 12 eIF3 subunits is interconnected such that perturbance of the expression of one subunit results in the down-regulation of entire modules leading to the formation of partial eIF3 subcomplexes with limited functionality (Herrmannova et al, 2020). eIF3d is the only exception in this respect, as its downregulation does not influence expression of any other eIF3 subunit.”

      (11) Page 10, bottom: ' The PCA plot and hierarchical clustering... These results suggest that eIF3h depletion impacts the translatome differentially than depletion of eIF3e or eIF3d.' This is already obvious in the polysome profiles of Figure S2C.

      We agree that this result is surely not surprising given the polysome profile and growth phenotype analyses of eIF3hKD. But still, we think that the PCA plot and hierarchical clustering results represent valuable controls. Nonetheless, we rephrased this section to note that this result agrees with the polysome profiles analysis: “The PCA plot and hierarchical clustering (Figure 2A and Supplementary Figure 4A) showed clustering of the samples into two main groups: Ribo-Seq and RNA-seq, and also into two subgroups; NT and eIF3hKD samples clustered on one side and eIF3eKD and eIF3dKD samples on the other. These results suggest that the eIF3h depletion has a much milder impact on the translatome than depletion of eIF3e or eIF3d, which agrees with the growth phenotype and polysome profile analyses (Supplementary Figure 1A and 1D).”

      (12) Page 12: ' As for the eIF3dKD "unique upregulated" DTEGs, we identified one interesting and unique KEGG pathway, the ABC transporters (Supplementary Figure 5A, in green).' This sentence is confusing, as there are more pathways that are significant in this group, so it is unclear why the authors consider it 'unique'.

      The eIF3dKD “unique upregulated” group comprises genes with increased TE only in eIF3dKD but not in eIF3eKD or eIF3hKD (500 genes, Fig 2G). All these 500 genes were examined for enrichment in the KEGG pathways, and the top 10 significant pathways were reported (Fig S6A). However, 8 out of these 10 pathways were also significantly enriched in other gene groups examined (e.g. eIF3d/eIF3e common). Therefore, the two remaining pathways (“ABC transporters” and “Other types of O-glycan biosynthesis”) are truly unique for eIF3dKD. We wanted to highlight the ABC transporters group in particular because we find it rather interesting (for the reasons mentioned in the article). We have corrected the sentence in question to avoid confusion: “Among the eIF3dKD “unique upregulated” DTEGs, we identified one interesting KEGG pathway, the ABC transporters, which did not show up in other gene groups (Supplementary Figure 6A, in green). A total of 12 different ABC transporters had elevated TE (9 of them are unique to eIF3dKD, while 3 were also found in eIF3eKD), 6 of which (ABCC1-5, ABCC10) belong to the C subfamily, known to confer multidrug resistance with alternative designation as multidrug resistance protein (MRP1-5, MRP7) (Sodani et al, 2012).

      Interestingly, all six of these ABCC transporters were upregulated solely at the translational level (Supplementary Spreadsheet S2).”    

      (13) Note typo ('Various') in Figure 4A.

      Corrected

      (14) The introduction could be shortened.

      This is a very subjective requirement. In fact, when this manuscript was reviewed in NAR, we were asked by two reviewers to expand it substantially. Because a number of various research topics come together in this work, e.g. translational regulation, the eIF3 structure and function, MAPK/ERK signaling, we are convinced that all of them demand a comprehensive introduction for non-experts in each of these topics. Therefore, with all due respect to this reviewer, we did not ultimately shorten it.

      Reviewer #2 (Recommendations For The Authors):

      - In Figure 2, it would be useful to know why eIF3d is destabilized by eIF3e knockdown - is it protein degradation and why do the eIF3d/e knockdowns not more completely phenocopy each other when there is the same reduction to eIF3d as in the eIF3d knockdown sample?

      Yes, we do think that protein degradation lies behind the eIF3d destabilization in the eIF3eKD, but we have not yet directly demonstrated this. However, we have shown that eIF3d mRNA levels are not altered in eIF3eKD and that Ribo-Seq data indicate no change in TE or FP for eIF3d-encoding mRNA in eIF3eKD. Nonetheless, it is important to note (and we discuss it in the article) that eIF3d levels in eIF3dKD are lower than eIF3d levels in eIF3eKD (please see Supplementary Figure 1C). In fact, we believe that this is one of the main reasons for the eIF3d/e knockdowns differences.

      - The western blots in Figures 4 and 6 show modest changes to target protein levels and would be strengthened by quantification.

      We have added the quantifications as requested by this reviewer and the reviewer 3.

      - For Figure 4, this figure would be strengthened by experiments showing if the increase in ribosomal protein levels is correlated with actual changes to ribosome biogenesis.

      As suggested, we performed polysome profiling in the presence of EDTA to monitor changes in the 60S/40S ratio, indicating a potential imbalance in the biogenesis of individual ribosome subunits. We found that it was not affected (Figure 3G). In addition, we performed the same experiment, normalizing all samples to the same number of cells (cells were carefully counted before lysis). In this way, we confirmed that eIF3dKD and eIF3eKD cells indeed contain a significantly increased number of ribosomes, in agreement with the western blot analysis (Figure 3H).

      - In Figure 6, there needs to be a nuclear loading control.

      This experiment was repeated with Lamin B1 used as a nuclear loading control – it is now shown as Fig. 5F.

      - For Figure 8, these findings would be strengthened using luciferase reporter assays where the various RNA determinants are experimentally tested. Similarly, 5′ TOP RNA reporters would have been appreciated in Figure 4.

      This is indeed a logical continuation of our work, which represents the current work in progress of one of the PhD students. We apologize, but we consider this time- and resource-demanding analysis out of scope of this article.

      Reviewer #3 (Recommendations For The Authors):

      (1) Within the many effects observed, it is mentioned that eIF3d is known to be overexpressed while eIF3e is underexpressed in many cancers, but knockdown of either subunit decreases MDM2 levels, which would be expected to increase P53 activity and decrease tumor cell transformation. In contrast, they also report that 3e/3d knockdown dramatically increases levels of cJUN, presumably due to increased MAPK activity, and is expected to increase protumor gene expression. Additional discussion is needed to clarify the significance of the findings, which are a bit confusing.

      This is indeed true. However, considering the complexity of eIF3, the largest initiation factor among all, as well as the broad portfolio of its functions, it is perhaps not so surprising that the observed effects are complex and may seem even contradictory in respect to cancer. To acknowledge that, we expanded the corresponding part of discussion as follows: “Here, we demonstrate that alterations in the eIF3 subunit stoichiometry and/or eIF3 subcomplexes have distinct effects on the translatome; for example, they affect factors that play a prominent (either positive or negative) role in cancer biology (e.g., MDM2 and cJUN), but the resulting impact is unclear so far. Considering the complex interactions between these factors as well as the complexity of the eIF3 complex per se, future studies are required to delineate the specific oncogenic and tumor suppressive pathways that play a predominant role in mediating the effects of perturbations in the eIF3 complex in the context of neoplasia.”

      (2) There are places in the text where the authors refer to changes in transcriptional control when RNA levels differ, but transcription versus RNA turnover wasn't tested, e.g. page 16 and Figure S10, qPCR does not confirm "transcriptional upregulation in all three knockdowns" and page 19 "despite apparent compensatory mechanisms that increase their transcription."

      This is indeed true, the sentences in question were corrected. The term “increased mRNA levels” was used instead of transcriptional upregulation (increased mRNA stabilization is also possible).

      (3) Similarly, the authors suggest that steady-state LARP1 protein levels are unaffected based on ribosome footprint counts (page 21). It is incorrect to assume this, because ribosome footprints can be elevated due to stalling on RNA that isn't being translated and doesn't yield more protein, and because levels of translated RNA/synthesized proteins do not always reflect steady-state protein levels, especially in mutants that could affect lysosome levels and protein turnover. Also page 12, 1st paragraph suggests protein production is down when ribosome footprints are changed.

      Yes, we are well-aware of this known limitation of Ribo-seq analysis. Therefore, the steadystate protein levels of our key hits were verified by western blotting. In addition, we have removed the sentence about LARP1 because it was based on Ribo-Seq data only without experimental evaluation of the steady-state LARP1 protein levels.

      (4) The translation buffering effect is not clear in some Figures, e.g. S6, S8, 8A, and B. The authors show a scheme for translationally buffered RNAs being clustered in the upper right and lower left quadrants in S4H (translation up with transcript level down and v.v.), but in the FP versus RNA plots, the non-TOP RNAs and 4E-P-regulated RNAs don't show this behavior, and appear to show a similar distribution to the global changes. Some of the right panels in these figures show modest shifts, but it's not clear how these were determined to be significant. More information is needed to clarify, or a different presentation, such as displaying the RNA subsets in the left panels with heat map coloring to reveal whether RNAs show the buffered translation pattern defined in purple in Figure S4H, or by reporting a statistical parameter or number of RNAs that show behavior out of total for significance. Currently the conclusion that these RNAs are translationally buffered seems subjective since there are clearly many RNAs that don't show changes, or show translation-only or RNA-only changes.

      We would like to clarify that S4H does not indicate a necessity for changes in FPs in the buffered subsets. Although opposing changes in total mRNA and FPs are classified as buffering, often we also consider the scenario where there are changes to the total mRNA levels not accompanied by changes in ribosome association.

      In figure S6, the scatterplots indicate a high density of genes shifted towards negative fold changes on the x-axis (total mRNA). This is also reflected in the empirical cumulative distribution functions (ecdfs) for the log2 fold changes in total mRNA in the far right panels of A and B, and the lack of changes in log2 fold change for FPs (middle panels). Similarly, in figure S8, the scatterplots indicate a density of genes shifted towards positive fold changes on the x-axis for total mRNA. The ecdfs also demonstrate that there is a significant directional shift in log2 fold changes in the total mRNA that is not present to a similar degree in the FPs, consistent with translational offsetting. It is rightly pointed out that not all genes in these sets follow the same pattern of regulation. We have revised the title of Supplementary Figure S6 (now S7) to reflect this. However, we would like to emphasize that these figures are not intended to communicate that all genes within these sets of interest are regulated in the same manner, but rather that when considered as a whole, the predominant effect seen is that of translational offsetting (directional shifts in the log2 fold change distribution of total mRNA that are not accompanied by similar shifts in FP mRNA log2 fold changes).

      The significance of these differences was determined by comparing the ecdfs of the log2 fold changes for the genes belonging to a particular set (e.g. non-TOP mTOR-sensitive, p-eIF4E-sensitive) against all other expressed genes (background) using a Wilcoxan rank sum test. This allows identification of significant shifts in the distributions that have a clear directionality (if there is an overall increase, or decrease in fold changes of FPs or total mRNA compared to background). If log2 fold changes are different from background, but without a clear directionality (equally likely to be increased or decreased), the test will not yield a significant result. This approach allows assessment of the overall behavior of gene signatures within a given dataset in a manner that is completely threshold-independent, such that it does not rely on classification of genes into different regulatory categories (translation only, buffering, etc.) based on significance or fold-change cut-offs (as in S4H). Therefore, we believe that this unbiased approach is well-suited for identifying cases when there are many genes that follow similar patterns of regulation within a given dataset.

      (5) Page 10-"These results suggest that eIF3h depletion impacts the translatome differentially than depletion of eIF3e or eIF3d" ...These results suggest that eIF3h has less impact on the translatome, not that it does so differently. If it were changing translation by a different mechanism, I would not expect it to cluster with control.

      This sentence was rewritten as follows: “The PCA plot and hierarchical clustering (Figure 2A and Supplementary Figure 4A) showed clustering of the samples into two main groups: RiboSeq and RNA-seq, and also into two subgroups; NT and eIF3hKD samples clustered on one side and eIF3eKD and eIF3dKD samples on the other. These results suggest that the eIF3h depletion has a much milder impact on the translatome than depletion of eIF3e or eIF3d, which agrees with the growth phenotype and polysome profile analyses (Supplementary Figure 1A and 1D).”

      Other minor issues:

      (1) There are some typos: Figure 2 leves, Figure 4 variou,

      Corrected.

      (2) Figure 3, font for genes on volcano plot too small

      Yes, maybe, however the resolution of this image is high enough to enlarge a certain part of it at will. In our opinion, a larger font would take up too much space, which would reduce the informativeness of this graph.

      (3) Figure S5, highlighting isn't defined.

      The figure legend for S5A (now S6A) states: “Less significant terms ranking 11 and below are in grey. Terms specifically discussed in the main text are highlighted in green.” Perhaps it was overlooked by this reviewer.

      (4) At several points the authors refer to "the MAPK signaling pathway", suggesting there is a single MAPK that is affected, e.g in the title, page 3, and other places when it seems they mean "MAPK signaling pathways" since several MAPK pathways appear to be affected.

      We apologize for any terminological inaccuracies. There are indeed several MAPK pathways operating in cells. In our study, we focused mainly on the MAPK/ERK pathway. The confusion probably stems from the fact that the corresponding term in the KEGG pathway database is labeled "MAPK signaling pathway" and this term, although singular, includes all MAPK pathways. We have carefully reviewed the entire article and have corrected the term used accordingly to either: 1) MAPK pathways in general, 2) the MAPK/ERK pathway for this particular pathway, or 3) "MAPK signaling pathway", where the KEGG term is meant.

      (5) Some eIF3 subunit RNAs have TOP motifs. One might expect 3e and 3h levels to change as a function of 3d knockdown due to TOP motifs but this is not observed. Can the authors speculate why the eIF3 subunit levels don't change but other TOP RNAs show TE changes? Is this true for other translation factors, or just for eIF3, or just for these subunits? Could the Western blot be out of linear range for the antibody or is there feedback affecting eIF3 levels differently than the other TOP RNAs, or a protein turnover mechanism to maintain eIF3 levels?

      This is indeed a very interesting question. In addition to the mRNAs encoding ribosomal proteins, we examined all TOP mRNAs and added an additional sheet to the S2 supplemental spreadsheet with all TOP RNAs listed in (Philippe et al., 2020, PMID: 32094190). According to our Ribo-Seq data, we could expect to see increased protein levels of eIF3a and eIF3f in eIF3dKD and eIF3eKD, but this is not the case, as judged from extensive western blot analysis performed in (Wagner et. al 2016, PMID: 27924037). Indeed, we cannot rule out the involvement of a compensatory mechanism monitoring and maintaining the levels of eIF3 subunits at steady-state – increasing or decreasing them if necessary, which could depend on the TOP motif-mediated regulation. However, we think that in our KDs, all non-targeted subunits that lose their direct binding partner in eIF3 due to siRNA treatment become rapidly degraded. For example, co-downregulation of subunits d, k and l in eIF3eKD is very likely caused by protein degradation as a result of a loss of their direct binding partner – eIF3e. Since we showed that the yeast eIF3 complex assembles co-translationally (Wagner et. al 2020, PMID: 32589964), and there is no reason to think that mammalian eIF3 differs in this regard, our working hypothesis is that free subunits that are not promptly incorporated into the eIF3 complex are rapidly degraded, and the presence or absence of the TOP motif in the 5’ UTR of their mRNAs has no effect. As for the other TOP mRNAs, translation factors eEF1B2, eEF1D, eEF1G, eEF2 have significantly increased FPs in both eIF3dKD and eIF3eKD, but we did not check their protein levels by western blotting to conclude anything specific.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The detailed, thorough critique provided by the three reviewers is very much appreciated. We believe the manuscript is greatly improved by the changes we have made based on those reviews. The major changes are described below, followed by a point by point response.

      Major Changes:

      (1) We revised our model (old Fig. 10; new Fig. 9) to keep the explanation focused on the data shown in the current study. Specifically, references to GTP/GDP states of Rab3A and changes in the presynaptic quantum have been removed and the mechanisms depicted are confined to pre- or post-synaptic Rab3A participating in either controlling release of a trophic factor that regulates surface GluA2 receptors (pre- or postsynaptic) or directly affecting fusion of GluA2-receptor containing vesicles (postsynaptic).

      (2) We replaced all cumulative density function plots and ratio plots, based on multiple quantile samples per cell, with box plots of cell means. This affects new Figures 1, 2, 3, 5, 6, 7 and 8. All references to “scaling,” “divergent scaling,” or “uniform scaling,” have been removed. New p values for comparison of means are provided above every box plot in Figures 1, 2, 3, 5, 6, 7 and 8. The number of cultures is provided in the figure legends.

      (3) We have added frequency to Figures 1, 2 and 8. Frequency values overall are more variable, and the effect of activity blockade less robust, than for mEPSC amplitudes. We have added text indicating that the increase in frequency after activity blockade was significant in neurons from cultures prepared from WT in the Rab3A+/- colony but not cultures prepared from KO mice (Results, lines 143 to 147, new Fig. 1G. H). The TTX-induced increase in frequency was significant in the NASPM experiments before NASPM, but not after NASPM (Results, lines 231 to 233, new Fig. 3, also cultures from WT in Rab3A+/- colony). The homeostatic plasticity effect on frequency did not reach significance in WT on WT glia cultures or

      WT on KO glia cultures, possibly due to the variability of frequency, combined with smaller sample sizes (Results, lines 400 to 403, new Fig. 8). In the cultures prepared from WT mice in the Rab3A+/Ebd colony, there was a trend towards higher frequency after TTX that did not reach statistical significance, and in cultures prepared from mutant mice, the p value was large, suggesting disruption of the effect, which appears to be due to an increase in frequency in untreated cultures, similar to the behavior of mEPSC amplitudes in neurons from mutant mice (Results, lines 161-167). In sum, the effect of activity on frequency requires Rab3A and Ca2+-permeable receptors, and is mimicked by the presence of the Rab3A Earlybird mutant. We have also added a discussion of these results (Discussion, lines 427-435). 

      (4) In the revised manuscript we have added analysis of VGLUT1 levels for the same synaptic sites that we previously analyzed GluA2 levels, and these data are described in Results, lines 344 to 371, and appear in new Table 2. In contrast to previous studies, we did not find any evidence for an increase in VGLUT1 levels after activity blockade. We reviewed those studies to determine whether there might be differences in the experimental details that could explain the lack of effect we observed. In (De Gois et al., 2005), the authors measured mRNA and performed western blots to show increases in VGLUT1 after TTX treatment in older rat cortical cultures (DIV 19). The study performs immunofluorescence imaging of VGLUT1 but only after bicuculline treatment (it decreases), not after TTX treatment. In (Wilson et al.,

      2005), the hippocampal cultures are treated with AP5, not TTX, and the VGLUT1 levels in immunofluorescence images are reported relative to synapsin I. That the type of activity blockade matters is illustrated by the failure of Wilson and colleagues to observe a consistent increase in VGLUT1/Synapsin ratio in cultures treated with AMPA receptor blockade (NBQX; supplementary information). These points have been added to the Discussion, lines 436 to 447.)

      Reviewer #1:

      (1) (model…is not supported by the data), (2) (The analysis of mEPSC data using quantile sampling…), (3) (…statistical analysis of CDFs suffers from n-inflation…), (4) (How does recording noise and the mEPSC amplitude threshold affect “divergent scaling?”) (5) (…justification for the line fits of the ratio data…), (7) (A comparison of p-values between conditions….) and (10) (Was VGLUT intensity altered in the stainings presented in the manuscript?)

      The major changes we made, described above, address Reviewer #1’s points. The remaining points are addressed below.

      (6) TTX application induces a significant increase in mEPSC amplitude in Rab3A-/- mice in two out of three data sets (Figs. 1 and 9). Hence, the major conclusion that Rab3A is required for homeostatic scaling is only partially supported by the data. 

      The p values based on CDF comparisons were problematic, but the point we were making is that they were much larger for amplitudes measured in cultures prepared from Rab3A-/- mice (Fig. 1, p = 0.04) compared to those from cultures prepared from Rab3A+/+ mice (Fig. 1, p = 4.6 * 10-4). Now that we are comparing means, there are no significant TTX-induced effects on mEPSC amplitudes for Rab3A-/- data. However, acknowledging that some increase after activity blockade remains, we describe homeostatic plasticity as being impaired or not significant, rather than abolished, by loss of Rab3A, (Abstract, lines 37 to 39; Results, lines 141 to 143; Discussion, lines 415 to 418).

      (8) There is a significant increase in baseline mEPSC amplitude in Rab3AEbd/Ebd (15 pA) vs. Rab3AEbd/+ (11 pA) cultures, but not in Rab3A-/- (13.6 pA) vs. Rab3A+/- (13.9 pA). Although the nature of scaling was different between Rab3AEbd/Ebd vs. Rab3AEbd/+ and Rab3AEbd/Ebd with vs. without TTX, the question arises whether the increase in mEPSC amplitude in Rab3AEbd/Ebd is Rab3A dependent. Could a Rab3A independent mechanism occlude scaling?

      The Reviewer is concerned that the increase in mEPSC amplitude in the presence of the Rab3A point mutant may be through a ‘non-Rab3A’ mechanism (a concern raised by the lack of such effect in cultures from the Rab3A-/- mice), and secondly, that the already large mEPSC cannot be further increased by the homeostatic plasticity mechanism. It must always be considered that a mutant with an altered genetic sequence may bind to novel partners, causing activities that would not be either facilitated or inhibited by the original molecule. We have added this caveat to Results, lines 180 to 186 We added that a number of other manipulations, implicating individual molecules in the homeostatic mechanism, have caused an increase in mEPSC amplitude at baseline, potentially nonspecifically occluding the ability of activity blockade to induce a further increase (Results lines 186 to 189). Still, it is a strong coincidence that the novel activity of the mutant Rab3A would affect mEPSC amplitude, the same characteristic that is affected by activity blockade in a Rab3A dependent manner, a point which we added to Results, lines 189 to 191.

      (9) Figure 4: NASPM appears to have a stronger effect on mEPSC frequency in the TTX condition vs. control (-40% vs -15%). A larger sample size might be necessary to draw definitive conclusions on the contribution of Ca2+-permeable AMPARs.

      Our results, even with the modest sample size of 11 cells, are clear: NASPM does not disrupt the effect of TTX treatment on mEPSC amplitude (new Fig. 3A). It also looks like there is a greater magnitude effect of NAPSM on frequency in TTX-treated cells; we note this, but point out that nevertheless, these mEPSCs are not contributing to the increase in mEPSC amplitude (Results, lines 238-241). 

      (11) The change in GluA2 area or fluorescence intensity upon TTX treatment in controls is modest. How does the GluA2 integral change?

      We had reported that GluA2 area showed the most prominent increase following activity blockade, with intensity changing very little. When we examined the integral, it closely matched the change in area. We have added the values for integral to new Fig. 5 D, H; new Fig. 6 A-C; new Fig. 7 A-C and new Table 1 (for GluA2) and new Table 2 (for VGLUT1). These results are described in the text in the following places: Results, lines 289-292; 298-299; 311-319; 328-324). For VGLUT1, both area and intensity changed modestly, and the integral appeared to be a combination of the two, being higher in magnitude and resulting in smaller p values than either area or intensity (Results, lines 344-348; 353-359; new Table 2).

      (12) The quantitative comparison between physiology and microscopy data is problematic. The authors report a mismatch in ratio values between the smallest mEPSC amplitudes and the smallest GluA2 receptor cluster sizes (l. 464; Figure 8). Is this comparison affected by the fluorescence intensity threshold? What was the rationale for a threshold of 400 a.u. or 450 a.u.? How does this threshold compare to the mEPSC threshold of 3 pA.

      This concern is partially addressed by no longer comparing the rank ordered mEPSC amplitudes with the rank ordered GluA2 receptor characteristics. We had used multiple thresholds in the event that an experiment was not analyzable with the chosen threshold (this in fact happened for VGLUT1, see end of this paragraph). We created box plots of the mean GluA2 receptor cluster size, intensity and integral, for experiments in which we used all three thresholds, to determine if the effect of activity blockade was different depending on which threshold was applied, and found that there was no obvious difference in the results (Author response image 1). Nevertheless, since there is no need to use a different threshold for any of the 6 experiments (3 WT and 3KO), for new Figures 5, 6 and 7 we used the same threshold for all data, 450; described in Methods, lines 746 to 749. For VGLUT1 levels, it was necessary to use a different threshold for Rab3A+/+ Culture #1 (400), but a threshold of 200 for the other five experiments (Methods, lines 751-757). The VGLUT1 immunofluorescent sites in Culture #1 had higher levels overall, and the low threshold caused the entire AOI to be counted as the synapse, which clearly included background levels outside of the synaptic site. Conversely, to use a threshold of 400 on the other experiments meant that the synaptic site found by the automated measurement tool was much smaller that what was visible by eye. In our judgement it would have been meaningless to adhere to a single threshold for VGLUT1 data.

      Author response image 1.

      Using different thresholds does not substantially alter GluA2 receptor cluster size data. A) Rab3A+/+ Culture #1, size data for three different thresholds, depicted above each graph. B) Rab3A+/+ Culture #2, size data for three different thresholds, depicted above each graph. Note scale bar in A is different from B, to highlight differences for different thresholds. (Culture #3 was only analyzed with 450 threshold).

      The conclusion that an increase in AMPAR levels is not fully responsible for the observed mEPSC increase is mainly based on the rank-order analysis of GluA2 intensity, yielding a slope of ~0.9. There are several points to consider here: (i) GluA2 fluorescence intensity did increase on average, as did GluA2 cluster size.

      (ii) The increase in GluA2 cluster size is very similar to the increase in mEPSC amplitude (each approx. 1820%). (iii) Are there any reports that fluorescence intensity values are linearly reporting mEPSC amplitudes (in this system)? Antibody labelling efficiency, and false negatives of mEPSC recordings may influence the results. The latter was already noted by the authors.

      Our comparison between mEPSC amplitude and GluA2 receptor cluster characteristics has been reexamined in the revised version using means rather than rank-ordered data in rank-order plots or ratio plots. Importantly, all of these methods revealed that in one out of three WT cultures (Culture #3) GluA2 receptor cluster size (old Fig. 8, old Table 1; new Fig. 6, new Table 1), intensity and integral (new Fig. 6, new Table 1) values decreased following activity blockade while in the same culture, mEPSC amplitudes increased. It is based on this lack of correspondence that we conclude that increases in mEPSC amplitude are not fully explained by increases in GluA2 receptors, and suggest there may be other contributors. These points are made in the Abstract (lines 108-110); Results (lines 319 to 326; 330337; 341-343) and the Discussion (lines 472 to 474). To our knowledge, there are not any reports that quantitatively compare receptor levels (area, intensity or integrals) to mEPSC amplitudes in the same cultures. We examined the comparisons very closely for 5 studies that used TTX to block activity and examined receptor levels using confocal imaging at identified synapses (Hou et al., 2008; Ibata et al., 2008; Jakawich et al., 2010a; Xu and Pozzo-Miller, 2017; Dubes et al., 2022). We were specifically looking for whether the receptor data were more variable than the mEPSC amplitude data, as we found. However, for 4 of the studies, sample sizes were very different so that we cannot simply compare the p values. Below is a table of the comparisons.

      Author response table 1.

      In Xu 2017 the sample sizes are close enough that we feel comfortable concluding that the receptor data were slightly more variable (p < 0.05) than mEPSC data (p<0.01) but recognize that it is speculative to say our finding has been confirmed. A discussion of these articles is in Discussion, lines 456-474.

      (iv) It is not entirely clear if their imaging experiments will sample from all synapses. Other AMPAR subtypes than GluA2 could contribute, as could kainite or NMDA receptors.

      While our imaging data only examined GluA2, we used the application of NASPM to demonstrate Ca2+permeable receptors did not contribute quantitatively to the increase in mEPSC amplitude following TTX treatment. Since GluA3 and GluA4 are also Ca2+-permeable, the findings in new Figure 3 (old Fig. 4) likely rule out these receptors as well.  There are also reports that Kainate receptors are Ca2+-permeable and blocked by NASPM (Koike et al., 1997; Sun et al., 2009), suggesting the NASPM experiment also rules out the contribution of Kainate receptors. Finally, given our recording conditions, which included normal magnesium levels in the extracellular solution as well as TTX to block action-potential evoked synaptic transmission, NMDA receptors would not be available to contribute currents to our recordings due to block by magnesium ions at resting Vm. These points have been added to the Methods section, lines 617 to 677 (NMDA); 687-694 (Ca2+-permeable AMPA receptors and Kainate receptors).

      Furthermore, the statement “complete lack of correspondence of TTX/CON ratios” is not supported by the data presented (l. 515ff). First, under the assumption that no scaling occurs in Rab3A-/-, the TTX/CON ratios show a 20-30% change, which indicates the variation of this readout. Second, the two examples shown in Figure 8 for Rab3A+/+ are actually quite similar (culture #1 and #2, particularly when ignoring the leftmost section of the data, which is heavily affected by the raw values approaching zero.

      We are no longer presenting ratio plots in the revised manuscript, so we do not base our conclusion that mEPSC amplitude data is not always corresponding to GluA2 receptor data on the difference in behavior of TTX/CON ratio values, but only on the difference in direction of the TTX effect in one out of three cultures. We agree with the reviewer that the ratio plots are much more sensitive to differences between control and treated values than the rank order plot, and we feel these differences are important, for example, there is still a homeostatic increase in the Rab3A-/- cultures, and the effect is still divergent rather than uniform. But the comparison of ratio data will be presented elsewhere.

      (13) Figure 7A: TTX CDF was shifted to smaller mEPSC amplitude values in Rab3A-/- cultures. How can this be explained?

      While this result is most obvious in CDF plots, we still observe a trend towards smaller mEPSC amplitudes after TTX treatment in two of three individual cultures prepared from Rab3A-/- mice when comparing means (new Fig. 7, Table 1) which did not reach statistical significance for the pooled data (new Fig. 5, new Table 1). There was not any evidence of this decrease in the larger data set (new Fig. 1) nor for Rab3A-/- neurons on Rab3A+/+ glia (new Fig. 8). Given that this effect is not consistent, we did not comment on it in the revised manuscript. It may be that there is a non-Rab3A-dependent mechanism that results in a decrease in mEPSC amplitude after activity blockade, which normally pulls down the magnitude of the activity-dependent increase typically observed. But studying this second component would be difficult given its magnitude and inconsistent presentation.

      Reviewer #1 (Recommendations For the Authors):

      (1) Abstract, last sentence: The conclusion of the present manuscript should be primarily based on the results presented. At present, it is mainly based on a previous publication by the authors.

      We have revised the last sentence to reflect actual findings of the current study (Abstract, lines 47 to 49).

      (2) Line 55: “neurodevelopmental”

      This phrase has been removed.

      (3) Line 56: “AMPAergic” should be replaced by AMPAR-mediated

      This sentence was removed when all references to “scaling” were removed; no other instances of “AMPAergic” are present.

      (4) Figure 9: The use of BioRender should be disclosed in the Figure Legend.

      We used BioRender in new Figures 3, 7 and 8, and now acknowledge BioRender in those figure legends.

      (5) Figure legends and results: The number of cultures should be indicated for each comparison.

      Number of cultures has been added to the figure legends.

      (6) Line 289: A comparison of p-values between conditions does not allow any meaningful conclusions.

      Agreed, therefore we have removed CDFs and the KS test comparison p values. All comparisons in the revised manuscript are for cell means.

      (7) Line 623ff: The argument referring to NMJ data is weak, given that different types of receptors are involved.

      We still think it is valid to point out that Rab3A is required for the increase in mEPC at the NMJ but that ACh receptors do not increase (Discussion, lines 522 to 525). We are not saying that postsynaptic receptors do not contribute in cortical cultures, only that there could be another Rab3A-dependent mechanism that also affects mEPSC amplitude.

      (8) Plotting data points outside of the ranges should be avoided (e.g., Fig. 2Giii, 7F).

      These two figures are no longer present in the revised manuscript. In revising figures, we made sure no other plots have data points outside of the ranges.

      (9) The rationale for investigating Rab3AEbd/Ebd remains elusive and should be described.

      A rationale for investigating Rab3AEbd/Ebd is that if the results are similar to the KO, it strengthens the evidence for Rab3A being involved in homeostatic synaptic plasticity. In addition, since its phenotype of early awakening was stronger than that demonstrated in Rab3A KO mice (Kapfhamer et al., 2002), it was possible we would see a more robust effect. These points have been added to the Results, lines 118 to 126.

      (10) Figures 3 and 4, as well as Figure 5 and 6 could be merged.

      In the revised version, Figure 3 has been eliminated since its main point was a difference in scaling behavior. Figure 4 has been expanded to include a model of how NASPM could reduce frequency (new Fig. 3.) Images of the pyramidal cell body have been added to Figure 5 (new Fig. 4), and Figure 6 has been completely revised and now includes pooled data for both Rab3A+/+ and Rab3A-/- cultures, for mEPSC amplitude, GluA2 receptor cluster size, intensity and integral.

      (11) Figure 5: The legend refers to MAP2, but this is not indicated in the figure.

      MAP2 has now been added to the labels for each image and described in the figure legend (new Fig. 4).

      Reviewer #2:

      Technical concerns:

      (1) The culture condition is questionable. The authors saw no NMDAR current present during spontaneous recordings, which is worrisome since NMDARs should be active in cultures with normal network activity (Watt et al., 2000; Sutton et al., 2006). It is important to ensure there is enough spiking activity before doing any activity manipulation. Similarly it is also unknown whether spiking activity is normal in Rab3AKO/Ebd neurons.

      In the studies cited by the reviewer, NMDA currents were detected under experimental conditions in which magnesium was removed. In our recordings, we have normal magnesium (1.3 mM) and also TTX, which prevents the necessary depolarization to allow inward current through NMDA receptors. This point has been added to our Methods, lines 674 to 677. We acknowledge we do not know the level of spiking in cultures prepared from Rab3A+/+, Rab3A-/- or Rab3A_Ebd/Ebd_ mice. Given the similar mEPSC amplitude for untreated cultures from WT and KO studies, we think it unlikely that activity was low in the latter, but it remains a possibility for untreated cultures from Rab3A_Ebd/Ebd_ mice, where mEPSC amplitude was increased. These points are added to the Methods, lines 615 to 622.

      (2) Selection of mEPSC events is not conducted in an unbiased manner. Manually selecting events is insufficient for cumulative distribution analysis, where small biases could skew the entire distribution. Since the authors claim their ratio plot is a better method to detect the uniformity of scaling than the well-established rank-order plot, it is important to use an unbiased population to substantiate this claim.

      We no longer include any cumulative distributions or ratio plot analysis in the revised version. We have added the following text to Methods, lines 703 to 720:

      “MiniAnalysis selects many false positives with the automated feature when a small threshold amplitude value is employed, due to random fluctuations in noise, so manual re-evaluation of the automated process is necessary to eliminate false positives. If the threshold value is set high, there are few false positives but small amplitude events that visually are clearly mEPSCs are missed, and manual re-evaluation is necessary to add back false negatives or the population ends up biased towards large mEPSC amplitudes. As soon as there is a manual step, bias is introduced. Interestingly, a manual reevaluation step was applied in a recent study that describes their process as ‘unbiased (Wu et al., 2020). In sum, we do not believe it is currently possible to perform a completely unbiased detection process. A fully manual detection process means that the same criterion (“does this look like an mEPSC?”) is applied to all events, not just the false positives, or the false negatives, which prevents the bias from being primarily at one end or the other of the range of mEPSC amplitudes. It is important to note that when performing the MiniAnalysis process, the researcher did not know whether a record was from an untreated cell or a TTX-treated cell.”

      (3) Immunohistochemistry data analysis is problematic. The authors only labeled dendrites without doing cell-fills to look at morphology, so it is questionable how they differentiate branches from pyramidal neurons and interneurons. Since glutamatergic synapse on these two types of neuron scale in the opposite directions, it is crucial to show that only pyramidal neurons are included for analysis.

      We identified neurons with a pyramidal shape and a prominent primary dendrite at 60x magnification without the zoom feature. This should have been made clear in the description of imaging. We have added an image of the two selected cells to our figure of dendrites (old Fig. 5, new Fig. 4), and described this process in the Methods, lines 736 to 739, and Results, lines 246 to 253. Given the morphology of the neurons selected it is highly unlikely that the dendrites we analyzed came from interneurons.

      Conceptual Concerns

      The only novel finding here is the implicated role for Rab3A in synaptic scaling, but insights into mechanisms behind this observation are lacking. The authors claim that Rab3A likely regulates scaling from the presynaptic side, yet there is no direct evidence from data presented. In its current form, this study’s contribution to the field is very limited.

      We have demonstrated that loss of Rab3A and expression of a Rab3A point mutant disrupt homeostatic plasticity of mEPSC amplitudes, and that in the absence of Rab3A, the increase in GluA2 receptors at synaptic sites is abolished. Further, we show that this effect cannot be through release of a factor, like TNFα, from astrocytes. In the new version, we add the finding that VGLUT1 is not increased after activity blockade, ruling out this presynaptic factor as a contributor to homeostatic increases in mEPSC amplitude. We show for the first time by examining mEPSC amplitudes and GluA2 receptors in the same cultures that the increases in GluA2 receptors are not as consistent as the increases in mEPSC amplitude, suggesting the possibility of another contributor to homeostatic increases in mEPSC amplitude. We first proposed this idea in our previous study of Rab3A-dependent homeostatic increases in mEPC amplitudes at the mouse neuromuscular junction. In sum, we dispute that there is only one novel finding and that we have no insights into mechanism. We acknowledge that we have no direct evidence for regulation from the presynaptic side, and have removed this claim from the revised manuscript. We have retained the Discussion of potential mechanisms affecting the presynaptic quantum and evidence that Rab3A is implicated in these mechanisms (vesicle size, fusion pore kinetics; Discussion, lines 537 to 563). One way to directly show that the amount of transmitter released for an mEPSC has been modified after activity blockade is to demonstrate that a fast off-rate antagonist has become less effective at inhibiting mEPSCs (because the increased glutamate released out competes it; see (Liu et al., 1999) and (Wilson et al., 2005) for example experiments). This set of experiments is underway but will take more time than originally expected, because we are finding surprisingly large decreases in frequency, possibly the result of mEPSCs with very low glutamate concentration that are completely inhibited by the dose used. Once mEPSCs are lost, it is difficult to compare the mEPSC amplitude before and after application of the antagonist. Therefore we intend to include this experiment in a future report, once we determine the reason for the frequency reduction, or, can find a dose where this does not occur.

      (1) Their major argument for this is that homeostatic effects on mEPSC amplitudes and GluA2 cluster sizes do not match. This is inconsistent with reports from multiple labs showing that upscaling of mEPSC amplitude and GluA2 accumulation occur side by side during scaling (Ibata et al., 2008; Pozo et al., 2012; Tan et al., 2015; Silva et al., 2019). Further, because the acquisition and quantification methods for mEPSC recordings and immunohistochemistry imaging are entirely different (each with its own limitations in signal detection), it is not convincing that the lack of proportional changes must signify a presynaptic component.

      Within the analyses in the revised manuscript, which are now based only on comparison of cell/dendrite means, we find a very good match in the magnitude of increase for the pooled data of mEPSC amplitudes and GluA2 receptor cluster sizes (+19.7% and +20.0% respectively; new Table 1). However, when looking at individual cultures, we had one of three WT cultures in which mEPSC amplitude increased 17.2% but GluA2 cluster size decreased 9.5%. This result suggests that while activity blockade does lead to an increase in GluA2 receptors after activity blockade, the effect is more variable than that for mEPSC amplitude. We went back to published studies to see if this has been previously observed, but found that it was difficult to compare because the sample sizes were different for the two characteristics (see Author response table 1). We included these particular 5 studies because they use the same treatment (TTX), examine receptors using imaging of identified synaptic sites, and record mEPSCs in their cultures (although the authors do not indicate that imaging and recordings are done simultaneously on the same cultures.) Only one of the studies listed by the Reviewer is in our group (Ibata et al., 2008). The study by (Tan et al., 2015) uses western blots to measure receptors; the study by (Silva et al., 2019) blocks activity using a combination of AMPA and NMDA receptor blockers; the study by (Pozo et al., 2012) correlates mEPSC amplitude changes with imaging but not in response to activity blockade, instead for changing the expression of GluA2. While it may seem like splitting hairs to reject studies that use other treatment protocols, there is ample evidence that the mechanisms of homeostatic plasticity depend on how activity was altered, see the following studies for several examples of this (Sutton et al., 2006; Soden and Chen, 2010; Fong et al., 2015). A discussion of the 5 articles we selected is in the revised manuscript, Discussion, lines 456 to 474. In sum, we provide evidence that activity blockade is associated with an overall increase in GluA2 receptors; what we propose is that this increase, being more variable, does not fully explain the increase in mEPSC amplitude. However, we acknowledge that the disparity could be explained by the differences in limitations of the two methods (Discussion, lines 469-472).

      (2) The authors also speculate in the discussion that presynaptic Rab3A could be interacting with retrograde BDNF signaling to regulate postsynaptic AMPARs. Without data showing Rab3A-dependent presynaptic changes after TTX treatment, this argument is not compelling. In this retrograde pathway, BDNF is synthesized in and released from dendrites (Jakawich et al., 2010b; Thapliyal et al., 2022), and it is entirely possible for postsynaptic Rab3A to interfere with this process cell-autonomously.

      We have added the information that Rab3A could control BDNF from the postsynaptic cell and included the two references provided by the reviewer, Discussion, lines 517 to 518. We have added new evidence, recently published, that the Rab3 family has been shown to regulate targeting of EGF receptors to rafts (among other plasma membrane molecules), with Rab3A itself clearly present in nonneuronal cells (Diaz-Rohrer et al., 2023) (added to Discussion, lines 509 to 515).

      (3) The authors propose that a change in AMPAR subunit composition from GluA2-containing ones to GluA1 homomers may account for the distinct changes in mEPSC amplitudes and GluA2 clusters. However, their data from the NASPM wash-in experiments clearly show that the GluA1 homomer contributions have not changed before and after TTX treatment.

      We have revised this section in the Discussion, lines 534 to 536, to clarify that any change due to GluA1 homomers should have been detectable by a greater ability of NASPM to reverse the TTX-induced increase.

      Reviewer #2 (Recommendations for the Authors):

      For authors to have more convincing arguments in general, they will need to clarify/improve certain details in their data collection by addressing the above technical concerns. Additionally, the authors should design experiments to test whether Rab3A regulates scaling from pre- or post-synaptic site. For example, they could sparsely knock out Rab3A in WT neurons to test the postsynaptic possibility. On the other hand, their argument for a presynaptic role would be much more compelling if they could show whether there are clear functional changes such as in vesicle sizes and release probability in the presynaptic terminal of Rab3AKO neurons.

      An important next step is to identify whether Rab3A is acting pre- or post-synaptically (Discussion, lines 572 to 573), but these experiments will be undertaken in the future. It would not add much to simply show vesicle size is altered in the KO (and we do not necessarily expect this since mEPSC amplitude is normal in the KO). It will be very difficult to establish that vesicle size is changing with activity blockade and that this change is prevented in the Rab3A KO, because we are looking for a ~25% increase in vesicle volume, which would correspond to a ~7.5% increase in diameter. Finally, we do not believe demonstrating changes in release probability tell us anything about a presynaptic role for Rab3A in regulating the size of the presynaptic quantum.

      Reviewer #3 (Public Review)

      Weaknesses: However, the rather strong conclusions on the dissociation of AMPAR trafficking and synaptic response are made from somewhat weaker data. The key issue is the GluA2 immunostaining in comparison with the mEPSC recordings. Their imaging method involves only assessing puncta clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, judging from the sample micrographs (Fig. 5). To my knowledge, this is a new and unvalidated approach that could represent a particular subset of synapses not representative of the synapses contributing to the mEPSC change (they are also sampling different neurons for the two measurements; an additional unknown detail is how far from the cell body were the analyzed dendrites for immunostaining.) While the authors acknowledge that a sampling issue could explain the data, they still use this data to draw strong conclusions about the lack of AMPAR trafficking contribution to the mEPSC amplitude change. This apparent difference may be a methodological issue rather than a biological one, and at this point it is impossible to differentiate these. It will unfortunately be difficult to validate their approach. Perhaps if they were to drive NMDAdependent LTD or chemLTP, and show alignment of the imaging and ephys, that would help. More helpful would be recordings and imaging from the same neurons but this is challenging. Sampling from identified synapses would of course be ideal, perhaps from 2P uncaging combined with SEP-labeled AMPARs, but this is more challenging still. But without data to validate the method, it seems unwarranted to make such strong conclusions such as that AMPAR trafficking does not underlie the increase in mEPSC amplitude, given the previous data supporting such a model.

      In the new version, we soften our conclusion regarding the mismatch between GluA2 receptor levels and mEPSC amplitudes, now only stating that receptors may not be the sole contributor to the TTX effect on mEPSC amplitude (Discussion, lines 472 to 474). With our analysis in the new version focusing on comparisons of cell means, the GluA2 receptor cluster size and the mEPSC amplitude data match well in magnitude for the data pooled across the 3 matched cultures (20.0% and 19.7%, respectively, see new Table 1). However, in one of the three cultures the direction of change for GluA2 receptors is opposite that of mEPSC amplitudes (Table 1, Culture #3, -9.5% vs +17.2%, respectively).

      It is unlikely that the lack of matching of homeostatic plasticity in one culture, but very good matching in two other cultures, can be explained by an unvalidated focus on puncta associated with MAP2 positive dendrites. We chose to restrict analysis of synaptic GluA2 receptors to the primary dendrite in order to reduce variability, reasoning that we are always measuring synapses for an excitatory pyramidal neuron, synapses that are relatively close to the cell body, on the consistently identifiable primary dendrite. We measured how far this was for the two cells depicted in old Figure 5 (new Fig. 4). Because we always used the 5X zoom window which is a set length, and positioned it within ~10 microns of the cell body, these cells give a ball park estimate for the usual distances. For the untreated cell, the average distance from the cell body was 38.5 ± 2.8 µm; for the TTX-treated cell, it was 42.4 ± 3.2 µm (p = 0.35, KruskalWallis test). We have added these values to the Results, lines 270 to 274.

      We did not mean to propose that AMPA receptor levels do not contribute at all to mEPSC amplitude, and we acknowledge there are clear cases where the two characteristics change in parallel (for example, in the study cited by Reviewer #2, (Pozo et al., 2012), increases in GluA2 receptors due to exogenous expression are closely matched by increases in mEPSC amplitudes.) What our matched culture experiments demonstrate is that in the case of TTX treatment, both GluA2 receptors and mEPSC amplitudes increase on average, but sometimes mEPSC amplitudes can increase in the absence of an increase in GluA2 receptors (Culture #3, Rab3A+/+ cultures), and sometimes mEPSC amplitudes do not increase even though GluA2 receptor levels do increase (Culture #3, Rab3A-/- cultures). Therefore, it would not add anything to our argument to examine receptors and mEPSCs in NMDA-dependent LTP, a different plasticity paradigm in which changes in receptors and mEPSCs may more closely align. It has been demonstrated that mEPSCs of widely varying amplitude can be recorded from a single synaptic site (Liu and Tsien, 1995), so we would need to measure a large sample of individual synapse recordings to detect a modest shift in average values due to activity blockade. In addition, it would be essential to express fluorescent AMPA receptors in order to correlate receptor levels in the same cells we record from (or at the same synapses). And yet, even after these heroics, one is still left with the issue that the two methods, electrophysiology and fluorescent imaging, have distinct limitations and sources of variability that may obscure any true quantitative correlation.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a frequency effect that is quite unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. It is also unclear why the authors argue this proves that NASPM was at an effective concentration (lines 399-400). Further, the amplitude data show a strong trend towards smaller amplitude. The p value for both control and TTX neurons was 0.08 – it is very difficult to argue that there is no effect. And the decrease is larger in the TTX neurons. Considering the strong claims for a presynaptic locus and the use of this data to justify only looking at GluA2 by immunostaining, these data do not offer much support of the conclusions. Between the sampling issues and perhaps looking at the wrong GluA subunit, it seems premature to argue that trafficking is not a contributor to the mEPSC amplitude change, especially given the substantial support for that hypothesis. Further, even if trafficking is not the major contributor, there could be shifts in conductance (perhaps due to regulation of auxiliary subunits) that does not necessitate a pre-synaptic locus. While the authors are free to hypothesize such a mechanism, it would be prudent to acknowledge other options and explanations.

      We have created a model cartoon to explain how NASPM could reduce mEPSC frequency (new Fig. 3D). mEPSCs that arise from a synaptic site that has only Ca2+-permeable AMPA receptors will be completely blocked by NASPM, if the NASPM concentration is maximal. The reason we conclude that we have sufficient NASPM reaching the cells is that the frequency is decreased, as expected if there are synaptic sites with only Ca2+-permeable AMPA receptors. We previously were not clear that there is an effect of NASPM on mEPSC amplitude, although it did not reach statistical significance (new Fig. 3B). Where there is no effect is on the TTX-induced increase in mEPSC amplitude, which remains after the acute NASPM application (new Fig. 3A). We have revised the description of these findings in Results, lines 220 to 241. In reviewing the literature further, we could find no previous studies demonstrating an increase in conductance in GluA2 or Ca2+-impermeable receptors, only in GluA1 homomers. In other words, any conductance change would have been due to a change in GluA1 homomers, and should have been visible as a disruption of the homeostatic plasticity by NASPM application. We have added text to Results, lines 211 to 217; 236-241; Discussion, lines 420 to 422; 526-536 and Methods, lines 685 to 695 regarding this point.

      The frequency data are missing from the paper, with the exception of the NASPM dataset. The mEPSC frequencies should be reported for all experiments, particularly given that Rab3A is generally viewed as a pre-synaptic protein regulating release. Also, in the NASPM experiments, the average frequency is much higher in the TTX treated cultures. Is this statistically above control values?

      This comment is addressed by the major change #3, above.

      Unaddressed issues that would greatly increase the impact of the paper:

      (1) Is Rab3A activity pre-synaptically, post-synaptically or both. The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where is it acting (pre or post) would aid substantially in understanding its role (and particularly the hypothesized and somewhat novel idea that the amount of glutamate released per vesicle is altered in HSP). They could use sparse knockdown of Rab3A, or simply mix cultures from KO and WT mice (with appropriate tags/labels). The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. The more support for their suggestion of a pre-synaptic site of control, the better.

      This is similar to the request of Reviewer #2, Recommendations to the Authors. An important next step is to identify whether Rab3A is working pre- or postsynaptically. However, it is possible that it is acting pre-synaptically to anterogradely regulate trafficking of AMPAR, as we have depicted in our model, new Fig. 9. To demonstrate that the presynaptic quantum is being altered, we would need to show that vesicle size is increased, or the amount of transmitter being released during an mEPSC is increased after activity blockade. To that end, we are currently performing experiments using a fast off-rate antagonist. As described above in response to Reviewer #2’s Conceptual Concerns, we find dramatic decreases in frequency not explained by the 30-60% inhibition observed for the largest amplitude mEPSCs, which suggests the possibility that small mEPSCs are more sensitive than large mEPSCs and therefore may have less transmitter. Due to these complexities and the delay while we test other antagonists to see if the effect is specific to fast-off rate antagonists, we are not including these results here.

      (2) Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs and/or a decrease of GABA-packaging in vesicles (ie the opposite of whatever is happening at excitatory synapses.). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling, an effect only at excitatory synapses would argue for a more specific role just at these synapses.

      It will be important to determine if homeostatic synaptic plasticity at inhibitory synapses on excitatory neurons is sensitive to Rab3A deletion, especially in light of the fact that unlike many of the other molecules implicated in homeostatic increases in mEPSCS, Rab3A is not a molecule known to be selective for glutamate receptor trafficking (in contrast to Arc/Arg3.1 or GRIP1, for example). Such a study would warrant its own publication.

      Reviewer #3 (Recommendations for the Authors):

      There are a number of minor points or suggestions for the authors:

      Is RIM1 part of this pathway (or expected to be)? Some discussion of this would be nice.

      RIM, Rab3-interacting molecule, has been implicated at the drosophila neuromuscular junction in a presynaptic form of homeostatic synaptic plasticity in which evoked release is increased after block of postsynaptic receptors (Muller et al., 2012), a plasticity that also requires Rab3-GAP (Muller et al., 2011). To our knowledge there is no evidence that RIM is involved in the homeostatic plasticity of mEPSC amplitude after activity blockade by TTX. The Rim1a KO does not have a change in mEPSC amplitude relative to WT (Calakos et al., 2004), but that is not unexpected given the normal mEPSC amplitude in neurons from cultures prepared from Rab3A-/- mice in the current study. It would be interesting to look at homeostatic plasticity in cortical cultures prepared from Rim1a or other RIM deletion mice, but we have not added these points to the revised manuscript since there are a number of directions one could go in attempting to define the molecular pathway and we feel it is more important to discuss the potential location of action and physiological mechanisms.

      Is the Earlybird mutation a GOF? More information about this mutation would help.

      We have added a description of how the Earlybird mutation was identified, in a screen for rest:activity mutants (Results, lines 118 to 123). Rab3A Earlybird mice have a shortened circadian period, shifting their wake cycle earlier and earlier. When Rab3A deletion mice were tested in the same activity raster plot measurements, the shift was smaller than that for the Earlybird mutant, suggesting the possibility that it is a dominant negative mutation.

      The high K used in the NASPM experiments seems a bit unusual. Have the authors done high K/no drug controls to see if this affects the synapses in any way?

      We used the high K based on previous studies that indicated the blocking effect of the Ca2+-permeable receptor blockers was use dependent (Herlitze et al., 1993; Iino et al., 1996; Koike et al., 1997). We reasoned that a modest depolarization would increase the frequency of AMPA receptor mEPSCs and allow access of the NASPM.  We have added this point to the Methods, lines 695 to 708. 

      The NASPM experiments do not show that GluA1 does not contribute (line 401), only that GluA1 homomers are not contributing (much – see above). GluA1/A2 heteromers are quite likely involved. Also, the SEM is missing from the WT pre/post NASPM data.

      Imaging of GluA2-positive sites will not distinguish between GluA2 homomers and GluA2-GluA1 heteromers, so we have added this clarification to Results, lines 242 to 246. We have remade the NASPM pre-post line plots so that the mean values and error bars are more visible (new Fig. 3B, C).

      It seems odd to speculate based on non-significant findings (line 650-1), with lower significance (p = 0.11) than findings being dismissed in the paper (NASPM on mEPSC amplitude; p = 0.08).

      We did not mean to dismiss the effect of NASPM on mEPSC amplitude (new Fig. 3B), rather, we dismiss the effect of NASPM on the homeostatic increase in mEPSC amplitude caused by TTX treatment (new Fig. 3A). We have emphasized this distinction in Results, lines 223 to 225, and Discussion, lines 420 to 422, as well as adding that the stronger effect of NASPM on frequency after TTX treatment suggests an activity-dependent increase in the number of synapses expressing only Ca2+ permeable homomers (Results, lines 236 to 241; Discussion, lines 431 to 435).

      Fig. 4 could be labeled better (to make it clear that B is amplitude and C is freq from the same cells).

      Fig. 4 has been revised—now the amplitude and frequency plots from the same condition (new Fig. 3, B, C; CON or TTX) are in a vertical line and the figure legend states that the frequency data are from the same cells as in Fig. 3A.

      The raw amplitude data seems a bit hidden in the inset panels – I would suggest these data are at least as important as the cumulative distributions in the main panel. Maybe re-organizing the figures would help.

      We have removed all cumulative distributions, rank order plots, and ratio plots. The box plots are now full size in new Figures 1, 2, 5, 6, 7 and 8.

      I’m not sure I would argue in the paper that 12 cells a day is a limiting issue for experiments. It doesn’t add anything and doesn’t seem like that high a barrier. It is fine to just say it is difficult and therefore there is a limited amount of data meeting the criteria.

      We have removed the comment regarding difficulty.

      Calakos N, Schoch S, Sudhof TC, Malenka RC (2004) Multiple roles for the active zone protein RIM1alpha in late stages of neurotransmitter release. Neuron 42:889-896.

      De Gois S, Schafer MK, Defamie N, Chen C, Ricci A, Weihe E, Varoqui H, Erickson JD (2005) Homeostatic scaling of vesicular glutamate and GABA transporter expression in rat neocortical circuits. J Neurosci 25:7121-7133.

      Diaz-Rohrer B, Castello-Serrano I, Chan SH, Wang HY, Shurer CR, Levental KR, Levental I (2023) Rab3 mediates a pathway for endocytic sorting and plasma membrane recycling of ordered microdomains. Proc Natl Acad Sci U S A 120:e2207461120.

      Dubes S, Soula A, Benquet S, Tessier B, Poujol C, Favereaux A, Thoumine O, Letellier M (2022) miR-124dependent tagging of synapses by synaptopodin enables input-specific homeostatic plasticity. EMBO J 41:e109012.

      Fong MF, Newman JP, Potter SM, Wenner P (2015) Upward synaptic scaling is dependent on neurotransmission rather than spiking. Nat Commun 6:6339.

      Herlitze S, Raditsch M, Ruppersberg JP, Jahn W, Monyer H, Schoepfer R, Witzemann V (1993) Argiotoxin detects molecular differences in AMPA receptor channels. Neuron 10:1131-1140.

      Hou Q, Zhang D, Jarzylo L, Huganir RL, Man HY (2008) Homeostatic regulation of AMPA receptor expression at single hippocampal synapses. Proc Natl Acad Sci U S A 105:775-780.

      Ibata K, Sun Q, Turrigiano GG (2008) Rapid synaptic scaling induced by changes in postsynaptic firing. Neuron 57:819-826.

      Iino M, Koike M, Isa T, Ozawa S (1996) Voltage-dependent blockage of Ca(2+)-permeable AMPA receptors by joro spider toxin in cultured rat hippocampal neurones. J Physiol 496 ( Pt 2):431437.

      Jakawich SK, Neely RM, Djakovic SN, Patrick GN, Sutton MA (2010a) An essential postsynaptic role for the ubiquitin proteasome system in slow homeostatic synaptic plasticity in cultured hippocampal neurons. Neuroscience 171:1016-1031.

      Jakawich SK, Nasser HB, Strong MJ, McCartney AJ, Perez AS, Rakesh N, Carruthers CJ, Sutton MA (2010b) Local presynaptic activity gates homeostatic changes in presynaptic function driven by dendritic BDNF synthesis. Neuron 68:1143-1158.

      Kapfhamer D, Valladares O, Sun Y, Nolan PM, Rux JJ, Arnold SE, Veasey SC, Bucan M (2002) Mutations in Rab3a alter circadian period and homeostatic response to sleep loss in the mouse. Nat Genet 32:290-295.

      Koike M, Iino M, Ozawa S (1997) Blocking effect of 1-naphthyl acetyl spermine on Ca(2+)-permeable AMPA receptors in cultured rat hippocampal neurons. Neurosci Res 29:27-36.

      Liu G, Tsien RW (1995) Properties of synaptic transmission at single hippocampal synaptic boutons. Nature 375:404-408.

      Liu G, Choi S, Tsien RW (1999) Variability of neurotransmitter concentration and nonsaturation of postsynaptic AMPA receptors at synapses in hippocampal cultures and slices. Neuron 22:395409.

      Muller M, Pym EC, Tong A, Davis GW (2011) Rab3-GAP controls the progression of synaptic homeostasis at a late stage of vesicle release. Neuron 69:749-762.

      Muller M, Liu KS, Sigrist SJ, Davis GW (2012) RIM controls homeostatic plasticity through modulation of the readily-releasable vesicle pool. J Neurosci 32:16574-16585.

      Pozo K, Cingolani LA, Bassani S, Laurent F, Passafaro M, Goda Y (2012) beta3 integrin interacts directly with GluA2 AMPA receptor subunit and regulates AMPA receptor expression in hippocampal neurons. Proc Natl Acad Sci U S A 109:1323-1328.

      Silva MM, Rodrigues B, Fernandes J, Santos SD, Carreto L, Santos MAS, Pinheiro P, Carvalho AL (2019) MicroRNA-186-5p controls GluA2 surface expression and synaptic scaling in hippocampal neurons. Proc Natl Acad Sci U S A 116:5727-5736.

      Soden ME, Chen L (2010) Fragile X protein FMRP is required for homeostatic plasticity and regulation of synaptic strength by retinoic acid. J Neurosci 30:16910-16921.

      Sun HY, Bartley AF, Dobrunz LE (2009) Calcium-permeable presynaptic kainate receptors involved in excitatory short-term facilitation onto somatostatin interneurons during natural stimulus patterns. J Neurophysiol 101:1043-1055.

      Sutton MA, Ito HT, Cressy P, Kempf C, Woo JC, Schuman EM (2006) Miniature neurotransmission stabilizes synaptic function via tonic suppression of local dendritic protein synthesis. Cell 125:785-799.

      Tan HL, Queenan BN, Huganir RL (2015) GRIP1 is required for homeostatic regulation of AMPAR trafficking. Proc Natl Acad Sci U S A 112:10026-10031.

      Thapliyal S, Arendt KL, Lau AG, Chen L (2022) Retinoic acid-gated BDNF synthesis in neuronal dendrites drives presynaptic homeostatic plasticity. Elife 11.

      Wilson NR, Kang J, Hueske EV, Leung T, Varoqui H, Murnick JG, Erickson JD, Liu G (2005) Presynaptic regulation of quantal size by the vesicular glutamate transporter VGLUT1. J Neurosci 25:62216234.

      Wu YK, Hengen KB, Turrigiano GG, Gjorgjieva J (2020) Homeostatic mechanisms regulate distinct aspects of cortical circuit dynamics. Proc Natl Acad Sci U S A 117:24514-24525.

      Xu X, Pozzo-Miller L (2017) EEA1 restores homeostatic synaptic plasticity in hippocampal neurons from Rett syndrome mice. J Physiol 595:5699-5712.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study enhances our understanding of the effects of landscape context on grassland plant diversity and biomass. Notably, the authors use a well-designed field sampling method to separate the effects of habitat loss and fragmentation per se. Most of the data and analyses provide solid support for the findings that habitat loss weakens the positive relationship between grassland plant richness and biomass.

      Response: Thanks very much for organizing the review of the manuscript. We are grateful to you for the recognition. We have carefully analyzed all comments of the editors and reviewers and revised our manuscript to address them. All comments and recommendations are helpfully and constructive for improving our manuscript. We have described in detail our response to each of comment below.

      In addition to the reviewers' assessments, we have the following comments on your paper.

      (1) Some of the results are not consistent between figures. The relationships between overall species richness and fragmentation per se are not consistent between Figs. 3 and 5. The relationships between aboveground biomass and habitat loss are not consistent between Figs. 4 and 5. How shall we interpret these inconsistent results?

      Response: Thanks for your insightful comments. The reason for these inconsistencies is that the linear regression model did not take into account the complex causal relationships (including direct and indirect effects) among the different influencing factors. The results in Figures 3 and 4 just represent the pairwise relationship pattern and relative importance, respectively. The causal effects of habitat loss and fragmentation per se on plant richness and above-ground biomass should be interpreted based on the structural equation model results (Figure 6). We have revised the data analysis to clear these inconsistent results. Line 225-228

      In the revised manuscript, we have added the interpretation for these inconsistent results. The inconsistent effects between Figures 3 and 6 suggest that fragmentation per se actually had a positive effect on plant richness after accounting for the effects of habitat loss and environmental factors simultaneously.

      The inconsistent effects between Figures 4 and 6 are because the effects of habitat loss and fragmentation per se on above-ground biomass were mainly mediated by plant richness and environmental factors, which had no significant direct effect (Figure 6). Thus, habitat loss and fragmentation per se showed no significant relative effects on above-ground biomass after controlling the effects of plant richness and environmental factors (Figure 4).

      (2) One of the fragmentation indices, mean patch area metric, seems to be more appropriate as a measure of habitat loss, because it represents "a decrease in grassland patch area in the landscape".

      Response: Thanks for your insightful comments. We apologize for causing this confusion. The mean patch area metric in our study represents the mean size of grassland patches in the landscape for a given grassland amount. Previous studies have often used the mean patch metric as a measure of fragmentation, which can reflect the processes of local extinction in the landscape (Fahrig, 2003; Fletcher et al., 2018). We have revised the definition of the mean patch area metric and added its ecological implication in the revised manuscript to clarify this confusion.

      (3) It is important to show both the mean and 95% CI (or standard error) of the slope coefficients regarding to Figs. 3 and 6.

      Response: Thanks for your suggestions. We have added the 95% confidence intervals to the Figure 3 and Figure 6 in the revised manuscript.

      (4) It would be great to clarify what patch-level and landscape-level studies are in lines 302-306. Note that this study assesses the effects of landscape context on patch-level variables (i.e., plot-based plant richness and plot-based grassland biomass) rather than landscape-level variables (i.e., the average or total amount of biomass in a landscape).

      Response: Thanks for your insightful comment. We agree with your point that our study investigated the effect of fragmented landscape context (habitat loss and fragmentation per se) on plot-based plant richness and plot-based above-ground biomass rather than landscape-level variables.

      Therefore, we no longer discussed the differences between the patch-level and landscape-level studies here, instead focusing on the different ecological impacts of habitat loss and fragmentation per se in the revised manuscript.

      Line 369-374:

      “Although habitat loss and fragmentation per se are generally highly associated in natural landscapes, they are distinct ecological processes that determine decisions on effective conservation strategies (Fahrig, 2017; Valente et al., 2023). Our study evaluated the effects of habitat loss and fragmentation per se on grassland plant diversity and above-ground productivity in the context of fragmented landscapes in the agro-pastoral ecotone of northern China, with our results showing the effects of these two facets to not be consistent.”

      (5) One possible way to avoid the confusion between "habitat fragmentation" and "fragmentation per se" could be to say "habitat loss and fragmentation per se" when you intend to express "habitat fragmentation".

      Response: Thanks for your constructive suggestions. To avoid this confusion, we no longer mention habitat fragmentation in the revised manuscript but instead express it as habitat loss and fragmentation per se.

      Reviewer #1 (Public Review):

      This is a well-designed study that explores the BEF relationships in fragmented landscapes. Although there are massive studies on BEF relationships, most of them were conducted at local scales, few considered the impacts of landscape variables. This study used a large dataset to specifically address this question and found that habitat loss weakened the BEF relationships. Overall, this manuscript is clearly written and has important implications for BEF studies as well as for ecosystem restoration.

      Response: We are grateful to you for the recognition and constructive comments. All the comments and suggestions are very constructive for improving this manuscript. We have carefully revised the manuscript following your suggestions. All changes are marked in red font in the revised manuscript.

      My only concern is that the authors should clearly define habitat loss and fragmentation. Habitat loss and fragmentation are often associated, but they are different terms. The authors consider habitat loss a component of habitat fragmentation, which is not reasonable. Please see my specific comments below.

      Response: We agree with your point. In the revised manuscript, we no longer consider habitat loss and fragmentation per se as two facets of habitat fragmentation. We have clearly defined habitat loss and fragmentation per se and explicitly evaluated their relative effects on plant richness, above-ground biomass, and the BEF relationship.

      Reviewer #1 (Recommendations For The Authors):

      Title: It is more proper to say habitat loss, rather than habitat fragmentation.

      Response: Thanks for your suggestion. We have revised the title to “Habitat loss weakens the positive relationship between grassland plant richness and above-ground biomass”

      Line 22, remove "Anthropogenic", this paper is not specifically discussing habitat fragmentation driven by humans.

      Response: Thanks for your suggestion. We have removed the “Anthropogenic” from this sentence.

      Line 26, revise to "we investigated the effects of habitat loss and fragmentation per se on plant richness... in grassland communities by using a structural equation model".

      Response: Thanks for your suggestion. We have revised this sentence.

      Line 25-28:

      “Based on 130 landscapes identified by a stratified random sampling in the agro-pastoral ecotone of northern China, we investigated the effects of landscape context (habitat loss and fragmentation per se) on plant richness, above-ground biomass, and the relationship between them in grassland communities using a structural equation model.”

      Line 58-60, habitat fragmentation generally involves habitat loss, but habitat loss is independent of habitat fragmentation, it is not a facet of habitat fragmentation.

      Response: Thanks for your insightful comment. We have no longer considered habitat loss and fragmentation per se as two facets of habitat fragmentation. In the revised manuscript, we consider habitat loss and fragmentation as two different processes in fragmented landscapes.

      Line 65-67, this sentence is not very relevant to this paragraph and can be deleted.

      Response: Thanks for your suggestion. We have deleted this sentence from the paragraph.

      Line 87-90, these references are mainly based on microorganisms, are there any references based on plants? These references are more relevant to this study. In addition, this is a key mechanism mentioned in this study, this section needs to be strengthened with more evidence and further exploration.

      Response: Thanks for your comment and suggestion. Thanks for your comment and suggestion. We have added some references based on plants here to strengthen the evidence and mechanism of habitat specialisation determines the BEF relationship.

      Line 89-95:

      “In communities, specialists with specialised niches in resource use may contribute complementary roles to ecosystem functioning, whereas generalists with unspecialised in resource use may contribute redundant roles to ecosystem functioning due to overlapping niches (Dehling et al., 2021; Denelle et al., 2020; Gravel et al., 2011; Wilsey et al., 2023). Therefore, communities composed of specialists should have a higher niche complementarity effect in maintaining ecosystem functions and a more significant BEF relationship than communities composed of generalists.”

      Denelle, P., Violle, C., DivGrass, C., Munoz, F. 2020. Generalist plants are more competitive and more functionally similar to each other than specialist plants: insights from network analyses. Journal of Biogeography 47: 1922-1933.

      Dehling, D.M., Bender, I.M.A., Blendinger, P.G., Böhning-Gaese, K., Muñoz, M.C., Neuschulz, E.L., Quitián, M., Saavedra, F., Santillán, V., Schleuning, M., Stouffer, D.B. 2021. Specialists and generalists fulfil important and complementary functional roles in ecological processes. Functional Ecology 35: 1810-1821.

      Wilsey, B., Martin, L., Xu, X., Isbell, F., Polley, H.W. 2023. Biodiversity: Net primary productivity relationships are eliminated by invasive species dominance. Ecology Letters.

      Line 129-130, Although you can use habitat loss in the discussion or the introduction, here preferably use habitat amount or habitat area, rather than habitat loss in this case. Habitat loss represents changes in habitat area, but the remaining grasslands could be the case of natural succession or other processes, rather than loss of natural habitat.

      Response: Thanks for your insightful comment. We agree with your point. In the revised manuscript, we have explicitly stated that habitat loss was represented by the loss of grassland amount in the landscape.

      Since the remaining grassland fragments in this region were mainly caused by grassland loss due to human activities such as cropland expansion (Chen et al., 2019; Yang et al., 2020), we used the percentage of non-grassland cover in the landscape to represent habitat loss in our study.

      Line 132-135:

      “Habitat loss was represented by the loss of grassland amount in the landscape. As the remaining grassland fragments in this region were mainly caused by grassland loss due to human activities such as cropland expansion (Chen et al., 2019; Yang et al., 2020), the percentage of non-grassland cover in the landscape was used in our study to represent habitat loss.”

      Lines 245-246, please also give more details of the statistical results, such as n, r value et al in the text.

      Response: Thanks for your suggestion. We have added the details of the statistical results in the revised manuscript.

      Line 283-290:

      “Habitat loss was significantly negatively correlated with overall species richness (R = -0.21, p < 0.05, Figure 3a) and grassland specialist richness (R = -0.41, p < 0.01, Figure 3a), but positively correlated with weed richness (R = 0.31, p < 0.01, Figure 3a). Fragmentation per se was not significantly correlated with overall species richness and grassland specialist richness, but was significantly positively correlated with weed richness (R = 0.26, p < 0.01, Figure 3b). Habitat loss (R = -0.39, p < 0.01, Figure 3c) and fragmentation per se (R = -0.26, p < 0.01, Figure 3d) were both significantly negatively correlated with above-ground biomass.”

      Fig. 5, is there any relationship between habitat amount and fragmentation per se in this study?

      Response: Thanks for your insightful comment. We have considered a causal relationship between habitat loss and fragmentation per se in the structural equation model. We have discussed this relationship in the revised manuscript.

      Line 290-293, how about the BEF relationships with different fragmentation levels? I may have missed something somewhere, but it was not shown here.

      Response: Thanks for your insightful comment. We have added the BEF relationships with different fragmentation per se levels here.

      Line 323-340:

      “The linear regression models showed that habitat loss had a significant positive modulating effect on the positive relationship between plant richness and above-ground biomass, and fragmentation per se had no significant modulating effect (Figure 5). The positive relationship between plant richness and above-ground biomass weakened with increasing levels of habitat loss, strengthened and then weakened with increasing levels of fragmentation per se.

      Author response image 1.

      Relationships between grassland plant richness and above-ground biomass at different levels of habitat loss and fragmentation per se from 130 landscapes in the Tabu River Basin, a typical agro-pastoral ecotone of northern China: (a) high habitat loss and low fragmentation per se, (b) high habitat loss and moderate fragmentation per se, (c) high habitat loss and high fragmentation per se, (d) moderate habitat loss and low fragmentation per se, (e) moderate habitat loss and moderate fragmentation per se, (f) moderate habitat loss and high fragmentation per se, (g) low habitat loss and low fragmentation per se, (h) low habitat loss and moderate fragmentation per se. The R2 values in each panel are from linear regression models. The n in each panel is the number of surveying sites used in the linear regression models. The blue solid and dashed trend lines represent the significant and not significant effects, respectively. The shaded area around the trend line represents the 95% confidence interval. * represent significance at the 0.05 level. ** represent significance at the 0.01 level.”

      Discussion

      The Discussion (Section 4.2) needs to be revised and focused on your key findings, it is habitat loss, not fragmentation per se, that weakens the BEF relationships.

      Response: Thanks for your insightful comment and suggestion. In the revised manuscript, we have rephrased the Discussion (Section 4.2) to mainly discuss the inconsistent effects of habitat loss and fragmentation per se on the BEF relationship.

      Line 414-416:

      “4.2 Habitat loss rather than fragmentation per se weakened the magnitude of the positive relationship between plant diversity and ecosystem function”

      The R2 in the results are low (e.g., Fig. 3), please also mention other variables that might influence the observed pattern in the Discussion, such as soil and topography, though I understand it is difficult to collect such data in this study.

      Response: Thanks for your insightful comment and suggestion. We agree with you and reviewer 3 that the impact of environmental factors should also be considered.

      Therefore, we have considered two environmental factors related to water and temperature (soil water content and land surface temperature) in the analysis and discussed their impacts on plant diversity and above-ground biomass in the revised manuscript.

      Lines 344-345, its relative importance was stronger in the intact landscape than that of the fragmented landscape?

      Response: We apologize for making this confusion. We have rephrased this sentence.

      Line 422-426:

      “Our study found grassland plant diversity showed a stronger positive impact on above-ground productivity than landscape context and environmental factors. This result is consistent with findings by Duffy et al. (2017) in natural ecosystems, indicating grassland plant diversity has an important role in maintaining grassland ecosystem functions in the fragmented landscapes of the agro-pastoral ecotone of northern China.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Yan et al. assess the effect of two facets of habitat fragmentation (i.e., habitat loss and habitat fragmentation per se) on biodiversity, ecosystem function, and the biodiversity-ecosystem function (BEF) relationship in grasslands of an agro-pastoral ecotone landscape in northern China. The authors use stratified random sampling to select 130 study sites located within 500m-radius landscapes varying along gradients of habitat loss and habitat fragmentation per se. In these study sites, the authors measure grassland specialist and generalist plant richness via field surveys, as well as above-ground biomass by harvesting and dry-weighting the grass communities in each 3 x 1m2 plots of the 130 study sites. The authors find that habitat loss and fragmentation per se have different effects on biodiversity, ecosystem function and the BEF relationship: whereas habitat loss was associated with a decrease in plant richness, fragmentation per se was not; and whereas fragmentation per se was associated with a decrease in above-ground biomass, habitat loss was not. Finally, habitat loss, but not fragmentation per se was linked to a decrease in the magnitude of the positive biodiversity-ecosystem functioning relationship, by reducing the percentage of grassland specialists in the community.

      Strengths:

      This study by Yan et al. is an exceptionally well-designed, well-written, clear and concise study shedding light on a longstanding, important question in landscape ecology and biodiversity-ecosystem functioning research. Via a stratified random sampling approach (cf. also "quasi-experimental design" Butsic et al. 2017), Yan et al. create an ideal set of study sites, where habitat loss and habitat fragmentation per se (usually highly correlated) are decorrelated and hence, separate effects of each of these facets on biodiversity and ecosystem function can be assessed statistically in "real-world" (and not experimental, cf. Duffy et al. 2017) communities. The authors use adequate and well-described methods to investigate their questions. The findings of this study add important empirical evidence from real-world grassland ecosystems that help to advance our theoretical understanding of landscape-moderation of biodiversity effects and provide important guidelines for conservation management.

      Weaknesses:

      I found only a few minor issues, mostly unclear descriptions in the study that could be revised for more clarity.

      Response: Thanks very much for your review of the manuscript. We are grateful to you for the recognition. All the comments and suggestions are very insightful and constructive for improving this manuscript. We have carefully studied the literature you recommend and revised the manuscript carefully following your suggestions. All changes are marked in red font in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments

      (1) Some aspects of the Methods section were not entirely clear to me, could you revise them for more clarity?

      (a) Whereas you describe 4 main facets of fragmentation per se that are used to create the PC1 as a measure of overall fragmentation per se, it looks as if this PC1 is mainly driven by 3 facets only (ED, PD and AREA_MN), and patch isolation (nearest neighbour distance, ENN) having a relatively low loading on PC1 (Figure A1). I think it would be good to discuss this fact and the consequences of it, that your definition of fragmentation is focused more on edge density, patch density and mean patch area, and less on patch isolation in your Discussion section?

      Response: Thanks for your insightful comment and suggestion. We agree with your point. We have discussed this fact and its implications for understanding the effects of fragmentation per se in our study.

      Line 384-389:

      “However, it is important to stress that the observed positive effect of fragmentation per se does not imply that increasing the isolation of grassland patches would promote biodiversity, as the metric of fragmentation per se used in our study was more related to patch density, edge density and mean patch area while relatively less related to patch isolation (Appendix Table A1). The potential threats from isolation still need to be carefully considered in the conservation of biodiversity in fragmented landscapes (Haddad et al., 2015).”

      (b) Also, from your PCA in Figure A1, it seems that positive values of PC1 mean "low fragmentation", whereas high values of PC1 mean "high fragmentation", however, in Figure A2, the inverse is shown (low values of PC1 = low fragmentation, high values of PC1 = high fragmentation). Could you clarify in the Methods section, if you scaled or normalized the PC1 to match this directionality?

      Response: We apologize for making this confusion. In order to be consistent with the direction of change in fragmentation per se, we took the inverse of the PC1 as a single fragmentation per se index, which was positively correlated with patch density, edge density, mean nearest-neighbor distance metric, and negatively with mean patch area (Appendix Figure A1 and Table A1). We have clarified this point in the Method section.

      Line 160-163:

      “We took the inverse of the PC1 as a single fragmentation per se index, which was positively correlated with patch density, edge density, mean nearest-neighbor distance metric, and negatively with mean patch area (Appendix Figure A1 and Table A1).”

      (c) On line 155 you describe that you selected at least 20 landscapes using stratified sampling from each of the eight groups of habitat amount and fragmentation combination. Could you clarify: 1) did you randomly sample within these groups with a minimum distance condition or was it a non-random selection according to other criteria? (I think you could move the "To prevent overlapping landscapes..." sentence up here to the description of the landscape selection process) 2) Why did you write "at least 20 landscapes" - were there in some cases more or less landscapes selected? 130 study landscapes divided by 8 groups only gives you 16.25, hence, at least for some groups there were less than 20 landscapes? Could you describe your final dataset in more detail, i.e. the number of landscapes per group and potential repercussions for your analysis?

      Response: Thanks for your insightful comments. In the revised manuscript, we have rephrased the method to provide more detail for the sampling landscape selection.

      (1) Line 169-172

      We randomly selected at least 20 grassland landscapes with a minimum distance condition using stratified sampling from each of the remaining eight grassland types as alternative sites for field surveys. The minimum distance between each landscape was at least 1000 m to prevent overlapping landscapes and potential spatial autocorrelation.

      (2) Line 184-191

      The reason for selecting at least 20 grassland landscapes of each type in this study was to ensure enough alternative sites for the field survey. This is because the habitat type of some selected sites was not the natural grasslands, such as abandoned agricultural land. Some of the selected sites may not be permitted for field surveys.

      Thus, we finally established 130 sites in the field survey. The types of the 130 sites were: 19 high-moderate, 14 high-low, 19 moderate-high, 16 moderate-moderate, 18 moderate-low, 16 low-high, 17 low-moderate, 11 low-low habitat amount and fragmentation per se.

      (d) On line 166, you describe that you established 130 sites of 30 m by 30 m - I assume they were located (more or less) exactly in the centre of the selected 500 m - radius landscapes? Were they established so that they were fully covered with grassland? And more importantly, how did you establish the 10 m by 10 m areas and the 1 m2 plots within the 30 m by 30 m sites? Did you divide the 30 m by 30 m areas into three rectangles of 10 m by 10 m and then randomly established 1 m2 plots? Were the 1 m2 plots always fully covered with grassland/was there a minimum distance to edge criterion? Please describe with more detail how you established the 1 m2 study sites, and how many there were per landscape.

      Response: Thanks for your insightful comments. In the revised manuscript, we have provided more detailed information on how to set up 130 sites of 30 m by 30 m and three plots of 1 m by 1 m.

      (1) As these 130 sites were selected based on the calculation of the moving window, they were located (more or less) exactly in the centre of the 500-m radius buffer.

      (2) These sites were fully covered with grassland because their size (30 m by 30 m) was the same as the size of the grassland cell (30 m by 30 m) used in the calculation of the moving window.

      (3) We randomly set up three 1 m * 1 m plots in a flat topographic area at the 10 m * 10 m centre of each site. Thus, there was a minimum distance of 10 m to the edge for each 1 m * 1 m plot.

      (4) There are three 1 m * 1 m plots per landscape.

      Line 182-191:

      “Based on the alternative sites selected above, we established 130 sites (30 m * 30 m) between late July to mid-August 2020 in the Tabu River Basin in Siziwang Banner, Inner Mongolia Autonomous Region (Figure 1). The types of the 130 sites were: 19 high-moderate, 14 high-low, 19 moderate-high, 16 moderate-moderate, 18 moderate-low, 16 low-high, 17 low-moderate, 11 low-low habitat amount and fragmentation per se. In order to exclude the impact of historical agricultural activities, the habitat type of the established sites was natural grasslands with regional vegetation characteristics. Each site was not abandoned agricultural land, and there was no sign of agricultural reclamation.

      At the 10 m * 10 m center of each site, we randomly set up three 1 m * 1 m plots in a flat topographic area to investigate grassland vascular plant diversity and above-ground productivity.”

      (e) Line 171: could you explain what you mean by reclaimed?

      Response: Thanks for your comment. The “reclaimed” means that historical agricultural activities. We have rephrased this sentence to make it more explicit.

      Line 186-189:

      “In order to exclude the impact of historical agricultural activities, the habitat type of the established sites was natural grasslands with regional vegetation characteristics. Each site was not abandoned agricultural land, and there was no sign of agricultural reclamation.”

      (f) Line 188 ff.: Hence your measure of productivity is average-above ground biomass per 1 m2. I think it would add clarity if you highlighted this more explicitly.

      Response: Thanks for your suggestion. We have highlighted that the productivity in our study was the average above-ground biomass per 1 m * 1 m plots in each site.

      Line 215-217:

      “For each site, we calculated the mean vascular plant richness of the three 1 m * 1 m plots, representing the vascular plant diversity, and mean above-ground biomass of the three 1 m * 1 m plots, representing the above-ground productivity.”

      (2) All figures are clear and well-designed!

      (a) Just as a suggestion: in Figures 3 and 6, you could maybe add the standard errors of the mean as well?

      Response: Thanks for your suggestion. In the revised manuscript, we have added the standard errors of the mean in Figures 3 and 6.

      (b) Figure 4: Could you please clarify: Which models were the optimal models on which these model-averaged standardized parameter estimates were based on? And hence, the optimal models contained all 4 predictors (otherwise, no standardized parameter estimate could be calculated)? Or do these model-averaged parameters take into account all possible models (and not only the optimal ones)?

      Response: Thanks for your suggestion. We selected the four optimal models based on the AICc value to calculate the model-averaged standardized parameter estimates. The four optimal models contained all predictors in Figure 4. We have added the four optimal models in Appendix Table A3.

      Appendix:

      Author response table 1.

      Four optimal models of landscape context, environment factors, and plant diversity affecting above-ground biomass.

      Note: AGB: above-ground biomass; HL: habitat loss; FPS: fragmentation per se; SWT: soil water content; LST: land surface temperature; GSR: grassland specialist richness; WR: weed richness; **: significance at the 0.01 level.”

      (c) Please add in all Figures (i.e., Figures 4, 5 and 6, Figure 6 per "high, moderate and low-class") the number of study units the analyses were based on.

      Response: Thanks for your suggestion. In the revised manuscript, we have added the number of study units the analyses were based on in all Figures.

      (d) Figure 6: I think it would be more consistent to add a second plot where the BEF-relationship is shown for low, moderate and high levels of habitat fragmentation per se. Could you also add a clearer description in the Methods and/or Results section of how you assessed if habitat amount or fragmentation per se affected the BEF-relationship? I.e. based on the significance of the interaction term (habitat amount x species richness) in a linear model?

      Response: Thanks for your insightful comment and suggestion. We have added a second plot in Figure 5 to show the BEF relationship at low, moderate and high levels of fragmentation per se.

      Line 328-340:

      Author response image 2.

      Relationships between grassland plant richness and above-ground biomass at different levels of habitat loss and fragmentation per se from 130 landscapes in the Tabu River Basin, a typical agro-pastoral ecotone of northern China: (a) high habitat loss and low fragmentation per se, (b) high habitat loss and moderate fragmentation per se, (c) high habitat loss and high fragmentation per se, (d) moderate habitat loss and low fragmentation per se, (e) moderate habitat loss and moderate fragmentation per se, (f) moderate habitat loss and high fragmentation per se, (g) low habitat loss and low fragmentation per se, (h) low habitat loss and moderate fragmentation per se. The R2 values in each panel are from linear regression models. The n in each panel is the number of surveying sites used in the linear regression models. The blue solid and dashed trend lines represent the significant and not significant effects, respectively. The shaded area around the trend line represents the 95% confidence interval. * represent significance at the 0.05 level. ** represent significance at the 0.01 level.”

      We determined whether habitat loss and fragmentation per se moderated the BEF relationship by testing the significance of their interaction term with plant richness. We have added a clearer description in the Methods section of the revised manuscript.

      Line 245-250:

      “We then assessed the significance of interaction terms between habitat loss and fragmentation per se and plant richness in the linear regression models to evaluate whether they modulate the relationship between plant richness and above-ground biomass. Further, we used a piecewise structural equation model to investigate the specific pathways in which habitat loss and fragmentation per se modulate the relationship between plant richness and above-ground biomass.”

      (3) While reading your manuscript, I missed a discussion on the potential non-linear effects of habitat amount and fragmentation per se. In your study, it seems that the effects of habitat amount and fragmentation per se on biodiversity and ecosystem function are quite linear, which contrasts previous research highlighting that intermediate levels of fragmentation/heterogeneity could maximise spatial asynchrony, biodiversity and ecosystem function (e.g. Redon et al. 2014, Thompson & Gonzalez 2016, Tscharntke et al. 2012, Wilcox et al. 2017). I think it would add depth to your study if you discussed your finding of linear effects of habitat amount and fragmentation on biodiversity, ecosystem functioning and BEF. For example:

      Response: Thanks for your constructive suggestions. We have carefully studied the literature (e.g. Redon et al. 2014, Thompson & Gonzalez 2016, Tscharntke et al. 2012, Wilcox et al. 2017), which highlights that intermediate levels of fragmentation/heterogeneity could maximise spatial asynchrony, biodiversity and ecosystem function.

      In the revised manuscript, we have added the discussion about the linear positive effects of fragmentation on plant diversity and above-ground productivity and discussed possible reasons for this linear effect.

      Line 402-413:

      “In our study, a possible mechanism for the positive impacts of fragmentation per se on plant diversity and above-ground productivity (indirect positive impact via plant diversity) is that fragmentation per se increases the habitat heterogeneity in the landscape, which can promote biodiversity through spatial asynchrony and spatial insurance effects (Tscharntke et al., 2012). Previous studies indicated that heterogeneity typically has nonlinear effects on biodiversity and ecosystem function, as moderate heterogeneity can maximise spatial asynchrony (Redon et al., 2014; Wilcox et al., 2017). However, our study did not observe nonlinear patterns between fragmentation per se and plant diversity and above-ground productivity. This may be due to the low spatial heterogeneity of this area as a result of agricultural intensification (Benton et al., 2003; Chen et al., 2019). The gradient of fragmentation per se in our study may not cover the optimal heterogeneity levels for maximising plant diversity and above-ground productivity (Thompson and Gonzalez, 2016).”

      Meanwhile, we also discussed the nonlinear pattern of the BEF relationship with increasing levels of fragmentation per se to add depth to the discussion.

      Line 442-451:

      “In addition, our study found that the BEF relationship showed a nonlinear pattern with increasing levels of fragmentation per se. For a given level of habitat loss, the positive BEF relationship was strongest at moderate fragmentation per se level and became neutral at high fragmentation per se level. This can be explained by the increased spatial asynchrony at moderate fragmentation per se level, which can promote niche complementary among species in the community and thus strengthen the BEF relationship (Gonzalez et al., 2020; Thompson and Gonzalez, 2016; Tscharntke et al., 2012). The neutral BEF relationship at high fragmentation per se level may be due to edge effects enhancing environmental filtering, thereby leading to functional redundancy among species and decoupling the BEF relationship (Fetzer et al., 2015; Hu et al., 2016; Zambrano et al., 2019).”

      (a) Line 74-75: I was wondering if you also thought of spatial insurance effects or spatial asynchrony effects that can emerge with habitat fragmentation, which could lead to increased ecosystem functioning as well? (refs. above).

      Response: Thanks for your constructive suggestions. In the revised manuscript, we have explicitly considered the spatial insurance effect or spatial asynchrony as the important mechanism for fragmentation per se to increase plant diversity, ecosystem function, and the BEF relationship.

      Line 74-77:

      “In theory, habitat loss and fragmentation per se can regulate ecosystem function and the BEF relationship by altering species composition, interactions, and spatial asynchrony regardless of changes in species richness (Liu et al., 2018; Thompson and Gonzalez, 2016; Tscharntke et al., 2012).”

      Line 402-408:

      “In our study, a possible mechanism for the positive impacts of fragmentation per se on plant diversity and above-ground productivity (indirect positive impact via plant diversity) is that fragmentation per se increases the habitat heterogeneity in the landscape, which can promote biodiversity through spatial asynchrony and spatial insurance effects (Tscharntke et al., 2012). Previous studies indicated that heterogeneity typically has nonlinear effects on biodiversity and ecosystem function, as moderate heterogeneity can maximise spatial asynchrony (Redon et al., 2014; Wilcox et al., 2017).”

      Line 442-451:

      “In addition, our study found that the BEF relationship showed a nonlinear pattern with increasing levels of fragmentation per se. For a given level of habitat loss, the positive BEF relationship was strongest at moderate fragmentation per se level and became neutral at high fragmentation per se level. This can be explained by the increased spatial asynchrony at moderate fragmentation per se level, which can promote niche complementary among species in the community and thus strengthen the BEF relationship (Gonzalez et al., 2020; Thompson and Gonzalez, 2016; Tscharntke et al., 2012). The neutral BEF relationship at high fragmentation per se level may be due to edge effects enhancing environmental filtering, thereby leading to functional redundancy among species and decoupling the BEF relationship (Fetzer et al., 2015; Hu et al., 2016; Zambrano et al., 2019).”

      (b) I was wondering, if this result of linear effects could also be the result of a fragmentation gradient that does not cover the whole range of potential values? Maybe it would be good to compare the gradient in habitat fragmentation in your study with a theoretical minimum maximum/considering that there might be an optimal medium degree of fragmentation.

      Response: Thanks for your insightful comment. We agree with your point that the linear effect of fragmentation per se in our study may be due to the fact that the gradient of fragmentation per se in this region may not cover the optimal heterogeneity levels for maximising spatial asynchrony. This is mainly because the agricultural intensification in the agro-pastoral ecotone of northern China could lead to lower spatial heterogeneity in this region. We have explicitly discussed this point in the revised manuscript.

      Line 406-413:

      “Previous studies indicated that heterogeneity typically has nonlinear effects on biodiversity and ecosystem function, as moderate heterogeneity can maximise spatial asynchrony (Redon et al., 2014; Wilcox et al., 2017). However, our study did not observe nonlinear patterns between fragmentation per se and plant diversity and above-ground productivity. This may be due to the low spatial heterogeneity of this area as a result of agricultural intensification (Benton et al., 2003; Chen et al., 2019). The gradient of fragmentation per se in our study may not cover the optimal heterogeneity levels for maximising plant diversity and above-ground productivity (Thompson and Gonzalez, 2016).”

      (4) Some additional suggestions:

      (a) Line 3: Maybe add "via reducing the percentage of grassland specialists in the community"?

      Response: Thanks for your suggestion. We have revised this sentence.

      Line 19:

      “Habitat loss can weaken the positive BEF relationship via reducing the percentage of grassland specialists in the community”

      (b) Lines 46-48: Maybe add "but see: Duffy, J.E., Godwin, C.M. & Cardinale, B.J. (2017). Biodiversity effects in the wild are common and as strong as key drivers of productivity. Nature."

      Response: Thanks for your suggestion. We have added this reference here.

      Line 47-49:

      “When research expands from experiments to natural systems, however, BEF relationships remain unclear in the natural assembled communities, with significant context dependency (Hagan et al., 2021; van der Plas, 2019; but see Duffy et al., 2017).”

      (c) Lines 82-87 and lines 90-93: Hence, your study actually is in contrast to these findings, i.e., fragmented landscapes do not necessarily have a lower fraction of grassland specialists? If yes, could you highlight this more explicitly?

      Response: Thanks for your insightful comment. We have explicitly highlighted this point in the revised manuscript.

      Line 434-439:

      “Meanwhile, our study demonstrates that habitat loss, rather than fragmentation per se, can decrease the degree of habitat specialisation by leading to the replacement of specialists by generalists in the community, thus weakening the BEF relationship. This is mainly because fragmentation per se did not decrease the grassland specialist richness in this region, whereas habitat loss decreased the grassland specialist richness and led to the invasion of more weeds from the surrounding farmland into the grassland community (Yan et al., 2022; Yan et al., 2023).”

      (d) Line 360: Could you add some examples of these multiple ecosystem functions you refer to?

      Response: Thanks for your suggestion. We have added some examples of these multiple ecosystem functions here.

      Line 456-457:

      “Therefore, future studies are needed to focus on multiple ecosystem functions, such as below-ground productivity, litter decomposition, soil carbon stocks, etc.”

      Reviewer #3 (Public Review):

      Summary:

      The authors aim to solve how landscape context impacts the community BEF relationship. They found habitat loss and fragmentation per se have inconsistent effects on biodiversity and ecosystem function. Habitat loss rather than fragmentation per se can weaken the positive BEF relationship by decreasing the degree of habitat specialization of the community.

      Strengths:

      The authors provide a good background, and they have a good grasp of habitat fragmentation and BEF literature. A major strength of this study is separating the impacts of habitat loss and fragmentation per se using the convincing design selection of landscapes with different combinations of habitat amount and fragmentation per se. Another strength is considering the role of specialists and generalists in shaping the BEF relationship.

      Response: We are grateful to you for the recognition and constructive comments. All the comments and suggestions are very constructive for improving this manuscript. We have carefully revised the manuscript following your suggestions. All changes are marked in red font in the revised manuscript.

      Weaknesses:

      (1) The authors used five fragmentation metrics in their study. However, the choice of these fragmentation metrics was not well justified. The ecological significance of each fragmentation metric needs to be differentiated clearly. Also, these fragmentation metrics may be highly correlated with each other and redundant. I suggest author test the collinearity of these fragmentation metrics for influencing biodiversity and ecosystem function.

      Response: Thanks for your constructive suggestion. The fragmentation metrics used in our study represent the different processes of breaking apart of habitat in the landscape, which are widely used by previous studies (Fahrig, 2003; Fahrig, 2017). In the revised manuscript, we have provided more detailed information about the ecological significance of these fragmentation indices.

      Line 142-148:

      “The patch density metric reflects the breaking apart of habitat in the landscape, which is a direct reflection of the definition of fragmentation per se (Fahrig et al., 2019). The edge density metric reflects the magnitude of the edge effect caused by fragmentation (Fahrig, 2017). The mean patch area metric and the mean nearest-neighbor distance metric are associated with the area and distance effects of island biogeography, respectively, reflecting the processes of local extinction and dispersal of species in the landscape (Fletcher et al., 2018).”

      Meanwhile, we have calculated the variance inflation factors (VIF) for each fragmentation metric to assess their collinearity. The VIF of these fragmentation metrics were all less than four, suggesting no significant multicollinearity for influencing biodiversity and ecosystem function.

      Author response table 2.

      Variance inflation factors of habitat loss and fragmentation per se indices for influencing plant richness and above-ground biomass.

      (2) I found the local environmental factors were not considered in the study. As the author mentioned in the manuscript, temperature and water also have important impacts on biodiversity and ecosystem function in the natural ecosystem. I suggest authors include the environmental factors in the data analysis to control their potential impact, especially the structural equation model.

      Response: Thanks for your constructive suggestion. We agree with you that environmental factors should be considered in our study. In the revised manuscript, we have integrated two environmental factors related to water and temperature (soil water content and land surface temperature) into the data analysis to control their potential impact. The main results and conclusions of the revised manuscript are consistent with those of the previous manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) L60-63. The necessity to distinguish between habitat loss and fragmentation per se is not clearly stated. More information about biodiversity conservation strategies can be given here.

      Response: Thanks for your suggestion. In the revised manuscript, we have provided more evidence about the importance of distinguishing between habitat loss and fragmentation per se for biodiversity conservation.

      Line 62-67:

      “Habitat loss is often considered the major near-term threat to the biodiversity of terrestrial ecosystems (Chase et al., 2020; Haddad et al., 2015), while the impact of fragmentation per se remains debated (Fletcher Jr et al., 2023; Miller-Rushing et al., 2019). Thus, habitat loss and fragmentation per se may have inconsistent ecological consequences and should be considered simultaneously to establish effective conservation strategies in fragmented landscapes (Fahrig et al., 2019; Fletcher et al., 2018; Miller-Rushing et al., 2019).”

      (2) L73-77. The two sentences are hard to follow. Please rephrase to improve the logic. And I don't understand the "however" here. There is no twist.

      Response: Thanks for your suggestion. We have rephrased the two sentences to improve their logic.

      Line 74-79:

      “In theory, habitat loss and fragmentation per se can regulate ecosystem function and the BEF relationship by altering species composition, interactions, and spatial asynchrony regardless of changes in species richness (Liu et al., 2018; Thompson and Gonzalez, 2016; Tscharntke et al., 2012). This is because species in communities are not ecologically equivalent and may respond differently to habitat loss and fragmentation per se, and contribute unequally to ecosystem function (Devictor et al., 2008; Wardle and Zackrisson, 2005).”

      (3) L97. Are grasslands really the largest terrestrial ecosystem? Isn't it the forest?

      Response: We apologize for making this confusion. We have rephrased this sentence here.

      Line 101-104:

      “Grasslands have received considerably less attention, despite being one of the largest terrestrial ecosystems, and suffering severe fragmentation due to human activities, such as agricultural reclamation and urbanisation (Fardila et al., 2017).”

      (4) Fig.1, whether the four sample plots presented in panel b are from panel a. Please add the scale bar in panel b.

      Response: Thanks for your comment. The four sample plots presented in panel b are from panel a in Figure 1. We have also added the scale bar in panel b.

      (5) L105. This statement is too specific. Please remove and consider merging this paragraph with the next.

      Response: Thanks for your suggestion. We have removed this sentence and merged this paragraph with the next.

      (6) L157. The accuracy and kappa value of the supervised classification should be given.

      Response: Thanks for your suggestion. We have added the accuracy and kappa value of the supervised classification in the revised manuscript.

      Line 176-177:

      “The overall classification accuracy was 84.3 %, and the kappa coefficient was 0.81.”

      (7) I would recommend the authors provide the list of generalists and specialists surveyed in the supplementary. Readers may not be familiar with the plant species composition in this area.

      Response: Thanks for your suggestion. We agree with your point. We have provided the list of generalists and specialists surveyed in the Appendix Table A4.

      Line 282-283:

      “A total of 130 vascular plant species were identified in our study sites, including 91 grassland specialists and 39 weeds (Appendix Table A4).”

      (8) Fig.4, it is better to add the results of variation partition to present the relative contribution of habitat fragmentation, environmental factors, and plant diversity.

      Response: Thanks for your suggestion. We have integrated the landscape context, environmental factors, and plant diversity into the multi-model averaging analysis and redraw Figure 4 to present their relative importance for above-ground biomass.

      Line 313-319:

      Author response image 3.

      Standardised parameter estimates and 95% confidence intervals for landscape context, plant diversity, and environmental factors affecting above-ground biomass from 130 landscapes in the Tabu River Basin, a typical agro-pastoral ecotone of northern China. Standardised estimates and 95% confidence intervals are calculated by the multi-model averaging method based on the four optimal models affecting above-ground biomass (Appendix Table A3). ** represent significance at the 0.01 level.

      (9) Please redraw Fig.2 and Fig.5 to integrate the environmental factors. Add the R-square to Fig 5.

      Response: Thanks for your suggestion. We have integrated two environmental factors into the structural equation model and redraw Figure 2 and Figure 5 in the revised manuscript. And we have added the R-square to the Figure 5.

      (10) L354. The authors should be careful to claim that habitat loss could reduce the importance of plant diversity to ecosystem function. This pattern observed may depend on the type of ecosystem function studied.

      Response: Thanks for your suggestion. We have avoided this claim in the revised manuscript and explicitly discussed the importance of simultaneously focusing on multiple ecosystem functions, such as below-ground productivity, litter decomposition, soil carbon stocks, etc.

      Line 454-457:

      “This inconsistency can be explained by trade-offs between different ecosystem functions that may differ in their response to fragmentation per se (Banks-Leite et al., 2020). Therefore, future studies are needed to focus on multiple ecosystem functions, such as below-ground productivity, litter decomposition, soil carbon stocks, etc.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment This valuable paper reports a theoretical framework and methodology for identifying Cancer Driving Nucleotides (CDNs), primarily based on single nucleotide variant (SNV) frequencies. A variety of solid approaches indicate that a mutation recurring three or more times is more likely to reflect selection rather than being the consequence of a mutation hotspot. The method is rigorously quantitative, though the requirement for larger datasets to fully identify all CDNs remains a noted limitation. The work will be of broad interest to cancer geneticists and evolutionary biologists. 

      The key criticism “the requirement for larger datasets to fully identify all CDNs remains a noted limitation” that is also found in both reviews. We have clarified the issue in the main text, the relevant parts, from which are copied below. The response below also addresses many comments in the reviews. In addition, Discussion of eLife-RP-RA-2024-99341 has been substantially expanded to answer the questions of Reviewer 2.

      We shall answer the boldface comment in three ways. First, it can be answered using GENIE data. Fig. 7 of the main text (eLife-RP-RA-2024-99340) shows that, when n increases from ~ 1000 to ~ 9,000, the numbers of discovered CDNs increase by 3 – 5 fold, most of which come from the two-hit class. Hence, the power of discovering more CDNs with larger datasets is evident. By extrapolation, a sample size of 100,000 should be able to yield 90% of all CDNs, as calculated here. (Fig. 7 also addresses the queries of whether we have used datasets other than TCGA. We indeed have used all public data, including GENIE and COSMIC.) 

      Second, the power of discovering more cancer driver genes by our theory is evident even without using larger datasets. Table 3 of the companion study (eLife-RP-RA-2024-99341) shows that, averaged across cancer types, the conventional method would identify 45 CDGs while the CDN method tallies 258 CDGs. The power of the CDN method is demonstrated. This is because the conventional approach has to identify CDGs (cancer driver genes) in order to identify the CDNs they carry. However, many CDNs occur in non-CDGs and are thus missed by the conventional approach. In Supplementary File S2, we have included a full list of CDNs discovered in our study, along with population allele frequency annotations from gnomAD. The distribution patterns of these CDNs across different cancer types show their pan-cancer properties as further explored in the companion paper.

      Third, while many, or even most CDNs occur in non-CDGs and are thus missed, the conventional approach also includes non-CDN mutations in CDGs. This is illustrated in Fig. 5 of the companion study (eLife-RP-RA-2024-99341) that shows the adverse effect of misidentifications of CDNs by the conventional approach. In that analysis, the gene-targeting therapy is effective if the patient has the CDN mutations on EGFR, but the effect is reversed if the EGFR mutations are non-CDN mutations.

      Reviewer #1 (Public Review):

      The authors developed a rigorous methodology for identifying all Cancer Driving Nucleotides (CDNs) by leveraging the concept of massively repeated evolution in cancer. By focusing on mutations that recur frequently in pan-cancer, they aimed to differentiate between true driver mutations and neutral mutations, ultimately enhancing the understanding of the mutational landscape that drives tumorigenesis. Their goal was to call a comprehensive catalogue of CDNs to inform more effective targeted therapies and address issues such as drug resistance.

      Strengths

      (1) The authors introduced a concept of using massively repeated evolution to identify CDNs. This approach recognizes that advantageous mutations recur frequently (at least 3 times) across cancer patients, providing a lens to identify true cancer drivers.

      (2) The theory showed the feasibility of identifying almost all CDNs if the number of sequenced patients increases to 100,000 for each cancer type.

      Weaknesses

      (1) The methodology remains theoretical and no novel true driver mutations were identified in this study.

      We now address the weakness criticism, which is gratefully received.

      The second part of the criticism (no novel true driver mutations were identified in this study) has been answered in the long responses to eLife assessment above. The first part “The methodology remains theoretical” is somewhat unclear. It might be the lead to the second part. However, just in case, we interpret the word “theoretical” to mean “the lack of experimental proof” and answer below.

      As Reviewer #1 noted, a common limitation of theoretical and statistical analyses of cancer drivers is the need to validate their selective advantage through in vitro or in vivo functional testing. This concern is echoed by both reviewers in the companion paper (eLife-RP-RA-2024-99341), prompting us to consider the methodology for functional testing of potential cancer drivers. An intuitive approach would involve introducing putative driver mutations into normal cells and observing phenotypic transformation in vitro and in vivo. In a recent stepwise-edited human melanoma model, Hodis et al. demonstrated that disease-relevant phenotypes depend on the “correct” combinations of multiple driver mutations (Hodis et al. 2022). Other high-throughput strategies can be broadly categorized into two approaches: (1) introducing candidate driver mutations into pre-malignant model systems that already harbor a canonical mutant driver (Drost and Clevers 2018; Grzeskowiak et al. 2018; Michels et al. 2020) and (2) introducing candidate driver mutations into growth factor-dependent cell models and assessing their impact on resulting fitness (Bailey et al. 2018; Ng et al. 2018). The underlying assumption of these strategies is that the fitness outcomes of candidate driver mutations are influenced by pre-existing driver mutations and the specific pathways or cancer hallmarks being investigated. This confines the functional test of potential cancer driver mutations to conventional cancer pathways. A comprehensive identification of CDNs is therefore crucial to overcome these limitations. In conjunction with other driver signal detection methods, our study aims to provide a more comprehensive profile of driver mutations, thereby enabling the functional testing of drivers involved in non-conventional cancer evolution pathways.

      (2) Different cancer types have unique mutational landscapes. The methodology, while robust, might face challenges in uniformly identifying CDNs across various cancers with distinct genetic and epigenetic contexts.

      We appreciate the comment. Indeed, different cancer types should have different genetic and epigenetic landscapes. In that case, one may have expected CDNs to be poorly shared among cancer types. However, as reported in Fig. 4 of the companion study, the sharing of CDNs across cancer types is far more common than the sharing of CDGs (Cancer Driving Genes). We suggest that CDNs have a much higher resolution than CDGs, whereby the signals are diluted by non-driver mutations. In other words, despite that the mutational landscape may be cancer-type specific, the pan-cancer selective pressure may be sufficiently high to permit the detection of CDN sharing among cancer types.

      Below, we shall respond in greater details. Epigenetic factors, such as chromatin states, methylation/acetylation levels, and replication timing, can provide valuable insights when analyzing mutational landscapes at a regional scale (Stamatoyannopoulos et al. 2009; Lawrence et al. 2013; Makova and Hardison 2015; Baylin and Jones 2016; Alexandrov et al. 2020; Abascal et al. 2021; Sherman et al. 2022). However, at the site-specific level, the effectiveness of these covariates in predicting mutational landscapes depends on their integration into a detailed model. Overemphasizing these covariates could lead to false negatives for known driver mutations (Hess et al. 2019; Elliott and Larsson 2021). In figure 3B of the main text, we illustrate the discrepancy between the mutation rate predictions from Dig and empirical observation. Ideally, no covariates would be needed under extensive sample sizes, where each mutable genomic sites would have sufficient mutations to yield a statistic significance and consequently, synonymous mutations would be sufficient for the characterization of mutational landscape. In this sense, the integration of mutational covariates represents a compromise under current sample size. In our study, the effect of unique mutational landscapes is captured by E(u), the mean mutation rate for each cancer type. We further accounted for the variability of site-level mutability using a gamma distribution. The primary goal of our study is to determine the upper limit of mutation recurrences under mutational mechanisms only. While selection force acts blindly to genomic features, mutational hotspots should exhibit common characteristics determined by their underlying mechanisms. In the main text, we attempted to identify such shared features among CDNs. Until these mutational mechanisms are fully understood, CDNs should be considered as potential driver mutations.

      (3) L223, the statement "In other words, the sequences surrounding the high-recurrence sites appear rather random.". Since it was a pan-cancer analysis, the unique patterns of each cancer type could be strongly diluted in the pan-cancer data.

      We now state that the analyses of mutation characteristic have been applied to the individual cancer types and did not find any pattern that deviates from randomness. Nevertheless, it may be argued that, with the exception of those with sufficiently large sample sizes such as lung and breast cancers, most datasets do not have the power to reject the null hypothesis. To alleviate this concern, we applied the ResNet and LSTM/GRU methods for the discovery of potential mutation motifs within each cancer type. All methods are more powerful than the one used but the results are the same – no cancer type yields a mutation pattern that can reject the null hypothesis of randomness (see below).

      As a positive control, we used these methods for the discovery of splicing sites of human exons. When aligned up with splicing site situated in the center (position 51 in the following plot), the sequence motif would look like:

      Author response image 1.

      5-prime

      Author response image 2.

      3-prime

      However, To account for the potential influence of distance from the mutant site in motif analysis, we randomly shuffled the splicing sites within a specified window around the alignment center, and their sequence logo now looks like:

      Author response image 3.

      5-prime shuffled

      Author response image 4.

      3-prime shuffled

      Author response image 5.

      random sequences from coding regions

      The classification results of the shuffled 5-prime (donner), 3-prime (acceptor) and random sequences from coding regions (Random CDS) are presented in the Author response table 1 (The accuracy for the aligned results, which is approximately 99%, is not shown here).

      Author response table 1.

      With the positive results from these positive controls (splicing site motifs) validating our methodology, we applied the same model structure to the train and test of potential mutational motifs of CDN sites. All models achieved approximately 50% accuracy in CDN motif analysis, suggesting that the sequence contexts surrounding CDN sites are not significantly different from other coding regions of the genome. This further implies that the recurrence of mutations at CDN sites is more likely driven by selection rather than mutational mechanisms.

      Note that this preliminary analysis may be limited by insufficient training data for CDN sites. Future studies will require larger sample sizes and more sophisticated models to address these limitations.

      (4) To solidify the findings, the results need to be replicated in an independent dataset.

      Figure 7 validates our CDN findings using the GENIE dataset, which primarily consists of targeted sequencing data from various panels. By focusing on the same genomic regions sequenced by GENIE, we observed a 3-5 fold increase in the number of discovered CDNs as sample size increased from approximately 1000 to 9000. Moreover, the majority of CDNs identified in TCGA were confirmed as CDNs in GENIE.

      (5) The key scripts and the list of key results (i.e., CDN sites with i{greater than or equal to}3) need to be shared to enable replication, validation, and further research. So far, only CDN sites with i{greater than or equal to}20 have been shared.

      We have now updated the “Data Availability” section in the main text, the corresponding scripts for key results are available on Gitlab at: https://gitlab.com/ultramicroevo/cdn_v1.

      (6) The versions of data used in this study are not clearly detailed, such as the specific version of gnomAD and the version and date of TCGA data downloaded from the GDC Data Portal.

      The versions of data sources have now been updated in the revised manuscript.

      Recommendations For The Authors:

      (1) L119, states "22.7 million nonsynonymous sites," but Table 1 lists the number as 22,540,623 (22.5 million). This discrepancy needs to be addressed for consistency.<br /> (2) Figure 2B, there is an unexplained drop in the line at i = 6 and 7 (from 83 to 45). Clarification is needed on why this drop occurs.<br /> (3) Figure 3A, for the CNS type, data for recurrence at 8 and 9 are missing. An explanation should be provided for this absence.<br /> (4) L201, the title refers to "100-mers," but L218 mentions "101-mers." This inconsistency needs to be corrected to ensure clarity and accuracy.<br /> (5) Figures 6 and 7 currently lack titles. Titles should be added to these figures to improve readability.

      Thanks. All corrections have been incorporated into the revised manuscript.

      Reviewer #2 (Public Review):<br /> Summary:<br /> The authors propose that cancer-driver mutations can be identified by Cancer Driving Nucleotides (CDNs). CDNs are defined as SNVs that occur frequently in genes. There are many ways to define cancer driver mutations, and the strengths and weaknesses are the reliance on statistics to define them.<br /> Strengths:<br /> There are many well-known approaches and studies that have already identified many canonical driver mutations. A potential strength is that mutation frequencies may be able to identify as yet unrecognized driver mutations. They use a previously developed method to estimate mutation hotspots across the genome (Dig, Sherman et al 2022). This publication has already used cancer sequence data to infer driver mutations based on higher-than-expected mutation frequencies. The advance here is to further illustrate that recurrent mutations (estimated at 3 or more mutations (CDNs) at the same base) are more likely to be the result of selection for a driver mutation (Figure 3). Further analysis indicates that mutation sequence context (Figure 4) or mutation mechanisms (Figure 5) are unlikely to be major causes for recurrent point mutations. Finally, they calculate (Figure 6) that most driver mutations identifiable by the CDN approach could be identified with about 100,000 to one million tumor coding genomes.<br /> Weaknesses:<br /> The manuscript does provide specific examples where recurrent mutations identify known driver mutations but do not identify "new" candidate driver mutations. Driver mutation validation is difficult and at least clinically, frequency (ie observed in multiple other cancer samples) is indeed commonly used to judge if an SNV has driver potential. The method would miss alternative ways to trigger driver alterations (translocations, indels, epigenetic, CNVs). Nevertheless, the value of the manuscript is its quantitative analysis of why mutation frequencies can identify cancer driver mutations.

      Recommendations For The Authors<br /> Whereas the analysis of driver mutations in WES has been extensive, the application of the method to WGS data (ie the noncoding regions) would provide new information.

      We appreciate that Reviewer #2 has suggested the potential application of our method to noncoding regions. Currently, the background mutation model is based on the site level mutations in coding regions, which hinders its direct applications in other mutation types such as CNVs, translocations and indels. We acknowledge that the proportion of patients with driver event involving CNV (73%) is comparable to that of coding point mutations (76%) as reported in the PCAWG analysis (Fig. 2A from Campbell et al., 2020). In future studies, we will attempt to establish a CNV-based background mutation rate model to identify positive selection signals driving tumorigenesis.

      References

      Abascal F, Harvey LMR, Mitchell E, Lawson ARJ, Lensing SV, Ellis P, Russell AJC, Alcantara RE, Baez-Ortega A, Wang Y, et al. 2021. Somatic mutation landscapes at single-molecule resolution. Nature:1–6.

      Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, Boot A, Covington KR, Gordenin DA, Bergstrom EN, et al. 2020. The repertoire of mutational signatures in human cancer. Nature 578:94–101.

      Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B, et al. 2018. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173:371-385.e18.

      Baylin SB, Jones PA. 2016. Epigenetic Determinants of Cancer. Cold Spring Harb Perspect Biol 8:a019505.

      Campbell PJ, Getz G, Korbel JO, Stuart JM, Jennings JL, Stein LD, Perry MD, Nahal-Bose HK, Ouellette BFF, Li CH, et al. 2020. Pan-cancer analysis of whole genomes. Nature 578:82–93.

      Drost J, Clevers H. 2018. Organoids in cancer research. Nat Rev Cancer 18:407–418.

      Elliott K, Larsson E. 2021. Non-coding driver mutations in human cancer. Nat Rev Cancer 21:500–509.

      Grzeskowiak CL, Kundu ST, Mo X, Ivanov AA, Zagorodna O, Lu H, Chapple RH, Tsang YH, Moreno D, Mosqueda M, et al. 2018. In vivo screening identifies GATAD2B as a metastasis driver in KRAS-driven lung cancer. Nat Commun 9:2732.

      Hess JM, Bernards A, Kim J, Miller M, Taylor-Weiner A, Haradhvala NJ, Lawrence MS, Getz G. 2019. Passenger Hotspot Mutations in Cancer. Cancer Cell 36:288-301.e14.

      Hodis E, Triglia ET, Kwon JYH, Biancalani T, Zakka LR, Parkar S, Hütter J-C, Buffoni L, Delorey TM, Phillips D, et al. 2022. Stepwise-edited, human melanoma models reveal mutations’ effect on tumor and microenvironment. Science 376:eabi8175.

      Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, Carter SL, Stewart C, Mermel CH, Roberts SA, et al. 2013. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499:214–218.

      Makova KD, Hardison RC. 2015. The effects of chromatin organization on variation in mutation rates in the genome. Nat Rev Genet 16:213–223.

      Michels BE, Mosa MH, Streibl BI, Zhan T, Menche C, Abou-El-Ardat K, Darvishi T, Członka E, Wagner S, Winter J, et al. 2020. Pooled In Vitro and In Vivo CRISPR-Cas9 Screening Identifies Tumor Suppressors in Human Colon Organoids. Cell Stem Cell 26:782-792.e7.

      Ng PK-S, Li J, Jeong KJ, Shao S, Chen H, Tsang YH, Sengupta S, Wang Z, Bhavana VH, Tran R, et al. 2018. Systematic Functional Annotation of Somatic Mutations in Cancer. Cancer Cell 33:450-462.e10.

      Sherman MA, Yaari AU, Priebe O, Dietlein F, Loh P-R, Berger B. 2022. Genome-wide mapping of somatic mutation rates uncovers drivers of cancer. Nat Biotechnol 40:1634–1643.

      Stamatoyannopoulos JA, Adzhubei I, Thurman RE, Kryukov GV, Mirkin SM, Sunyaev SR. 2009. Human mutation rate associated with DNA replication timing. Nat Genet 41:393–395.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary:  

      Wang et al. investigate sexual dimorphic changes in the transcriptome of aged humans. This study relies upon analysis of the Genotype-Tissue Expression dataset that includes 54 tissues from human donors. The authors investigate 17,000 transcriptomes from 35 tissues to investigate the effect of age and sex on transcriptomic variation, including the analysis of alternative splicing. Alternative splicing is becoming more appreciated as an influence in the aging process, but how it is affected by sexual dimorphism is still largely unclear. The authors investigated multiple tissues but ended up distilling brain tissue down to four separate regions: decision, hormone, memory, and movement. Building upon prior work, the authors used an analysis method called principal component-based signal-to-variation ratio (pcSVR) to quantify differences between sex or age by considering data dispersion. This method also considers differentially expressed genes and alternative splicing events. 

      Strengths:  

      (1) The authors investigate sexual dimorphism on gene expression and alternative splicing events with age in multiple tissues from a large publicly available data set that allows for reanalysis. 

      (2) Furthermore, the authors take into account the ethnic background of donors. Identification of agingmodulating genes could be useful for the reanalysis of prior data sets. 

      Weaknesses:  

      The models built off of the GTEx dataset should be tested in another data set (ex. Alzheimer's disease) where there are functional changes that can be correlated. Gene-length-dependent transcription decline, which occurs with age and disease, should also be investigated in this data set for potential sexual dimorphism. 

      We appreciate the reviewer’s constructive feedback and acknowledgment of the strengths of our study. The detailed results are included in the ‘Recommendations for the authors’ from the editorial office. Below we summarize our feedback that address the concerns of this reviewer:

      (1) Independent Alzheimer’s disease (AD) datasets:

      We acknowledge the importance of validating our models beyond GTEx to assess their generalizability aging to Alzheimer’s disease. While GTEx provides valuable transcriptomic data across multiple tissues, it lacks direct functional assessments linked to disease states. We have already analyzed RNA-seq data from ROSMAP and GEO in Figure 4, focusing on sex-biased gene expression and splicing changes between aging and AD.  The results showed a male-biased association with Alzheimer’s disease at AS resolution, indicating that the AS changes during aging could contribute more to AD in males than females. We added a highlight to this analysis in the manuscript (Pages 6-7).

      (2) Sexual dimorphism in Gene-Length-Dependent Transcription Decline (GLTD) 

      We appreciate the reviewer’s suggestion to explore gene-length-dependent transcription decline (GLTD), which has been implicated in both aging and disease. As the reviewer suggested, our analysis revealed that GLTD exhibits sex-biased patterns in different tissues, aligning with recent literature on sex-dimorphic transcriptional aging. Our findings also revealed that longer genes with greater transcriptional decline are enriched in AD-related pathways. We have incorporated this new analysis in the ‘Recommendations for the authors’ in Author response image 5-6 and expanded the discussion of the biological relevance. 

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, Wang et al analyze ~17,000 transcriptomes from 35 human tissues from the GTEx database and address transcriptomic variations due to age and sex. They identified both gene expression changes as well as alternative splicing events that differ among sexes. Using breakpoint analysis, the authors find sex dimorphic shifts begin with declining sex hormone levels with males being affected more than females. This is an important pan-tissue transcriptomic study exploring age and sex-dependent changes although not the first one. 

      Strengths:  

      (1) The authors use sophisticated modeling and statistics for differential, correlational, and predictive analysis. 

      (2) The authors consider important variables such as genetic background, ethnicity, sampling bias, sample sizes, detected genes, etc. 

      (3) This is likely the first study to evaluate alternative splicing changes with age and sex at a pan-tissue scale. 

      (4) Sex dimorphism with age is an important topic and is thoroughly analyzed in this study.  Weaknesses:  

      (1) The findings have not been independently validated in a separate cohort or through experiments. Only selective splicing factor regulation has been verified in other studies. 

      (2) It seems the authors have not considered PMI or manner of death as a variable in their analysis. 

      (3) The manuscript is very dense and sometimes difficult to follow due to many different types of analyses and correlations. 

      (4) Short-read data can detect and quantify alternative splicing events with only moderate confidence and therefore the generalizability of these findings remains to be experimentally validated. 

      We appreciate the thorough review and thoughtful feedback. We have addressed the reviewer’s concerns and added clarification. The detailed results are included in Recommendations for the authors. Here are the summaries.

      (1) Challenge of independent validation in separate cohorts

      • The GTEx dataset includes the most comprehensive transcriptome resource for studying population-level differences in age and sex across tissues, particularly including large-scale brain samples. This provides a unique opportunity to analyze sex-dimorphic aging and the relevance of age-associated diseases.  Several technical issues, including cell type heterogeneity, postmortem artifacts, as well as sequencing biases, lead to technical challenges in different cohorts.

      • As the reviewer mentioned, we analyzed transcriptomic data from Shen et al. (2024) and compared them with GTEx results (Author response image 2). Limited overlap in differentially expressed genes again highlighted the challenges in cross-dataset validation due to the differences in cell composition and data processing (peripheral blood mononuclear cells (PBMCs) vs whole blood). 

      • Due to the limited human brain transcriptome data covering different age and sex groups, we found mouse hippocampus datasets from Mass spectrometry (MS), including young and old, as well as female and male groups.  The results validated the expression of splicing factors in brain (Author response image 9). This cross-species consistency supports the robustness of our findings in human brain aging.

      (2) Effects of Postmortem Interval, Manner of Death, and Time of Death

      • We agree that the sample collections could introduce confounding effects. To address this, we calculated the correlations between the confounding factors with Postmortem Interval (PMI), Manner of Death (DTHMNNR), or Time of Death (DTHTIME and DTHSEASON). We observed strong correlations in some surrogate variables in most tissues, indicating that those factors could be well-regressed during our analysis (Recommendations for the authors, Figure S4 and R8). 

      • In addition, we re-evaluated our analyses while incorporating PMI as a covariate in our models. Our results align with our initial findings (Author response image 1), suggesting that age- and sex-dependent transcriptomic changes are not strongly confounded by PMI and confirming that our model has controlled PMI. These results are detailed in ‘Recommendations for the authors’ and included in Figure S4C-E with the description in text, Page 5. 

      (3) Readability of manuscript and flow of analyses

      • In summary, our study first examined global alternative splicing (AS) and gene expression (GE) across all tissues before focusing on specific regions for deeper insights. To improve clarity, we have made the following revisions:

      • Add clearer statements when transitioning between all-tissue and brain-specific analyses (Page 6-7).

      • Modify the subtitle of Results to highlight all-tissue vs. brain analyses (Page 6).

      • These refinements could enhance the manuscript’s structure, making the flow of analysis and conclusions more intuitive for readers.

      (4) Limitations of short-read RNA-seq for splicing analysis

      • Short-read RNA-seq provides only moderate confidence in detecting and quantifying full-length isoforms. However, its higher sequencing depth makes it more suitable for quantifying changes in alternative splicing (AS) events.

      • Our analysis focused on splicing event-level quantification, applying stringent filters and using our GPU-based tool, which showed strong concordance with RT-PCR and other pipelines. Therefore, we also cited and included the updated Paean manuscript that benchmarks its performance in AS analysis.

      Reviewer #3 (Public review): 

      Summary:  

      In this study, Wang et al utilized the available GTEx data to compile a comprehensive analysis that attempt to reveal aging-related sex-dimorphic gene expression as well as alternative splicing changes in humans. 

      The key conclusions based on their analysis are that. 

      (1) extensive sex-dimorphisms during aging with distinct patterns of change in gene expression and alternative splicing (AS), and 

      (2) the male-biased age-associated AS events have a stronger association with Alzheimer's disease, and  (3) the female-biased events are often regulated by several sex-biased splicing factors that may be controlled by estrogen receptors. They further performed break-point analysis and revealed that in males there are two main breakpoints around ages 35 and 50, while in females, there is only one breakpoint at 45. 

      Strengths:  

      This study sets an ambitious goal, leveraging the extensive GTEx dataset to investigate aging-related, sexdimorphic gene expression and alternative splicing changes in humans. The research addresses a significant question, as our understanding of sex-dimorphic gene expression in the context of human aging is still in its early stages. Advancing our knowledge of these molecular changes is vital for identifying therapeutic targets for age-related diseases and extending the human health span. The study is highly comprehensive, and the authors are commendable for their attempted thorough analysis of both gene expression and alternative splicing - an area often overlooked in similar studies. 

      We thank this reviewer for the insightful review and recognition of our study's significance.  We agree with the reviewer on how to examine sex-dimorphic gene expression and alternative splicing in aging by using the GTEx dataset.  This is indeed an essential aspect of developing potential therapeutic targets for agerelated diseases to promote human health span.

      Weaknesses:  

      Due to the inherent noise within the GTEx dataset - which includes numerous variables beyond aging and sex - there are significant technical concerns surrounding this study. Additionally, the lack of crossvalidation with independent, existing data raises questions about whether the observed gene expression changes genuinely reflect those associated with human aging. For instance, the break-point analysis in this study identifies two major breakpoints in males around ages 35 and 50, and one breakpoint in females at age 45; however, these findings contradict a recent multi-omics longitudinal study involving 108 participants aged 25 to 75 years, where breakpoint at 44 and 60 years was observed in both male and females (Shen et al, 2024). These issues cast doubt on the robustness of the study's conclusions. Specific concerns are outlined below: 

      References: 

      Ferreira PG, Muñoz-Aguirre M, Reverter F, Sá Godinho CP, Sousa A, Amadoz A, Sodaei R, Hidalgo MR, Pervouchine D, Carbonell-Caballero J et al (2018) The effects of death and post-mortem cold ischemia on human tissue transcriptomes. Nature Communications 9: 490. 

      Shen X, Wang C, Zhou X, Zhou W, Hornburg D, Wu S, Snyder MP (2024) Nonlinear dynamics of multiomics profiles during human aging. Nature Aging. 

      Wucher V, Sodaei R, Amador R, Irimia M, Guigó R (2023) Day-night and seasonal variation of human gene expression across tissues. PLOS Biology 21: e3001986. 

      (1) The primary method used in this study is linear regression, incorporating age, sex, and age-by-sex interactions as covariates, alongside other confounding factors (such as ethnicity) as unknown variables. However, the analysis overlooks two critical known variables in the GTEx dataset: time of death (TOD) and postmortem interval (PMI). Both TOD and PMI are recorded for each sample and account for substantial variance in gene expression profiles. A recent study by Wucher et al.(Wucher et al, 2023) demonstrated the powerful impact of TOD on gene expression by using it to reconstruct human circadian and even circannual datasets. Similarly, Ferreira et al. (Ferreira et al, 2018) highlighted PMI's influence on gene expression patterns. Without properly adjusting for these two variables, confidence in the study's conclusions remains limited at best. 

      We appreciate the reviewer for raising this important point regarding the impact of post-mortem interval (PMI) and time of death (TOD) on gene expression, including the death seasons (DTHSEASON) and daytime (DTHTIME). To address this point, we carefully evaluated whether our linear model controlled for these factors as potential confounders. 

      Our results showed that PMI and TOD significantly correlated with the estimated covariates in most tissues, suggesting that their effects could be effectively regressed out using our model (Figure S4).  As the reviewers and editors suggested, we have now included this correlation analysis in the updated Figure S4C-E and the text in the Results section, citing relevant literature [1,2] (Page 5). 

      Author response image 1.

      The results of differential gene expression analysis with vs without the inclusion of PMI correction as a known covariate. The scatter plots show the correlations of significance levels (pvalues, left panel) and effect sizes (coefficients, right panel) of sex (A) and age (B). Whole-blood tissue is used as an example.

       

      In addition, we did the differential analysis that incorporated PMI as a covariate in the regression models and re-evaluated the age- and sex-related transcriptomic changes. Using WholeBlood gene expression as an example, our revised analysis shows that the inclusion of PMI in the covariates has minimal impact on the significance levels and effects of sex and age (i.e., p-values and coefficients, respectively), indicating that our findings are robust using confounding factors (Author response image 1). 

      (2) To demonstrate that their analysis is robust and that the covariates TOD and PMI are otherwise negligible - the authors should cross-validate their findings with independent datasets to confirm that the identified gene expression changes are reproducible for some tissues. For instance, the recent study by Shen et al. (Shen et al., 2024) in Nature Aging offers an excellent dataset for cross-validation, particularly for blood samples. Comparing the GTEx-derived results with this longitudinal transcriptome dataset would enable verification of gene expression changes at both the individual gene and pathway levels. Without such validation, confidence in the study's conclusions remains limited. 

      We thank the reviewer for the insightful suggestion regarding cross-validation with independent datasets. We understand that validating findings across datasets is crucial for ensuring robustness. As the reviewers suggested, we see whether there are some shared findings in the GTEx data with the study by Shen et al. (2024) in Nature Aging. However, after performing comparisons with our GTEx results in whole blood tissue, we found that the overlaps of differentially expressed genes are limited (Fig. 3). In our results, we found a large proportion of age-associated genes in the GTEx data, whereas just 54 genes are age-associated from Shen et al.’s PBMC data. 3 in 7 genes are differentially expressed in both datasets (Fig. 3A). Additionally, we performed the functional enrichment analysis on the GTEx-specific age-associated genes.

      We observed a strong enrichment in the biological pathways related to neutrophil functions and innate immune responses, which are specific to the cell compositions in whole blood rather than PBMC (Fig. 3B).

      Author response image 2.

      The comparison between the gene expression of whole blood tissue from GTEx and PBMCs from Shen et al. (A) The bar plot shows the number of age (left panel) or sex-associated  (right panel) genes in the two datasets. The grey bars highlight the proportion of overlapped genes in both datasets. (B) The top 10 significantly enriched biological processes in the GTEx-specific age-associated genes. The color bar shows the number of age-associated genes in specific pathways.

      These discrepancies highlighted the crucial factors in cross-dataset comparison:

      • Cell compositions: GTEx used whole blood, which contains all blood components, including neutrophils and erythrocytes, whereas PBMCs contain lymphocytes and monocytes. Under the influence of granulocytes and red blood cells in whole blood, the gene expression profiles between these two datasets are different.

      • Biological functions: Whole blood includes both innate and adaptive immune components; thus, aging-related gene expression changes in whole blood may include a broader systemic response than those in PBMCs. This difference in biological context contributes to the observed variation in the differentially expressed genes, as demonstrated by our functional enrichment analysis (Fig. 3B). 

      • Sequencing biases and data processing: The two datasets were generated using different RNAseq processing pipelines, including distinct normalization, batch correction, and quantification methodologies. These technical differences may introduce systematic variations that complicate direct cross-validation.

      Due to these fundamental problems, a direct one-to-one validation between the two datasets is challenging. We understand the importance of independent dataset validation and appreciate the reviewer’s suggestion. However, future studies could be performed more precisely if comparable whole-blood-based datasets are available. In addition, GTEx data provides nearly thousands of samples in whole blood, which is a largescale, comprehensive, and clinically relevant dataset for studying aging-related changes, particularly in innate immunity and inflammation, which are not well captured in PBMCs.

      (3) As a demonstration of the lack of such validation, in the Shen et al. study (Shen et al., 2024), breakpoints at 44 and 60 years were observed in both males and females, while this study identifies two major breakpoints in males around ages 35 and 50, and one breakpoint in females at age 45. What caused this discrepancy? 

      We thank the reviewer and the editors for both coming up with the non-linear multi-omic aging patterns observed by Shen et al.  They observed two prominent crests around the ages of 45 and 60 from omics data.

      Similarly, we also identified two breakpoints in our analysis, with some differences in specific age breakpoints. These could be the result of sample preparation methods and breakpoint definition. These responses are also included in the editor’s recommendations.

      Definition of breakpoints vs crests:

      • Crests represent age-related molecular changes at each time point across the human lifespan. They indicate the number of molecules that are differentially expressed during aging (q < 0.05), without considering individual expression levels.

      • Our breakpoints, in contrast, are identified after filtering the chronological trends using the Autoregressive Integrated Moving Average (ARIMA) model. We calculated the rate of change at each age point using the smooth approach and sliding windows. Breakpoints are defined as local maxima where the distance to the nearest minimum, relative to the global maximum. We indeed found some local wide peaks around 60 in some tissues, shown in Figure S10, however, we excluded these due to our strict cutoffs to remove noise.

      Differences and similarities between sequenced tissues: 

      • Whole-blood vs PBMC: In the GTEx RNA-seq data used in our study, whole blood samples from donors were sequenced, whereas their study used PBMCs. Whole blood contains all blood components, including red blood cells, platelets, granulocytes (e.g., neutrophils), lymphocytes, and monocytes, while PBMCs represent a subset of white blood cells, primarily consisting of lymphocytes (T cells, B cells, NK cells) and monocytes, excluding granulocytes and erythrocytes. As we mentioned in the previous responses, the gene expression changes observed in whole blood capture the contributions of neutrophils and other granulocytes, which are neglected in the PBMC profile (also shown in Figure S11C). 

      • For the shared tissues in two studies – skin, we looked at the non-linear changes during aging and found the same two breakpoints: 43 and 58. 

      Novelties in our study:

      • Whole blood can serve as a readily accessible resource for testing age-related disease biomarkers without cell separation, making it more practical for clinical applications.

      • Our analysis was performed on females and males, respectively. The main object of our analysis is to compare the differences in aging rates between sexes. Our results reveal clear sex-specific differences across multiple human tissues. Therefore, the identified breakpoints may differ when sex effects are not taken into account, highlighting the specificity of our analysis. 

      • Additionally, our breakpoints are integrated across multiple tissues. Our results showed that there is a large diversity of aging patterns in different tissues.

      As the reviewers and editors suggested, we have added the following statements to clarify this distinction in the Discussion section: ‘Our analysis observed the non-linear aging patterns with two breakpoints, which is consistent with recent findings, with differences in specific age points due to sex differences as well as tissue diversities 3.’ (Page 14), and ‘These breakpoints could represent key junctures in the aging process that align with the non-linear patterns of aging and disease progression.’ (Page 15)

      (4) Although the alternative splicing analysis is intriguing, the authors did not differentiate between splicing events that alter the protein-coding sequence and those that do not. Many splicing changes occurring in the 5' UTR and 3' UTR regions do not impact protein coding, so it is essential to filter these out and focus specifically on alternative splicing events that can modify protein-coding sequences. 

      The reviewer raises an important point. In our study, we included the AS events in protein-coding genes to gain a comprehensive understanding of sex-biased age-associated splicing. As the reviewer suggested, focusing on coding-sequence-altering events is particularly relevant to protein function. To address this, we performed an additional analysis to specifically annotate sBASEs occurring within the coding sequence (defeined as CDS-altering sBASEs) and reanalyzed their functional pathways and AD-associations (Author response image 3).  

      Our analysis revealed that most of the sBASEs are relevant to protein-coding sequences (CDS) across multiple tissues (Author response image 3A).  We then confirmed our findings using CDS-altering sBASEs. We found that those sBASEs in brain regions were significantly enriched in pathways related to amyloid-beta formation and actin filament organization (Author response image 3B). Notably, male-biased sBASEs in decision-related brain regions were particularly associated with dendrite development and regulation of cell morphogenesis, highlighting the sex-specific roles of sBASEs in brain functions. Additionally, we performed a random forest classification using only CDS-altering sBASEs in AD datasets (Author response image 3C-D), again confirming the malebiased association between aging and AD.

      Overall, we found that most of the identified sBASEs could modify protein-coding sequences, and our main conclusions remain consistent even after filtering out non-coding events. 

      Nevertheless, in addition to AS events that impact protein sequences, alternative splicing in untranslated regions (UTRs) also plays a critical regulatory role. Splicing events in the 5′ UTR can influence translation efficiency by modifying upstream open reading frames (uORFs) or RNA secondary structures, while splicing in the 3′UTR can affect mRNA stability, localization, and translation by altering microRNA binding sites and RNA-binding protein interactions. Given these functional implications, we believe that UTR-targeted AS events should also be considered to supplement the understanding of post-transcriptional gene regulation in future research.

      Author response image 3.

      The distribution and functional relevance of sBASEs with coding effects. (A) The number of sBASEs and CDS-altering sBASEs across multiple tissues. The deeper bars show the number of sBASEs whose alternative splice sites are located at protein-coding regions. (B) GO biological pathways in each sex and brain region. Heatmap shows the sex-specific pathways that are significantly enriched by CDS-altering sBASEs in more than 2 brain regions and sex. (C) Correlation between ADassociated and age-associated AS changes across the CDS-altering sBASEs that alter protein-coding sequences in females and males. (D) Performances of sex-stratified models predicted by CDS-altering sBASEs in 100 iterations using the random forest approach

      (5) One of the study's main conclusions - that "male-biased age-associated AS events have a stronger association with Alzheimer's disease" - is not supported by the data presented in Figure 4A, which shows an association with "regulation of amyloid precursor formation" only in female, not male, alternative splicing genes. Additionally, the gene ontology term "Alzheimer's disease" is absent from the unbiased GO analysis in Figure S6. These discrepancies suggest that the focus on Alzheimer's disease may reflect selective data interpretation rather than results driven by an unbiased analysis. 

      We thank the reviewer for this point. In our functional analysis, we identified distinct biological processes enriched in female- and male-biased AS genes, such as the regulation of amyloid precursor formation in females and structural constituents of the cytoskeleton in males. However, Alzheimer’s disease (AD) is a complex neurodegenerative disorder with multiple pathological mechanisms beyond amyloid-beta (Aβ) formation, many of which are strongly age-related in both sexes. This complexity motivates us to explore novel relationships between splicing and AD in distinct sexes.

      Although Figure 4A shows the enrichment of “regulation of amyloid precursor formation” in female-biased AS events, this does not contradict the broader enrichment of AD-related processes in male-biased AS events. Our disease ontology analysis supports this finding, as male-biased age-associated AS events are enriched in neurodegenerative diseases, including cognitive disorders. Additionally, we considered not only individual GO terms but also the disease-associated transcriptomic signatures from AD-related datasets, which collectively indicate a stronger association in males. 

      Regarding Figure S6 mentioned by the reviewer, the GO term “Alzheimer’s disease” is not explicitly listed in the heatmap because we filtered the pathways that are consistently enriched in multiple tissues. As noted in the figure legend, we only displayed sex-specific GO terms that were significant in at least 15 tissues. Then, since the brain is highly affected by age-related processes and neurological conditions show sex differences, the sex-biased AS events could help explain differential susceptibility to age-related cognitive decline and neurodegeneration. That’s why we chose the brain data for detailed analysis.

      To improve clarity, we have revised the text to describe the purpose of our analysis in brain rather than other tissues (Page 6-7). We appreciate the reviewer’s feedback, and we will consider additional analyses to further explore the sex-biased AS as well as disease risk in other tissues.

      (6) The experimental data presented in Figures 5E - I merely demonstrate that estrogen receptor regulates the expression of two splicing factors, SRSF1 and SRSF7, in an estradiol-dependent manner. However, this finding does not support the notion that this regulation actually contributes to sex-dimorphic alternative splicing changes during human aging. Notably, the authors do not provide evidence that SRSF1 and SRSF7 expression changes actually occur in a sex-dependent manner with human aging (in a manner similar to TIA1). As such, this experimental dataset is disconnected from the main focus of the study and does not substantiate the conclusions on sex-dimorphic splicing during human aging. The authors performed RNAseq in wild-type and ER mutant cells, and they should perform a comprehensive analysis of ER-dependent alternative splicing and compare the results with the GTEx data. It should be straightforward. 

      Thanks for the reviewer’s feedback. The main purpose of the analyses in Figures 5E-I was to explore which factors affect the sex-biased expression of splicing factors during aging and substantially regulate alternative splicing (AS). To address the reviewer’s concerns, we have included additional analysis and explained the challenge of linking estrogen receptor (ER)-regulated splicing factors to sex-dimorphic AS changes during human aging in specific human cell types. 

      • As suggested by the reviewer, we first examined the expression changes of SRSF1 and SRSF7 during aging in males and females, like TIA1 in decision-related brain regions (Fig. 5I).

      • Secondly, the regulation is based on a highly complex regulatory network involving multiple splicing factors and cell heterogeneity. Due to these complexities, we did not overlap ER-dependent AS changes with sBASEs from GTEx datasets directly. As far as the reviewer is concerned, we supplemented the AS analysis in the GSE89888 dataset (Fig. 5H) and identified the estrogenregulated AS events mediated by ESR1. We found that ~6% (26/396) of female-specific ageassociated AS events were regulated by ESR1, of which 6 sBASEs can be regulated by femalebiased splicing factors. The low overlaps could be represented by the limited coverage of different RNA-seq datasets and cell types used across these analyses. Notably, the results indicated that only a fraction of AS could be directly accounted for by estrogen via ESR1, suggesting the complexity of transcriptional and splicing regulatory networks during aging. 

      • Meanwhile, we downloaded independent experimental datasets to discover the regulation by our candidate splicing factors. Due to SRSF1 is identified as a potential regulator of sex-biased splicing, we analyzed RNA-seq data with SRSF1 knock-down (KD) glioblastoma cell lines (U87MG and U251), a type of brain cancer formed from astrocytes that support nerve cells 4.  As a result, we indeed found that some sBASEs are regulated by SRSF1 during aging through this experiment using brain cell lines (Author response image 4). Together, these results suggested that some of the SF-RNA regulatory relationships can be observed in another cellular system, further supporting our findings. 

      Due to the limitations of cell-based models and the complexity in the splicing regulatory network, it is challenging to directly validate aging regulation, particularly between different sexes, based on ER treatments in vivo. However, our findings still provide valuable mechanistic insights into ER-regulated splicing factors, implying their potential role in sex-biased aging.

      Author response image 4.

      SRSF1 regulations on specific sBASEs using SRSF1 knock-down RNA-seq data in GBM cells. Three examples are shown to be regulated during aging with significant changes between SRSF1 KD vs control in U251 and U87MG cell lines. The splicing diagrams are shown below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      The authors found that alternative splicing was affected by both sex and age across many tissues, with gene expression differences affected by both parameters only present in some tissues. This trend was consistent when the effects of sex chromosomes were subtracted from the analysis. The effect of aging on differential gene expression and alternative splicing was more prevalent in male than female samples. For analysis purposes, young subjects were deemed to be anyone under 40, and old subjects were over 60 years old. The authors then investigated if specific genes or alternative splicing events were responsible for these effects. Some candidate genes or splicing events were identified but there was little overlap between tissues, suggesting no universal gene or event as a driver of aging. Surrogate variables like the ethnic backgrounds of donors were also investigated. Ultimately the authors found that alternative splicing events showed a stronger sexual dimorphic effect with age than did differential gene expression and that at least for the brain, alternative splicing changes showed a bias for Alzheimer's disease in male samples. This was highlighted by examples of exon skipping in SCL43A2 and FAM107A in males that were associated respectively with plaques and tangles. 

      The authors go on to identify sexual dimorphic differences in splicing factors in particular brain regions during age. Finally, the authors performed analysis for aging-modulated genes, identifying nearly 1000 across the tissues, nearly 70% of which are sex-specific. Their work suggests that further analysis of these aging-modulated genes could be differentially modulating the transcriptome based on sex. The work is novel and interesting, especially investigating sexual dimorphism in alternative splicing. However, the work is still preliminary, and these assumptions need to be applied to other data sets beyond GTEx for validation as well as some other phenomena that need to be considered. I recommend major revisions to address the points below. 

      (1) At the beginning of the results section, the authors state that the brain is stratified into four functional regions. It would be useful to explicitly state those four regions in the text at that point. 

      We agree that specifying these regions early in the text will improve clarity and provide the reader with a clear understanding of the analysis. As the reviewer’s suggestion, we revised the Results section (Page 3) to explicitly state the four functional brain regions as follows: ‘Due to data sparseness, the brain tissues were recombined into four functional regions (table S1), including hormone- or emotion-related region, movement-related region, memory-related region, and decision-related region (See Methods).’. This ensures that the regions are clearly defined before the subsequent analysis is presented. 

      (2) The manuscript becomes a bit confusing when the authors shift from all the tissues as a whole specifically to the brain and then back to the larger tissue set to make assumptions. This can be a bit confusing and should be better delineated.

      We thank the reviewer and editor for the feedback regarding the transitions between the analysis of all tissues and the brain-specific analysis. In our study, we first conducted a broad analysis of alternative splicing (AS) and gene expression (GE) across all tissues. For the AS analyses, we did sBASEs analysis in all tissues and then focused on specific tissue (i.e., brain) whose splicing changes are functionally enriched with age-related diseases.  For the GE analyses, we also analyzed the aging rate across tissues and identified the tissue-specific/shared patterns. 

      We agree that the shifts of the tissues for AS and GE may cause some confusion, and have made the following revisions to delineate why we focused on different tissues for distinct analyses:

      • We have added clear statements to better delineate when we shift focus from the analysis of all tissues to the region-specific analysis and vice versa. For instance, in the Results section (Page 67), we include a transitional phrase: ‘Having established patterns across all tissues, we now turn to a more focused analysis to investigate tissue-specific alternative splicing changes.’

      • To improve the overall structure, we have reorganized the Results section, adding distinct subheadings for the analysis of all tissues and the brain (Page 6), which should make the transition between these sections smoother and more intuitive for the reader.

      We believe that these revisions will make the manuscript’s structure clearer and allow the reader to better follow the flow of the analysis and the subsequent conclusions.

      (3) Gene-length-dependent transcription decline (GLTD) is another phenomenon that occurs with aging and is known to be associated with Alzheimer's disease [PMID38519330]. The authors should make some statement if this is present in their dataset and if any sexual dimorphism in tissues is present. 

      We thank the editors and reviewers for bringing up the possible connection of gene-length-dependent transcription decline (GLTD), which was reported to be associated with both aging and Alzheimer’s disease (AD). We appreciate the reviewer’s suggestion and have addressed whether GLTD is present in our dataset and whether any sex differences are observed in this context.

      We evaluated GLTD using the correlation between gene length with age-associated changes (i.e., the coefficients of the ‘age’ term in the linear regression model) in GTEx data. We did observe strong evidence of GLTD, particularly in the brain, heart, muscle, pancreas, spleen, skin, muscle, etc (Author response image 5A). In brain, we performed the functional enrichment analysis on the genes with Foldchange > 2 and length > 10<sup>5</sup> bp (Author response image 5B). We found that these extremely long genes are significantly relevant to synapse and neuron functions. These findings align with previous studies showing that GLTD can occur with aging in the tissues that are relevant to Alzheimer’s disease, cardiovascular diseases, and common failures of metabolism (e.g., diabetes) [5,6]. Additionally, it was not a ubiquitous phenomenon across all tissues. The correlations could be positive in tissues like adipose and artery.  These findings suggested the GLTD could be varied and tissuespecific in its manifestation during aging. 

      Author response image 5.

      (A) The correlation between gene length and age-associated changes across GTEx tissues in human samples. The correlation tests are evaluated using Spearman’s approach. The color bar indicates the -log10 transformed p-values in the correlation test. (B) The results of GO enrichment analysis using the genes with Foldchange > 2 and length > 10<sup>5</sup> bp. The parent terms calculated by ‘rrvgo’ with a similarity threshold of 0.9 are shown.

      Regarding sexual dimorphism, we conducted this analysis in females and males, respectively (Author response image 6). We found GLTD exists in both females and males in most tissues, such as brain, whole blood, muscle, etc, consistent with the previous results without considering the sex groups. Interestingly, we observed sexbiased patterns in certain tissues. In particular, the left ventricle, pancreas, and hippocampus showed notable male-biased patterns in the degree of transcriptional decline with gene length, whereas skin, liver, small intestine, and esophagus showed that in females. These findings suggest that GLTD could be relevant to aging and age-related diseases; the levels of expression and sexual dimorphism may vary depending on the tissue type. We hope this clarification addresses the reviewer’s concern and provides a more comprehensive understanding of the GLTD and sex differences observed in our dataset. 

      Author response image 6.

      The correlation between gene length and age-associated changes across tissues in females and males, respectively. The correlation tests are evaluated using the Spearman’s approach. The red dots indicate the significant correlations in females, while the navy dots show those in males.

      (4) Because the majority of this work has been performed in the GTEx dataset, applying this analysis to another publicly available dataset would be useful validation. For instance, the authors have interesting findings in the brain and correlations to Alzheimer's disease. Analysis of an existing RNAseq dataset from Alzheimer's disease patients and controls (with functional outcomes) would provide more evidence beyond the preliminary findings from GTEx. 

      We appreciate the reviewer’s suggestion on the validation of our findings by applying our analysis to independent RNA-seq datasets from Alzheimer’s disease patients. 

      • We have used two Alzheimer’s disease datasets, GEO and ROSMAP, to investigate the correlation between aging and Alzheimer’s disease (AD) and included these analyses in our study (Fig. 4B-C and Figure S8C).

      • In the Results section (Page 7), we have presented the results of this validation, where we identified correlations between sex-biased aging-related splicing changes and AD-related changes. These findings support the conclusions from the GTEx dataset and further strengthen the relevance of our results to AD.

      As suggested, we have updated the manuscript to more explicitly highlight this validation in the Discussion section (Page 12), noting: ‘We further validated our findings using Alzheimer’s disease dataset, ROSMAP, where we observed consistent correlations between aging-related splicing changes and Alzheimer’s disease-related changes, providing additional evidence for the robustness of our results.’ 

      Reviewer #2 (Recommendations for the authors): 

      (1) In the text (Introduction and Discussion), the authors mention analyzing 54 tissues, the abstract states 35 tissues, Table S1 lists 48, and Figure 2A-B shows 33. Could the authors please clarify exactly how many tissues they used? I am also confused by the sample numbers in Table S1. For example: for adiposesubcutaneous tissue, the total number of females is listed as 218 but the sum of young and old females is only 110. Does this mean some samples were excluded? What is the exclusion criterion? 

      We thank the reviewers and editors for pointing out the discrepancies regarding the number of tissues analyzed and the sample numbers in Table S1. We appreciate the opportunity to clarify these points:

      Number of tissues analyzed:

      • We downloaded and analyzed 17,382 samples in 54 tissues from GTEx in total (31 tissues and 13 brain regions), as mentioned in the Results, Methods, and Discussion sections. Table S1 lists 48 tissues (31 tissues, 13 brain regions, and 4 merged brain regions), which include a refined classification of the tissues we analyzed, accounting for the variations in brain region categorization in the dataset.

      • The discrepancy also arises from the different sample size cutoffs in specific analyses. For pcSVR analysis (Figure 2A-B), we did the subsampling for the permutation analysis for certain key findings, so we filtered a subset of 33 tissues (29 tissues and 4 merged brain regions), which included at least 3 samples in each age group in females or males. 

      • To resolve this, we have clarified the total number of tissues analyzed and aligned the numbers across the manuscript. In the revised manuscript, we now explicitly state in both the Abstract and Methods sections that 54 tissues were analyzed in the context of this study. We added a note in Methods to clarify that 35 tissues are 31 tissues and 4 merged brain regions (Page 16). In Figure 2A-B, we clarified that the 33 tissues are filtered due to the usage in this analysis (Page 17).

      Sample numbers in Table S1:

      • Regarding the sample sizes of age groups, the discrepancy occurred due to the classification of the age groups. We classify the samples into three: Young, Middle, and Old, as mentioned in the Results section (Page 4). 

      • Additionally, we excluded the sample sizes in 13 single brain regions. We aligned the total tissue number to 35 with our texts.

      We hope this resolves the confusion regarding the number of tissues and the sample sizes used in the analysis. These clarifications have been incorporated into the revised manuscript to ensure consistency.

      (2) Was post-mortem interval (PMI) or manner of death considered in the model? For example, traumatic death may have major consequences on gene expression. Similarly, a few tissues have low sample numbers, for example, kidney cortex and brain. The pooling of brain samples is explained and the kidney cortex is excluded, so why is it listed in Table S1? 

      Thank you for raising this important point regarding the potential impact of post-mortem interval (PMI) and manner of death (DTHMNNR) on gene expression. We carefully considered both factors as potential confounders in our analysis. 

      Specifically, to evaluate their impacts, we calculated the correlations between the coefficients of PMI or manner of death, with the confounding factors. Our results showed that PMI and DTHMNNR are significantly correlated with the covariates in most tissues, suggesting that their effects could be effectively regressed in our model (Figure S4). As we have mentioned in Figure S4 and Author response image 1, we conducted a differential analysis that incorporated PMI as a covariate in the regression models and re-evaluated the age- and sex-related transcriptomic changes to address this concern. The high correlations showed the minor effect size of PMI when including the covariates in the model. As suggested by the reviewers and editors, we have now included this correlation analysis in Figure S4C-E and updated the text in the results section (Page 5).

      Additionally, as the responses above, Table S1 provides the general sample sizes of all GTEx tissues without filtering. We have modified the table to include a total of 35 tissues, including 31 non-brain tissues and 4 brain regions.

      (3) It might be important to show a simple visual of cohort details such as age ranges, sexes, ethnicities, PMIs, etc. 

      To address this, we added summary figures to illustrate the distributions of key demographic variables, including age, sex, BMI, ethnicity, post-mortem intervals (PMIs), and manner of death (DTHMNNR) (Author response image 7 and Author response image 8). This will provide readers with a clearer overview of the dataset composition and potential covariates affecting the analysis. 

      Author response image 7.

      Age (left panel), BMI (Body Mass Index) (middle panel), and PMI (Post-Mortem Interval) (right panel) distribution in GTEx v8 cohort.

      Author response image 8.

      Sex (left panel), ethnicity (middle panel), and manner of death (DTHMNNR) (right panel) distribution in GTEx v8 cohort.

      (4) Since this study is highly correlative, it is impossible to determine if the findings hold true without an independent cohort validation or experimental validation. They used the ROSMAP cohort for AD samples, and some splicing factors regulation but the generalizability to the age and sex effects have not been independently tested.

      The reviewer raises an important point regarding the independent validation of sex- and age-associated splicing changes associated with AD. We used GTEx primarily because it includes approximately 17,000 RNA-seq samples across multiple human tissues, making it the most comprehensive public resource for studying population-level differences in age and sex. In particular, its large-scale brain samples provide a unique opportunity to analyze transcriptomic changes in sex-dimorphic aging.

      We understand the reviewer’s concern that our findings are mainly supported by correlative evidence, which could be affected by dataset-specific biases. However, there are several technical issues in crossvalidation with transcriptomes across different datasets, including limited comparability due to cell type heterogeneity, postmortem artifacts, and sequencing biases.

      Specifically, GTEx data is bulk RNA-seq that does not capture cell-type-specific transcriptomic changes. Given the cellular complexity of the brain and other tissues, observed differences in gene expression and splicing may be influenced by shifts in cellular composition rather than intrinsic transcriptional regulation. For example, we compared our results from GTEx whole blood with the analysis using an external dataset from Peripheral Blood Mononuclear Cells (PBMCs) provided by Shen et al. (2024) [3] (Author response image 2).  We observed limited overlap in differentially expressed genes between these datasets (probably because the whole blood contains diverse immune cell populations), highlighting the challenges in cross-dataset validation due to differences in tissue composition and sample processing.

      Therefore, we applied surrogate variable analysis (SVA) to minimize technical and biological confounders. This approach helped reduce biases from genetic background to hidden batch effects, including postmortem artifacts, sequencing biases (Figure S4), and other covariates. This approach could help us identify whether sex-biased splicing events are biologically meaningful rather than technical artifacts.  

      In addition, to address the reviewer’s concern on the splicing factor regulation, we managed to find a dataset in decision-related brain regions. Due to the limitation of human brain data covering different age and sex groups, we used mouse hippocampus datasets, including young and old, as well as female and male groups [7].  The analysis of protein levels from MS data identified sex-biased age-associated splicing factors, including Srsf1 and Srsf7.  We found that the changes are consistent with the findings from GTEx (Author response image 9), aligning with our sex-biased splicing factor expression during aging in the same region of the human brain. This cross-species consistency supports the robustness of our findings in human brain aging.

      Author response image 9.

      Protein levels of some male-specific splicing factors in human hippocampus quantified using MS data. The Y-axis shows the protein intensity. Different facets mean different sample batch sets. The yellow boxes indicate the protein levels in the young group, while the brown boxes indicate those in the old group.

      In summary, despite the inherent limitations of RNA-seq studies in sex- and age-related transcriptomics, we have made our best efforts to address these concerns through comparisons with external datasets, statistical corrections, and validation using proteomic data. We appreciate the reviewer’s feedback and include additional discussion on these points (Page 13). 

      (5) Are AS predictions from short-read data accurate enough to make the predictions the authors report? 

      The reviewer is correct that the short-read sequencing has inherent limitations in reconstructing full-length isoforms.  However, the higher sequencing depth for short reads makes it a better choice in quantifying the relative change of each AS event across different conditions.  As a result, short-read data are extensively used in the splicing field to quantitatively measure the AS changes.  For this reason, we focused on the levels of alternative splicing events, rather than the quantification of full-length isoforms.  We used a series of stringent filters in our analyses to increase the reliability of our results.

      Specifically, we filtered the read counts of the junction read counts (JC) of most differential AS events that were higher than 10, as mentioned in the Methods section. Also, we used our GPU-based gene expression quantification tool, Paean, which performed better in cross-validation with quantitative RT-PCR results. The results of Paean are consistent with other pipelines. We cited an updated version of Paean that included the comparison with other tools in analyzing AS for consistency.  The manuscript on the new Paean version is being reviewed in another journal, and we included the PDF of that manuscript (Fig. 3 in the Paean manuscript) in the revised documents. 

      (6) Along the same lines, the finding that male age-related AS events are linked to Alzheimer's disease somewhat contradicts epidemiological studies that show that even after adjusting for age, women still have a greater risk of developing Alzheimer's than men. The authors show a significant overlap with AD GE events in females but don't explain the discrepancy. 

      We appreciate the editor’s comment regarding these discrepancies with the epidemiological studies. Previous studies suggested that the disease manifestations of Alzheimer’s Disease (AD) showed sex differences in AD phenotypes, including cognitive decline and brain atrophy [8].  The analyses on the sex/age effect of AD are indeed pretty complex, depending on the molecular criteria (GE or AS vs epidemiological data) in distinct studies, probably due to the difficulty in capturing how environmental exposures interact with biological pathways.  We hope to bring up three related points regarding this concern, which were also discussed in the revised manuscript. 

      • As we have mentioned in the Discussion section, an early study investigated the relationship between age, sex, and cognitive function in a large cohort of 17,127 UK Biobank participants [9]. Their study highlighted more apparent age-related changes in cognitive function among men, suggesting a potential vulnerability of men to cognitive decline with age.  Their main conclusion is consistent with our findings. 

      • While men and women can both suffer from Alzheimer's disease, women are more likely to be diagnosed, possibly due to longer lifespans and potential differences in brain structure or other factors. Although women exhibit a higher overall risk of AD, they may also have distinct molecular compensatory mechanisms that influence disease progression. 

      • To avoid the age effect, in our AD datasets, including ROSMAP, we filtered the samples over 90 years old to match the number of both sexes and the age distribution between the AD and control groups. Our analysis avoided the age biases in comparing AD and control, suggesting the crucial roles of sBASEs in AD during male aging.

      Moreover, for gene expression (GE), we showed distinct patterns of AD-related genes in females with AS. These two molecular processes do not necessarily have the same functional impact. AS changes may precede or contribute to disease onset in different ways compared to GE alterations. Our study came up with the underlying mechanisms linking cognitive disorders and alternative splicing (AS) at a higher molecular resolution.   

      (7) Could the authors explain which sBASE subset they used for their random forest prediction model and what was the rationale? 

      We are sorry for missing the details in selecting sBASEs (sex-biased age-associated splicing events) for the random forest prediction model. We specifically used sBASEs that exhibited specific sex-biased changes in splicing associated with aging. This subset of sBASEs was chosen in terms of those that could also be detected in the ROSMAP AD dataset due to different sequencing depths or technical biases across datasets. These sBASEs were further input to a prediction model with the feature selection algorithm RFE, and then evaluated their contributions. In the revised manuscript, we added the details of this selection in the Methods (Page 7).

      (8) The breakpoint analysis is particularly interesting. Can this be speculated to correlate with the recent non-linear multi-omic aging patterns observed by Shen et al in Nature Aging? 

      Thank you for highlighting the interesting aspects of our breakpoint analysis and suggesting its potential correlation with the non-linear aging patterns observed by Shen et al. 

      Shen et al. observed two prominent crests around the ages of 45 and 60 using omics data. Similarly, we also identified the non-linear aging patterns with two breakpoints in our analysis. However, there are some notable differences in specific breakpoints between these two studies, resulting from the breakpoint definition, as well as the sample preparations. According to the response in Author response image 2, the differences come from the following aspects:

      The definition of breakpoints vs crests:

      • Crests represent age-related molecular changes at each time point across the human lifespan. They indicate the number of molecules that are differentially expressed during aging (q < 0.05), without considering individual expression levels.

      • Our breakpoints, in contrast, are identified after filtering the chronological trends based on the expression levels and calculating the rate of change at each age point using sliding windows. Breakpoints are defined as local maxima where the distance to the nearest minimum, relative to the global maximum, exceeds 10%. We indeed found some local wide peaks around 60 in some tissues, shown in Figure S10, however, we excluded these due to our strict cutoffs.

      The sequenced biosamples: 

      • Whole-blood vs Peripheral Blood Mononuclear Cells (PBMC): As mentioned in previous responses, in GTEx, whole blood samples from donors were sequenced, whereas their study used PBMCs. Whole blood contains all blood components, including red blood cells, platelets, granulocytes (e.g., neutrophils), lymphocytes, and monocytes, while PBMCs only represent a subset of white blood cells, primarily consisting of lymphocytes (T cells, B cells, NK cells) and monocytes, excluding granulocytes and erythrocytes. Gene expression changes observed in whole blood capture the contributions from neutrophils and other granulocytes, which are absent in PBMC analyses (as shown in Figure S11C and Author response image 2). Additionally, whole blood can serve as a readily accessible biomarker source for testing age-related diseases without the need for cell separation, making it a more practical option for clinical applications.

      • For both studies, we share a tissue, which is skin, we looked at the non-linear changes during aging and found the same two breakpoints: 43 and 58. 

      Sex-specific analysis in females and males:

      • The main object of our analysis is to compare the differences in aging rates between sexes. Notably, the identified breakpoints may differ when sex effects are not taken into account, highlighting the importance of analyzing males and females separately.

      We have added the following statements to further clarify this connection: ‘Our analysis observed the nonlinear aging patterns with two breakpoints, which is consistent with recent findings (Nature Aging, 2024), with differences in specific age points due to the sex differences as well as tissue diversities.’ (Page 14), and ‘These breakpoints could represent key junctures in the aging process that align with the non-linear patterns of aging and disease progression.’ (Page 15)

      (9) Minor - the authors should refer to figures in the Discussion. They do so in some cases but this needs to be more extensive. 

      Thank you for pointing this out. In response, we have reviewed the Discussion section and added references to relevant figures where appropriate. In the section discussing the discrepancies between the profiles of GE vs. AS, we now refer to Figure 3 to highlight the earlier onset of different transcriptomic resolutions (Page 12); When describing the sex-specific age-associated AS changes and their associations with Alzheimer’s disease, we have added references to Figure 4 (Page 12); In the discussion of estrogen-mediated regulation of splicing factors, we have referred to Figure 5A, which detail the construction of RBP-RNA regulatory network integrating muti-dimensional data obtained through several orthogonal state-of-the-art approaches (Page 14).

      Reference:

      (1) Ferreira, P.G. et al. The effects of death and post-mortem cold ischemia on human tissue transcriptomes. Nature communications 9, 490 (2018).

      (2) Wucher, V., Sodaei, R., Amador, R., Irimia, M. & Guigó, R. Day-night and seasonal variation of human gene expression across tissues. PLoS Biology 21, e3001986 (2023).

      (3) Shen, X. et al. Nonlinear dynamics of multi-omics profiles during human aging. Nature aging, 116 (2024).

      (4) Zhou, X. et al. Splicing factor SRSF1 promotes gliomagenesis via oncogenic splice-switching of MYO1B. The Journal of clinical investigation 129, 676-693 (2019).

      (5) Soheili-Nezhad, S., Ibáñez-Solé, O., Izeta, A., Hoeijmakers, J.H. & Stoeger, T. Time is ticking faster for long genes in aging. Trends in Genetics 40, 299-312 (2024).

      (6) Brouillette, M. Gene length could be a critical factor in the aging of the genome. Proceedings of the National Academy of Sciences 121, e2416630121 (2024).

      (7) Keele, G.R. et al. Global and tissue-specific aging effects on murine proteomes. Cell reports 42(2023).

      (8) Ferretti, M.T. et al. Sex differences in Alzheimer disease—the gateway to precision medicine. Nature Reviews Neurology 14, 457-469 (2018).

      (9) Foo, H. et al. Age-and sex-related topological organization of human brain functional networks and their relationship to cognition. Frontiers in aging neuroscience 13, 758817 (2021).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The work is a useful contribution towards understanding the role of archaeal and plant D-aminoacyl-tRNA deacylase 2 (DTD2) in deacylation and detoxification of D-Tyr-tRNATyr modified by various aldehydes produced as metabolic byproducts in plants. It integrates convincing results from both in vitro and in vivo experiments to address the long-standing puzzle of why plants outperform bacteria in handling reactive aldehydes and suggests a new strategy for stress-tolerant crops. The impact of the paper is limited by the fact that only one modified D-aminoacyl tRNA was examined, in lack of evidence that plant eEF1A mimics EF-Tu in protecting L-aminoacyl tRNAs from modification, and in failure to measure accumulation of toxic D-aminoacyl tRNAs or impairment of translation in plant cells lacking DTD2.

      We have now addressed all the drawbacks as follows:

      ‘only one modified D-aminoacyl tRNA was examined’

      We wish to clarify that only D-Leu (Yeast), D-Asp (Bacteria, Yeast), D-Tyr (Bacteria, Cyanobacteria, Yeast) and D-Trp (Bacteria) show toxicity in vivo in the absence of known DTD (Soutourina J. et al., JBC, 2000; Soutourina O. et al., JBC, 2004; Wydau S. et al., JBC, 2009) and D-Tyr-tRNATyr is used as a model substrate to test the DTD activity in the field because of the conserved toxicity of D-Tyr in various organisms. DTD2 has been shown to recycle D-Asp-tRNAAsp and D-Tyr-tRNATyr with the same efficiency both in vitro and in vivo (Wydau S. et al., NAR, 2007) and it also recycles acetaldehyde-modified D-Phe-tRNAPhe and D-Tyr-tRNATyr in vitro as shown in our earlier work (Mazeed M. et al., Science Advances, 2021). We have earlier shown that DTD1, another conserved chiral proofreader across bacteria and eukaryotes, acts via a side chain independent mechanism (Ahmad S. et al., eLife, 2013). To check the biochemical activity of DTD2 on D-Trp-tRNATrp, we have now done the D-Trp, D-Tyr and D-Asp toxicity rescue experiments by expressing the archaeal DTD2 in dtd null E. coli cells. We found that DTD2 could rescue the D-Trp toxicity with equal efficiency like D-Tyr and D-Asp (Figure: 1). Considering the action on multiple side chains with different chemistry and size, it can be proposed with reasonable confidence that DTD2 also operates based on a side chain independent manner.

      Author response image 1.

      DTD2 recycles multiple D-aa-tRNAs with different side chain chemistry and size. Growth of wildtype (WT), dtd null strain (∆dtd), and Pyrococcus horikoshii DTD2 (PhoDTD2) complemented ∆dtd strains of E. coli K12 cells with 500 µM IPTG along with A) no D-amino acids, B) 2.5 mM D-tyrosine, C) 30 mM D-aspartate and D) 5 mM D-tryptophan.

      ‘lack of evidence that plant eEF1A mimics EF-Tu in protecting L-aminoacyl tRNAs from modification’

      To understand the role of plant eEF1A in protecting L-aa-tRNAs from aldehyde modification, we have done a thorough sequence and structural analysis. We analysed the aa-tRNA bound elongation factor structure from bacteria (PDB ids: 1TTT) and found that the side chain of amino acid in the amino acid binding site of EF-Tu is projected outside (Figure: 2A; 3A). In addition, the amino group of amino acid is tightly selected by the main chain atoms of elongation factor thereby lacking a space for aldehydes to enter and then modify the L-aa-tRNAs and Gly-tRNAs (Figure: 2B; 3B). Modelling of D-amino acid (D-phenylalanine and smallest chiral amino acid, D-alanine) in the same site shows serious clashes with main chain atoms of EF-Tu, indicating D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2C-E). Next, we superimposed the tRNA bound mammalian eEF-1A cryoEM structure (PDB id: 5LZS) with bacterial structure to understand the structural differences in terms of tRNA binding and found that elongation factor binds tRNA in a similar way (Figure: 3C-D). Modelling of D-alanine in the amino acid binding site of eEF-1A shows serious clashes with main chain atoms, indicating a general theme of D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2F; 3E). Structure-based sequence alignment of elongation factor from bacteria, archaea and eukaryotes (both plants and mammals) shows a strict conservation of amino acid binding site (Figure: 2G). This suggests that eEF-1A will mimic EF-Tu in protecting L-aa-tRNAs from reactive aldehydes. Minor differences near the amino acid side chain binding site (as indicated in Wolfson and Knight, FEBS Letters, 2005) might induce the amino acid specific binding differences (Figure: 3F). However, those changes will have no influence when the D-chiral amino acid enters the pocket, as the whole side chain would clash with the active site. We have now included this sequence and structural conservation analysis in our revised manuscript (in text: line no 107-129; Figure: 2 and S2). Overall, our structural analysis suggests a conserved mode of aa-tRNA selection by elongation factor across life forms and therefore, our biochemical results with bacterial elongation factor Tu (EF-Tu) reflect the protective role of elongation factor in general across species.

      Author response image 2.

      Elongation factor enantio-selects L-aa-tRNAs through D-chiral rejection mechanism. A) Surface representation showing the cocrystal structure of EF-Tu with L-Phe-tRNAPhe. Zoomed-in image showing the binding of L-phenylalanine with side chain projected outside of binding site of EF-Tu (PDB id: 1TTT). B) Zoomed-in image of amino acid binding site of EF-Tu bound with L-phenylalanine showing the selection of amino group of amino acid through main chain atoms (PDB id: 1TTT). C) Modelling of D-phenylalanine in the amino acid binding site of EF-Tu shows severe clashes with main chain atoms of EF-Tu. Modelling of smallest chiral amino acid, alanine, in the amino acid binding site of EF-Tu shows D) no clashes with L-alanine and E) clashes with D-alanine. F) Modelling of D-alanine in the amino acid binding site of eEF-1A shows clashes with main chain atoms. (*Represents modelled molecule). G) Structure-based sequence alignment of elongation factor from bacteria, archaea and eukaryotes (both plants and animals) showing conserved amino acid binding site residues. (Key residues are marked with red star).

      Author response image 3.

      Elongation factor protects L-aa-tRNAs from aldehyde modification. A) Cartoon representation showing the cocrystal structure of EF-Tu with L-Phe-tRNAPhe (PDB id: 1TTT). B) Zoomed-in image of amino acid binding site of EF-Tu bound with L-phenylalanine (PDB id: 1TTT). C) Cartoon representation showing the cryoEM structure of eEF-1A with tRNAPhe (PDB id: 5LZS). D) Image showing the overlap of EF-Tu:L-Phe-tRNAPhe crystal structure and eEF-1A:tRNAPhe cryoEM structure (r.m.s.d. of 1.44 Å over 292 Cα atoms). E) Zoomed-in image of amino acid binding site of eEF-1A with modelled L-alanine (PDB id: 5ZLS). (*Modelled) F) Overlap showing the amino acid binding site residues of EF-Tu and eEF-1A. (EF-Tu residues are marked in black and eEF-1A residues are marked in red).

      ‘failure to measure accumulation of toxic D-aminoacyl tRNAs or impairment of translation in plant cells lacking DTD2’

      We agree that measuring the accumulation of D-aa-tRNA adducts from plant cells lacking DTD2 is important. We tried to characterise the same with dtd2 mutant plants extensively through Northern blotting as well as mass spectrometry. However, due to the lack of information about the tissue getting affected (root or shoot), identity of aa-tRNA as well as location of aa-tRNA (cytosol or organellar), we are so far unsuccessful in identifying them from plants. Efforts are still underway to identify them from plant system lacking DTD2. However, we have used a bacterial surrogate system, E. coli, as used earlier in Mazeed M. et al., Science Advances, 2021 to show the accumulation of D-aa-tRNA adducts in the absence of dtd. We could identify the accumulation of both formaldehyde and MG modified D-aa-tRNA adducts via mass spectrometry (Figure: 4). These results are now included in the revised manuscript (in line no: 190-197 and Figure: S5).

      Author response image 4.

      Loss of DTD results in accumulation of modified D-aminoacyl adducts on tRNAs in E. coli. Mass spectrometry analysis showing the accumulation of aldehyde modified D-Tyr-tRNATyr in A) Δdtd E. coli, B) formaldehyde and D-tyrosine treated Δdtd E. coli, and C) MG and D-tyrosine treated Δdtd E. coli. ESI-MS based tandem fragmentation analysis for unmodified and aldehyde modified D-Tyr-tRNATyr in D) Δdtd E. coli, E) and F) formaldehyde and D-tyrosine treated Δdtd E. coli, G) and H) MG and D-tyrosine treated Δdtd E. coli.

      Response to Public Reviews:

      We are grateful for the reviewers’ positive feedback and their comments and suggestions on this manuscript. Reviewer 1 has indicated two weaknesses and Reviewer 2 has none. We have now addressed all the concerns of the Reviewers.

      Reviewer #1 (Public Review):

      Summary:

      This work is an extension of the authors' earlier work published in Sci Adv in 2001, wherein the authors showed that DTD2 deacylates N-ethyl-D-aminoacyl-tRNAs arising from acetaldehyde toxicity. The authors in this study, investigate the role of archaeal/plant DTD2 in the deacylation/detoxification of D-Tyr-tRNATyr modified by multiple other aldehydes and methylglyoxal (produced by plants). Importantly, the authors take their biochemical observations to plants, to show that deletion of DTD2 gene from a model plant (Arabidopsis thaliana) makes them sensitive to the aldehyde supplementation in the media especially in the presence of D-Tyr. These conclusions are further supported by the observation that the model plant shows increased tolerance to the aldehyde stress when DTD2 is overproduced from the CaMV 35S promoter. The authors propose a model for the role of DTD2 in the evolution of land plants. Finally, the authors suggest that the transgenic crops carrying DTD2 may offer a strategy for stress-tolerant crop development. Overall, the authors present a convincing story, and the data are supportive of the central theme of the story.

      We are happy that reviewer found our work convincing and would like to thank the reviewer for finding our data supportive to the central theme of the manuscript.

      Strengths:

      Data are novel and they provide a new perspective on the role of DTD2, and propose possible use of the DTD2 lines in crop improvement.

      We are happy for this positive comment on the manuscript.

      Weaknesses:

      (a) Data obtained from a single aminoacyl-tRNA (D-Tyr-tRNATyr) have been generalized to imply that what is relevant to this model substrate is true for all other D-aa-tRNAs (term modified aa-tRNAs has been used synonymously with the modified Tyr-tRNATyr). This is not a risk-free extrapolation. For example, the authors see that DTD2 removes modified D-Tyr from tRNATyr in a chain-length dependent manner of the modifier. Why do the authors believe that the length of the amino acid side chain will not matter in the activity of DTD2?

      We thank the reviewer for bringing up this important point. As mentioned above, we wish to clarify that only half of the aminoacyl-tRNA synthetases are known to charge D-amino acids and only D-Leu (Yeast), D-Asp (Bacteria, Yeast), D-Tyr (Bacteria, Cyanobacteria, Yeast) and D-Trp (Bacteria) show toxicity in vivo in the absence of known DTD (Soutourina J. et al., JBC, 2000; Soutourina O. et al., JBC, 2004; Wydau S. et al., JBC, 2009). D-Tyr-tRNATyr is used as a model substrate to test the DTD activity in the field because of the conserved toxicity of D-Tyr in various organisms. DTD2 has been shown to recycle D-Asp-tRNAAsp and D-Tyr-tRNATyr with the same efficiency both in vitro and in vivo (Wydau S. et al., NAR, 2007). Moreover, we have previously shown that it recycles acetaldehyde-modified D-Phe-tRNAPhe and D-Tyr-tRNATyr in vitro as shown in our earlier work (Mazeed M. et al., Science Advances, 2021). We have earlier shown that DTD1, another conserved chiral proofreader across bacteria and eukaryotes, acts via a side chain independent mechanism (Ahmad S. et al., eLife, 2013). To check the biochemical activity of DTD2 on D-Trp-tRNATrp, we have now done the D-Trp, D-Tyr and D-Asp toxicity rescue experiments by expressing the archaeal DTD2 in dtd null E. coli cells. We found that DTD2 could rescue the D-Trp toxicity with equal efficiency like D-Tyr and D-Asp (Figure 1). Considering the action on multiple side chains with different chemistry and size, it can be proposed with reasonable confidence that DTD2 also operates based on a side chain independent manner.

      (b) While the use of EFTu supports that the ternary complex formation by the elongation factor can resist modifications of L-Tyr-tRNATyr by the aldehydes or other agents, in the context of the present work on the role of DTD2 in plants, one would want to see the data using eEF1alpha. This is particularly relevant because there are likely to be differences in the way EFTu and eEF1alpha may protect aminoacyl-tRNAs (for example see description in the latter half of the article by Wolfson and Knight 2005, FEBS Letters 579, 3467-3472).

      We thank the reviewer for bringing up this important point. As mentioned above, to understand the role of plant eEF1A in protecting L-aa-tRNAs from aldehyde modification, we have done a thorough sequence and structural analysis. We analysed the aa-tRNA bound elongation factor structure from bacteria (PDB ids: 1TTT) and found that the side chain of amino acid in the amino acid binding site of EF-Tu is projected outside (Figure: 2A; 3A). In addition, the amino group of amino acid is tightly selected by the main chain atoms of elongation factor thereby lacking a space for aldehydes to enter and then modify the L-aa-tRNAs and Gly-tRNAs (Figure: 2B; 3B). Modelling of D-amino acid (D-phenylalanine and smallest chiral amino acid, D-alanine) in the same site shows serious clashes with main chain atoms of EF-Tu, indicating D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2C-E). Next, we superimposed the tRNA bound mammalian eEF-1A cryoEM structure (PDB id: 5LZS) with bacterial structure to understand the structural differences in terms of tRNA binding and found that elongation factor binds tRNA in a similar way (Figure: 3C-D). Modelling of D-alanine in the amino acid binding site of eEF-1A shows serious clashes with main chain atoms, indicating a general theme of D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2F; 3E). Structure-based sequence alignment of elongation factor from bacteria, archaea and eukaryotes (both plants and mammals) shows a strict conservation of amino acid binding site (Figure: 2G). Minor differences near the amino acid side chain binding site (as indicated in Wolfson and Knight, FEBS Letters, 2005) might induce the amino acid specific binding differences (Figure: 3F). However, those changes will have no influence when the D-chiral amino acid enters the pocket, as the whole side chain would clash with the active site. We have now included this sequence and structural conservation analysis in our revised manuscript (in text: line no 107-129; Figure: 2 and S2). Overall, our structural analysis suggests a conserved mode of aa-tRNA selection by elongation factor across life forms and therefore, our biochemical results with bacterial elongation factor Tu (EF-Tu) reflect the protective role of elongation factor in general across species.

      Reviewer #2 (Public Review):

      In bacteria and mammals, metabolically generated aldehydes become toxic at high concentrations because they irreversibly modify the free amino group of various essential biological macromolecules. However, these aldehydes can be present in extremely high amounts in archaea and plants without causing major toxic side effects. This fact suggests that archaea and plants have evolved specialized mechanisms to prevent the harmful effects of aldehyde accumulation.

      In this study, the authors show that the plant enzyme DTD2, originating from archaea, functions as a D-aminoacyl-tRNA deacylase. This enzyme effectively removes stable D-aminoacyl adducts from tRNAs, enabling these molecules to be recycled for translation. Furthermore, they demonstrate that DTD2 serves as a broad detoxifier for various aldehydes in vivo, extending its function beyond acetaldehyde, as previously believed. Notably, the absence of DTD2 makes plants more susceptible to reactive aldehydes, while its overexpression offers protection against them. These findings underscore the physiological significance of this enzyme.

      We thank the reviewer for the positive comments the manuscript.

      Response to recommendation to authors:

      Reviewer #1 (Recommendations For The Authors):

      I enjoyed reading the manuscript entitled, "Archaeal origin translation proofreader imparts multi aldehyde stress tolerance to land plants" from the Sankaranarayanan lab. This work is an extension of their earlier work published in Sci Adv in 2001, wherein they showed that DTD2 deacylates N-ethyl-D-aminoacyl-tRNAs arising from acetaldehyde toxicity. Now, the authors of this study (Kumar et al.) investigate the role of archaeal/plant DTD2 in the deacylation/detoxification of D-Tyr-tRNATyr modified by multiple other aldehydes and methylglyoxal (which are produced during metabolic reactions in plants). Importantly, the authors take their biochemical observations to plants, to show that deletion of DTD2 gene from a model plant (Arabidopsis thaliana) makes them sensitive to the aldehyde supplementation in the media especially in the presence of D-Tyr. These conclusions are further supported by the observation that the model plant shows increased tolerance to the aldehyde stress when DTD2 is overproduced from the CaMV 35S promoter. The authors propose a model for the role of DTD2 in the evolution of land plants. Finally, the authors suggest that the transgenic crops carrying DTD2 may offer a strategy for stress-tolerant crop development. Overall, the authors present a convincing story, and the data are supportive of the central theme of the story.

      We are happy that reviewer enjoyed our manuscript and found our work convincing. We would also like to thank reviewer for finding our data supportive to the central theme of the manuscript.

      I have the following observations that require the authors' attention.

      1) The title of the manuscript will be more appropriate if revised to, "Archaeal origin translation proofreader, DTD2, imparts multialdehyde stress tolerance to land plants".

      Both the reviewer’s suggested to change the title. We have now changed the title based on reviewer 2 suggestion.

      2) Abstract (line 19): change, "physiologically abundantly produced" to "physiologically produced".

      As per the reviewer’s suggestion, we have now changed it to "physiologically produced".

      3) Introduction (line 50): delete, 'extremely'.

      We have removed the word 'extremely' from the Introduction.

      4) Line 79: change, "can be utilized" to "may be explored".

      We have changed "can be utilized" to "may be explored" as suggested by the reviewers.

      5) Results in general:

      (a) Data obtained from a single aminoacyl-tRNA (D-Tyr-tRNATyr) have been generalized to imply that what is relevant to this model substrate is true for all other D-aa-tRNAs (term modified aa-tRNAs has been used synonymously with the modified D-Tyr-tRNATyr). This is a risky extrapolation. For example, the authors see that DTD2 removes modified D-Tyr from tRNATyr in a chain-length dependent manner of the modifier. Why do the authors believe that the length of the amino acid side chain will not matter in the activity of DTD2?

      We thank the reviewer for bringing up this important point. As mentioned above, we wish to clarify that only half of the aminoacyl-tRNA synthetases are known to charge D-amino acids and only D-Leu (Yeast), D-Asp (Bacteria, Yeast), D-Tyr (Bacteria, Cyanobacteria, Yeast) and D-Trp (Bacteria) show toxicity in vivo in the absence of known DTD (Soutourina J. et al., JBC, 2000; Soutourina O. et al., JBC, 2004; Wydau S. et al., JBC, 2009). D-Tyr-tRNATyr is used as a model substrate to test the DTD activity in the field because of the conserved toxicity of D-Tyr in various organisms. DTD2 has been shown to recycle D-Asp-tRNAAsp and D-Tyr-tRNATyr with the same efficiency both in vitro and in vivo (Wydau S. et al., NAR, 2007). Moreover, we have previously shown that it recycles acetaldehyde-modified D-Phe-tRNAPhe and D-Tyr-tRNATyr in vitro as shown in our earlier work (Mazeed M. et al., Science Advances, 2021). We have earlier shown that DTD1, another conserved chiral proofreader across bacteria and eukaryotes, acts via a side chain independent mechanism (Ahmad S. et al., eLife, 2013). To check the biochemical activity of DTD2 on D-Trp-tRNATrp, we have now done the D-Trp, D-Tyr and D-Asp toxicity rescue experiments by expressing the archaeal DTD2 in dtd null E. coli cells. We found that DTD2 could rescue the D-Trp toxicity with equal efficiency like D-Tyr and D-Asp (Figure 1). Considering the action on multiple side chains with different chemistry and size, it can be proposed with reasonable confidence that DTD2 also operates based on a side chain independent manner.

      (b) Interestingly, the authors do suggest (in the Materials and Methods section) that the experiments were performed with Phe-tRNAPhe as well as Ala-tRNAAla. If what is stated in Materials and Methods is correct, these data should be included to generalize the observations.

      We regret for the confusing statement. We wish to clarify that L- and D-Tyr-tRNATyr were used for checking the TLC-based aldehyde modification, EF-Tu based protection assays and deacylation assays, D-Phe-tRNAPhe was used to characterise aldehyde-based modification by mass spectrometry and L-Ala-tRNAAla was used to check the modification propensity of multiple aldehydes. We used multiple aa-tRNAs to emphasize that aldehyde-based modifications are aspecific towards the identity of aa-tRNAs. All the data obtained with respective aa-tRNAs are included in manuscript.

      (c) While the use of EFTu supports that the ternary complex formation by the elongation factor can resist modifications of L-Tyr-tRNATyr by the aldehydes or other agents, in the context of the present work on the role of DTD2 in plants, one would want to see the data using eEF1alpha. This is particularly relevant because there are likely to be differences in the way EFTu and eEF1alpha may protect aminoacyl-tRNAs (for example see description in the latter half of the article by Wolfson and Knight 2005, FEBS Letters 579, 3467-3472).

      We thank the reviewer for bringing up this important point. As mentioned above, to understand the role of plant eEF1A in protecting L-aa-tRNAs from aldehyde modification, we have done a thorough sequence and structural analysis. We analysed the aa-tRNA bound elongation factor structure from bacteria (PDB ids: 1TTT) and found that the side chain of amino acid in the amino acid binding site of EF-Tu is projected outside (Figure: 2A; 3A). In addition, the amino group of amino acid is tightly selected by the main chain atoms of elongation factor thereby lacking a space for aldehydes to enter and then modify the L-aa-tRNAs and Gly-tRNAs (Figure: 2B; 3B). Modelling of D-amino acid (D-phenylalanine and smallest chiral amino acid, D-alanine) in the same site shows serious clashes with main chain atoms of EF-Tu, indicating D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2C-E). Next, we superimposed the tRNA bound mammalian eEF-1A cryoEM structure (PDB id: 5LZS) with bacterial structure to understand the structural differences in terms of tRNA binding and found that elongation factor binds tRNA in a similar way (Figure: 3C-D). Modelling of D-alanine in the amino acid binding site of eEF-1A shows serious clashes with main chain atoms, indicating a general theme of D-chiral rejection during aa-tRNA binding by elongation factor (Figure: 2F; 3E). Structure-based sequence alignment of elongation factor from bacteria, archaea and eukaryotes (both plants and mammals) shows a strict conservation of amino acid binding site (Figure: 2G). Minor differences near the amino acid side chain binding site (as indicated in Wolfson and Knight, FEBS Letters, 2005) might induce the amino acid specific binding differences (Figure: 3F). However, those changes will have no influence when the D-chiral amino acid enters the pocket, as the whole side chain would clash with the active site. We have now included this sequence and structural conservation analysis in our revised manuscript (in text: line no 107-129; Figure: 2 and S2). Overall, our structural analysis suggests a conserved mode of aa-tRNA selection by elongation factor across life forms and therefore, our biochemical results with bacterial elongation factor Tu (EF-Tu) reflect the protective role of elongation factor in general across species.

      6) Results (line 89): Figure: 1C-G (not B-G).

      As correctly pointed out by the reviewer(s), we have changed it to Figure: 1C-G.

      7) Results (line 91): Figure: S1B-G (not C-G).

      We wish to clarify that this is correct.

      8) Line 97: change, "propionaldehyde" to "propionaldehyde (Figure: 1H)".

      As per the reviewer’s suggestion, we have now changed, "propionaldehyde" to "propionaldehyde (Figure: 1H)".

      9) Line 124: The statement, "DTD2 cleaved all modified D-aa-tRNAs at 50 pM to 500 nM range (Figure: 2A_D)" is not consistent with the data presented. For example, Figure 2D does not show any significant cleavage. Figure S2A-B also does not show cleavage.

      We thank the reviewers for pointing this out. We have changed the sentence to “DTD2 cleaved majority of aldehyde modified D-aa-tRNAs at 50 pM to 500 nM range".

      10) Line 131: Cleavage observed in Fig. S2E is inconsistent with the generalized statement on DTD1.

      We wish to clarify that the minimal activity seen in Fig. S2E is inconsistent with the general trend of DTD1’s biochemical activity seen on modified D-aa-tRNAs. In addition, we have earlier shown that D-aa-tRNA fits snugly in the active site of DTD1 (Ahmad S. et al., eLife, 2013) whereas the modified D-aa-tRNA cannot bind due to the space constrains in the active site of DTD1 (Mazeed M. et al., Science Advances, 2021). Therefore, this minimal activity could be a result of technical error during this biochemical experiment and could be considered as no activity.

      11) Lines 129-133: Citations of many figure panels particularly in the supplementary figures are inconsistent with generalized statements. This section requires a major rewrite or rearrangement of the figure panels (in case the statements are correct).

      We thank the reviewers for bringing forth this point and we have accordingly modified the statement into “DTD2 from archaea recycled short chain aldehyde-modified D-aa-tRNA adducts as expected (Figure: 3E-G) and, like DTD2 from plants, it did not act on aldehyde-modified D-aa-tRNAs longer than three chains (Figure: 3H; S3C-D; S4G-L)”.

      12) Line 142: I don't believe one can call PTH a proofreader. Its job is to recycle tRNAs from peptidyl-tRNAs.

      We thank the reviewers for pointing out this very important point. This is now corrected.

      13). Line 145: change, "DTD2 can exert its protection for" to "DTD2 may exert protection from".

      As per the reviewer’s suggestion, we have now changed"DTD2 can exert its protection for" to "DTD2 may exert protection from".

      14) Line 148: change, "a homozygous line (Figure: 3A) and checked for" to "homozygous lines (Figure: 3A) and checked them for".

      As per the reviewer’s suggestion, we have now changed, "a homozygous line (Figure: 3A) and checked for" to "homozygous lines (Figure: 3A) and checked them for".

      15) Line 148: Change, the sentence beginning with dtd2 as follows. Similar to earlier results30-32, dtd2-/- (dtd2 hereafter) plants were susceptible to ethanol (Figure: S4A) confirming the non-functionality DTD2 gene in dtd2 plants.

      As per the reviewer’s suggestion, we have now changed the sentence accordingly.

      16) Line 161: change, "linked" to "associated".

      As per the reviewer’s suggestion, we have now changed "linked" to "associated".

      17) Lines 173-176: It would be interesting to know how well the DTD2 OE lines do in comparison to the other known transgenic lines developed with, for example, ADH, ALDH, or AOX lines. Any ideas would help appreciate the observation with DTD2 OE lines!

      We greatly appreciate the reviewer’s suggestion. We have not done any comparison experiment with any transgenic lines so far. However, it can be potentially done in further studies with DTD2 OE lines.

      18) Line 194: change, "necessary" with "present".

      As per the reviewer’s suggestion, we have now changed "necessary" with "present".

      19) Line 210: what is meant by 'huge'? Would 'significant' sound better?

      As per the reviewer’s suggestion, we have now changed "huge" with "significant".

      20) Lines 239-243: This needs to be rephrased. Isn't alpha carbonyl of the carboxyl group that makes ester bond with the -CCA end of the tRNA required for DTD2 activity as well? Are you referring to the carbonyl group in the moiety that modifies the alpha-amino group? Please clarify. The cited reference (no. 64) of Atherly does not talk about it.

      We regret for the confusing statement. To clarify, we were referencing to the carbonyl carbon of the modification post amino group of the amino acid in aa-tRNAs (Figure: 5). We have now included a figure (Figure: S4Q of revised manuscript) to show the comparison of the carbonyl group for the better clarity. The cited reference Atherly A. G., Nature, 1978 shows the activity of PTH on peptidyl-tRNAs and peptidyl-tRNAs possess carbonyl carbon at alpha position post amino group of amino acid in L-aa-tRNAs.

      Author response image 5.

      Figure showing the difference in the position of carbonyl carbon in acetonyl and acetyl modification on aa-tRNAs.

      21) Line 261: thrive (not thrives).

      As per the reviewer’s suggestion, we have now changed it to thrive.

      22) In Fig3A: second last lane, it should be dtd-/-:: AtDTDH150A (not dtd-/-:: AtDTDH150A).

      We thank the reviewers for pointing out this, we have corrected it.

      23). Materials and methods: Please clarify which experiments used tRNAPhe, tRNAAla, PheRS, etc. Also, please carefully check all other details provided in this section.

      As per the reviewer’s suggestion, we would like to provide a table below explaining the use of different substrates as well as enzymes in our experiments.

      Author response table 1.

      24) Figure legends (many places): p values higher than 0.05 (not less than) are denoted as ns.

      We thank the reviewers for pointing out this. We have corrected it.

      Reviewer #2 (Recommendations For The Authors):

      I have only minor comments for the authors:

      Title: I would replace "Archeal origin translation proofreader" with " A translation proofreader of archeal origin"

      As per the reviewer’s suggestion, we have now changed the title.

      Abstract: This section could benefit from some rewriting. For instance, at the outset, the initial logical connection between the first and second sentences of the abstract is somewhat unclear. At the very least, I would suggest swapping their order to enhance the narrative flow. Later in the text, the term "chiral proofreading systems" is introduced; however, it is only in a subsequent sentence that these systems are explained to be responsible for removing stable D-aminoacyl adducts from tRNA. Providing an immediate explanation of these systems would enhance the reader's comprehension. The authors switch from the past participle tense to the present tense towards the end of the text. I would recommend that they choose one tense for consistency. In the final sentence, I would suggest toning down the statement and replacing "can be used" with "could be explored." (https://www.nature.com/articles/d41586-023-02895-w). The same comment applies to the introduction, line 79.

      As per the reviewer’s suggestion, we have now changed the abstract appropriately.

      General note: Conventionally, the use of italics is reserved for the specific species "Arabidopsis thaliana," while the broader genus "Arabidopsis" is not italicized.

      We acknowledge the reviewer for this pertinent suggestion. This is now corrected in revised version of our manuscript.

      General note: I would advise the authors against employing bold characters in conjunction with colors in the figures.

      We thank the reviewer for this suggestion. We have now changed it appropriately in revised version of our manuscript.

      Figure 1A: I recommend including the concentrations of the various aldehydes used in the experiment within the figure legend. While this information is available in the materials and methods section, it would be beneficial to have it readily accessible when analyzing the figure.

      As per the reviewer’s suggestion, we have now included the concentrations in figure legend.

      Figure 1I, J: some error bars are invisible.

      We thank the reviewers for pointing out this, we have corrected it.

      Figure 2M: The table could be simplified by removing aldehydes for which it was not feasible to demonstrate activity. The letter "M" within the cell labeled "aldehydes" appears to be a typographical error, presumably indicating the figure panel.

      As per the reviewer’s suggestion, we have now changed this appropriately.

      Figure 3: For consistency with the other panels in the figure, I recommend including an additional panel to display the graph depicting the impact of MG on germination.

      As per the reviewer’s suggestion, we have now changed this appropriately.

      Figure 4: Considering that only one plant is presented, it would be beneficial to visualize the data distribution for the other plants used in this experiment, similar to what the authors have done in panel A of the same figure.

      We thank the reviewer for bringing up this point. We wish to clarify that we have done experiment with multiple plants. However, for the sake of clarity, we have included the representative images. Moreover, we have included the quantitative data for multiple plants in Figure 3C-G.

      Figure 5E: The authors may consider presenting a chronological order of events as they believe they occurred during evolution.

      We thank the reviewer for the suggestion. However, it is very difficult to pinpoint the chronology of the events. Aldehydes are lethal for systems due to their hyper reactivity and systems would require immediate solutions to survive. Therefore, we think that both problem (toxic aldehyde production) and its solution (expansion of aldehyde metabolising repertoire and recruitment of archaeal DTD2) might have appeared simultaneously.

      Figure 6: The model appears somewhat crowded, which may affect its clarity and ease of interpretation. The authors might also consider dividing the legend sentence into two separate sentences for better readability.

      As per the reviewer’s suggestion, we have now changed this appropriately.

      Line 149: I recommend explicitly stating that ethanol metabolism produces acetaldehyde. This clarification will help the general reader immediately understand why DTD2 mutant plants are sensitive to ethanol.

      As per the reviewer’s suggestion, we have now changed this appropriately.

      Line 289: there is a typographical error, "promotor" instead of the correct term "promoter.".

      We thank the referee for pointing out this, we have now corrected it.

      Figure S5: The root morphology of DTD2 OE plants appears to exhibit some differences compared to the WT, even in the absence of a high concentration of aldehydes. It would be valuable if the authors could comment on these observed differences unless they have already done so, and I may have overlooked it.

      We thank the referee for pointing out this. We do see minor differences in root morphology, but they are more pronounced with aldehyde treatments. The reason for this phenotype remains elusive and we are trying to understand the role of DTD2 in root development in detail in further studies.

      Some Curiosity Questions (not mandatory for manuscript acceptance):

      1) Do DTD2 OE plants display an earlier flowering phenotype than wild-type Col-0?

      We have not done detailed phenotyping of DTD2 OE plants. However, our preliminary observations suggest no differences in flowering pattern as compared to wild-type Col-0.

      2) What is the current understanding of the endogenous regulation of DTD2?

      We have not done detailed analysis to understand the endogenous regulation of DTD2.

      3) Could the protective phenotype of DTD2 OE plants in the presence of aldehydes be attributed to additional functions of this enzyme beyond the removal of stable D-aminoacyl adducts from tRNAs?

      Based on the available evidence regarding the biochemical activity and in vivo phenotypes of DTD2, it appears that removal of stable D-aminoacyl adducts from tRNA is key for the protective phenotype of DTD2 OE.

      A Suggestion for Future Research (not required for manuscript acceptance):

      The authors could explore the possibility of overexpressing DTD2 in pyruvate decarboxylase transgenic plants and assess whether this strategy enhances flood tolerance without incurring a growth penalty under normal growth conditions.

      We thank the referee for this interesting suggestion for future research. We will surely keep this in mind while exploring the flood tolerance potential of DTD2 OE plants.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major change:

      All three of our reviewers raised the possibility that changes in movement during the time spent at the center ports could have contributed to changes in SWR rates. Analyses to address this possibility, based on the examination of trials with high and low speeds, were originally included in the supplement but we did not sufficiently highlight and explain these results. To rectify this, we have moved these results into a new main Figure 3 and now include a paragraph describing our interpretation of these results (page 9). We also include a more detailed description of the subjects’ behavior during port times – namely, that all subjects must remain quite stationary while at the reward ports in order to keep their nose in a specific position which keeps the port triggered. As a result, all subjects maintain head speeds well below our typical speed threshold for immobility while at the ports. This leads us to predict that any feedback based on periods of immobility alone (as requested by Reviewer 3) would show results very similar to our Control cohort and would not alter SWR rates seen during neurofeedback trials.

      Minor changes:

      (1) Reviewer 1 observed our that reported statistics appeared to be missing an interaction term showing that neurofeedback differentially affected the SWR rate/count pre- and postreward. We apologize for a lack of clarity here: we fit pre- and post-reward times with separate linear mixed effects models, so this interaction term is neither expected nor defined in our model. We have added a sentence clarifying this aspect of our LME approach in the Methods section: “Each model is designed to compare samples from all trials of the control group to samples from neurofeedback and delay trials from the neurofeedback cohort for a specific time period (for instance, pre-reward-delivery at the center ports).” Combining both times in the same model would require adding an additional hierarchical level in order to preserve the pairing of the pre- and post-reward time period for each trial, which we are concerned would complicate the formulation and interpretation of the model. However, the reviewer raises a good point that the comparison between these two time periods reveals an additional difference between the trial types: SWR rate remains relatively consistent between the pre- and post-reward periods during neurofeedback trials, while delay and control trials show a clear increase in SWR rate between the two time periods. To visualize and quantify this effect, we calculated the difference in SWR rates between the two time periods and now include this plot as Supplementary Figure 2F, which is referenced in page 8 of the main text.

      (2) Reviewer 2 found our original title, “Neurofeedback training can modulate task-relevant memory replay in rats” to be misleading and suggestive of a manipulation to memory content. We are in complete agreement with the Reviewer in that our manipulation does not alter replay content, so to be more specific and accurate, we have changed our title to their suggestion “Neurofeedback training can modulate task-relevant memory replay rate in rats” accordingly.

      (3) Reviewer 2 also requested that we include analyses quantifying baseline SWR rates for each of our experimental subjects. Although we initially considered reporting our results in measures of change relative to each individual animal’s baseline, we decided against this approach for several reasons.

      First, it is important to clarify that we extensively train the animals on the task prior to implant, so we do not have access to a truly naïve, pre-behavior baseline SWR rate for any of our subjects. However, because the pre-implant training is conducted consistently between our neurofeedback and our control cohort, we have no reason to believe that the behavioral training prior to implant would introduce differences in SWR rate between the cohorts. Indeed, we find no difference in post-reward SWR rate (or SWR rate at the home well) when we quantify the first 250 trials of post-implant behavior for each subject (see panel A below). Note that we cannot compare the pre-reward SWR rate at this point, because it is influenced by the task structure which guarantees at least one SWR in each neurofeedback trial pre-reward.

      Further, we do find that SWR rate is quite consistent over many days of task performance in the control cohort (show for the post-reward period in panel B below). This suggests that comparing the post-neurofeedback training SWR rates for the neurofeedback cohort to SWR rates throughout the training for the control cohort is not likely to be confounded by differing amounts of training experience. This is supported by our analyses in Figure 2 which show no differences in SWR rate between the two cohorts when considering pre- and post-reward times combined.

      Author response image 1.

      (A) SWR rate calculated during the post-reward period at the center port for the first 250 trials of postimplant behavior for each animal. Trials of all types are included (ie both neurofeedback trials and delay trials for the manipulation cohort). Groupwise comparison p=0.192. (B) Mean SWR rate during the post-reward period at the center port for each behavioral training epoch shows no systematic change over time across subjects within the control cohort.

      Finally, within each cohort, we found the overall SWR rates to be quite consistent across animals. If each subject in the neurofeedback cohort had shown dramatically different SWR rates at the beginning of neurofeedback training, we would have needed to express the effect of neurofeedback training relative to baseline for each animal. However, since the range of SWR rates were highly comparable, we felt that it was more accessible, and easier to place our results within the context of the literature, by expressing our results as simple SWR rates themselves rather than measures of relative change. Within the neurofeedback cohort, comparing neurofeedback to delay trials is inherently matched for baseline SWR rate since these comparisons are made within the same animal.

      (4) Finally, Reviewer 2 raises the possibility that older animals or those with cognitive deficits might respond to neurofeedback differently. We entirely agree with this possibility, and note this in our Discussion section: “Since the neurofeedback paradigm depends on the occurrence of at least a low endogenous rate of SWR occurrence, it would be important to implement neurofeedback training as a relatively early interventional strategy prior to extensive neurodegeneration, and training may take longer in aged or impaired subjects.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Yue et al. re-processed publicly available DNA methylation data (published in 2012 and 2017 from the Meissner lab) from pre- and post-implantation mouse embryos. Against the global wave of genome-wide reduction of DNA methylation occurring during pre-implantation development, they detected a slight increase (~1% on average) of DNA methylation at gene promoter regions during the transition from 8-cell to blastocyst stage. They claim that many such promoters are located in the X chromosome. Subsequently, they knocked down Dnmt3b (presumably because of its upregulation during the transition from the 8-cell to blastocyst stage) and detected the aberrant patterning of H3K27me3 in the mutant female embryos. Based on this observation, they claim that imprinted X-chromosome inactivation is impaired in the Dnmt3b-Kd pre-implantation embryos. Finally, they propose a model where such an increase of DNA methylation together with H3K27me3 regulates imprinted X-chromosome inactivation in the pre-implantation embryos. While their observation is of potential interest, the current version of the work fails to provide enough evidence to support their conclusions. Below are suggestions and comments on the manuscript.

      Major issues:

      (1) Sex of the embryos of the genome-wide bisulfite-sequencing data

      The authors re-analyzed publicly available genome-wide DNA methylation data from the Meissner lab published in 2012 and 2017. The former used reduced representation bisulfite sequencing (RRBS) and the latter used whole-genome bisulfite sequencing (WGBS). Based mainly on the RRBS data, Yue et al. detected de novo DNA methylated promoters during the transition from 8-cell to blastocyst against the global wave of genome-wide DNA demethylation. They claim that such promoter regions are enriched at the "inactive" X chromosome. However, it would be difficult to discuss DNA methylation at inactive X-chromosomes as the RRBS data were derived from a mixture of male and female embryos. It would also be notable that the increase of DNA methylation at these promoter regions is ~1% on average. Such a slight increase in DNA methylation during pre-implantation development could also be due to the developmental variations between the embryos or between the sexes of embryos.

      Thanks so much for your insightful comments. Whether de novo DNA methylation occurs in a sex-dimorphic manner would be of significance for our study. Based on your comments, we have added a reanalysis based on a publicly available single cell multi-omics sequencing (COOL-seq) data of mouse early embryos (Guo et al., 2017). The results showed that both male and female embryonic cells gain DNA methylation during the transition from the 8-cell to ICM (Figure 1—figure supplement 1C-D; Lines 112-115 in the revised manuscript).

      With regards to the increase in the promoter region, many previous studies have revealed that promoter and overlapping CGI regions, especially high CpG promoters, always showed low levels of DNA methylation (Auclair et al., 2014; Borgel et al., 2010; Dahlet et al., 2020). The relatively lower basal levels make the increase seem relatively slight. Thus, we added relevant statements to clarify this information and rewritten the sentences in the revised manuscript (Lines 116-118, 125-127 in the revised manuscript).

      In addition, using the single cell COOL-seq data, we also specifically reanalyzed the DNA methylation changes on the X chromosome in female embryos. The X chromosome showed a more notable increase than that on autosomes, and the female X chromosome showed a higher DNA methylation level than that of the male (Figure 3—figure supplement 2A-B; Lines 203-206 in the revised manuscript).

      Thanks again for your insightful and constructive comments that significantly strengthen our evidence. We have added these results in the revised manuscript.

      (2) Imprinted X-chromosome inactivation and evaluation of H3K27me3 (related to Figures 2C, D; 3F; Figure2-supplement 2 F, G; Figure3-supplement 3G)

      Based on the slight change in the H3K27me3 signals in the Dnmt3b-Kd blastocysts, the authors claim that imprinted X-chromosome inactivation is impaired in the mutant embryo. It would be not easy to reach this conclusion from such a rough analysis of H3K27me3 presented in Figure 2C, D. Rigorous quantification/evaluation of the H3K27me3 signals in the Dnmt3b-Kd embryos should be considered. Additional evidence for the impairment of H3K27me3 in the mutant embryos should also be provided (expression of a subset of X-linked genes by RNA-FISH or RT-PCR etc.). Though technically challenging, high-resolution genome-wide approach such as ChIP-seq of H3K27me3 in the Dnmt3b-kd female embryos (with traceable SNPs between maternal and paternal X chromosome to distinguish inactive and active X-chromosome) could more precisely evaluate regions that lose H3K27me3 in the X-chromosome (de novo DNA methylated promoters from 8-cell to blastocyst, for example).

      Thanks so much for your insightful comments that make our results more convincing. The H3K27me3 domain is a classic marker for establishment of XCI by achieving X chromosome wide heterochromatinization of transcriptional depression (Chow and Heard, 2009; Heard et al., 2004; Huynh and Lee, 2005). Thus, in the present study, we have performed immunostaining for H3K27me3 domains to evaluate the iXCI status in the blastocysts, as previously reported (Fukuda et al., 2014; Gontan et al., 2018; Inoue et al., 2010; Tan et al., 2016). Base on your comments, we have added another statistical method to quantify the establishment of iXCI, i.e. the percentage of H3K27me3-positive and -negative cells to total trophoblast cells in female blastocysts subject to Dnmt3b knockdown or not. The result also indicated that Dnmt3b knockdown led to a significant loss of H3K27me3 domains from total trophoblast cells. Similarly, new data based on statistical analyses of total trophoblast cells, has also been added in the results of Dnmt3b knockout and 5-aza-dC (Figure 3F; Figure 3—figure supplement 3D, H in the revised manuscript).

      To clarify the significance and reliability of detecting H3K27me3 domains, we have added a schematic diagram depicting the process of iXCI initiation and establishment, as well as the experimental design and work flows, to make our results easier to be understood (Figure 3C in the revised manuscript).

      In addition, we agree with your comments that additional evidence will benefit the conclusion. Thus, we have reanalyzed the RNA-seq and H3K27me3 CHIP-seq data in extraembryonic ectoderm (ExE) of E6.5 single embryos that underwent Dnm3a/3b knockout because preimplantation iXCI status maintains extraembryonic cells (Chen et al., 2019; Galupa and Heard, 2015; Schulz and Heard, 2013). The results showed that Dnmt knockout-induced chromosome-wide loss of DNA methylation led to a nearly complete loss of H3k27me3 on paternal X chromosome (specifically inactivated in iXCI), along with a notable transcriptional upregulation cross the chromosome. By contrast, these changes cannot be not observed on maternal X chromosome.

      We have added this result in the revised manuscript (Lines 253-261; Figure 3—figure supplement 4A in the revised manuscript).

      (3) Analysis of the developmental potential of Dnmt3b-kd embryos

      While the authors claim that Dnmt3b-mediated de novo DNA methylation plays an important role in imprinted X-chromosome inactivation, it remains unclear whether the analysis presented in Figure 4 is derived from "female" embryos. This analysis seemed confusing as the authors claim that de novo DNA methylation in the promoter regions during the transition from 8-cell to blastocyst regulates imprinted X-chromosome inactivation, but this should not happen in the male embryos. Was the impairment of embryonic proliferation and differentiation observed in both male and female embryos? Or is this specific to the female embryos? We think that the sex of the embryos would be critical for the analysis presented in Figure 4.

      Thanks so much for your constructive comments to make our results smoother and clearer. The Figure 4 mainly presents the developmental role of minor de novo methylation based on the integrated analysis of DNA methylation and gene expression dynamics from the 8-cell to ICM. Because our data indicated that both male and female embryos undergo minor de novo methylation (Figure 1—figure supplement 1C-D in the revised manuscript). This section mainly focused on genome wide and general changes, but not on sex dimorphic consequence.

      To avoid the possible confusion, we have reorganized the RESULTS AND DISCUSSION section and presented this section as Figure 2 in the revised manuscript, before the chromosomal distribution analysis and subsequent detection relevant to iXCI.

      Reviewer #2 (Public Review):

      Summary:

      Here, Yue et al. set out to determine if the low DNMT3B expression that is observed prior to de novo DNA methylation (before the blastocyst stage) has a function. Re-analyzing existing DNA methylation data from Smith et al. (2012) they find a small DNA methylation gain over a subset of promoters and gene bodies, occurring between the 8-cell and blastocyst stages, and refer to this as "minor de novo DNA methylation". They attempt to assess the relevance/functionality of this minor DNA methylation gain, and report reduced H3K27me3 in Dnmt3b knockdown (KD) trophoblast cells that normally undergo imprinted X-chromosome inactivation (iXCI) before the blastocyst stage. In addition, they assess the proliferation, differentiation, metabolic function, implantation rate, and live birth rate of Dnmt3b KD blastocysts.

      Strengths:

      Working with early embryos is technically demanding, making the well-designed experiments from this manuscript useful to the epigenetics community. Particularly, the DNMT3B expression and 5-mC staining at different embryonic stages.

      Thanks for your positive evaluation, we have revised manuscript based on your comments, and the items need to be addressed in detail are explained in the point-by-point response to each comment.

      Weaknesses:

      - Throughout the manuscript, please represent DNA methylation changes as delta DNA methylation instead of fold change.

      Thanks so much for your constructive comments. We have represented DNA methylation changes as “ΔDNA methylation” (Figure 2—figure supplement 1A; Figure 3—figure supplement 1A; Figure 3—figure supplement 3I in the revised manuscript).

      - Detailed methods on the re-analysis of the DNA methylation data from Smith et al. 2012 are missing from the materials and methods section. Was a minimum coverage threshold used?

      Thanks so much for your reminder. We have added relevant statements and provided the detail of the coverage criteria in the subsection of Bioinformatics analysis in the Materials and methods section as follows: RRBS data of mouse embryos (2-cell embryos, 4-cell embryos, 8-cell embryos, ICM, and E6.5 embryos) were downloaded from the published article by Smith et al (Smith et al., 2012) (accession number: GSE34864). The methylation level was calculated as the number of “methylated” reads (reporting as C), divided by the total number of “methylated” and “unmethylated” read, which reporting as C or T. The genomic region information was downloaded from the mm9 Repeat Masker. As described in the published article, promoters were defined as 1 kb up- and downstream of the TSS and classified into high-density CpG promoter (HCP), intermediate-density CpG promoter (ICP) and low-density CpG promoter (LCP). Only CpG sites with at least fivefold coverage were included in the methylation analysis. We have added relevant information in the revised manuscript (Lines 462-470 in the revised manuscript).

      - Detailed methods on the establishment and validation of Dnmt3b KO blastocysts and 5-aza-dC treated blastocysts are missing (related to Figure 2).

      Thanks so much for your detailed reminder. In the present study, we used a well-established Dnmt3b-deficient mouse model (Okano et al., 1999) to validate the role of minor de novo DNA methylation in iXCI establishment. Heterozygous Dnmt3b<sup>+/-</sup> mice that carry one mutant locus of Dnmt3b, were obtained from the Mutant Mouse Resource & Research Centers (MMRRC, NIH). Homozygous embryos were obtained by intercrossing Dnmt3b<sup>+/-</sup> male and female mice. Genotyping assays of collected embryos was performed by PCR using primers that were designed based on the gene targeting strategy following the MMRRC genotyping protocol (https://www.med.unc.edu/mmrrc/genotyping-protocols/mmrrc-center-protocol-29886/). We have provided the detailed methods in the revised manuscript (Lines 350-354; 391-393 in the revised manuscript). In addition, we added a schematic diagram depicting the processes of embryo collection and detection (Figure 3—figure supplement 3A in the revised manuscript).

      Similarly, we have provided relevant details of 5-aza-dC supplementation in the revised manuscript (Lines 412-415 in the revised manuscript) and added a schematic diagram depicting the details of experimental design and processes (Figure 3—figure supplement 3E in the revised manuscript).

      - Detailed methods on the re-analysis of the ChIPseq data from Liu et al. 2016 are missing from the materials and methods section.

      Thank you for pointing this out. The bigwig files of H3K27me3 ChIP-seq data were downloaded from the published article by Liu et al (Liu et al., 2016)(accession number: GSE73952). These signal tracks were generated using the MACS2 (v2.0.10.20131216) pileup function and normalized to 1 million reads for visualization, as described in the original publication. We have added relevant information to the MATERIALS AND METHODS section in the revised manuscript (Lines 474-479 in the revised manuscript).

      - Some of the data represented in bar graphs does not look convincing/significant. Maybe this data can be better represented differently, such as in box plots or violin plots, which would better represent the data.

      Thanks so much for your comments that improve our result presentation, relevant results have been changed into box plots in the revised manuscript (Figure 3E; Figure 3—figure supplement 3C; Figure 3—figure supplement 3G in the revised manuscript). In addition, to strengthen our evidence, we have added alternative statistical method to quantify the establishment of iXCI, i.e. the percentage of H3K27me3-positive and -negative cells to total trophoblast cells in female blastocysts subject to Dnmt3b knockdown or not. (Figure 3F; Figure 3—figure supplement 3D, H in the revised manuscript).

      - The relevance and rationale for experiments using 5-aza-dC treatment is unclear.

      Thanks so much for reminding us to make our results more informative and convincing. 5-aza-dC is a well-established global DNA hypomethylating agent that efficiently inhibit the activity of all DNMTs, and thus has been frequently used to study the maintenance of DNA methylation and de novo DNA methylation (Maslov et al., 2012; Oka et al., 2005).

      In our study, to validate the function of minor de novo DNA methylation in iXCI, we take advantage of 5-aza-dC-induced DNMT inhibition, which allows us, despite its inhibitory effect common to various DNMTs, to transiently treat embryos specifically during the window of minor de novo DNA methylation (from the 8-cell to blastocyst stage). We have added these statements, as well as a schematic diagram depicting the experimental design, in the revised manuscript to make our experiments more rational and easier to be understood (Lines 183-188; Figure 3—figure supplement 3E in the revised manuscript).

      References

      Auclair, G., Guibert, S., Bender, A. and Weber, M. (2014). Ontogeny of CpG island methylation and specificity of DNMT3 methyltransferases during embryonic development in the mouse. Genome Biol. 15, 545.

      Borgel, J., Guibert, S., Li, Y., Chiba, H., Schubeler, D., Sasaki, H., Forne, T. and Weber, M. (2010). Targets and dynamics of promoter DNA methylation during early mouse development. Nat. Genet. 42, 1093-1100.

      Chen, Z., Yin, Q., Inoue, A., Zhang, C. and Zhang, Y. (2019). Allelic H3K27me3 to allelic DNA methylation switch maintains noncanonical imprinting in extraembryonic cells. Sci Adv 5, eaay7246.

      Chow, J. and Heard, E. (2009). X inactivation and the complexities of silencing a sex chromosome. Curr. Opin. Cell Biol. 21, 359-366.

      Dahlet, T., Argueso Lleida, A., Al Adhami, H., Dumas, M., Bender, A., Ngondo, R. P., Tanguy, M., Vallet, J., Auclair, G., Bardet, A. F., et al. (2020). Genome-wide analysis in the mouse embryo reveals the importance of DNA methylation for transcription integrity. Nat Commun 11, 3153.

      Fukuda, A., Tomikawa, J., Miura, T., Hata, K., Nakabayashi, K., Eggan, K., Akutsu, H. and Umezawa, A. (2014). The role of maternal-specific H3K9me3 modification in establishing imprinted X-chromosome inactivation and embryogenesis in mice. Nat Commun 5, 5464.

      Galupa, R. and Heard, E. (2015). X-chromosome inactivation: new insights into cis and trans regulation. Curr. Opin. Genet. Dev. 31, 57-66.

      Gontan, C., Mira-Bontenbal, H., Magaraki, A., Dupont, C., Barakat, T. S., Rentmeester, E., Demmers, J. and Gribnau, J. (2018). REX1 is the critical target of RNF12 in imprinted X chromosome inactivation in mice. Nat Commun 9, 4752.

      Guo, F., Li, L., Li, J., Wu, X., Hu, B., Zhu, P., Wen, L. and Tang, F. (2017). Single-cell multi-omics sequencing of mouse early embryos and embryonic stem cells. Cell Res. 27, 967-988.

      Heard, E., Chaumeil, J., Masui, O. and Okamoto, I. (2004). Mammalian X-chromosome inactivation: an epigenetics paradigm. Cold Spring Harb. Symp. Quant. Biol. 69, 89-102.

      Huynh, K. D. and Lee, J. T. (2005). X-chromosome inactivation: a hypothesis linking ontogeny and phylogeny. Nat. Rev. Genet. 6, 410-418.

      Inoue, K., Kohda, T., Sugimoto, M., Sado, T., Ogonuki, N., Matoba, S., Shiura, H., Ikeda, R., Mochida, K., Fujii, T., et al. (2010). Impeding Xist expression from the active X chromosome improves mouse somatic cell nuclear transfer. Science 330, 496-499.

      Liu, X. Y., Wang, C. F., Liu, W. Q., Li, J. Y., Li, C., Kou, X. C., Chen, J. Y., Zhao, Y. H., Gao, H. B., Wang, H., et al. (2016). Distinct features of H3K4me3 and H3K27me3 chromatin domains in pre-implantation embryos. Nature 537, 558-562.

      Maslov, A. Y., Lee, M., Gundry, M., Gravina, S., Strogonova, N., Tazearslan, C., Bendebury, A., Suh, Y. and Vijg, J. (2012). 5-aza-2'-deoxycytidine-induced genome rearrangements are mediated by DNMT1. Oncogene 31, 5172-5179.

      Oka, M., Meacham, A. M., Hamazaki, T., Rodic, N., Chang, L. J. and Terada, N. (2005). De novo DNA methyltransferases Dnmt3a and Dnmt3b primarily mediate the cytotoxic effect of 5-aza-2'-deoxycytidine. Oncogene 24, 3091-3099.

      Okano, M., Bell, D. W., Haber, D. A. and Li, E. (1999). DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 247-257.

      Schulz, E. G. and Heard, E. (2013). Role and control of X chromosome dosage in mammalian development. Curr. Opin. Genet. Dev. 23, 109-115.

      Smith, Z. D., Chan, M. M., Mikkelsen, T. S., Gu, H. C., Gnirke, A., Regev, A. and Meissner, A. (2012). A unique regulatory phase of DNA methylation in the early mammalian embryo. Nature 484, 339-344.

      Tan, K., An, L., Miao, K., Ren, L., Hou, Z., Tao, L., Zhang, Z., Wang, X., Xia, W., Liu, J., et al. (2016). Impaired imprinted X chromosome inactivation is responsible for the skewed sex ratio following in vitro fertilization. Proc. Natl. Acad. Sci. U. S. A. 113, 3197-3202.

      Reviewer #1 (Recommendations For The Authors):

      Title

      It would be hard to understand what "co"-regulates means. Does this mean DNA methylation and H3K27me3 co-regulate imprinted X- X-chromosome inactivation? If so, the title can be reworded.

      Thanks for your insightful comments, the title has been corrected into “A wave of minor de novo DNA methylation initiates in mouse 8-cell embryos and co-regulates imprinted X- chromosome inactivation with H3K27me3” (Line 2 in the revised manuscript).

      Text

      (1) As DNA methylation analysis is a primary part of this study, how they processed DNA methylation data can be added to the "Bioinformatics analysis" in the MATERIALS AND METHODS section.

      Thanks for your kind reminder. We have added relevant information in the Materials and methods section in the revised manuscript (Lines 462-474 in the revised manuscript).

      (2) It seems that recent literature has not been cited in the manuscript. Specifically, none of the papers after 2018 were cited. Recent relevant papers should also be cited throughout the manuscript.

      Thanks so much for your reminder. We have added more recent literature to update the relevant information, such as the evidence supporting the causal role between DNA methylation and XCI (Lines 225-228, 264-265 in the revised manuscript); the concurrent enrichment of DNA methylation and H3K27me3 in genes subject to XCI (Lines 301-303 in the revised manuscript); the dominant role of de novo methylation in X chromosome (Lines 253-256 in the revised manuscript), etc.

      (3) Line 56: The first report that describes the dynamics of DNMT3B expression in pre-implantation embryonic development (Hirasawa et al., 2007) is missing. This paper should be cited.

      Sorry for our carelessness, we have added relevant references and rewritten the sentence in the revised manuscript (Lines 56-57 in the revised manuscript). I think you meant the report by Hirasawa et al in 2008, in which presented expression and subcellular localization of Dnmt3a and Dnmt3b in mouse oocytes and preimplantation embryos.

      (4) Line 98: It would be good to mention that the data were derived from reduced representation bisulfite sequencing as the authors used whole-genome bisulfite sequencing data from the same research group as well.

      Thanks for your kind reminder. As you have suggested, we have added the description in the revised manuscript to emphasize that these data were derived from reduced representation bisulfite sequencing, while another data were derived from whole-genome bisulfite sequencing, respectively. (Lines 98-99, 111 in the revised manuscript).

      (5) Line 101: We first... "the preferential target of DNMT3B (Auclair et al., 2014; Borgel et al., 2010)". More recent literature (Baubec et al., 2016, Duymich et al., 2016, for example) showed that the preferential target of DNMT3B is not a promoter but a gene body. This sentence should be reworded.

      Thanks so much for your detailed reminder. As you have pointed out, “preferential target” seems to be an inaccurate statement. Besides of promoters, gene bodies and other elements also undergo de novo DNA methylation (Auclair et al., 2014; Dahlet et al., 2020; Duymich et al., 2016).

      We have rewritten the sentence as follows in the revised manuscript: “Promoter regions are important target sites of DNMT3B (Choi et al., 2011). The acquisition of DNA methylation in promoters, especially in intermediate and low CpG promoters, during implantation is largely dependent on DNMT3B and plays an important role in regulating developmental genes (Auclair et al., 2014; Borgel et al., 2010; Dahlet et al., 2020). Thus, among genomic regions that may undergo de novo DNA methylation, we initially focused our analysis on DNA methylation dynamics of promoters...” (Lines 100-106 in the revised manuscript)

      (6) Lines 108-109: It would be good to mention that these data were derived from whole-genome bisulfite sequencing.

      Thanks for your kind reminder. As aforementioned, we have added a description in the revised manuscript to distinguish between data derived from reduced representation bisulfite sequencing and whole-genome bisulfite sequencing (Lines 98-99, 111 in the revised manuscript).

      (7) Line 141: rXCI should be defined.

      Thanks for your kind reminder. We have added full descriptions and more necessary information about iXCI and rXCI, to make our statements clearer and easier to be understood (Lines 210-213 in the revised manuscript). In addition, we carefully checked the relevant descriptions throughout the manuscript, and each abbreviation (such as “ICM”) has been defined at its first occurrence. Additionally, we have replaced abbreviations that appears only once in the manuscript with their full terms (Lines 122, 212 in the revised manuscript).

      (8) Lines 145-149: The role of DNA methylation for imprinted X-inactivation has already been reported (Chiba et al., 2008). The relevant sentences should be reworded.

      Thanks so much for reminding us the important earlier literature that explores the relationship between DNA methylation and XCI. However, the primary aim and hypothesis of the study by Chiba et al. are different from those of our study. Chiba et al focused on whether DNA methylation is the imprinting mark responsible for monoallelic expression of Xist (the initiation event of iXCI), while our study focused on the role of DNA methylation in achieving X chromosomal heterochromatinization (the late event of iXCI).

      In detail, the study by Chiba et al. mainly focused on exploring why Xist is specifically expressed from paternal allele and iXCI occurs specifically on the paternal X chromosome in mouse preimplantation embryos. Because Previous studies have suggested that genomic imprinting of Xist is established during oogenesis (Oikawa et al., 2014; Tada et al., 2000), Chiba et al. wanted to test whether the DNA methylation imprinting established during oogenesis is responsible for the monoallelic expression of Xist in preimpantaiton embryos. Analyses of DNA methyltransferase maternal knockout embryos revealed that oocyte DNA methylation is dispensable for Xist imprinting (Chiba et al., 2008). Follow-up study by Inoue et al. identified a broad H3K27me3 enrichment within the Xist 5’region established during oocyte growth and persists through preimplantation development, as the imprinting mark of Xist (Inoue et al., 2017). These series of studies are very important and allows us to understand the mechanism underlying paternal allele-specific iXCI in mouse preimplantation embryos and extraembryonic tissues.

      However, the hypothesis is different in our study. Based on the finding of minor de novo DNA methylation and its preferential distribution on the X chromosome, we have speculated that the minor de novo methylation, which occurs from the 8-cell to blastocyst stage, may participate in achieving X chromosomal heterochromatinization. Although DNA methylation is essential for maintaining X chromosome-wide transcriptional silence of rXCI, its role in iXCI remains controversial and it is even plausibly thought that DNA methylation is not required for achieving iXCI because preimplantation embryos undergo global and massive DNA demethylation.

      We have reorganized this paragraph, relevant statements have been added to make the background and discussion clearer and easier to be understood. (Lines 217-234 in the revised manuscript)

      (9) Lines 164-165: Information regarding Dnmt3b KO is missing. Did the authors generate an original KO line or use an already published one? It should be explicitly stated.

      Thank you so much for your kind reminder. The Dnmt3b heterozygous mice were obtained from the Mutant Mouse Resource & Research Centers (MMRRC), and Dnmt3b knockout (KO) embryos were generated by mating Dnmt3b heterozygous females with heterozygous males. The genotyping of Dnmt3b KO embryos was performed by PCR following the MMRRC genotyping protocol (https://www.med.unc.edu/mmrrc/genotyping-protocols/mmrrc-center-protocol-29886/). The relevant information has been added to the MATERIALS AND METHODS section in the revised manuscript (Lines 350-354; 391-393 in the revised manuscript).

      (10) Line 165: chemical-induced inhibition of DNMT3B. As 5-aza-dC also blocks DNMT3A and DNMT1, this sentence should be reworded.

      Thank you for your valuable comments. 5-aza-dC is a well-established global DNA hypomethylating agent that efficiently inhibit the activity of all DNMTs, and has been frequently used to study the maintenance of DNA methylation and de novo DNA methylation (Maslov et al., 2012; Oka et al., 2005). Thus, despite its inhibitory effect common to various DNMTs, chemical-induced inhibition of DNMTs has the advantage of allowing us to transiently treated embryos specifically during the window of minor de novo DNA methylation (the 8-cell to blastocyst stage). We have rewritten the relevant sentences in the revised manuscript (Lines 183-188 in the revised manuscript).

      (11) Lines 171-174: "The role of de novo methylation in iXCI...". This possibility was already tested in the previous study from the Sasaki lab (Chiba et al., 2008).

      As mentioned above, the primary aim and hypothesis of the study by Chiba et al. are different from those of our study. Chiba et al. mainly focused on exploring why Xist is specifically expressed from paternal allele and iXCI occurs specifically on the paternal X chromosome in mouse preimplantation embryos, so they tested whether the DNA methylation imprinting established during oogenesis is responsible for this monoallelic expression of Xist in preimplantation embryos (the initiation event of iXCI).

      By contrast, based on the finding of minor de novo DNA methylation and its preferential distribution on X chromosome, our study has speculated that the minor de novo DNA methylation, which occurs from the 8-cell to blastocyst stage, may participate in achieving X chromosomal heterochromatinization (the late event of iXCI).

      Thanks so much for reminding us this important literature, to make our discussion more informative. We have reorganized this paragraph by rewriting or adding relevant statements to make the background and discussion clearer and easier to be understood (Lines 217-231 in the revised manuscript). In addition, to avoid repeated statement and make our discussion more concise, we have removed the similar sentences at the end of this paragraph.

      (12) Lines 198-200: "Given DNA methylation...". These citations mention a general relationship between DNA methylation and H3K27me3 in cells in culture. As I believe the authors focus on X-chromosome inactivation in the female embryos, more relevant papers that discuss the order of the events for the establishment of H3K27me3 and DNA methylation in the inactive X-chromosome can be cited.

      Thanks so much for your comment to improve our discussion. It has been thought that during the late phase of rXCI in fully differentiated cells, gene silencing is achieved by PRC2 complex-induced H3K27me3, and then is further stably maintained by the redundant action of multiple layers of epigenetic modifications, including DNA methylation, to reach the maximum level of chromatin compaction (Chow and Heard, 2009; Heard et al., 2004; Pintacuda and Cerase, 2015). In line with this, a recent multifaceted analysis showed that DNA methylation and H3K27me3 are concurrently enriched in genes subject to XCI (Balaton and Brown, 2021). We have added these statements in the revised manuscript (Lines 295-303 in the revised manuscript).

      (13) Line 241: As 5-aza-dC blocks both de novo and maintenance DNA methylation, this sentence should be reworded.

      Thank you for your kind reminder. As you have mentioned above, 5-aza-dC is a well-established global DNA hypomethylating agent that efficiently inhibit the activity of all DNMTs, and has been frequently used to study the maintenance of DNA methylation and de novo DNA methylation (Maslov et al., 2012; Oka et al., 2005). Thus, despite its inhibitory effect common to various DNMTs, chemical-induced inhibition of DNMTs has the advantage of allowing us to transiently treated embryos specifically during the window of minor de novo DNA methylation (the 8-cell to blastocyst stage). We have rewritten the relevant sentences in the revised manuscript (Lines 183-188 in the revised manuscript).

      Figures

      (1) Figure 1C, D: Do the rows in C and D show the corresponding genes?

      Figure 1C and D represent the DNA methylation changes of promoters (C) and gene bodies (D) respectively, during the transition from the 8-cell to blastocyst stage. Two data were analyzed independently, and rows did not show the corresponding genes. Since we have focused on the minor de novo methylation in promoter regions, to avoid confusion, the results of the gene body have been removed from the revised manuscript.

      (2) Figure 1G: Yy2 promoter gained DNA methylation during the transition from 8-cell to the blastocyst stage. Is this a representative locus for the de novo methylated promoters that are shown in Figure 1F where an increase of DNA methylation is about ~1% on average? Another representative locus could be shown instead of this gene promoter.

      Thanks so much for you detailed reminder. The inconsistency between the global methylation change and bisulfite sequencing analysis of Yy2, may be due to the details of methodologies, such C-T conversion efficiency, the number of picked colonies, etc. Since we have confirmed the presence of minor de novo DNA methylation using different publicly available data, to avoid ambiguity, we have removed this result in revised manuscript.

      (3) Figures 2C and 3A: It would be helpful to mention what the arrowheads mean.

      Thanks so much for you detailed reminder. In Figure 2C, the arrowhead indicates the H3k27me3 domain and the blank arrowhead indicates the blastomere without the H3k27me3 domain. In Figure 3A, the arrowhead indicates Xist RNA domain and the blank arrowhead indicates the blastomere without Xist RNA domain. We have added the information in the revised manuscript (Lines 736-738, 747-749 in the revised manuscript).

      (4) Figure 3-figure supplement 2B: It would be hard to see whether H3K27me3 is enriched at the promoter regions of presented genes. It would be helpful to show the values for the Y-axis as in panel A.

      Thanks for your helpful reminder. We have added the scales to the figure to improve the result presentation (Figure 4—figure supplement 2B in the revised manuscript).

      (5) Figure 4-figure supplement 2: 5-aza-dC blocks not only the activity of DNMT3B but also DNMT1, and DNMT3A (all these DNMTs are expressed during pre-implantation embryos, see Hirasawa et al., 2007). This part can be omitted from the manuscript.

      Thanks for your insightful comments. As you have mentioned above, the relevance and rationale for experiments using 5-aza-dC treatment should be clarified. 5-aza-dC is a well-established global DNA hypomethylating agent that efficiently inhibit the activity of all DNMTs, and thus has been frequently used to study the maintenance of DNA methylation and de novo DNA methylation (Maslov et al., 2012; Oka et al., 2005).

      In our study, to validate the function of minor de novo DNA methylation in iXCI and blastocyst development, we take advantage of 5-aza-dC-induced DNMT inhibition, which allows us to transiently treated embryos specifically during the window of minor de novo DNA methylation (the 8-cell to blastocyst stage), despite its non-specificity to various DNMTs.

      Based on these considerations, we hope to retain this result, and wish to get your understanding.

      We have added these statements in the revised manuscript to make our experiments more rational and easier to be understood (Lines 183-188 in the revised manuscript) and added a schematic diagram depicting the experimental design (Figure 3—figure supplement 3E in the revised manuscript).

      Reviewer #2 (Recommendations For The Authors):

      Recommendations/concerns in the text:

      - Line 106, it is unclear what is meant by "in line with this"? Gene body DNA methylation is a characteristic of active transcription, so why would a gain in DNA methylation at promoters be in line with a gain in DNA methylation over gene bodies?

      Thank you so much for your comments that pointed out our ambiguous statement. We meant both the promoter and gene body regions, albeit accounting for small proportions, gain DNA methylation during the transition from the 8-cell to blastocyst stage. Based on the comment by Reviewer#1, since we have focused on the minor de novo methylation in promoter regions, to avoid confusion, the results of the gene body have been removed from the revised manuscript.

      - Line 111 & 114, can 6% DNA methylation really be considered "relatively hypermethylated" compared to 3% DNA methylation that is referred to as "more hypomethylated"?

      We apologize for our unclear and ambiguous statements. Here we focused on the promoter regions. Many previous studies have revealed that compared with gene bodies and other genome elements, promoter and overlapping CGI regions, especially high CpG promoters, always showed low levels of DNA methylation. We have added relevant statements to clarify this information, and rewritten the sentences in the revised manuscript (Lines 100-106, 116-118, 121, 124 in the revised manuscript).

      - Line 124, there are a number of processes identified, why only mention one in the text? Suggest changing writing to be more accurate, indicating what was included for the GO analysis and using the words "enriched for ... processes". Saying it may be linked to a process is an overstatement and not supported by further experiments/data.

      Thank you so much for your detailed comments that make our results more informative. We have checked the relevant description and addressed your suggestions as follows: By performing gene ontology enrichment analysis of genes that undergo minor or major de novo DNA methylation respectively, we noticed that besides of many important basic processes common to two waves of de novo DNA methylation, genes subject to minor de novo DNA methylation were enriched in processes such as organic substance transport, chromosome organization, and cell fate specification (Lines 129-134 in the revised manuscript).

      - Lines 149 - 152: sentence/message unclear.

      We apologize for the ambiguous description. We have corrected the relevant descriptions as follows: To identify the biological function of minor de novo DNA methylation in iXCI, we knocked down Dnmt3b in preimplantation embryos by microinjecting Dnmt3b siRNA into zygotes (Lines 234-236 in the revised manuscript).

      - Lines 162-164: the data in Figure 2C/D does not support this statement, as it does not show H3K27me3 loss specifically at the inactive X-chromosome.

      Thanks so much for your insightful comments. Despite the global enrichment of H3K27me3, the H3K27me3 domain detected by immunostaining is a classic marker for establishment of XCI by achieving X chromosome wide heterochromatinization of transcriptional depression (Chow and Heard, 2009; Heard et al., 2004; Huynh and Lee, 2005). Thus, we have used immunostaining for H3K27me3 domains to evaluate the iXCI establishment in the blastocysts, as previously reported (Fukuda et al., 2014; Gontan et al., 2018; Inoue et al., 2010; Tan et al., 2016). To make our results more convincing, we have added another statistical method to quantify the establishment of iXCI, i.e., the percentage of H3K27me3-positive and -negative trophoblast cells to total trophoblast cells in female blastocysts subject to Dnmt3b knockdown or not.

      In addition, we have added a schematic diagram depicting the process of iXCI initiation and establishment, as well as the experimental design and work flows, to make the result easier to be understood.

      In addition, we agree with your comments that additional evidence will benefit the conclusion. To strengthen the evidence, and test whether DNA methylation loss leads to a prolonged effect on iXCI, we have reanalyzed the RNA-seq and H3K27me3 CHIP-seq data in extraembryonic ectoderm (ExE) of E6.5 single embryos that underwent Dnm3a/3b knockout because preimplantation iXCI status maintains extraembryonic cells (Chen et al., 2019; Galupa and Heard, 2015; Schulz and Heard, 2013). The results showed that chromosome-wide loss of DNA methylation led to a nearly complete loss of H3k27me3 on paternal (specifically inactivated in iXCI), along with a notable transcriptional upregulation cross the chromosome. By contrast, these changes cannot be not observed on maternal X chromosome. (Lines 253-261; Figure 3—figure supplement 4A in the revised manuscript)

      - Lines 169-174: sentence/message unclear.

      As aforementioned, we have reorganized this paragraph by rewriting or adding relevant statements relevant to the DNA methylation and XCI, to make the background and discussion clearer and easier to be understood (Lines 217-234 in the revised manuscript). In addition, to avoid repeated statement and make our discussion more concise, we have removed the similar sentences at the end of this paragraph.

      - Lines 177-179: this statement is too bold. The data does not support "direct evidence".

      Thank you for your detailed reminder. We have rewritten the sentence to avoid confusion and overstatement (Lines 262-268 in the revised manuscript).

      - Line 198: these are not all enzymes, but could be referred to as chromatin modifiers.

      We apologize for the ambiguous description. As you suggested, we have corrected “enzymes” to “chromatin modifiers” (Lines 284, 287 in the revised manuscript).

      - Line 199: this statement is not correct in all contexts. There are many studies showing antagonism between DNA methylation and H3K27me3.

      Thanks so much for you careful reviewing. As you have pointed out, the relationship of DNA methylation and H3K27me3 are divergent and largely controversial among studies. Under certain circumstances, DNA methylation shows antagonistic effect to H3K27me3 at promoters, via excluding the binding of PRC2 (the main complex responsible for H3K27me3 deposition) components to their targets (Bartke et al., 2010; Jermann et al., 2014), while other studies have presented alternative evidence that PRC2 (the main complex responsible for H3K27me3 deposition) and DNA methylation cooperate to achieve silencing (Hagarman et al., 2013; Vire et al., 2006). Thus, it has been thought that the relationship between DNA and methylation and histone modifications is complex, possibly in a cell-type and/or genomic region-specific manner. Both antagonism and coordination can be observed in different regulatory elements in mouse ES cells (King et al., 2016).

      We apologize our incomplete statement because we mainly focused on their synergistic relationship. We have refined this section by rewriting relevant sentences and adding necessary statements (Lines 288-303 in the revised manuscript).

      - Lines 228-230: the developmental significance of DNA methylation homeostasis is already well-established. Please reference relevant papers showing this here.

      Thank you for this helpful suggestion. We have reorganized this section. Relevant references that highlight the developmental significance of DNA methylation homeostasis have added. The sentence has been rewritten and moved to the end of this paragraph, in the revised manuscript (Lines 159-161 in the revised manuscript).

      - Line 238: an explanation/rationale for looking at energy metabolism is lacking.

      Thank you for your comments to make our results earlier to be understood. The detection of energy metabolism is mainly based on the integrated analysis of DNA methylation and gene expression from the 8-cell embryos to ICM, to test the potential short-and long-term developmental consequences of minor de novo DNA methylation. Bioinformatic analysis suggested that many basic processes, such as cell differentiation, cell cycle and metabolic regulation, may be regulated by minor de novo DNA methylation. Among the enriched genes, several are related energy metabolism. In addition, because energy metabolism is crucial for supporting embryo differentiation and development, and oxidative phosphorylation (OXPHOS) metabolism is highly activated during the blastocyst stage (Zhao et al., 2021), we next examined the energy metabolism, particularly OXPHOS activity, of Dnmt3b-KD embryos. We have refined the section by rewritten relevant sentence and added necessary statements (Lines 175-179 in the revised manuscript).

      - Lines 246-248: Looking at the data in Figure 2 figure supplement 2, this statement is simply not true with regards to DNMT3B protein, and also global DNA methylation level is reduced in the Dnmt3b KD blastocyst, which could lead to defective major de novo DNA methylation.

      Thanks for your careful reviewing, we have rewritten the sentence to make our statement more accurate and avoid overstatement (Lines 188-190 in the revised manuscript).

      Recommendations/concerns relating to figures:

      Figure 1:

      - Of all genic promoters, how many were included in the analysis (contained sufficient coverage)? What cut-off/thresholds were used to consider DNA methylation gain at a promoter?

      Thanks for your comments. In total, 11662 promoters were analyzed. Given that promoter methylation is generally at low level, particularly at the 8-cell stage at which minor de novo methylation is just initiated. The relatively lower basal levels make the increase before the blastocyst, seem considerably slight. To capture the slight changes, we have used the relaxed threshold based on ΔDNA methylation. Only CpG sites with at least fivefold coverage were included in the methylation analysis based on data from Smith et al. (Smith et al., 2012)., ΔDNA methylation greater or less than 0 was defined as gain or loss of DNA methylation. We have added this information in the revised manuscript (Lines 462-470 in the revised manuscript).

      - Does an average methylation level of 0.02 represent 2% DNA methylation? Presuming yes, is the average 1.5% DNA methylation gain at promoters real? And meaningful? Especially compared to the gain in DNA methylation that takes place between ICM and E6.5 (Figure 1 Figure Supplement 1 D)

      As you have pointed out, an average methylation level of 0.02 represent 2% DNA methylation. As aforementioned, promoters exhibited an average of 1.5% DNA methylation gain during the transition from 8-cell stage to ICM. The slight increase may be mainly due to the relatively lower basal levels. As you expected, compared with the comprehensive de novo DNA methylation during implantation, preimplantation de novo methylation occurs more slightly, at a small proportion of promoter regions, so designated it as minor de novo DNA methylation. It should be also mentioned that a proportion of these promoters continue to gain massive DNA methylation during implantation. We have refined the relevant sentences to provide more detailed information of our results (Lines 125-127 in the revised manuscript).

      - Why is there a focus on promoters (which are not the preferential target of DNMT3B)?

      Thanks so much for your detailed reminder. As you have pointed out, “preferential target” seems to be an inaccurate statement. besides of promoters, gene bodies and other elements also undergo de novo DNA methylation (Auclair et al., 2014; Dahlet et al., 2020; Duymich et al., 2016). We have focused on the promoter regions based on the following considerations: (1) Promoter regions are important target sites of DNMT3B (Choi et al., 2011); (2) The acquisition of DNA methylation in promoters, especially in intermediate and low CpG promoters, during implantation is largely dependent on DNMT3B and plays an important role in regulating developmental genes (Auclair et al., 2014; Borgel et al., 2010; Dahlet et al., 2020). We have rewritten the relevant sentence in the revised manuscript (Lines 100-106 in the revised manuscript).

      - Figure 1H shows that promoters that gain DNA methylation during the "minor de novo DNA methylation" continue to gain DNA methylation during "de novo DNA methylation". Is the ~1.5% DNA methylation gain just the slow start of the main de novo DNA methylation wave?

      Your comments is very helpful to improve the description of our results. In the present study, our analysis indicated that a small proportion of promoters initially gain methylation during the transition from the 8-cell to ICM. The finding challenges current knowledge: (1) de novo DNA methylation occurs during implantation, by which globally hypomethylated blastocysts acquire genome-wide DNA methylation (Borgel et al., 2010; Dahlet et al., 2020; Smith et al., 2012); (2) during preimplantation development, embryos undergo massive and global DNA demethylation.

      To distinguish the current knowledge of the timing and dynamics of DNA methylation during the early development, we have designated our finding during the transition from the 8-cell to blastocyst stage, as minor de novo DNA methylation.

      We agree with your notion that among the promoters undergoing minor de novo methylation, most of them continue to gain DNA methylation during implantation, as revealed in Fig. 1F. We have added refine the relevant statement in revised manuscript (Lines 125-127 in the revised manuscript).

      - The GO analysis performed for Figure 1H, what was used as input? Promoters of genes that gain DNA methylation as identified in 1C?

      Thank you for your comments. For the GO analysis shown in Figure 1H, we used genes with promoter regions that gained or lost DNA methylation during the transition from the 8-cell to ICM respectively (identified in Figure 1C, as input), respectively. This information has been clarified in the revised manuscript to ensure accuracy (Lines 129-134 in the revised manuscript).

      - Figure 1 figure supplement 1, is there only a fold change as threshold or also a calculated significance (eg. p-value/FDR)?

      Thanks for your valuable comments. Considering the relatively low DNA methylation levels at promoter regions, and the slightly changes occurring during the preimplantation embryo development, we used the relaxed threshold based on ΔDNA methylation. Only CpG sites with at least fivefold coverage were included in the methylation analysis based on data from Smith et al. (Smith et al., 2012), ΔDNA methylation greater or less than 0 was defined as gain or loss of DNA methylation. We have replaced relevant figures and added this information in the revised manuscript (Figure 1—figure supplement 1D-E; Lines 125-127 in the revised manuscript).

      - To confirm DNMT3B is responsible for the DNA methylation gain: DNMT3B KD/KO followed by promoter DNA methylation analysis to confirm the promoters that gain DNA methylation between 8 cell and ICM don't gain DNA methylation in the absence of DNMT3B.

      We agree with your comments that additional evidence will benefit the conclusion. To strengthen the evidence, we have reanalyzed the RNA-seq and H3K27me3 CHIP-seq data in extraembryonic ectoderm (ExE) of E6.5 single embryos that underwent Dnm3a/3b knockout because preimplantation iXCI status maintains extraembryonic cells (Chen et al., 2019; Galupa and Heard, 2015; Schulz and Heard, 2013). The results showed that chromosome-wide loss of DNA methylation led to a nearly complete loss of H3k27me3 on paternal (specifically inactivated in iXCI), which showed a notable transcriptional upregulation cross the chromosome. By contrast, these changes cannot be not observed on maternal X chromosome. We have added this result in the revised manuscript (Lines 253-261; Figure 3—figure supplement 4A in the revised manuscript).

      Figure 2:

      - Figure 2A: label missing for what the numbers on the y-axis represent.

      Thank you for pointing this out. We apologize for the oversight. We have added the label of y-axis in Figure 2A to clarify what the numbers represent, making it easier to be understood (Figure 3A in the revised manuscript).

      - Figure 2B: y-axis is % of methylated promoters compared to all promoters?

      Thank you for your suggestion. The y-axis in Figure 2B indeed represents the percentage of de novo methylated promoters relative to all promoters. As you have suggested, we have clarified this labeling in the revised manuscript (Figure 3B in the revised manuscript).

      - What is the delta DNA methylation gain specifically for X-linked promoters?

      Thanks so much for your reminder. To provide more convincing evidence. We have reanalyzed a single cell COOL-seq data, we also specifically reanalyzed the DNA methylation changes on the X chromosomal promoter in female embryos. The X chromosome showed a more notable increase in the de novo methylated promoters than that on autosomes, and the female X chromosome showed higher DNA methylation levels than that of the male (Figure 3—figure supplement 2A-B; Lines 203-206 in the revised manuscript).

      - Figure 2C: include representative images of separate channels to better see the signal of CDX2 and H3K27me3. Quantification would be better represented with box plots.

      Thank you for your helpful suggestions. We have added separate channel images in the revised manuscript. Additionally, we have adjusted the quantification to be represented as box plots, as you have suggested, to improve the accuracy and interpretability of the data presentation (Figure 3D-F in the revised manuscript).

      - Figure 2C: Does the H3K27me3 signal overlap with the location of the inactive X-chromosome (is there maybe denser DAPI or do IF combined with Xist RNA-FISH)?

      Thanks so much for your insightful comments. Despite the global enrichment of H3K27me3, the H3K27me3 domain detected by immunostaining is a classic marker for establishment of XCI by achieving X chromosome wide heterochromatinization of transcriptional depression (Chow and Heard, 2009; Heard et al., 2004; Huynh and Lee, 2005). Thus, we have used immunostaining for H3K27me3 domains to evaluate the iXCI establishment in the blastocysts, as previously reported (Fukuda et al., 2014; Gontan et al., 2018; Inoue et al., 2010; Tan et al., 2016). We have taken effort to perform co-staining of H3K27me3 IF and Xist FISH, but was hindered by the technical challenge, we wish to get your understanding. However, as we aforementioned, H3K27me3 is a well-accepted maker to clarify the XCI status.

      In addition, to make our results more convincing, we have added an alternative statistical method to quantify the establishment of iXCI, i.e., the percentage of H3K27me3-positive and -negative trophoblast cells to total trophoblast cells in female blastocysts subject to Dnmt3b knockdown or not (Figure 3F; Lines 243-244 in the revised manuscript)

      - Figure 2 figure supplement 2A: relative expression of Dnmt3b?

      Thanks for your detailed reminder. The data represent the relative expression level of Dnmt3b, as noted in the original figure legend. Based on your comments, we have added the gene name in the label of the Y-axis. Similarly, the protein name has been also added to make the results more informative (Figure 2 figure supplement 2A, C, E in the revised manuscript).

      - Figure 2 figure supplement 2B/C: in the text, line 153, it is stated that "Dnmt3b mRNA and protein levels were significantly reduced in morulae, but not in blastocysts compared to those of negative control (NC) group". These figures do not support that statement. The IF images show a loss of DNMT3B in the Dnmt3b KD blastocysts. The IF quantification seems to have fewer datapoints for the blastocyst, and looking at the bar graphs, there seems to be a trend towards reduced DNMT3B in both the morula and blastocyst, which would also explain the reduction in DNA methylation in both stages as shown in Figure 2 figure supplement 2D/E.

      Thanks so much for your careful reviewing that makes our statements more accurate. We have rewritten the sentence in the revised manuscript as follows: Dnmt3b mRNA and protein levels were significantly reduced in morulae, and tended to be lower in blastocysts compared to those of the negative control (NC) group. In addition, we have removed “transient” from the original statement “The transient inhibition of Dnmt3b” (Lines 168-170 in the revised manuscript).

      - Figure 2 figure supplement 2F/G: include representative IF images with separation of all channels and the merged image.

      Thank you for your suggestion. We have added the representative immunofluorescence (IF) images with separate channels and merged image in the revised manuscript (Figure 3—figure supplement 3B, F in the revised manuscript).

      - Figure 2 figure supplement 2H: Instead of showing log2FC in methylation levels, delta methylation would be more informative. Are these genes already inactivated at the 8-cell stage? Or are they active and become inactivated by the gain in DNA methylation? Doing qPCR for these genes, or looking at published RNAseq data would be informative. What happens to the expression of these genes in the Dnmt3b KD?

      Thanks for your suggestions. We have represented DNA methylation changes as “ΔDNA methylation”. During mouse preimplantation development, iXCI is initiated in earlier cleavage female embryos dependent on Xist upregulation around 4-8-cell stage, and then Xist specifically coats paternal X chromosome and finally leads to chromosome-wide silencing via heterochromatinization in early blastocysts. Thus, these non-escaping genes, which are subject to XCI, would not be inactivated at 8-cell stage

      Author response image 1.

      The processes of iXCI initiation and establishment (left panel), and dynamics of total expression levels of X chromosome in male and female preimplantation embryos (right panel, note that X-dosage is balanced between sexes until the early blastocyst stage).

      As you expected, most of these representative non-escaping is downregulated upon the transition of 8-cell to blastocyst stage, consistent with their gain of DNA methylation. Additionally, since preimplantation iXCI status maintains extraembryonic cells (Galupa and Heard, 2015; Schulz and Heard, 2013), we further reanalyzed the published RNA-seq data in extraembryonic ectoderm (ExE) of E6.5 single embryos that underwent DNA methyltransferase knockout (Chen et al., 2019). The results showed that chromosome-wide loss of DNA methylation led to a chromosome-wide transcriptional upregulation, including the locus of these non-escaping genes, on paternal X chromosome. We have added this result in the revised manuscript (Figure 3—figure supplement 3J; Figure 3—figure supplement 4A-B; Lines 253-261 in the revised manuscript).

      Figure 3:

      - Figure 3 figure supplement 1: representative IF image missing.

      Thanks for your kind reminder. We have added the representative IF images in the revised manuscript to provide a clearer illustration of the data (Figure 4—figure supplement 1A in the revised manuscript).

      - Figure 3 figure supplement 2B: scales are missing for the H3K27me3 ChIP-seq data (are the 8-cell and ICM tracks set to the same scale?). It looks like the ICM track is cut off at the top (peaks not fully displayed) and the data looks very sparse. A more informative analysis would be to do peak calling over promoters and compare 8-cell with ICM.

      Thanks for your detailed reminder. We apologize for the missing of scale bars in the H3K27me3 ChIP-seq data. The 8-cell and ICM tracks were set to the same scale, and we have now added scales to the figure in the revised manuscript to improve the result presentation. As you have speculated, the visual effect of the flatted peak is not caused by track cutting off, but rather by zooming into a specific region in the extended IGV files.

      These results are based on the reanalysis of publicly available data of pooled embryos, which just provided suggestive but not direct evidence to support the role of DNA methylation in promoting X-linked H3K27me3 enrichment in iXCI.

      To provide more convincing evidence. we have reanalyzed the RNA-seq and H3K27me3 CHIP-seq data in extraembryonic ectoderm (ExE) of E6.5 female embryos that underwent Dnmt3a/3b knockout because preimplantation iXCI status maintains extraembryonic cells (Chen et al., 2019; Galupa and Heard, 2015; Schulz and Heard, 2013). The results showed that Dnmt knockout led to a nearly complete loss of H3k27me3 on paternal (specifically inactivated in iXCI), which showed a notable transcriptional upregulation cross the chromosome. By contrast, these changes cannot be not observed on maternal X chromosome (Figure 3—figure supplement 4 in the revised manuscript). We have added these results in the revised manuscript.

      - Figure 3E: Given all tested proteins give a positive signal, it would have been good to include a negative control chromatin protein that is known to not interact with DNMT3B. Given both PRC2 and DNMT3B are chromatin-binding proteins, can the signal be a result of close proximity instead of a direct interaction?

      In the present study, to test the interaction between DNMT3B and PRC2 core components, we have used in situ proximity ligation assay (PLA), an increasingly popular technique for detecting the close proximity of two proteins in fixed samples using two primary antibodies (Alsemarz et al., 2018).

      Author response image 2.

      Schematic diagram of the principle of the in situ PLA.

      Compared with classical co-Immunoprecipitation (Co-IP) method, in situ PLA has advantages in (1) detecting low input samples or proteins expressed at low levels, which is extremely difficult using Co-IP; (2) providing in situ or subcellular information of protein-protein interaction. However, it should be noted that the maximal distance allowing this reaction is 40 nm, which is not quite small enough to demonstrate a physical interaction between the two antigens, but sufficient to support a very close “proximity”.

      In our study, in situ PLA, including the experimental design of negative control, was performed in the accordance with the manufacturer’s instruction of Duolink® In Situ Red Starter Kit (MilliporeSigma): “Technical negative controls included incubation with each primary antibody separately and no primary antibody”. We have refined the relevant sentence in the revised manuscript (Lines 308-310 in the revised manuscript)

      - Figure 3G: It would have been good to include a negative control, and DNase/benzonase to exclude DNA/RNA-mediated protein interaction.

      - (Of note, there have been previous studies reporting an interaction between PRC2 and DNMT3B in other cell types, such as in Weigert et al. 2023, but unfortunately, they don't seem to use DNase/benzonase either).

      The Co-IP analysis of DNMT3B and PRC2 core components in differentiated female ES cells was presented as additional supportive evidence. Because the Co-IP analysis is extremely difficult for preimplantation embryos, we have used in situ PLA to detect their interaction. However, the maximal distance allowing in situ PLA reaction is 40 nm, which is not quite small enough to demonstrate a physical interaction (Alsemarz et al., 2018). Thus, we have added a Co-IP analysis using differentiated female ES cells, in which rXCI occurs upon the differentiation.

      Based on this consideration of the importance and contribution of this result, we have moved this result from the main figure, to the supplemental figure (Figure 4—figure supplement 3H in the revised manuscript).

      - Figure 3 figure supplement 3G: what were the ESCs differentiated into? Did the Dnmt3b KO or Dnmt3a/b DKO show any differentiation defect?

      The mouse ESC line PGK12.1 was a well-established ex vivo model of rXCI. Under the standard culture condition, PGK12.1 is normally fated to neuroectodermal commitment.

      Author response image 3.

      Immunostaining of NESTIN, a neuroectodermal stem cell marker molecule, and NANOG in undifferentiated and differentiated PGK12.1 ESCs respectively.

      No differentiation defects have been observed in either Dnmt3b KO or Dnmt3a/3b DKO ESCs in our study. Dnmt KO/DKO/TKO ES cell lines have been successfully used as the model of interaction of DNA methylation and H3K27me3 deposition (King et al., 2016).

      Figure 4:

      - Figure 4B: Is there an explanation for seeing similar total cell numbers in Figure 4B, but showing decreased proliferation in Figure 4A?

      Thank you for your insightful comments. The EdU cell proliferation assays labels cells during the S phase of cell cycle, as the 5-ethynyl 2´-deoxyuridine (EdU) is incorporated into newly synthesized DNA. This labeling identifies cells undergoing DNA synthesis, but these cells may not have completed mitosis at the time of detection. As a result, the total cell number may not immediately reflect the decrease in proliferation observed in the treated group. To address this point, we have rewritten the sentences in the revised manuscript (Lines 174-175 in the revised manuscript).

      References

      Alsemarz, A., Lasko, P. and Fagotto, F. J. B. (2018). Limited significance of the in situ proximity ligation assay. bioRxiv, 411355.

      Auclair, G., Guibert, S., Bender, A. and Weber, M. (2014). Ontogeny of CpG island methylation and specificity of DNMT3 methyltransferases during embryonic development in the mouse. Genome Biol. 15, 545.

      Balaton, B. P. and Brown, C. J. (2021). Contribution of genetic and epigenetic changes to escape from X-chromosome inactivation. Epigenetics Chromatin 14, 30.

      Bartke, T., Vermeulen, M., Xhemalce, B., Robson, S. C., Mann, M. and Kouzarides, T. (2010). Nucleosome-interacting proteins regulated by DNA and histone methylation. Cell 143, 470-484.

      Borgel, J., Guibert, S., Li, Y., Chiba, H., Schubeler, D., Sasaki, H., Forne, T. and Weber, M. (2010). Targets and dynamics of promoter DNA methylation during early mouse development. Nat. Genet. 42, 1093-1100.

      Chen, Z., Yin, Q., Inoue, A., Zhang, C. and Zhang, Y. (2019). Allelic H3K27me3 to allelic DNA methylation switch maintains noncanonical imprinting in extraembryonic cells. Sci Adv 5, eaay7246.

      Chiba, H., Hirasawa, R., Kaneda, M., Amakawa, Y., Li, E., Sado, T. and Sasaki, H. (2008). De novo DNA methylation independent establishment of maternal imprint on X chromosome in mouse oocytes. Genesis 46, 768-774.

      Choi, S. H., Heo, K., Byun, H. M., An, W., Lu, W. and Yang, A. S. (2011). Identification of preferential target sites for human DNA methyltransferases. Nucleic Acids Res. 39, 104-118.

      Chow, J. and Heard, E. (2009). X inactivation and the complexities of silencing a sex chromosome. Curr. Opin. Cell Biol. 21, 359-366.

      Dahlet, T., Argueso Lleida, A., Al Adhami, H., Dumas, M., Bender, A., Ngondo, R. P., Tanguy, M., Vallet, J., Auclair, G., Bardet, A. F., et al. (2020). Genome-wide analysis in the mouse embryo reveals the importance of DNA methylation for transcription integrity. Nat Commun 11, 3153.

      Duymich, C. E., Charlet, J., Yang, X. J., Jones, P. A. and Liang, G. N. (2016). DNMT3B isoforms without catalytic activity stimulate gene body methylation as accessory proteins in somatic cells. Nat Commun 7, 11453.

      Fukuda, A., Tomikawa, J., Miura, T., Hata, K., Nakabayashi, K., Eggan, K., Akutsu, H. and Umezawa, A. (2014). The role of maternal-specific H3K9me3 modification in establishing imprinted X-chromosome inactivation and embryogenesis in mice. Nat Commun 5, 5464.

      Galupa, R. and Heard, E. (2015). X-chromosome inactivation: new insights into cis and trans regulation. Curr. Opin. Genet. Dev. 31, 57-66.

      Gontan, C., Mira-Bontenbal, H., Magaraki, A., Dupont, C., Barakat, T. S., Rentmeester, E., Demmers, J. and Gribnau, J. (2018). REX1 is the critical target of RNF12 in imprinted X chromosome inactivation in mice. Nat Commun 9, 4752.

      Hagarman, J. A., Motley, M. P., Kristjansdottir, K. and Soloway, P. D. (2013). Coordinate regulation of DNA methylation and H3K27me3 in mouse embryonic stem cells. PLoS One 8, e53880.

      Heard, E., Chaumeil, J., Masui, O. and Okamoto, I. (2004). Mammalian X-chromosome inactivation: an epigenetics paradigm. Cold Spring Harb. Symp. Quant. Biol. 69, 89-102.

      Huynh, K. D. and Lee, J. T. (2005). X-chromosome inactivation: a hypothesis linking ontogeny and phylogeny. Nat. Rev. Genet. 6, 410-418.

      Inoue, A., Jiang, L., Lu, F. and Zhang, Y. (2017). Genomic imprinting of Xist by maternal H3K27me3. Genes Dev. 31, 1927-1932.

      Inoue, K., Kohda, T., Sugimoto, M., Sado, T., Ogonuki, N., Matoba, S., Shiura, H., Ikeda, R., Mochida, K., Fujii, T., et al. (2010). Impeding Xist expression from the active X chromosome improves mouse somatic cell nuclear transfer. Science 330, 496-499.

      Jermann, P., Hoerner, L., Burger, L. and Schubeler, D. (2014). Short sequences can efficiently recruit histone H3 lysine 27 trimethylation in the absence of enhancer activity and DNA methylation. Proc. Natl. Acad. Sci. U. S. A. 111, E3415-3421.

      King, A. D., Huang, K., Rubbi, L., Liu, S., Wang, C. Y., Wang, Y., Pellegrini, M. and Fan, G. (2016). Reversible Regulation of Promoter and Enhancer Histone Landscape by DNA Methylation in Mouse Embryonic Stem Cells. Cell Rep. 17, 289-302.

      Maslov, A. Y., Lee, M., Gundry, M., Gravina, S., Strogonova, N., Tazearslan, C., Bendebury, A., Suh, Y. and Vijg, J. (2012). 5-aza-2'-deoxycytidine-induced genome rearrangements are mediated by DNMT1. Oncogene 31, 5172-5179.

      Oikawa, M., Inoue, K., Shiura, H., Matoba, S., Kamimura, S., Hirose, M., Mekada, K., Yoshiki, A., Tanaka, S., Abe, K., et al. (2014). Understanding the X chromosome inactivation cycle in mice: a comprehensive view provided by nuclear transfer. Epigenetics-Us 9, 204-211.

      Oka, M., Meacham, A. M., Hamazaki, T., Rodic, N., Chang, L. J. and Terada, N. (2005). De novo DNA methyltransferases Dnmt3a and Dnmt3b primarily mediate the cytotoxic effect of 5-aza-2'-deoxycytidine. Oncogene 24, 3091-3099.

      Pintacuda, G. and Cerase, A. (2015). X Inactivation Lessons from Differentiating Mouse Embryonic Stem Cells. Stem Cell Rev Rep 11, 699-705.

      Schulz, E. G. and Heard, E. (2013). Role and control of X chromosome dosage in mammalian development. Curr. Opin. Genet. Dev. 23, 109-115.

      Smith, Z. D., Chan, M. M., Mikkelsen, T. S., Gu, H. C., Gnirke, A., Regev, A. and Meissner, A. (2012). A unique regulatory phase of DNA methylation in the early mammalian embryo. Nature 484, 339-344.

      Tada, T., Obata, Y., Tada, M., Goto, Y., Nakatsuji, N., Tan, S., Kono, T. and Takagi, N. (2000). Imprint switching for non-random X-chromosome inactivation during mouse oocyte growth. Development 127, 3101-3105.

      Tan, K., An, L., Miao, K., Ren, L., Hou, Z., Tao, L., Zhang, Z., Wang, X., Xia, W., Liu, J., et al. (2016). Impaired imprinted X chromosome inactivation is responsible for the skewed sex ratio following in vitro fertilization. Proc. Natl. Acad. Sci. U. S. A. 113, 3197-3202.

      Vire, E., Brenner, C., Deplus, R., Blanchon, L., Fraga, M., Didelot, C., Morey, L., Van Eynde, A., Bernard, D., Vanderwinden, J. M., et al. (2006). The Polycomb group protein EZH2 directly controls DNA methylation. Nature 439, 871-874.

      Zhao, J., Yao, K., Yu, H., Zhang, L., Xu, Y., Chen, L., Sun, Z., Zhu, Y., Zhang, C., Qian, Y., et al. (2021). Metabolic remodelling during early mouse embryo development. Nat Metab 3, 1372-1384.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      (1) In the "Introduction" section, an important aspect that requires attention pertains to the discussion surrounding the heterodimerization of CXCR4 and CCR5. Notably, the manuscript overlooks a recent study (https://doi.org/10.1038/s41467-023-42082-z) elucidating the mechanism underlying the formation of functional dimers within these G protein-coupled receptors (GPCRs)…The inclusion of this study within the manuscript would significantly enrich the contextual framework of the work, offering readers a comprehensive understanding of the current knowledge surrounding the structural dynamics and functional implications of CXCR4 and CCR5 heterodimerization.

      We thank the reviewer for his/her recommendation to enrich the contextual framework of our study. The Nature Communications paper by Di Marino et al. was published after we sent the first version of our manuscript to eLife, and therefore was not included in the discussion. As the reviewer rightly indicates, this paper elucidates the mechanism underlying the formation of functional dimers within CCR5 and CXCR4. Using metadynamics approaches, the authors emphasize the importance of distinct transmembrane regions for dimerization of the two receptors. In particular, CXCR4 shows two low energy dimer structures and the TMVI-TMVII helices are the preferred interfaces involved in the protomer interactions in both cases. Although the study uses in silico techniques, it also includes the molecular binding mechanism of CCR5 and CXCR4 in the membrane environment, as the authors generate a model in which the receptors are immersed in a 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) phospholipid bilayer with 10% cholesterol. This is an important point in this study, as membrane lipids also interact with membrane proteins, and the lipid composition affects CXCR4 oligomerization (Gardeta S.R. et al. Front. Immunol. 2023). In particular, Di Marino et al. find a cholesterol molecule placed in-between the two CXCR4 protomers where it engages a series of hydrophobic interactions with residues including Leu132, Val214, Leu216 and Phe249. Then, the polar head of cholesterol forms an H-bond with Tyr135 that further stabilizes protomer binding. In our hands, the F249L mutation in CXCR4 reverted the antagonism of AGR1.137, suggesting that the compound binds, among others, this residue. We should, nonetheless, indicate that we analyzed receptor oligomerization and not CXCR4 dimerization, which was the main object of the Di Marino et al. study. It is therefore also plausible that other residues than those described as essential for CXCR4 dimerization might participate in receptor oligomerization. We can speculate that AGR1.137 might affect cholesterol binding to CXCR4 and, therefore, alter dimerization/oligomerization. Additionally, the CXCR4 x-ray structure with PDB code 3ODU (Wu B. et al. Science, 2010) experimentally shows the presence of two fatty acid molecules in contact with both TMV and TMVI. These molecules closely interact with hydrophobic residues in the protein, thereby stabilizing it in a hydrophobic environment. Although more experiments will be needed to clarify the mechanism involved, our results suggest that cholesterol and/or other lipids also play an important role in CXCR4 oligomerization and function, as seen for other GPCRs (Jakubik J. & ElFakahani E.E. Int J Mol Sci. 2021). However, we should also consider that other factors not included in the analysis by Di Marino et al. can also affect CXCR4 oligomerization; for instance, the co-expression of other chemokine receptors and/or other GPCRs that heterodimerize with CXCR4 might affect CXCR4 dynamics at the cell membrane, similar to other membrane proteins such as CD4, which also forms complexes with CXCR4 (Martinez-Muñoz L. et al. Mol. Cell 2018).

      The revised discussion contains references to the study by Di Marino et al. to enrich the contextual framework of our data.

      (2) In "various sections" of the manuscript, there appears to be confusion surrounding the terminology used to refer to antagonists. It is recommended to provide a clearer distinction between allosteric and orthosteric antagonists to enhance reader comprehension. An orthosteric antagonist typically binds to the same site as the endogenous ligand, directly blocking its interaction with the receptor. On the other hand, an allosteric antagonist binds to a site distinct from the orthosteric site, inducing a conformational change in the receptor that inhibits the binding of the endogenous ligand. By explicitly defining the terms "allosteric antagonist" and "orthosteric antagonist" within the manuscript, readers will be better equipped to discern the specific mechanisms discussed in the context of the study.

      The behavior of the compounds described in our manuscript (AGR1.35 and AGR1.137) fits with the definition of allosteric antagonists, as they bind on a site distinct from the orthosteric site, although they only block some ligand-mediated functions and not others. This would mean that they are not formally antagonists and should be not considered as allosteric compounds, as their binding on CXCR4 does not alter CXCL12 binding, although they might affect its affinity. In this sense, our compounds respond much better to the concept of negative allosteric modulators (Gao Z.-G. & Jacobson K.A. Drug Discov. Today Technol. 2013). They act by binding on a site distinct from the orthosteric site and selectively block some downstream signaling pathways but not others induced by the same endogenous agonist.

      To avoid confusion and to clarify the role of the compounds described in this study, we now refer to them as negative allosteric modulators along the manuscript.

      (3) In the Results section, the computational approach employed for "screening small compounds targeting CXCR4, particularly focusing on the inhibition of CXCL12-induced CXCR4 nanoclustering", requires clarification due to several points of incomprehension. The following recommendations aim to address these concerns and enhance the overall clarity of the section:

      (1) Computational Approach and Binding Mode Description: 

      -Explicitly describe the methodology for identifying the pocket/clef area in angstroms (Å) on the CXCR4 protein structure. Include details on how the volume of the cleft enclosed by TMV and TMVI was determined, as this information is not readily apparent in the provided reference (https://doi.org/10.1073/pnas.1601278113).

      The identification of the cleft was based on the observations by Wu et al. (Wu B. et al. Science 2010) who described the presence of bound lipids in the area formed by TMV and VI, and those of Wescott et al. (Wescott M.P. et al. Proc. Natl. Acad. Sci. 2016) on the importance of TMVI in the transmission of conformational changes promoted by CXCL12 on CXCR4 towards the cytoplasmic surface of the receptor to link the binding site with signaling activation. Collectively, these results, and our previous data on the critical role of the N-terminus region of TMVI for CXCR4 oligomerization (Martinez-Muñoz L. et al. Mol. Cell 2018), focused our in silico screening to this region. Once we detected that several compounds bound CXCR4 in this region, the cleavage properties were calculated by subtracting the compound structure. The resulting PDB was analyzed using the PDBsum server (Laskowski R.A. et. al. Protein Sci. 2018). Volume calculations were obtained using the server analyzing surface clefts by SURFNET (Laskowski R. A. J. Mol. Graph. 1995). The theoretical interaction surface between the selected compounds and CXCR4 and the atomic distances between the protein residues and the compounds was calculated using the PISA server (Krissinel E. & Henrick K. J. Mol. Biol. 2007) (Fig. I, only for review purposes). The analysis of the cleft occupied by AGR1.135 showed two independent cavities of 434 Å3 and 1,381 Å3 that were not connected to the orthosteric site. In the case of AGR1.137, the data revealed two distinct clefts of 790 Å3 and 580 Å3 (Fig. I, only for review purposes). These details have been included in the revised manuscript (New Fig. 1A, Supplementary Fig 8A, B).

      (4) Clarify the statement regarding the cleft being "surface exposed for interactions with the plasma membrane," particularly in the context of its embedding within the membrane.

      For GPCRs, transmembrane domains represent binding sites for bioactive lipids that play important functional and physiological roles (Huwiler A. & Zangemeister-Wittke U. Pharmacol. Ther. 2018). The channel between TMV and TMVI connects the orthosteric chemokine binding pocket to the lipid bilayer and is occupied by an oleic acid molecule, according to the CXCR4 structure published in 2010 (Wu B. et al. Science 2010). In addition, the target region contains residues involved in cholesterol (and perhaps other lipids) engagement (Di Marino et al. Nat. Commun. 2023). Taken together, these data support our statement that the cleft supports interactions between CXCR4 molecules and the plasma membrane. 

      Moreover, the data of Di Marino et al. also support that CCR5 and CXCR4 have a symmetric and an asymmetric binding mode. Therefore, either dimeric structure has the possibility to form trimers, tetramers, and even oligomers by using the free binding interface to complex with another protomer. This hypothesis suggests that the interaction of dimers to form oligomers should involve residues distinct from those included in the dimeric conformation.

      The sentence has been modified in the revised manuscript to clarify comprehension.

      (5) Discuss the rationale behind targeting the allosteric binding pocket instead of the orthosteric pocket, outlining potential advantages and disadvantages.

      The advantages and disadvantages of using negative allosteric modulators vs orthosteric antagonists have been now included in the revised discussion. 

      The majority of GPCR-targeted drugs function by binding to the orthosteric site of the receptor, and are agonists, partial agonists, antagonists or inverse agonists. These orthosteric compounds can have off-target effects and poor selectivity due to highly homologous receptor orthosteric sites and to abrogation of spatial and/or temporal endogenous signaling patterns. 

      The alternative is to use allosteric modulators, which can tune the functions associated with the receptors without affecting the orthosteric site. They can be positive, negative or neutral modulators, depending on their effect on the functionality of the receptor (Foster D.J. & Conn P.J. Neuron 2017). For example, the use of a negative allosteric modulator of a chemokine receptor to dampen pathological signaling events, while retaining full signaling for non-pathological activities might limit adverse effects (Kohout T.A.et al. J. Biol. Chem. 2004). In this case, the negative allosteric modulator 873140 blocks CCL3 binding on CCR5 but does not alter CCL5 binding (Watson C. et al. Mol. Pharmacol. 2005). In other cases, allosteric modulators can stabilize a particular receptor conformation and block others. The mechanism of action of the anti-HIV-1, FDAapproved, CCR5 allosteric modulator, maraviroc (Jin J. et al. Sci. Signal. 2018) is attributed to its ability to modulate CCR5 dimer populations and their subsequent subcellular trafficking and localization to the cell membrane (Jin J .et al. Sci. Signal. 2018). Two CCR5 dimeric conformations that are imperative for membrane localization were present in the absence of maraviroc; however, an additional CCR5 dimer conformation was discovered after the addition of maraviroc, and all homodimeric conformations were further stabilized. This finding is consistent with the observation that CCR5 dimers and oligomers inhibit HIV host-cell entry, likely by preventing the HIV-1 co-receptor formation.

      It is well known that GPCRs activate G proteins, but they also recruit additional proteins (e.g., β-arrestins) that induce signaling cascades which, in turn, can direct specific subsets of cellular responses independent of G protein activation (Eichel K. et al. Nature 2018) and are responsible for either therapeutic or adverse effects. Allosteric modulators can thus be used to block these adverse effects without influencing the therapeutic benefits. This was the case in the design of G protein-biased agonists for the kappa opioid receptor, which maintain the desirable antinociceptive and antipruritic effects and eliminate the sedative and dissociative effects in rodent models (Brust T.F. et al. Sci. Signal 2016).

      (6) Provide the PDB ID of the CXCR4 structure used as a template for modeling with SwissModel. Explain the decision to model the structure from the amino acid sequence and suggest an alternative approach, such as utilizing AlphaFold structures and performing classical molecular dynamics with subsequent clustering for the best representative structure.

      The PDB used as a template for modeling CXCR4 was 3ODU. This information was already included in the material and methods section. At the time we performed these analyses, there were several crystallographic structures of CXCR4 in complex with different molecules and peptides deposited at the PDB. None of them included a full construct containing the complete receptor sequence to provide a suitable sample for Xray structure resolution, as the N- and C-terminal ends of CXCR4 are very flexible loops. In addition, the CXCR4 constructs contained T4 lysozyme inserted between helices TMV and TMVI to increase the stability of the protein––a common strategy used to facilitate crystallogenesis of GPCRs (Zou Y. et al. PLoS One 2012). Therefore, we generated a CXCR4 homology model using the SWISS-MODEL server (Waterhouse A. et al. Nucleic Acids Res. 2018). This program reconstructed the loop between TMV and TMVI, a domain particularly important in this study that was not present in any of the crystal structure available in PDB. The model structure was, nonetheless, still incomplete, as it began at P27 and ended at S319 because the terminal ends were not resolved in the crystal structure used as a template. Nevertheless, we considered that these terminal ends were not involved in CXCR4 oligomerization. 

      As Alphafold was not available at the time we initiated this project, we didn’t use it. However, we have now updated our workflow to current methods and predicted the structure of the target using AlphaFold (Jumper J. et al. Nature 2021) and the sequence available under UniProt entry P61073. We prepared the ligands using OpenBabel (O’Boyle N.M. et al., J. Cheminformatics 2011), with a gasteiger charge assignment, and generated 10 conformers for each input ligand using the OpenBabel genetic algorithm. We then prepared the target structure with Openmm, removing all waters and possible heteroatoms, and adding all missing atoms. We next predicted the target binding pockets with fPocket (Le Guilloux V. et al. BMC Bioinformatics 2009), p2rank (Krivak R. & Hoksza, J. Cheminformatics 2018), and AutoDock autosite (Ravindranath P.A. & Sanner M.F. Bioinformatics 2016). We chose only those pockets between TMV and TMVI (see answer to point 3). We merged the results of the three programs into so-called consensus pockets, as two pockets are said to be sufficiently similar if at least 75% of their surfaces are shared (del Hoyo D. et al. J. Chem. Inform. Model. 2023). From the consensus pockets, there was one pocket that was significantly larger than the others and was therefore selected. We then docked the ligand conformers in this pocket using AutoDock GPU (Santos-Martins D. et al. J. Chem. Theory Comput. 2021), LeDock (Liu N & Xu Z., IOP Conf. Ser. Earth Environ. Sci. 2019), and Vina (Eberhardt J. et al. J. Chem. Inf. Model. 2021). The number of dockings varied from 210 to 287 poses. We scored each pose with the Vina score using ODDT (Wójcikowski M. et al. J. Cheminform. 2015). Then, we clustered the different solutions into groups whose maximum RMSD was 1Å. This resulted in 40 clusters, the representative of each cluster was the one with maximum Vina score and confirmed that the selected compounds bound this pocket (Author response image 1). When required, we calculated the binding affinity using Schrodinger’s MM-GBSA procedure (Greenidge P.A. et al. J. Chem. Inf. Model. 2013), in two ways: first, assuming that the ligand and target are fixed; second, with an energy minimization of all the atoms within a distance of 3Å from the ligand. This information has now been included in the revised version of the manuscript.

      Author response image 1.

      AGR1.135 docking in CXCR4 using the updated protocol for ligand docking. Cartoon representation colored in gray with TMV and TMVI shown in blue and pink, respectively. AGR1.135 is shown in stick representation with carbons in yellow, oxygens in red and nitrogens in blue.

      (7) Specify the meaning of "minimal interaction energy" and where (if present) the interaction scores are reported in the text.

      We refer to minimal interaction energy, the best docking score, that is, the best score obtained in our docking studies. These data were not included in the previous manuscript due to space restrictions but are now included in the reviewed manuscript.

      (8) You performed docking studies using GLIDE to identify potential binding sites for the small compounds on the CXCR4 protein. The top-scoring binders were then subjected to further refinement using PELE simulations. However, I realize that a detailed description of the specific binding modes of these compounds was not provided in the text. Please make the description of binding poses more detailed

      Firstly, to assess the reliability of this method, a PELE study was carried out for the control molecule IT1t, which is a small drug-like isothiourea derivative that has been crystallized in complex with CXCR4 (PDB code: 3ODU). IT1t is a CXCR4 antagonist that binds to the CXCL12 binding cavity and inhibits HIV-1 infection (Das D. Antimicrob. Agents Chemother. 2015; Dekkers S. et al. J. Med. Chem. 2023). From the best five trajectories, two of them had clearly better binding energies, and corresponded to almost the same predicted pose of the molecule. Although the predicted binding mode was not exactly the same as the one in the crystal structure, the approximation was very good, giving validation to the approach. Although PELE is a suitable technique to find potential binding sites, the predicted poses must be subsequently refined using docking programs.

      Analyzing the best trajectories for the remaining ligands, at least one of the best-scored poses was always located at the orthosteric binding site of CXCR4. Even though these poses showed good binding energies, they were discarded as the in vitro biological experiments indicated that the compounds were unable to block CXCL12 binding or CXCL12-mediated inhibition of cAMP release or CXCR4 internalization. Collectively, these data indicated that the selected compounds did not behave as orthosteric inhibitors of CXCR4. The CXCL12 binding pocket is the biggest cavity in CXCR4, and so PELE may tend to place the molecules near it. However, all the compounds presented other feasible binding sites with a comparable binding energy.

      AGR1.135 and AGR1.137 showed interesting poses between TMV and TMVI with very good binding energy (-51.4 and -37.2 kcal/mol, respectively). This was precisely the region we had previously selected for the in silico screening, as previously described (see response to point 3).

      AGR1.131 showed two poses with low binding energy that were placed between helices TMI and TMVII (-43.6 kcal/mol) and between helices TMV and TMVI (-39.8 kcal/mol). This compound was unable to affect CXCL12-mediated chemotaxis and was therefore used as an internal negative control as it was selected in the in silico screening with the same criteria as the other compounds but failed to alter any CXCL12-mediated functions. PELE studies nonetheless provided different binding sites for each molecule, which had to be further studied using docking to obtain a more accurate binding mode. In agreement with the previous commentary, we repeated the analysis using AlphaFold and the rest of the procedure described (see our response to point 6) and calculated the binding energies for all the compounds using Schrodinger’s MM-GBSA procedure (Greenidge P.A. et al. J. Chem. Inf. Model. 2013). Calculations were performed in two ways: first, assuming that the ligand and target are fixed; second, with an energy minimization of all the atoms within a distance of 3Å from the ligand. The results using the first method indicated that AGR1.135 and AGR1.137 showed poses between TMV and TMVI with - 56.4 and -62.4 kcal/mol, respectively and AGR1.131 had a pose between TMI and TMVII with -61.6kcal/mol.  In the second method AGR1.135 and AGR1.137 showed poses between TMV and TMVI with -57.9, and -67.6 kcal/mol, respectively, and AGR1.131 of -62.2 kcal/mol between TMI and TMVII.

      This information is now included in the text.

      (9) (2) Experimental Design:-Justify the choice of treating Jurkat cells with a concentration of 50 μM of the selected compound. Consider exploring different concentrations and provide a rationale for the selected dosage. Additionally, clearly identify the type of small compound used in the initial experiment.

      The revised version contains a new panel in Fig. 1B to show a more detailed kinetic analysis with different concentrations (1-100 µM) of the compounds in the Jurkat migration experiments. In all cases, 100 µM nearly completely abrogated cell migration, but in order to reduce the amount of DMSO added to the cells we selected 50 µM for further experiments, as it was the concentration that inhibits 50-75% of ligand-induced cell migration. Regarding the type of small compounds used in the initial experiments, they were compounds included in the library described in reference #24 (Sebastian-Pérez V. et al Med. Biol. Chem. 2017), which contains heterocyclic compounds. We would note that we do not consider AGR1.137 a final compound. We think that there is scope to develop AGR1.137-based second-generation compounds with greater solubility in water, greater specificity or affinity for CXCR4, and to evaluate delivery methods to hopefully increase activity.  

      (10) Avoid reporting details in rounded parentheses within the text; consider relocating such information to the Materials and Methods section or figure captions for improved readability.

      Most of the rounded parentheses within the text have been eliminated in the revised version of the manuscript to improve readability.

      (11) Elaborate on the virtual screening approach using GLIDE software, specifying the targeted site and methodology employed.

      For the virtual screening, we used the Glide module (SP and XP function scoring) included in the Schrödinger software package, utilizing the corresponding 3D target structure and our MBC library (Sebastián-Pérez V et al. J. Chem. Inf. Model. 2017).  The center of the catalytic pocket was selected as the centroid of the grid. In the grid generation, a scaling factor of 1.0 in van der Waals radius scaling and a partial charge cutoff of 0.25 were used. A rescoring of the SP poses of each compound was then performed with the XP scoring function of the Glide. The XP mode in Glide was used in the virtual screening, the ligand sampling was flexible, epik state penalties were added and an energy window of 2.5 kcal/mol was used for ring sampling. In the energy minimization step, the distance-dependent dielectric constant was 4.0 with a maximum number of minimization steps of 100,000. In the clustering, poses were considered as duplicates and discarded if both RMS deviation is less than 0.5 Å and maximum atomic displacement is less than 1.3 Å.

      (12) Provide clarity on the statement that AGR1.131 "theoretically" binds the same motif, explaining the docking procedure used for this determination.

      In the in silico screening, AGR1.131 was one of the 40 selected compounds that showed, according to the PELE analysis (see answer to point 8), a pose with low binding energy (-39.8 kcal/mol) between TMV and TMVI helices, which is the selected area for the screening. It, nonetheless, also showed a best pose placed between helices TM1 and TM7 (-43.7 kcal/mol) using the initial workflow. In conclusion, although AGR1.131 also faced to the TMV-TMVI, the most favorable pose was in the area between TMI and TMVII. In addition, the compound was included in the biological screening, where it did not affect CXCL12-mediated chemotaxis. We thus decided to use it as an internal negative control, as it has a skeleton very similar to AGR1.135 and AGR1.137 and can interact with the TM domains of CXCR4 without promoting biological effects. This statement has been clarified in the revised text.

      (13) Toxicity Testing:

      -Enhance the explanation of the approach to testing the toxicity of the compound in Jurkat cells. Consider incorporating positive controls to strengthen the assessment and clarify the experimental design.

      All the selected compounds in the in silico screening were initially tested for propidium iodide incorporation in treated cells in a toxicity assay, and some of them were discarded for further experiments (e.g., AGR1.103 and VSP3.1).

      Further evaluation of Jurkat cell viability was determined by cell cycle analysis using propidium iodide.  Supplementary Fig. 1B included the percentage of each cell cycle phase, and data indicated no significant differences between the treatments tested. Nevertheless, at the suggestion of the reviewer, and to clarify this issue, positive controls inducing Jurkat cell death (staurosporine and hydrogen peroxide) have also been included in the new Supplementary Fig. 2. The new figure also includes a table showing the percentage of cells in each cell-cycle phase.  

      (14) In the Results section concerning "AGR1.135 and AGR1.137 blocking CXCL12-mediated CXCR4 nanoclustering and dynamics", several points can be improved to enhance clarity and coherence: 1. Specificity of Low Molecular Weight Compounds:  

      -Clearly articulate how AGR1.135 and AGR1.137 specifically target homodimeric CXCR4 and provide an explanation for their lack of impact on heterodimeric CXCR4-CCR5 in that region.

      First of all, we should clarify that when we talk about receptor nanoclustering, oligomers refer to complexes including 3 or more receptors and, therefore, the residues involved in these interactions can differ from those involved in receptor dimerization. Moreover, our FRET experiments did not indicate that the compounds alter receptor dimerization (see new Supplementary Fig. 7). Of note, mutant receptors unable to oligomerize can still form dimers (Martínez-Muñoz L. et al. Mol. Cell 2018; García-Cuesta E.M .et al. Proc. Natl. Acad. Sci. USA 2022). Additionally, we believe that these oligomers can also include other chemokine receptors/proteins expressed at the cell membrane, which we are currently studying using different models and techniques.

      We have results supporting the existence of CCR5/CXCR4 heterodimers (Martínez-Muñoz L et al. Proc. Natl. Acad. Sci. USA 2014), in line with the data published by Di Marino et al. However, in the current study we have not evaluated the impact of the selected compounds on other CXCR4 complexes distinct from CXCR4 oligomers. Our Jurkat cells do not express CCR5 and, therefore, we cannot discuss whether AGR1.137 affects CCR5/CXCR4 heterodimers. The chemokine field is very complex and most receptors can form dimers (homo- and heterodimers) as well as oligomers (Martinez-Muñoz L., et al Pharmacol & Therap. 2011) when co-expressed. To evaluate different receptor combinations in the same experiment is a complex task, as the number of potential combinations between distinct expressed receptors makes the analysis very difficult. We started with CXCR4 as a model, to continue later with other possible CXCR4 complexes. In addition, for the analysis of CCR5/CXCR4 dynamics, it is much better to use dual-TIRF techniques, which allow the simultaneous detection of two distinct molecules coupled to different fluorochromes.

      Regarding the data of Di Marino et al., it is possible that the compounds might also affect heterodimeric conformations of CXCR4. This aspect has also been broached in the revised discussion. We would again note that we evaluated CXCR4 oligomers and not monomers or dimers; this is especially relevant when we compare the residues involved in these processes as they might differ depending on the receptor conformation considered. This issue was also hypothesized by Di Marino et al. (see our response to point 4).

      (15) When referring to "unstimulated" cells, provide a more detailed explanation to elucidate the experimental conditions and cellular state under consideration.

      Unstimulated cells refer to the cells in basal conditions, that is, cells in the absence of CXCL12. For TIRF-M experiments, transiently-transfected Jurkat cells were plated on glass-bottomed microwell dishes coated with fibronectin; these are the unstimulated cells. To observe the effect of the ligand, dishes were coated as above plus CXCL12 (stimulated cells). We have clarified this point in the material and methods section of the revised version.

      (16) 2. Paragraph Organization

      -Reorganize the second paragraph to eliminate redundancy and improve overall flow. A more concise and fluid presentation will facilitate reader comprehension and engagement.

      The second paragraph has been reorganized to improve overall flow.

      (17) Ensure that each paragraph contributes distinct information, avoiding repetition and redundancy.

      We have carefully revised each paragraph of the manuscript to avoid redundancy.

      (18) 3. Claim of Allosteric Antagonism:

      -Exercise caution when asserting that "AGR1.135 and AGR1.137 behave as allosteric antagonists of CXCR4" based on the presented results. Consider rephrasing to reflect that the observed effects suggest the potential allosteric nature of these compounds, acknowledging the need for further investigations and evidence.

      To avoid misinterpretations on the effect of the compounds on CXCR4, as we have commented in our response to point 2, we have substituted the term allosteric inhibitors with negative allosteric modulators, which refer to molecules that act by binding a site distinct from the orthosteric site, and selectively block some downstream signaling pathways, whereas others induced by the same endogenous or orthosteric agonist are unaffected (Gao Z.-G. & Jacobson K.A. Drug Discov. Today Technol. 2013). Our data indicate that the selected small compounds do not block ligand binding or G protein activation or receptor internalization, but inhibit receptor oligomerization and ligand-mediated directed cell migration.

      (19) In the Results section discussing the "incomplete abolition of CXCR4-mediated responses in Jurkat cells by AGR1.135 and AGR1.137", several points can be refined for better clarity and completeness:  1. Inclusion of Positive Controls: 

      -Consider incorporating positive controls in relevant experiments to provide a comparative benchmark for assessing the impact of AGR1.135 and AGR1.137. This addition will strengthen the interpretation of results and enhance the experimental rigor. 

      The in vivo experiments (Fig. 7E,F) used AMD3100, an orthosteric antagonist of CXCR4, as a positive control. We also included AMD3100, as a positive control of inhibition when evaluating the effect of the compounds on CXCL12 binding (Fig. 3, new Supplementary Fig. 3). The revised version of the manuscript also includes the effect of this inhibitor on other relevant CXCL12-mediated responses such as cell migration (Fig. 1B), receptor internalization (Fig. 3A), cAMP production (Fig. 3C), ERK1/2 and AKT phosphorylation (Supplementary Fig. 4), actin polymerization (Fig. 4A), cell polarization (Fig. 4B, C) and cell adhesion (Fig. 4D), to facilitate the interpretation of the results and improve the experimental rigor.

      (20) 2. Clarification of Terminology: 

      -Clarify the term "CXCR4 internalizes" by providing context, perhaps explaining the process of receptor internalization and its relevance to the study.

      We refer to CXCR4 internalization as a CXCL12-mediated endocytosis process that results in reduction of CXCR4 levels on the cell surface. We use CXCR4 internalization in this study with two purposes: First, for CXCR4 and other chemokine receptors, internalization processes are mediated by ligand-induced clathrin vesicles (Venkatesan et al 2003) a process that triggers CXCR4 aggregation in these vesicles. We have previously determined that the oligomers of receptors detected by TIRF-M remain unaltered in cells treated with inhibitors of clathrin vesicle formation and of internalization processes (Martinez-Muñoz L. et al. Mol. Cell 2018). Moreover, we have described a mutant CXCR4 that cannot form oligomers but internalizes normally in response to CXCL12 (Martinez-Muñoz L. et al. Mol. Cell 2018). The observation in this manuscript of normal CXCL12-mediated endocytosis in the presence of the negative allosteric inhibitors of CXCR4 that abrogate receptor oligomerization reinforces the idea that the oligomers detected by TIRF are not related to receptor aggregates involved in endocytosis; Second, receptor internalization is not affected by the allosteric compounds, indicating that they downregulate some CXCL12-mediated signaling events but not others (new Fig. 3).

      All these data have been included in the revised discussion of the manuscript.

      (21) Elaborate on the meaning of "CXCL12 triggers normal CXCR4mut internalization" to enhance reader understanding.

      We have previously described a triple-mutant CXCR4 (K239L/V242A/L246A; CXCR4mut). The mutant residues are located in the N-terminal region of TMVI, close to the cytoplasmic region, thus limiting the CXCR4 pocket described in this study (see our response to point 3). This mutant receptor dimerizes but neither oligomerizes in response to CXCL12 nor supports CXCL12-induced directed cell migration, although it can still trigger some Ca2+ flux and is internalized after ligand activation (Martinez-Muñoz L. et al. Mol. Cell 2018).  We use the behavior of this mutant (CXCR4mut) to show that the CXCR4 oligomers and the complexes involved in internalization processes are not the same and to explain why we evaluated CXCR4 endocytosis in the presence of the negative allosteric modulators.

      As we indicated in a previous answer to the reviewer, these issues have been re-elaborated in the revised version.

      (22) 3. Discrepancy in CXCL12 Concentration:

      -Address the apparent discrepancy between the text stating, "...were stimulated with CXCL12 (50 nM, 37{degree sign}C)," and the figure caption (Fig. 3A) reporting a concentration of 12.5 nM. Rectify this inconsistency and provide an accurate and clear explanation.

      We apologize for this error, which is now corrected in the revised manuscript. With the exception of the cell migration assays in Transwells, where the optimal concentration was established at 12.5 nM, in the remaining experiments the optimal concentration of CXCL12 employed was 50 nM. These concentrations were optimized in previous works of our laboratory using the same type of experiment. We should also remark that in the experiments using lipid bilayers or TIRF-M experiments, CXCL12 is used to coat the plates and therefore it is difficult to determine the real concentration of the ligand that is retained in the surface of the plates after the washing steps performed prior to adding the cells. In addition, we use 100 nM CXCL12 to create the gradient in the chambers used to perform the directed-cell migration experiments.

      (23) 4. Speculation on CXCL12 Binding:

      -Refrain from making speculative statements, such as "These data suggest that none of the antagonists alters CXCL12 binding to CXCR4," unless there is concrete evidence presented up to that point. Clearly outline the results that support this conclusion.

      Figure 3B and Supplementary Figure 3 show CXCL12-ATTO700 binding by flow cytometry in cells pretreated with the negative allosteric modulators. We have also included AMD3100, the orthosteric antagonist, as a control for inhibition. While these experiments showed no major effect of the compounds on CXCL12 binding, we cannot discard small changes in the affinity of the interaction between CXCL12 and CXCR4. In consequence we have re-written these statements.

      (24) 5. Corroboration of Data:

      -Specify where the corroborating data from immunostaining and confocal analysis are reported, ensuring readers can access the relevant information to support the conclusions drawn in this section.

      In agreement with the suggestion of the reviewer, the revised manuscript includes data from immunostaining and confocal analysis to complement Fig. 4B (new Fig. 4C). The revised version also includes some representative videos for the TIRF experiments showed in Figure 2 to clarify readability.

      (25) In the Results section concerning "AGR1.135 and AGR1.137 antagonists and their direct binding to CXCR4", several aspects need clarification and refinement for a more comprehensive and understandable presentation: 1. Workflow Clarification:

      -Clearly articulate the workflow used for assessing the binding of AGR1.135 and AGR1.137 to CXCR4. Address the apparent contradiction between the inability to detect a direct interaction and the utilization of Glide for docking in the TMV-TMVI cleft.

      To address the direct interaction of the compounds with CXCR4, we intentionally avoided the modification of the small compounds with different labels, which could affect their properties. We therefore attempted a fluorescence a spectroscopy strategy to formally prove the ability of the small compounds to bind CXCR4, but this failed because the AGR1.135 is yellow in color, which interfered with the determinations. We also tried a FRET strategy (see new Supplementary Fig. 7) and detected a significant increase in FRET efficiency of CXCR4 homodimers when AGR1.135 was evaluated, but again the yellow color interfered with FRET determinations. Moreover, AGR1.137 did not modify FRET efficiency of CXCR4 dimers. Therefore, we were unable to detect the interaction of the compounds with CXCR4.

      We elected to develop an indirect strategy; in silico, we evaluated the binding-site using docking and molecular dynamics to predict the most promising CXCR4 binding residues involved in the interaction with the selected compounds. Next, we generated point mutant receptors of the predicted residues and re-evaluated the behavior of the allosteric antagonists in a CXCL12-induced cell migration experiment. Obviously, we first discarded those CXCR4 mutants that were not expressed on the cell membrane as well as those that were not functional when activated with CXCL12. Using this strategy, we eliminated the interference due to the physical properties of the compounds and demonstrated that if the antagonism of a compound is reversed in a particular CXCR4 mutant it is because the mutated residue participates or interferes with the interaction between CXCR4 and the compound, thus assuming (albeit indirectly) that the compound binds CXCR4. 

      To select the specific mutations included in the analysis, our strategy was to generate point mutations in residues present in the TMV-TMVI pocket of CXCR4 that were not directly proposed as critical residues involved in chemokine engagement, signal initiation, signal propagation, or G protein-binding, based on the extensive mutational study published by Wescott MP et. al. (Wescott M.P. et. al. Proc. Natl. Acad. Sci. U S A. 2016).

      (26) Provide a cohesive explanation of the transition from docking evaluation to MD analysis, ensuring a transparent representation of the methodology.

      Based on the aim of this work, the workflow shown in Author response image 2, was proposed to predict the binding mode of the selected molecules. Firstly, a CXCR4 model was generated to reconstruct some unresolved parts of the protein structure; then a binding site search using PELE software was performed to identify the most promising binding sites; subsequently, docking studies were performed to refine the binding mode of the molecules; and finally, molecular dynamics simulations were run to determine the most stable poses and predict the residues that we should mutate to test that the compounds interact with CXCR4. 

      Author response image 2.

      Workflow followed to determine the binding mode of the  studied compounds.

      (27) 2. Choice of Software and Techniques:

      -Justify the use of "AMBER14" and the PELE approach, considering  their potential obsolescence.

      These experiments were performed five years ago when the project was initiated. As the reviewer indicates, AMBER14 and PELE approaches might perhaps be considered obsolescent. Thus, we have predicted the structure of the target using AlphaFold (Jumper J. et al, Nature 2021) and the sequence available under UniProt entry P61073. The complete analysis performed (see our response to point 4) confirmed that the compounds bound the selected pocket, as we had originally determined using PELE. These new analyses have been incorporated into the revised manuscript.

      (28)-Discuss the role of the membrane in the receptor-ligand interac7on. Elaborate on how the lipidic double layer may influence the binding of small compounds to GPCRs embedded in the membrane.

      Biological membranes are vital components of living organisms, providing a diffusion barrier that separates cells from the extracellular environment, and compartmentalizing specialized organelles within the cell. In order to maintain the diffusion barrier and to keep it electrochemically sealed, a close interaction of membrane proteins with the lipid bilayer is necessary. It is well known that this is important, as many membrane proteins undergo conformational changes that affect their transmembrane regions and that may regulate their activity, as seen with GPCRs (Daemen F.J. & Bonting S.L., Biophys. Struct. Mech. 1977; Gether U. et al. EMBO J. 1997). The lateral and rotational mobility of membrane lipids supports the sealing function while allowing for the structural rearrangement of membrane proteins, as they can adhere to the surface of integral membrane proteins and flexibly adjust to a changing microenvironment. In the case of the first atomistic structure of CXCR4 (Wu B. et al. Science 2010), it was indicated that for dimers, monomers interact only at the extracellular side of helices V and VI, leaving at least a 4-Å gap between the intracellular regions, which is presumably filled by lipids. In particular, they indicated that the channel between TMV and TMVI that connects the orthosteric chemokine binding pocket to the lipid bilayer is occupied by an oleic acid molecule. Recently, Di Marino et al., analyzing the dimeric structure of CXCR4, found a cholesterol molecule placed in between the two protomers, where it engages a series of hydrophobic interactions with residues located in the area between TMI and TMVI (Leu132, Val214, Leu216, Leu246, and Phe249). The polar head of cholesterol forms an H-bond with Tyr135 that further stabilizes its binding mode. This finding confirms that cholesterol might play an important role in mediating and stabilizing receptor dimerization, as seen in other GPCRs (Pluhackova, K., et al. PLoS Comput. Biol. 2016). In addition, we have previously observed that, independently of the structural changes on CXCR4 triggered by lipids, the local lipid environment also regulates CXCR4 organization, dynamics and function at the cell membrane and modulates chemokine-triggered directed cell migration. Prolonged treatment of T cells with bacterial sphingomyelinase promoted the complete and sustained breakdown of sphingomyelins and the accumulation of the corresponding ceramides, which altered both membrane fluidity and CXCR4 nanoclustering and dynamics. Under these conditions, CXCR4 retained some CXCL12-mediated signaling activity but failed to promote efficient directed cell migration (Gardeta S.R. et al. Front. Immunol. 2022). Collectively, these data demonstrate the key role that lipids play in the stabilization of CXCR4 conformations and in regulating its lateral mobility, influencing their associated functions. These considerations have been included in the revised version of the manuscript. 

      (29) 3. Stable Trajectories and Binding Mode Superimposi7on -Specify the criteria for defining "stable trajectories" to enhance reader understanding

      There could be several ways to describe the stability of a MD simulation, based on the convergence of energies, distances or ligand-target interactions, among others. In this work, we use the expression “stable trajectories” to refer to simulations in which the ligand trajectory converges and the ligand RMSD does not fluctuate more than 0.25Å. This definition is now included in the revised text.

      (30)  Clarify the meaning behind superimposing the two small compounds and ensure that the statement in the figure caption aligns with the information presented in the main text.

      We apologize for the error in the previous Fig. 5A and in its legend. The figure was created by superimposing the protein component of the poses for the two compounds, AGR1.135 and AGR1.137, rather than the compounds themselves. As panel 5A was confusing, we have modified all Fig. 5 in the revised manuscript to improve clarity.

      (31) 4. Volume Analysis and Distances:

      -Provide details on how the volume analysis was computed and how distances were accounted for. Consider adding a figure to illustrate these analyses, aiding reader comprehension.

      The cleft search and analysis were performed using the default settings of SURFNET (Laskowski R.A. J. Mol. Graph. 1995) included in the PDBsum server (Laskowski R.A. et. al. Trends Biochem. Sci. 1997). The first run of the input model for CXCR4 3ODU identified a promising cleft of 870 Å3 in the lower half of the region flanked by TMV and TMVI, highlighting this area as a possible small molecule binding site (Fig. I, only for review purposes). Analysis of the cleft occupied by AGR1.135 showed two independent cavities of 434 Å3 and 1381 Å3 that were not connected to the orthosteric site. The same procedure for AGR1.137 revealed two distinct clefts of 790 Å3 and 580 Å3, respectively (Fig. I, only for review purposes). Analysis of the atomic distances between the protein residues and the compounds was performed using the PISA server. Krissinel E. & Henrick K. J. Mol. Biol. 2007). (Please see our response to point 3 and the corresponding figure).

      (32) 5. Mutant Selection and Relevance:

      -Clarify the rationale behind selecting the CXCR4 mutants used in the study. Consider justifying the choice and exploring the possibility of performing an alanine (ALA) scan for a more comprehensive mutational analysis.  

      The selection of the residues to be mutated along the cleft was first based on their presence in the proposed cleft and the direct interaction of the compounds with them, either by hydrogen bonding or by hydrophobic interactions. Secondly, all mutated residues did not belong to any of the critical residues involved in transmitting the signal generated by the interaction of CXCL12 with the receptor. In any case, mutants producing a non-functional CXCR4 at the cell membrane were discarded after FACS analysis and chemotaxis experiments. Finally, the length and nature of the resulting mutations were designed mainly to occlude the cleft in case of the introduction of long residues such as lysines (I204K, L208K) or to alter hydrophobic interactions by changing the carbon side chain composition of the residues in the cleft. Indeed, we agree that the alanine scan mutation analysis would have been an alternative strategy to evaluate the residues involved in the interactions of the compounds. 

      (33) Reevaluate the statement regarding the relevance of the Y256F muta7on for the binding of AGR1.137. If there is a significant impact on migra7on in the mutant (Fig. 6B), elaborate on the significance in the context of AGR1.137 binding.

      In the revised discussion we provide more detail on the relevance of Y256F mutation for the binding of AGR1.137 as well as for the partial effect of G207I and R235L mutations. The predicted interactions for each compound are depicted in new Fig. 6 C, D after LigPlot+ analysis (Laskowski R.A. & Swindells M.B. J. Chem. Inf. Model. 2011), showing that AGR1.135 interacted directly with the receptor through a hydrogen bond with Y256. When this residue was mutated to F, one of the anchor points for the compound was lost, weakening the potential interaction in the region of the upper anchor point.

      It is not clear how the Y256F mutation will affect the binding of AGR1.137, but other potential contacts cannot be ruled out since that portion of the compound is identical in both AGR1.135 and AGR1.137. This is especially true for its neighboring residues in the alpha helix, F249, L208, as shown in 3ODU structure (Fig. 6D), which are shown to be directly implicated in the interaction of both compounds. Alternatively, we cannot discard that Y256 interacts with other TMs or lipids stabilizing the overall structure, which could reverse the effect of the mutant at a later stage (Author response image 3).

      Author response image 3.

      Cartoon representation of Y256 and its intramolecular interactions in the CXCR4 Xray solved structure 3ODU. TMV helix is colored in blue and TMVI in pink.

      (34) Address the apparent discrepancy in residue involvement between AGR1.135 and AGR1.137, particularly if they share the same binding mode in the same clef.

      AGR1.135 and AGR1.137 exhibit comparable yet distinct binding modes, engaging with CXCR4 within a molecular cavity formed by TMV and TMVI. AGR1.135 binds to CXCR4 through three hydrogen bonds, two on the apical side of the compound that interact with residues TMV-G207 and TMVI-Y256 and one on the basal side that interacts with TMVI-R235 (Fig. 5A). This results in a more extended and rigid conformation when sharing hydrogen bonds, with both TMs occupying a surface area of 400 Å2 and a length of 20 Å in the cleft between TMV and TMVI (Supplementary Fig. 8A). AGR1.137 exhibits a distinct binding profile, interacting with a more internal region of the receptor. This interaction involves the formation of a hydrogen bond with TMIIIV124, which induces a conformational shift in the TMVI helix towards an active conformation (Fig. 5B; Supplementary Fig. 13). Moreover, AGR1.137 may utilize the carboxyl group of V124 in TMIII and overlap with AGR1.135 binding in the cavity, interacting with the other 19 residues dispersed between TMV and VI to create an interaction surface of 370 Å2 along 20 Å (Supplementary Fig. 8B). This is illustrated in the new Fig. 5B. AGR1.137 lacks the phenyl ring present in AGR1.135, resulting in a shorter compound with greater difficulty in reaching the lower part of TMVI where R235 sits. 

      Author response image 4.

      AGR1.135 and AGR1.137 interaction with TMV and TMVI.  The model shows the location of the compounds within the TMV-VI cleft, illustrated by a ribbon and stick representation. The CXCR4 segments of TMV and TMVI are represented in blue and pink ribbons respectively, and side chains for some of the residues defining the cavity are shown in sticks. AGR1.135 and AGR1.137 are shown in stick representation with carbon in yellow, nitrogen in blue, oxygen in red, and fluorine in green. Hydrogen bonds are indicated by dashed black lines, while hydrophobic interactions are shown in green. The figure reproduces the panels A, B of Fig. 5 in the revised manuscript.

      (35) In the Results sec7on regarding "AGR1.137 treatment in a zebrafish xenograf model", the following points can be refined for clarity and completeness: 1. Cell Line Choice for Zebrafish Xenograft Model:

      -Explain the rationale behind the choice of HeLa cells for the zebrafish xenograft model when the previous experiments primarily focused on Jurkat cells. Address any specific biological or experimental considerations that influenced this decision.

      As far as we know, there are no available models of tumors in zebrafish using Jurkat cells. We looked for a tumoral cell system that expresses CXCR4 and could be transplanted into zebrafish. HeLa cells are derived from a human cervical tumor, express a functional CXCR4, and have been previously used for tumorigenesis analyses in zebrafish (Brown H.K. et al. Expert Opin. Drug Discover. 2017; You Y. et al Front. Pharmacol. 2020). These cells grow in the fish and disseminate through the ventral area and can be used to determine primary tumor growth and metastasis. Nonetheless, we first analyzed in vitro the expression of a functional CXCR4 in these cells (Supplementary Fig. 10A), whether AGR1.137 treatment specifically abrogated CXCL12-mediated direct cell migration (Fig. 7A, B), as whether it affected cell proliferation (Supplementary Fig. 10B). As HeLa cells reproduce the in vitro effects detected for the compounds in Jurkat cells, we used this model in zebrafish. These issues were already discussed in the first version of our manuscript. 

      (36) 2. Toxicity Assessment in Zebrafish Embryos: 

      -Clarify the basis for stating that AGR1.137 is not toxic to zebrafish embryos. Consider referencing the Zebrafish Embryo Acute Toxicity Test (ZFET) and provide relevant data on lethal concentration (LC50) and non-lethal toxic phenotypes such as pericardial edema, head and tail necrosis, malformation, brain hemorrhage, or yolk sac edema.

      Tumor growth and metastasis kinetics within the zebrafish model have been extensively evaluated in many publications (White R. et al. Nat. Rev. Cancer. 2013; Astell K.R. and Sieger D. Cold Spring Harb. Perspect. Med. 2020; Chen X. et al. Front. Cell Dev. Biol. 2021; Weiss JM. Et al. eLife 2022; Lindhal G. et al NPJ Precis. Oncol. 2024). Our previous experience using this model shows that tumors start having a more pronounced proliferation and lower degree of apoptosis from day 4 onwards, but we cannot keep the tumor-baring larvae for that long due to ethical reasons and also because we don’t see much scientific benefit of unnecessarily extending the experiments. Anti-proliferative or pro-apoptotic effects of drugs can still be observed within the three days, even if this is then commonly seen as larger reduction (instead of a smaller growth as it is commonly seen in for example mouse tumor models) compared to controls. Initially we characterized the evolution of implanted tumors in our system and how much they metastasize over time in the absence of treatment before to test the compounds (Author response image 5).

      The in vivo experiments were planned to validate efficacious concentrations of the investigated drugs rather than to derive in vivo IC50 or other values, which require testing of multiple doses. We have, however, included an additional concentration to show concentration-dependence and therefore on-target specificity of the drugs in the revised version of the manuscript (data also being elaborated in ongoing experiments). At this stage, we believe that adding the LC50 does not provide interesting new knowledge, and it is standard to only show results from the experimental endpoint (in our case 3 days post implantation). We agree that showing these new data points strengthens the manuscript and facilitates independent evaluation and conclusions to be drawn from the presented data. We have created new graphs where datapoints for each compound dose are shown.  

      Author response image 5.

      Evolution of the tumors and metastasis along the time in the absence of any treatment. HeLa cells were labeled with 8 µg/mL Fast-DiI™ oil and then implanted in the dorsal perivitelline space of 2-days old zebrafish embryos. Tumors were imaged within 2 hours of implantation and re-imaged each 24 h for three days. Changes in tumor size was evaluated as tumor area at day 1, 2 and 3 divided by tumor area at day 0, and metastasis was evaluated as the number of cells disseminated to the caudal hematopoietic plexus at day 1, 2 and 3 divided by the number of cells at day  3.

      Regarding the statement that AGR1.137 was not toxic, this was based on visual inspection of the zebrafish larvae at the end of the experiment, which also revealed a lack of drug-related mortality in these experiments. There are a number of differences in how our experiment was run compared with the standardized ZFET. ZFET evaluates toxicity from 0 hours post-fertilization to 1 or 2 days post-fertilization, whereas here we exposed zebrafish from 2 days post-fertilization to 5 days post-fertilization. The ZFET furthermore requires that the embryos are raised at 26ºC whereas kept the temperature as close as possible to a physiologically relevant temperature for the tumor cells (36ºC). In the ZFET, embryos are incubated in 96-well plates whereas for our studies we required larger wells to be able to manipulate the larvae and avoid well edge-related imaging artefacts, and we therefore used 24-well plates. As such, the ZFET was for various reasons not applicable to our experimental settings. As we were not interested in rigorously determining the LD50 or other toxicity-related measurements, as our focus was instead on efficacy and we found that the targeted dose was tolerated, we did not evaluate multiple doses, including lethal doses of the drug, and are therefore not able to determine an LD50/LC50. We also did not find drug-induced non-lethal toxic phenotypes in this study, and so we cannot elaborate further on such phenotypes other than to simply state that the drug is well tolerated at the given doses. Therefore, the reference to ZFET in the manuscript was eliminated.

      (37) If supplementary information is available, consider providing it for a comprehensive understanding of toxicity assessments. 

      The effective concentration used in the zebrafish study was derived from the in vitro experiments. That being said, and as elaborated in our response to comment 36, we have added data for one additional dose to show the dose-dependent regulation of tumor growth and metastasis. 

      (38) 3. Optimization and Development of AGR1.137: 

      -Justify the need for further optimization and development of AGR1.137 if it has a comparable effect to AMD3100. Explain the specific advantages or improvements that AGR1.137 may offer over AMD3100. 

      AGR1.137 is highly hydrophobic and is very difficult to handle, particularly in in vivo assays; thus, for the negative allosteric modulators to be used clinically, it would be very important to increase their solubility in water. Contrastingly, AMD3100 is a water-soluble compound. Before using the zebrafish model, we performed several experiments in mice using AGR1.137, but the inhibitory results were highly variable, probably due to its hydrophobicity. We also believe that it would be important to increase the affinity of AGR1.137 for CXCR4, as the use of lower concentrations of the negative allosteric modulator would limit potential in vivo side effects of the drug. On the other hand, we are also evaluating distinct administration alternatives, including encapsulation of the compounds in different vehicles. These alternatives may also require modifications of the compounds. 

      AMD3100 is an orthosteric inhibitor and therefore blocks all the signaling cascades triggered by CXCL12. For instance, we observed that AMD3100 treatment blocked CXCL12 binding, cAMP inhibition, calcium flux, cell adhesion and cell migration (Fig. 3, Fig. 4), whereas the effects of AGR1.137 were restricted to CXCL12-mediated directed cell migration. Although AMD3100 was well tolerated by healthy volunteers in a singledose study, it also promoted some mild and reversible events, including white blood cells count elevations and variations of urine calcium just beyond the reported normal range (Hendrix C.W. et al. Antimicrob. Agents Chemother. 2000). To treat viral infections, continuous daily dosing requirements of AMD3100 were impractical due to severe side effects including cardiac arrhythmias (De Clercq E. Front Immunol. 2015). For AMD3100 to be used clinically, it would be critical to control the timing of administration. In addition, side effects after long-term administration have potential problems. Shorter-term usage and lower doses would be fundamental keys to its success in clinical use (Liu T.Y. et al. Exp. Hematol. Oncol. 2016). The use of a negative allosteric modulator that block cell migration but do not affect other signaling pathways triggered by CXCL12 would be, at least in theory, more specific and produce less side effects. These ideas have been incorporated into the revised discussion to reflect potential advantages or improvements that AGR1.137 may offer over AMD3100.

      (39) 4. Discrepancy in AGR1.137 and AMD3100 Effects:

      -Discuss the observed discrepancy where AGR1.137 exhibits similar effects to AMD3100 but only after 48 hours. Provide insights into the temporal dynamics of their actions and potential implications for the experimental design.

      Images and data shown in Fig. 7E, F correspond to days 0 and 3 after HeLa cell implantation (tumorigenesis) and only to day 3 in the case of metastasis data. The revised version contains the effect of two distinct doses of the compounds (10 and 50 µM, for AGR1.135 and AGR1.137 and 1 and 10 µM for AMD3100). 

      (40) In the "Discussion" section, there are several points that require clarifica7on and refinement to enhance the overall coherence and depth of the analysis:  1. Reduction of Side-Effects: 

      -Provide a more detailed explanation of how the identified compounds, specifically AGR1.135 and AGR1.137, contribute to the reduction of side effects. Consider discussing specific mechanisms or characteristics that differentiate these compounds from existing antagonists.

      The sentence indicating that AGR1.135 and AGR1.137 contribute to reduce side effects is entirely speculative, as we have no experimental evidence to support it. We have therefore corrected this in the revised version. The origin of the sentence was that orthosteric antagonists typically bind to the same site as the endogenous ligand, thus blocking its interaction with the receptor. Therefore, orthosteric inhibitors (i.e. AMD3100) block all signaling cascades triggered by the ligand and therefore their functional consequences. However, the compounds described in this project are essentially negative allosteric modulators, that is, they bind to a site distinct from the orthosteric site, inducing a conformational change in the receptor that does not alter the binding of the endogenous ligand, and therefore block some specific receptor-associated functions without altering others. We observed that AGR1.137 blocked receptor oligomerization and directed cell migration whereas CXCL12 still bound CXCR4, triggered calcium mobilization, did not inhibit cAMP release or promoted receptor internalization. This is why we speculated on the limitation of side effects. The statements have been nonetheless revised in the new version of the manuscript.

      (41) 2. Binding Site Clarification:

      -Address the apparent discrepancy between docking the small compounds in a narrow cleft formed by TMV and TMVI helices and the statement that AGR1.131 binds elsewhere. Clarify the rationale behind this assertion

      After the in silico screening, a total of 40 compounds were selected.  These compounds showed distinct degrees of interaction with the cleft formed by TMV and TMVI and even with other potential interaction sites on CXCR4, with the exception of the ligand binding site according to the data described by Wescott et al. (PNAS 2016 113:9928-9933), as this possibility was discarded in the initial approach of the in silico screening. According to PELE analysis, AGR1.131 was one of the 40 selected compounds that showed a pose with low binding energy, -39.8 kcal/mol, between TMV and TMVI helices, that is, it might interact with CXCR4 through the selected area for the screening. It nonetheless also showed a best pose placed between helices TMI and TMVII, -43.7 kcal/mol. In any case, the compound was included in the biological screening, where it was unable to impact CXCL12-mediated chemotaxis (Fig. 1B). We then focused on AGR1.135 and AGR1.137, as showed a higher inhibitory effect on CXCL12-mediated migration, and on AGR1.131 as an internal negative control. AGR1.131 has a skeleton very similar to the other compounds (Fig. 1C) and can interact with the TM domains of CXCR4 without promoting effects. None of the three compounds affected CXCL12 binding, or CXCL12mediated inhibition of cAMP release, or receptor internalization. However, whereas AGR1.135 and AGR1.137, blocked CXCL12-mediated CXCR4 oligomerization and directed cell migration towards CXCL12 gradients, AGR1.131 had no effect in these experiments (Fig. 3, Fig.  4). 

      Next, we performed additional theoretical calculations (PELE, docking, MD) to inspect in detail the potential binding modes of active and inactive molecules. Based on these additional calculations, we identified that whereas AGR1.135 and AGR1.137 showed preferent binding on the molecular pocket between TMV and TMVI, the best pose for AGR1.131 was located between TMI and TMVII, as the initial experiments indicated.  These observations and data have been clarified in the revised discussion. 

      (42) 3. Impact of Chemical Modifications:

      -Discuss the consequences of the distinct chemical groups in AGR1.135, AGR1.137, and AGR1.131, specifically addressing how variations in amine length and chemical nature may influence binding affinity and biological activity. Provide insights into the potential effects of these modifications on cellular responses and the observed outcomes in zebrafish. 

      The main difference between AGR1.131 and the other two compounds is the higher flexibility of AGR1.131 due to the additional CH2 linker, together with the lack of a piperazine ring. The additional CH2 linking the phenyl ring increases the flexibility of AGR1.131 when compared with AGR1.135 and AGR1.137, and the absence of the piperazine ring might be responsible for its lack of activity, as it makes this compound able to bind to CXCR4 (Fig. 1C).

      AGR1.137 was chosen in a second round. The additional presence of the tertiary amine (in the piperazine ring) allows the formation of quaternary ammonium salts in the aqueous medium and its substituents to increase its solubility (Fig 1C). This characteristic might be related to the absence of toxic effects of the compound in the zebrafish model.

      (43) 4. Existence of Distinct CXCR4 Conformational States: 

      -Provide more detailed support for the statement suggesting the "existence of distinct CXCR4 conformational states" responsible for activating different signaling pathways. Consider referencing relevant studies or experiments that support this claim.

      Classical models of GPCR allostery and activation, which describe an equilibrium between a single inactive and a single signaling-competent active conformation, cannot account for the complex pharmacology of these receptors. The emerging view is that GPCRs are highly dynamic proteins, and ligands with varying pharmacological properties differentially modulate the balance between multiple conformations.

      Just as a single photograph from one angle cannot capture all aspects of an object in movement, no one biophysical method can visualize all aspects of GPCR activation. In general, there is a tradeoff between high-resolution information on the entire protein versus dynamic information on limited regions. In the former category, crystal and cryo-electron microscopy (cryoEM) structures have provided comprehensive, atomic-resolution snapshots of scores of GPCRs both in inactive and active conformations, revealing conserved conformational changes associated with activation. However, different GPCRs vary considerably in the magnitude and nature of the conformational changes in the orthosteric ligand-binding site following agonist binding (Venkatakrishnan A.J.V. et al. Nature 2016). Spectroscopic and computational approaches provide complementary information, highlighting the role of conformational dynamics in GPCR activation (Latorraca N.R.V. et al. Chem. Rev 2017). In the absence of agonists, the receptor population is typically dominated by conformations closely related to those observed in inactive-state crystal structures (Manglik A. et al. Cell 2015). While agonist binding drives the receptor population towards conformations similar to those in activestate structures, a mixture of inactive and active conformations remains, reflecting “loose” or incomplete allosteric coupling between the orthosteric and transducer pockets (Dror R.O. et al. Proc. Natl. Acad. Sci. USA 2011). Surprisingly, for some GPCRs, and under some experimental conditions, a substantial fraction of unliganded receptors already reside in an active-like conformation, which may be related to their level of basal or constitutive signaling (Staus D.P. et al. J. Biol. Chem. 2019);  Ye L. et al. Nature 2016).  In our case, the negative allosteric modulators, (Staus DP, et al. J. Biol. Chem 2019); Ye L. et al. Nature 2016) did not alter ligand binding and had only minor effects on specific CXCL12-mediated functions such as inhibition of cAMP release or receptor internalization, among others, but failed to regulate CXCL12-mediated actin dynamics and receptor oligomerization. Collectively, these data suggest that the described compounds alter the active conformation of CXCR4 and therefore support the presence of distinct receptor conformations that explain a partial activation of the signaling cascade.

      All these observations are now included in the revised discussion of the manuscript.

      (44) 5. Equilibrium Shift and Allosteric Ligands: 

      -Clarify the statement about "allosteric ligands shifting the equilibrium to favor a particular receptor conformation". Support this suggestion with references or experimental evidence

      In a previous answer (see our response to point 2), we explain why we define the compounds as negative allosteric modulators. These compounds do not bind the orthosteric binding site or a site distinct from the orthosteric site that alters the ligand-binding site. Their effect should be due to changes in the active conformation of CXCR4, which allow some signaling events whereas others are blocked. Our functional data thus support that through the same receptor the compounds separate distinct receptor-mediated signaling cascades, that is, our data suggest that CXCR4 has a conformational heterogeneity. It is known that GPCRs exhibit more than one “inactive” and “active” conformation, and the endogenous agonists stabilize a mixture of multiple conformations. Biased ligands or allosteric modulators can achieve their distinctive signaling profiles by modulating this distribution of receptor conformations. (Wingler L.M. & Lefkowitz R.J. Trends Cell Biol. 2020). For instance, some analogs of angiotensin II do not appreciably activate Gq signaling (e.g., increases in IP3 and Ca2+) but still induce receptor phosphorylation, internalization, and mitogen-activated protein kinase (MAPK) signaling (Wei H, et al. Proc. Natl. Acad. Sci. USA 2003). Some of these ligands activate Gi and G12 in bioluminescence resonance energy transfer (BRET) experiments (Namkung Y. et al. Sci. Signal. 2018). A similar observation was described in the case of CCR5, where some chemokine analogs promoted G protein subtype-specific signaling bias (Lorenzen E. et al. Sci. Signal 2018). Structural analysis of distinct GPCRs in the presence of different ligands vary considerably in the magnitude and nature of the conformational changes in the orthosteric ligand-binding site following agonist binding (Venkatakrishnan A.J.V. et al. Nature 2016). Yet, these changes modify conserved motifs in the interior of the receptor core and induce common conformational changes in the intracellular site involved in signal transduction. That is, these modifications might be considered distinct receptor conformations. 

      The revised discussion contains some of these interpretations to support our statement about the stabilization of a particular receptor conformation triggered by the negative allosteric modulators. 

      (45) 6. Refinement of Binding Mode: 

      -Clarify the workflow for obtaining the binding mode, particularly the role of GLIDE and PELE. Clearly explain how these software tools were used in tandem to refine the binding mode. 

      The computational sequential workflow applied in this project included, i) Protein model construction, ii) Virtual screening (Glide), iii) PELE, iv) Docking (AutoDock and Glide) and v) Molecular Dynamics (AMBER).

      Glide was applied for the structure-based virtual screening to explore which compounds could fit and interact with the previously selected binding site.

      After the identification of theoretically active compounds (modulators of CXCR4), additional calculations were done to identify a potential binding site. PELE was used in this sense, to study how the compounds could bind in the whole surface of the target (TMV-TMVI). By applying PELE, we avoided biasing the calculation, and we found that the trajectories with better interaction energies identified the cleft between TMV and TMVI as the binding site for AGR1.135 and AGR1.137, and not for AGR1.131. AGR1.131 showed a pose with low binding energy, -39.8 kcal/mol, between TMV and TMVI helices, that is, it might interact with CXCR4 in the selected area for the screening. But it also showed a better pose placed between helices TMI and TMVII, - 43.7 kcal/mol (see our response to point 41). These data have been now confirmed using Schrodinger’s MM-GBSA procedure (see our response to points 6 and 8). In any case, the compound was included in the biological screening, where it was unable to affect CXCL12-mediated chemotaxis (Fig. 1B). Docking and MD simulations were then performed to study and refine the specific binding mode in this cavity. These data were important to choose the mutations on CXCR4 required, to test whether the compounds reversed its behavior. In these experiments we also confirmed that AGR1.131 had a better pose on the TMI-TMVII region. 

      (46) 7. Impact of Compound Differences on CXCR4-F249L mutant: 

      -Provide visual aids, such as figures, and additional experiments to support the statement about differences in the behavior of AGR1.135 and AGR1.137 on cells expressing CXCR4-F249L mutant. Elaborate on the closer interaction suggested between the triazole group of AGR1.137 and the F249 residue

      At the reviewer’s suggestion, Fig. 5 has been modified to incorporate a closer view of the interactions identified and new panels in new Fig. 6 have been added to show in detail the effect of the mutations selected on the structure of the cleft between TMV and TMVI. The main difference between AGR1.135 and AGR1.137 is how the triazole group interacts with F249 and L216 (Author response image 6). In AGR1.137, the three groups are aligned in a parallel organization, which appears to be more effective: This might be due to a better adaptation of this compound to the cleft since there is only one hydrogen bond with V124. In AGR1.135, the compound interacts with the phenyl ring of F249 and has a stronger interaction at the apical edge to stabilize its position in the cleft. However, there is still an additional interaction present. When changing F249

      Author response image 6.

      Cartoon representation of the interaction of CXCR4 F249L mutant with AGR1.135 (A) and AGR1.137 (B). The two most probable conformations of Leucine rotamers are represented in cyan A and B conformations. Van der Waals interactions are depicted in blue cyan dashed lines, hydrogen bonds in black dashed lines. CXCR4 segments of TMV and TMVI are colored in blue and pink, respectively

      to L (Fig. VIIA, B, only for review purposes) and showing the two most likely rotamers resulting from the mutation, it is observed that rotamer B is in close proximity to the compound, which may cause the binding to either displace or adopt an alternative conformation that is easier to bind into the cleft. As previously mentioned, it is likely that AGR1.135 can displace the mutant rotamer and bind into the cleft more easily due to its higher affinity.

      (47) In the "Materials and Methods" section, the computational approach for the "discovery of CXCR4 modulators" requires significant revision and clarification. The following suggestions aim to address the identified issues: 1. Structural Modeling: 

      -Reconsider the use of SWISS-MODEL if there is an available PDB code for the entire CXCR4 structure. Clearly articulate the rationale for choosing one method over the other and explain any limitations associated with the selected approach. 

      The SWISS-model server allows for automated comparative modeling of 3D protein structures that was pioneered in the fields of automated modeling. At the time we started this project. it was the most accurate method to generate reliable 3D protein structure models.

      As explained above, we have now predicted the structure of the target using AlphaFold (Jumper J. et al, Nature 2021) and performed several additional experiments that confirm that the small compounds bind the selected pocket as the original strategy indicated (see our response to point 6). (Fig. II, only for review purposes).

      (48) 2. Parametriza7on of Small Compounds: 

      -Provide a detailed description of the parametrization process for the small compounds used in the study. Specify the force field and parameters employed, considering the obsolescence of AMBER14 and ff14SB. Consider adopting more contemporary force fields and parameterization strategies. 

      When we performed these experiments, some years ago, the force fields applied (ff14SB, AMBER14 used in MD or OPLS2004 in docking with Glide) were well accepted and were gold standards. It is, however, true that the force fields have evolved in the past few years, Moreover, in the case of the MD simulations, to consider the parameters of the ligands that are not contained within the force field, we performed an additional parameterization as a standard methodology. We then generated an Ab initio optimization of the ligand geometry, defining as basis sets B3LYP 6-311+g(d), using Gaussian 09, Revision A.02, and then a single point energy calculation of ESP charges, with HF 6311+g(d) on the optimized structure. As the last step of the parametrization, the antechamber module was used to adapt these charges and additional parameters for MD simulations.

      (49) 3. Treatment of Lipids and Membrane: 

      -Elaborate on how lipids were treated in the system. Clearly describe whether a membrane was included in the simulations and provide details on its composition and structure. Address the role of the membrane in the study and its relevance to the interactions between CXCR4 and small compounds 

      To stabilize CXCR4 and more accurately reproduce the real environment in the MD simulation, the system was embedded in a lipid bilayer using the Membrane Builder tool (Sunhwan J. et al. Biophys. J. 2009) from the CHARMM-GUI server. The membrane was composed of 175 molecules of the fatty acid 1-palmitoyl-2-oleoyl-sn-glycero-3phosphocholine (POPC) in each leaflet. The protein-membrane complex was solvated with TIP3 water molecules. Chloride ions were added up to a concentration of 0.15 M in water, and sodium ions were added to neutralize the system. This information was previously described in detail.

      (50) 4. Molecular Dynamics Protocol: 

      -Provide a more detailed and coherent explanation of the molecular dynamics protocol. Clarify the specific steps, parameters, and conditions used in the simulations. Ensure that the protocol aligns with established best practices in the field.

      Simulations were calculated on an Asus 1151 h170 LVX-GTX-980Ti workstation, with an Intel Core i7-6500 K Processor (12 M Cache, 3.40 GHz) and 16 GB DDR4 2133 MHz RAM, equipped with a Nvidia GeForce GTX 980Ti available for GPU (Graphics Processing Unit) computations. MD simulations were performed using AMBER14 (Case D.A. et al. AMBERT 14, Univ. of California, San Francisco, USA, 2014) with ff14SB (Maier J.A. et al. J. Chem. Theory Comput. 2015) and lipid14 (Dickson C. J. et al. J. Chem. Theory Comput. 2014) force fields in the NPT thermodynamic ensemble (constant pressure and temperature). Minimization was performed using 3500 Steepest Descent steps and 4500 Conjugate Gradient steps three times, firstly considering only hydrogens, next considering only water molecules and ions, and finally minimizing all atoms. Equilibration raises system temperature from 0 to 300 K at a constant volume fixing everything but ions and water molecules. After thermalization, several density equilibration phases were performed. In the production phase, 50 ns MD simulations without position restraints were calculated using a time step of 2 fs. Trajectories of the most interesting poses were extended to 150 ns. All bonds involving hydrogen atoms were constrained with the SHAKE algorithm (Lippert R.A. et al. J. Chem. Phys. 2007). A cutoff of 8 Å was used for the Lennard-Jones interaction and the short-range electrostatic interactions. Berendsen barostat (Berendsen H.J. et al. J. Chem. Phys.  1984) and Langevin thermostat were used to regulate the system pression and temperature, respectively. All trajectories were processed using CPPTRAJ (Roe D.R. & Cheatham III T.E. J. Chem. Theory Comput. 2013) and visualized with VMD (Visual Molecular Dynamics) (Humphrey W. et al. J. Mol. Graphics. 1996). To reduce the complexity of the data, Principal Component Analysis (PCA) was performed on the trajectories using CPPTRAJ.

      (51) Consider updating the molecular dynamics protocol to incorporate more contemporary methodologies, considering advancements in simulation techniques and software.

      In our answer to points 6 and 47, we describe why we use the technology based on Swiss-model and PELE analysis and how we have now used Alphafold and other more contemporary methodologies to confirm that the small compounds bind the selected pocket.

      (52) Figure 1A: 

      •  Consider switching to a cavity representation for CXCL12 to enhance clarity and emphasize the cleft.

      Fig. 1A has been modified to emphasize the cleft.

      (53) Explicitly show the TMV-TMVI cleft in the figure for a more comprehensive visualization. 

      In Fig. 1A we have added an insert to facilitate TMV-TMVI visualization.

      (54) Figure 1B: 

      •  Clearly explain the meaning of the second DMSO barplot to avoid confusion. 

      To clarify this panel, we have modified the figure and the figure legend. Panel B now includes a complete titration of the three compounds analyzed in the manuscript.  The first bar shows cell migration in the absence of both treatment with AMD3100 and stimulation with CXCL12.  The second bar shows migration in response to CXCL12 in the absence of AMD3100. The third bar shows the effect of AMD3100 on CXCL12-induced migration, as a known control of inhibition of migration.  We hope that this new representation of the data results is clearer.

      (55) Figure 1C: 

      •  Provide a clear legend explaining the significance of the green shading on the small compounds. 

      The legend for Fig. 1C has been modified accordingly to the reviewer’s suggestion.

      (56) Figure 2: 

      •  Elaborate on the role of fibronectin in the experiment and explain the specific contribution of CD86-AcGFP.

      The ideal situation for TIRF-M determinations is to employ cells on a physiological substrate complemented with or without chemokines. Fibronectin is a substrate widely used in different studies that allows cell adhesion, mimicking a physiological situation. Jurkat cells express alpha4beta1 and alpha5beta1 integrins that mediate adhesion to fibronectin (Seminario M.C. et al. J. Leuk. Biol. 1999).

      Regarding the use of CD86-AcGFP in TIRF-M experiments. We currently determine the number of receptors in individual trajectories of CXCR4 using, as a reference, the MSI value of CD86-AcGFP that strictly showed a single photobleaching step (Dorsch S. et al. Nat Methods 2009).

      We preferred to use CD86-AcGFP in cells instead of AcGFP on glass, to exclude any potential effect on the different photodynamics exhibited by AcGFP when bound directly to glass. In any case, this issue has been clarified in the revised version.

      (57) Figure 3D: 

      •  Include a plot for the respective band intensity to enhance data presentation 

      The plot showing the band intensity analysis of the experiments shown in Fig. 3D was already included in the original version (see old Supplementary Fig. 3). However, in the revised version, we include these plots in the same figure as panels 3E and 3F.  As a control of inhibition of CXCL12 stimulation, we have also included a new figure (Supplementary Fig. 4) showing the effect of AMD3100 on CXCL12-induced activation of Akt and ERK as analyzed by western blot.

      (58) Consider adding AMD3100 as a control for comparison. 

      In agreement with the reviewer’s suggestion, we have added the effect of AMD3100 in most of the functional experiments performed.

      (59) Figure 4: 

      •  Address the lack of positive controls in Figure 4 and consider their inclusion for a more comprehensive analysis. 

      DMSO bars correspond to the control of the experiment, as they represent the effect of CXCL12 in the absence of any allosteric modulator. As previously described in this point-by-point reply, DMSO bars correspond to the control performed with the solvent with which the small compounds, at maximum concentration, are diluted.  Therefore, they show the effect of the solvent on CXCL12 responses. In any case, and in order to facilitate the comprehension of the figure we have also added the controls in the absence of DMSO to demonstrate that the solvent does not affect CXCL12-mediated functions, together with the effect of the orthosteric inhibitor AMD3100. In addition, we have also included representative images of the effect of the different compounds on CXCL12-induced polarization (Fig. 4C).

      (60) In Figure 4A, carefully assess overlapping error bars and ensure accurate interpreta7on. If necessary, consider alternative representation. 

      We have tried alternative representations of data in Fig. 4A, but in all cases the figure was unclear. We believe that the way we represent the data in the original manuscript is the most clear and appropriate.  Nevertheless, we have now included significance values as a table annexed to the figure, as well as the effect of AMD3100, as a control of inhibition

      (61) Supplementary Figure 1A: 

      •  Improve the clarity of bar plots for better understanding. Consider reordering them from the most significant to the least. 

      This was a good idea, and therefore Supplementary Fig. 1A has been reorganized to improve clarity.

      (62) Supplementary Figure 1C: 

      •  Clarify the rationale behind choosing the 12.5 nM concentration and explain if different concentrations of CXCL12 were tested. 

      In old Supplementary Fig. 1C, we used untreated cells, that is, CXCL12 was not present in the assay.  These experiments were performed to test the potential toxicity of DMSO (solvent) or the negative allosteric modulators on Jurkat cells. The 12.5 nM concentration of CXCL12 mentioned in the figure legend applied only to panels A and B, as indicated in the figure legend. We previously optimized this concentration for Jurkat cells using different concentrations of CXCL12 between 5 and 100 nM.  Nevertheless, we have reorganized old supplementary fig. 1 and clarified the figure legend to avoid misinterpretations (see Supplementary Fig 1A, B and Supplementary Fig. 2A, B).

      (63) Explain the observed reduction in fluorescence intensity for AGR1.135. 

      The cell cycle analysis has been moved from Supplementary Fig. 1C to a new Supplementary Fig. 2.  It now includes the flow cytometry panels to show fluorescence intensity as a function of the number of cells analyzed (Panel 1A) as well as a table (panel B) with the percentage of cells in each phase of the cell cycle. We believe that the apparent reduction in fluorescence that the reviewer observes is mainly due to the number of events analyzed. However, we have changed the flow cytometry panels for others that are more representative and included a table with the mean of the different results. When we determined the percentage of cells in each cell cycle phase, we observed that it looks very similar in all the experimental conditions. That is, none of the compounds affected any of the cell cycle phases. We have also included the effect of H2O2 and staurosporine as control compounds inducing cell death and cell cycle alteration of Jurkat cells.

      (64) Supplementary Table 1: 

      •  Include a column specifying the scoring for each compound to provide a clear reference for readers. 

      To facilitate references to readers, we have now included the inhibitory effect of each compound on Jurkat cell migration in the revised version of this table. 

      (65) Minor Points 

      Page 2 - Abstract: Rephrase the first sentence of the abstract to enhance fluidity. 

      Although the entire manuscript was revised by a professional English editor, we appreciate the valuable comments of this reviewer and we have corrected these issues accordingly.

      (66) Page 2 - Abstract: Explicitly define "CXCR4" as "C-X-C chemokine receptor type 4" the first time it appears.

      We have not used C-X-C chemokine receptor type 4 the first time it appears in the abstract. CXCR4 is an acronym normally accepted to identify this chemokine receptor, and it is used as CXCR4 in many articles published in eLife. However, we introduce the complete name the first time it appears in the introduction.

      (67) Page 2 - Abstract: Explicitly define "CXCL12" as "C-X-C motif chemokine 12" the first time it is mentioned. 

      As we have discussed in the previous response, we have not used C-X-C motif chemokine 12 the first time CXCL12 appears in the abstract, as it is a general acronym normally accepted to identify this specific chemokine, even in eLife papers. However, we introduce the complete name the first time it appears in the introduction section.

      (68) Page 2 - Abstract: Explicitly define "TMV and TMVI" upon its first mention.

      The acronym TM has been defined as “Transmembrane” in the revised version

      (69) Page 2 - Abstract: Review the use of "in silico" in the sentence for accuracy and consider revising if necessary.

      With the term “in silico” we want to refer to those experiments performed on a computer or via computer simulation software. We have carefully reviewed its use in the new version of the manuscript.

      (70) Page 2 - Abstract: Add a comma after "compound" in the sentence, "We identified AGR1.137, a small compound that abolishes...".

      A comma after “compound” has been added in the revised sentence.

      (71) Page 2 - Significance Statement: Rephrase the first sentence of the "Significance Statement" to avoid duplication with the abstract.

      The first sentence of the Significance Statement has been revised to avoid duplication with the abstract. 

      (72) Page 2 - Significance Statement: Break down the lengthy sentence, "Here, we performed in silico analyses..." for better readability. 

      The sentence starting by “Here, we performed in silico analyses…” has been broken down in the revised manuscript.

      (73) Page 2 - Introduction: Replace "Murine studies" with a more specific term for clarity.

      The term “murine studies” is normally used to refer to experimental studies developed in mice. We have nonetheless rephrased the sentence.

      (74) Page 3 - Introduction: Rephrase the sentence for clarity: "Finally, using a zebrafish model, ..."

      The sentence has been now rephrased for clarity.

      (75) Results-AGR1.135 and AGR1.137 block CXCL12-mediated CXCR4 nanoclustering and dynamics: 

      Rephrase the sentence for clarity: "Retreatment with AGR1.135 and AGR1.137, but not with AGR1.131, substantially impaired CXCL12-mediated receptor nanoclustering.”

      The sentence has been rephrased for clarity.

      (76) Results - AGR1.135 and AGR1.137 incompletely abolish CXCR4-mediated responses in Jurkat cells: Clarify the sentence: "In contrast to the effect promoted by AMD3100, a binding-site antagonist of CXCR4..."

      The sentence has been modified for clarity.

      (77) Consider using "orthosteric" instead of "binding-site" antagonist.

      The term orthosteric is now used throughout to refer to a binding site antagonist.

      (78) Discussion: Use the term "in silico" only when necessary.

      We have carefully reviewed the use of “in silico” in the manuscript.

      (79) Discussion: Clarify the sentence: "...not affect neither CXCR2-mediated cell migration...". Confirm if "CXCL12" is intended.

      The sentence refers to the chemokine receptor CXCR2, which binds the chemokine CXCL2. To test the specificity of the compounds for the CXCL12/CXCR4 axis, we evaluated CXCL2-mediated cell migration.  The results indicated that CXCL2/CXCR2 axis was not affected by the negative allosteric modulators, whereas CXCL12-mediated cell migration was blocked.  The sentence has been clarified in the new version of the manuscript.

      (80) Figure 4B: Bold the "B" in the figure label for consistency.

      The “B” in Fig. 4B has been bolded.

      Reviewer #2

      (1) Fig 2. The SPT data is sub-optimal in its presentation as well as analysis. Example images should be shown. The analysis and visualization of the data should be reconsidered for improvements. Graphs with several hundreds, in some conditions over 1000 tracks, per condition are very hard to compare. The same (randomly selected representative set) number of data points should be shown for better visualization. Also, more thorough analyses like MSD or autocorrelation functions are lacking - they would allow enhanced overall representation of the data.

      In agreement with the reviewer’s commentary, we have modified the representation of Fig. 2. We have carefully read the paper published by Lord S.J. and col. (Lord S. J. et al., J. Cell Biol. 2020) and we apply their recommendations for these type of data. We have also included as supplementary material representative videos for the TIRF-M experiments performed to allow readers to visualize the original images. Regarding the MSD analyses, they were developed to determine all D1-4 values. According to the data published by Manzo & García-Parajo (Manzo C. & García-Parajo M.F. Rep.Prog. Phys. 2015) due to the finite trajectory length the MSD curve at large tlag has poor statistics and deviates from linearity. However, the estimation of the Diffusion Coefficient (D1-4) can be obtained by fitting of the short tlag region of the MSD plot giving a more accurate idea of the behavior of particles. In agreement we show D1-4 values and not MSD data. 

      Due to the space restrictions, it is very difficult to include all the figures generated, but, only for review purposes, we included in this point-by-point reply some representative plots of the MSD values as a function of the time from individual trajectories showing different types of motion obtained in our experiments (Author response image 7).

      Author response image 7.

      Representative MSD plots from individual trajectories of CXCR4-AcGFP showing different types of motion: A) confined, B) Brownian/Free, C) direct transport of CXCR4-AcGFP particles diffusing at the cell membrane detected by SPT-TIRF in resting JKCD4 cells.

      Further analysis, such as the classification based on particle motion, has not been included in this article. This classification uses the moment scaling spectrum (MSS), described by Ewers H. et al. 2005 PNAS, and requires particles with longer trajectories (>50 frames). Only for review purposes, we include a figure showing the percentage of the MSS-based particle motion classification for each condition. As expected, most of long particles are confined, with a slight increase in the percentage upon CXCL12 stimulation in all conditions, except in cell treated with AGR1.137 (Author response image 8).

      Author response image 8.

      Effects of the negative allosteric modulators on the Types of Motion of CXCR4. Percentage of single trajectories with different types of motion, classified by MSS (DMSO: 58 particles in 59 cells on FN; 314 in 63 cells on FN+CXCL12; AGR1.131: 102 particles in 71 cells on FN; 258in 69 cells on FN+CXCL12; AGR1.135: 86 particles in 70 cells on FN; 120 in 77 cells on FN+CXCL12; AGR1.137: 47 particles in 66 cells on FN; 74 in 64 cells on FN+CXCL12) n = 3.

      (2) Fig 3. The figure legends have inadequate information on concentrations and incubation times used, both for the compounds and other treatments like CXCL12 and forskolin. For the Western blot data, also the quantification should be added to the main figure. The compounds, particularly AGR1.137 seem to lead to augmented stimulation of pAKT and pERK. This should be discussed

      The Fig. 3 legend has been corrected in the revised manuscript. Fig. 3D now contains representative western blots and the densitometry evaluation of these experiments. As the reviewer indicates, we also detected in the western blot included, augmented stimulation of pAKT and pERK in cells treated with AGR1.137. However, as shown in the densitometry analysis, no significant differences were noted between the data obtained with each compound. As a control of inhibition of CXCL12 stimulation we have included a new Supplementary Fig. 4 showing the effect of AMD3100 on CXCL12-induced activation of Akt and ERK as analyzed by western blot.

      (3) Fig. 4 immunofluorescence data on polarization as well as the flow chamber data lack the representative images of the data. The information on the source of the T cells is missing. Not clear if this experiment was done on bilayers or on static surfaces.

      Representative images for the data shown in Figure 4B have been added in the revised figure (Fig. 4C). The experiments in Fig. 4B were performed on static surfaces. As indicated in the material and methods section, primary T cell blasts were added to fibronectin-coated glass slides and then were stimulated or not with CXCL12 (5 min at 37ºC) prior to fix permeabilize and stain them with Phalloidin. Primary T cell blasts were generated from PBMCs isolated from buffy coats that were activated in vitro with IL-2 and PHA as indicated in the material and methods section.

      (4) The data largely lacks titration of different concentrations of the compounds. How were the effective concentration and treatment times determined? What happens at higher concentrations? It is important to show, for instance, if the CXCR12 binding gets inhibited at higher concentrations. most experiments were performed with 50 uM, but HeLa cell data with 100 uM. Why and how was this determined? 

      The revised version contains a new panel in Fig. 1B to show a more detailed kinetic analysis with different concentrations (1-100 µM) of the compounds in the migration experiments using Jurkat cells. We choose 50 µM for further studies as it was the concentration that inhibits 50-75% of the ligand induced cell migration. 

      We have also included the effect of two doses of the compounds (10 and 50 µM) in the zebrafish model as well as AMD3100 (1 and 10 µM) as control (new Fig. 7D, E).  Tumors were imaged within 2 hours of implantation and tumor-baring embryos were treated with either vehicle (DMSO) alone, AGR1.131 or AGR1.137 at 10 and 50 µM or AMD3100 at 1 and 10 µM for three days, followed by re-imaging.

      Regarding the amount of CXCL12 used in these experiments, with the exception of cell migration assays in Transwells, where the optimal concentration was established at 12.5 nM, in all the other experiments the optimal concentration of CXCL12 employed was 50 nM. In the case of the directional cell migration assays, we use 100 nM to create the chemokine gradient in the device. These concentrations have been optimized in previous works of our laboratory using these types of experiments. It should also be noted that in the experiments using lipid bilayers or TIRF-M experiments, CXCL12 is used to coat the plates and therefore it is difficult to determine the real concentration that is retained in the surface after the washing steps performed prior adding the cells.

      (5) The authors state that they could not detect direct binding of the compounds and the CXCR14. It should be reported what approaches were tried and discussed why this was not possible. 

      We attempted a fluorescence spectroscopy strategy to formally prove the ability of AGR1.135 to bind CXCR4, but this strategy failed because the compound has a yellow color that interfered with the determinations. We also tried a FRET strategy (see supplementary Fig. 7) and detected a significant increase in FRET efficiency of CXCR4 homodimers in cells treated with AGR1.135; this effect was due to the yellow color of this compound that interferes with FRET determinations. In the same assays, AGR1.137 did not modify FRET efficiency for CXCR4 homodimers and therefore we cannot assume that AGR1.137 binds on CXCR4. All these data have been considered in the revised discussion.

      (6) The proliferation data in Supplementary Figure 1 lacks controls that affect proliferation and indication of different cell cycle stages. What is the conclusion of this data? More information on the effects of the drug to cell viability would be important.

      Toxicity in Jurkat cells was first determined by propidium iodide incorporation. Some compounds (i.e., AGR1.103 and VSP3.1) were discarded from further analysis as they were toxic for cells. In a deeper analysis of cell toxicity, even if these compounds did not kill the cells, we checked whether they could alter the cell cycle of the cells. New Supplementary Fig. 2 includes a table (panel B) with the percentage of cells in each cell cycle phase, and no differences between any of the treatments tested were detected. 

      Nevertheless, to clarify this issue the revised version of the figure also includes H2O2 and staurosporine stimuli to induce cell death and cell cycle alterations as controls of these assays.

      (7) The flow data in Supplementary Figure 2 should be statistically analysed. 

      Bar graphs corresponding to the old Supplementary Fig. 2 (new Supplementary Fig. 3) are shown in Fig. 3B. We have also incorporated the corresponding statistical analysis to this figure. 

      (8) In general, the authors should revise the figure legends to ensure that critical details are added. 

      We have carefully revised all the figure legends in the new version of the manuscript.

      (9) Bar plots are very poor in showing the heterogeneity of the data. Individual data points should be shown whenever feasible. Superplot-type of representation is strongly advised (https://doi.org/10.1083/jcb.202001064).

      We have carefully read the paper published by Lord S.J. and col. (Lord S. J. et al., J. Cell Biol. 2020) and we apply their recommendations for our TIRF-M data (see revised Fig.  2).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors): 

      - The title may not reflect the key finding of the paper. It is well established in the field that the disaggregation process is sensitive to perturbations of the levels of the disaggregating factors.

      We have changed the title to better reflect the major finding of the work, the importance of the NEF during the initiation of disaggregation. The new title is: Early Steps of Protein Disaggregation by Hsp70 Chaperone and Class B J-Domain Proteins are Shaped by Hsp110.

      - Abstract:

      Please note that the phrases "stimulation is much limited with class A JDPs", "limited destabilization of the chaperone complex improves disaggregation", and "tuned proportion between the co-chaperones" are hard to understand. Only after having read the manuscript are the meanings of these phrases accessible.

      The phrases in the abstract were changed (page 1, lines 10-14).

      - The subheading "Sse1 improves aggregate modification by Hsp70" on p. 7 is unclear. What is measured is a decrease in aggregate size dependent on Hsp70-JDP as well as Sse1.

      The subheading was changed to include more precise information, into “Sse1 leads to Hsp70-depenent reduction of aggregate size”.

      - The subheading "Biphasic effects of Sse1 on the Hsp70 disaggregation activity" does not describe the finding clearly; "Biphasic effects" is a term that is hard to understand.

      To avoid phrases that can be understood in many ways, we have changed the subheading into “Hormetic effects of Sse1 in Hsp70 disaggregation activity”

      - p.5, last line. Hsp110 typo The typos have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) The article emphasises multiple times the importance of stoichiometry between the (co-)chaperones. Most figures would benefit from an indication of the used stoichiometry (or all absolute concentrations) to support the points made about the stoichiometry, especially the figures showing titrations of Sse1, Sse1-2, and Sis1 (Fig. 3D, 3E, 4A-C, S2B, S5F, S6A-E).

      The information of protein concentrations has been included in all figure captions.

      (2) The manuscript includes a summary model. While this model is a plausible hypothesis of the mechanism of disaggregation by Hsp70, in particular when viewed with previous data (Wyszkowski et al., 2021), it focuses rather heavily on the potential remodeling of clients by Hsp70, which is not the primary focus of the data presented in this manuscript. More emphasis could be put on the JDP class/ functional specificity observed.

      The model has been changed according to the Reviewer’s comments to better reflect the findings presented in the manuscript (Figure 5).

      (3) The methods section is very brief. I recommend including additional details about reaction conditions (temperature, buffer compositions, protein concentrations) even when previously reported elsewhere to improve the readability of the manuscript. Details regarding the DLS experiments performed are missing.

      More detailed information on the experimental conditions has been added to the Methods section, as well as to figure legends.

      (4) Many experiments incorporate BLI to assess the effect of NEFs on the binding of the Hsp70 and JDP to aggregates. Although appropriate controls are included (no ATP, Hsp70, and JDP only), a control with only Hsp70 and the NEF would be useful to determine to which extent the NEF itself alters the thickness of the (Hsp70-bound) aggregate biolayer.

      The suggested controls were added (Figure 1—figure supplement 1 G) and discussed in the manuscript (page 5, lines 23-24).

      Reviewer #3 (Recommendations For The Authors):

      - The refolding assay makes use of Luciferase denatured in 5 M GdnHCl. These conditions lead to a spontaneous refolding yield of 20% (Figure 3C), which is very high and limits conclusions on the effect of Hsp110 but also JDPs on the refolding process. Typically this assay uses 6 M GdnHCl for Luciferase denaturation and under these conditions, spontaneous refolding of Luciferase is hardly observed (e.g. Laufen et al. PNAS 1999). The authors are therefore asked to repeat key experiments using altered (6M) GdnHCl concentrations.

      We based our experiments assessing luciferase refolding on the publication by Imamoglu et al. (2020), in which the authors, using 5 M GdnHCl for luciferase denaturation, demonstrated that spontaneous and chaperone-assisted luciferase refolding strongly depends on luciferase concentration. In this work, a similar degree of luciferase refolding was reported for the same final luciferase concentration (100 nM) as we used in our experiments (Figure 1—figure supplement 1D). As an additional control, we compared the effects of 5 M and 6 M of GdnHCl during denaturation on luciferase refolding under the same conditions (100 nM, 25 °C, 2 h) and we observed no significant differences (Author response image 1).

      Author response image 1.

      Chaperone-assisted folding of luciferase after denaturation at 5 M or 6 M GdnHCl. Luciferase was denatured in 5 M or 6 M GdnHCl according to the protocol in the Materials and Methods section. Luminescence was monitored alone or after incubation with Luminescence was monitored alone or after incubation with Ssa1-Sis1 or Ssa1-Ydj1. Chaperones were used at 1 µM concentration. Luciferase activity was measured after 2 hours and normalized to the activity of the native protein. Error bars indicate SD from three repeats.

      - Figure 1B: The authors are asked to provide binding curves for Ssa1/Sse1 (no Sis1) and Sis1/Sse1 (no Ssa1) as controls. Particularly the latter combination is required as direct cooperation between Hsp110 and JDPs has been suggested in the literature (Mattoo et al., JBC 2013).

      We performed the suggested BLI experiment, and the results are presented in the new Figure 1—figure supplement 1 G (page 5, lines 23-24).

      - Figure 1B (and other figure parts showing BLI data): it is unclear how often the BLI experiments have been performed. This should be stated in the figure legend. Can the authors add SDs to the respective curves?

      We added detailed information about the number of replicates to the figure legends. SD bars were added to the BLI results shown in Figures1-4, apart from the results of titrations, for which, for the sake of clarity, the three replicates are represented in the plots on the right (Figure 3D). In the case of less than 3 repeats of the results presented in the Supplementary Figures, the remaining repeats are added to the provided Source Data file, information about which has been added to the captions of the respective figures. 

      - The observation that Hsp110 can interrupt Hsp70 interaction with JDPs is intriguing. Do the authors envision JDP displacement from the aggregate? If so this could be shown in BLI experiments by monitoring the release of fluorescently labeled Sis1 (similar to labeled Ssa1, Fig. S3C). Or will the released JDP immediately rebind to another binding site on the aggregate? The authors should at least discuss the diverse scenarios as they are relevant to the mechanism of protein disaggregation.

      The proposed experiment is challenging due to the transient nature of Sis1 binding to aggregate and high background observed with the method using the fluorescently labelled proteins. The aspect of chaperone’s re-binding after their release by Hsp110 proposed by the reviewer has been introduced into the Discussion section (pages 12/13, lines 25-4). We speculate that Hsp110 might release an Hsp70 molecule as well as a JDP molecule that had been bound to the aggregate through Hsp70 (Figure 5).  

      - Figure 2B: Ssa1/Sis1/Sse1 strongly decreases the size of Luciferase-GFP aggregates. Yet this activity only allows for limited refolding of aggregated Luciferase and the reaction stays largely dependent on Hsp104. How do the authors envision the role of the hexameric disaggregase in this process? Does it act exclusively on small-sized aggregates after Hsp110-dependent fragmentation?

      A question of the Hsp104 activity with the Hsp70-processed aggregates is indeed intriguing and we agree that it should have been discussed more thoroughly. We added to the manuscript the results of the reactivation of luciferase-GFP with and without Hsp104 to emphasize the role of Hsp104 in the active protein recovery (Figure 2—figure supplement 1A) (page 7, lines 24-27). We propose that aggregate fragmentation by Hsp70-JDPB-Hsp110 increases the effective aggregate surface, at which Hsp104 might become engaged. We do not think that Hsp104 acts only on small aggregates, it might be just more effective, when the number of exposed polypeptides is larger. In the cell, where Hsp104 binds to aggregates of various sizes, protein aggregates apparently also need to undergo such Hsp110-boosted pre-processing by Hsp70, based on the finding that Sse1 is not necessary for Hsp104 recruitment to aggregates, but it is required for Hsp104-dependent disaggregation (Kaimal et al., 2017). We have added a comment on this problem to the Discussion section (pages 11/12, lines 33-4) .

      - Page 9: The authors state that the Sse1-2 variant is nearly as effective as Sse1 Wt in stimulating substrate dissociation and refer to published work (Polier et al., 2008). It is unclear how the variant should have Wtlike activity in triggering substrate release although its activity in catalyzing nucleotide exchange is reduced to 5% (both activities are coupled). The observation that high Sse1-2 concentrations do not inhibit protein disaggregation does not necessarily exclude the possibility that high Sse1 WT concentration inhibit the reaction by overstimulating substrate release. The latter possibility should be considered by the authors and added to the discussion section.

      We agree with the Reviewer that the description of the Sse1-2 variant was misleading, as it was lacking the key information, that according to the published data (Polier et al., 2008), it was 10 times higher the concentration of the Sse1-2 variant than Sse1 WT that had a similar nucleotide-exchange activity to the wild type. We have changed the text (page 9, lines 16-22, page 13, lines 26-28) to avoid confusion as well as the model in the Figure 5, to underline the importance of substrate release as the cause of the Hsp110-dependent inhibition.

      - While similar effects are observed for human class A and class B JDP co-chaperones, they are clearly less pronounced. A mechanistic explanation for the difference between yeast and human chaperones is currently missing and the authors are asked to elaborate on this aspect.

      There are indeed clear differences between the human and yeasts systems, especially regarding the dependence on the NEF. Hsc70 has been reported to have a lower rate of ADP release (Dragovic et al., 2006) and thus might rely more on Hsp110 than its yeast ortholog. For the same reason, the strong Hsc70 stimulation by Hsp105 is also observed with class A JDP. We have added a comment on these effects in the Discussion section (page 12, lines 17-21).

      Minor points

      - Figure S1C (right): the disaggregation rate (%GFP/h) is somewhat misleading/confusing as a value of more than 150%/h is determined in the presence of the complete disaggregation system while only approx. 60% GFP is indeed refolded by the system (Figure S1C, left). Showing the rate as %GFP/min seems more rational.

      We changed the units according to the Reviewer’s comment (Figure 1—figure supplement 1A, C).

      - Figure S5B: Only a single data point is shown for Ssa1/Sis1/Sse1.

      We changed the figure to include datapoints from all three repeats (Figure 3—figure supplement 1 B).

      - There are several typos throughout the manuscript. A more careful proofreading is recommended

      We have corrected the typos.

      Reviewer #1 (Public Review):

      The experiments differ somewhat in regard to the aggregated protein used. For example, in Figure 1A, FFL is used with only limited reactivation (10% reactivated at the last timepoint and the curve is flattening), while in Figure 2B FFL-EGFP is used to monitor microscopically what appears to be complete disaggregation. Does FFL-EGFP behave the same as FFL in assays such as the one in Figure 1A or are there major differences that may impact how the data should be interpreted?

      We added the results of Luc-GFP reactivation (Figure 2—figure supplement 1 B) (discussed on page 7, lines 24-27 of the manuscipt) which agree with the results obtain with Luciferase as a substrate (Figure 1—figure supplement 1 B). They clearly show that the Ssa1-Sis1-Sse1-dependent decrease in aggregate size is not associated with the recovery of active protein.

      Reviewer #2 (Public Review):

      Experimental data concerning the class A JDPs should be interpreted with caution. These experiments show very small reactivation activities for luciferase in the range of 0-1% without the addition of Hsp104 and 0-15% with the addition of Hsp104. Moreover, since the assay is based on the recovery of luciferase activity, it conflates two chaperone activities, namely disaggregation and refolding. It is possible that the small degree of reactivation observed for the class A JDP reflects a minor subpopulation of the aggregated species that is particularly easy to disaggregate/refold and may thus not be representative of bulk behaviour.

      The disaggregation by the Hsp70 system can be enhanced by the addition of small heat shock proteins at the step of substrate aggregation (Rampelt et al., 2012). However, sHsps compete with Hsp70 for binding to the aggregate (Żwirowski et al., 2017) and for that reason we decided not to include sHsps in the experiments presented in the manuscript, as it would introduce another level of complexity. However, as a control, we performed the disaggregation assay with Hsp70 with Ydj1 using luciferase aggregates formed in the presence or absence of sHsp (Author response image 2). In 1 h, the Hsp70 system without Hsp104 yielded 5% of recovered luciferase activity and the system with Hsp104, 23% compared to the native. The impact of Sse1 on Ssa1-Ydj1 and Ssa1-Ydj1-Hsp104 was similar as for luciferase aggregates formed without sHsps (Figure 1A, Figure 1—figure supplement 1 B). Furthermore, according to the Reviewer’s comment, we have changed the Figure 5 to underscore the more prominent role of class A JDPs in the final protein folding than in disaggregation.

      Author response image 2.

      Disaggregaton of heat-aggregated luciferase – impact of sHsps. Luciferase (2 μM) was denatured with (blue) or without (red) Hsp26 (20 μM) at 45 ̊C for 15 min in the buffer A (Materials and Methods). Upon 100-fold dilution with the buffer A, supplemented wih 5 mM ATP, 2 mM DTT, 1.2 μM creatine kinase, 20 mM creatine phosphate, chaperones indicated in the legend were added to the final concentration of 1 μM, except for Sse1, concentration of which was 0.1 μM. Shown is luciferase activity measured after 1 h of incubation at 25 °C, normalized to the activity of native luciferase.

      Reviewer #3 (Public Review):

      Enhanced recruitment of Hsp70 in the presence of Hsp110 was shown for amyloid fibrils before (Beton et al., EMBO J 2022) and should be acknowledged. 

      We have added the suggested citation with a respective comment (page 11, lines 20-21).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper details a study of endothelial cell vessel formation during zebrafish development. The results focus on the role of aquaporins, which mediate the flow of water across the cell membrane, leading to cell movement. The authors show that actin and water flow together drive endothelial cell migration and vessel formation. If any of these two elements are perturbed, there are observed defects in vessels. Overall, the paper significantly improves our understanding of cell migration during morphogenesis in organisms.

      Strengths:

      The data are extensive and are of high quality. There is a good amount of quantification with convincing statistical significance. The overall conclusion is justified given the evidence.

      Weaknesses:

      There are two weaknesses, which if addressed, would improve the paper.

      (1) The paper focuses on aquaporins, which while mediates water flow, cannot drive directional water flow. If the osmotic engine model is correct, then ion channels such as NHE1 are the driving force for water flow. Indeed this water is shown in previous studies. Moreover, NHE1 can drive water intake because the export of H+ leads to increased HCO3 due to the reaction between CO2+H2O, which increases the cytoplasmic osmolarity (see Li, Zhou and Sun, Frontiers in Cell Dev. Bio. 2021). If NHE cannot be easily perturbed in zebrafish, it might be of interest to perturb Cl channels such as SWELL1, which was recently shown to work together with NHE (see Zhang, et al, Nat. Comm. 2022).

      (2) In some places the discussion seems a little confusing where the text goes from hydrostatic pressure to osmotic gradient. It might improve the paper if some background is given. For example, mention water flow follows osmotic gradients, which will build up hydrostatic pressure. The osmotic gradients across the membrane are generated by active ion exchangers. This point is often confused in literature and somewhere in the intro, this could be made clearer.

      Reviewer #1 (Recommendations For The Authors):

      (1) The paper focuses on aquaporins, which while mediating water flow, cannot drive directional water flow. If the osmotic engine model is correct, then ion channels such as NHE1 are the driving force for water flow. Indeed this water is shown in previous studies. Moreover, NHE1 can drive water intake because the export of H+ leads to increased HCO3 due to the reaction between CO2+H2O, which increases the cytoplasmic osmolarity (see Li, Zhou and Sun, Frontiers in Cell Dev. Bio. 2021). If NHE cannot be easily perturbed in zebrafish, it might be of interest to perturb Cl channels such as SWELL1, which was recently shown to work together with NHE (see Zhang, et al, Nat. Comm. 2022).

      We thank Reviewer #1 for this very important comment and the suggestion to examine the function of ion channels in establishing an osmotic gradient to drive directional flow. We have taken on board the reviewer’s suggestion and examined the expression of NHE1 and SWELL1 in endothelial cells using published scRNAseq of 24 hpf ECs (Gurung et al, 2022, Sci. Rep.). We found that slc9a1a, slc9a6a, slc9a7, slc9a8, lrrc8aa and lrrc8ab are expressed in different endothelial subtypes. To examine the function of NHE1 and SWELL1 in endothelial cell migration, we used the pharmacological compounds, 5-(N-ethyl-Nisopropyl)amiloride (EIPA) and DCPIB, respectively. While we were unable to observe an ISV phenotype after EIPA treatment at 5, 10 and 50µM, we were able to observe impaired ISV formation after DCPIB treatment that was very similar to that observed in Aquaporin mutants. We were very encouraged by these results and proceeded to perform more detailed experiments whose results have yielded a new figure (Figure 6) and are described and discussed in lines 266 to 289 and 396 to 407, respectively, in the revised manuscript.

      (2) In some places the discussion seems a little confusing where the text goes from hydrostatic pressure to osmotic gradient. It might improve the paper if some background is given. For example, mention water flow follows osmotic gradients, which will build up hydrostatic pressure. The osmotic gradients across the membrane are generated by active ion exchangers. This point is often confused in literature and somewhere in the intro, this could be made clearer.

      Thank you for pointing out the deficiency in explaining how osmotic gradients drive water flow to build up hydrostatic pressure. We have clarified this in lines 50, 53 - 54 and 385.

      The two recommendations listed above would improve the paper. They are however not mandatory. The paper would be acceptable with some clarifying rewrites. I am not an expert on zebrafish genetics, so it might be difficult to perturb ion channels in this model organism. Have the authors tried to perturb ion channels in these cells?

      We hope that our attempts at addressing Reviewer’s 1 comments are satisfactory and sufficient to clarify the concerns outlined.

      Reviewer #2 (Public Review):

      Summary:

      Directional migration is an integral aspect of sprouting angiogenesis and requires a cell to change its shape and sense a chemotactic or growth factor stimulus. Kondrychyn I. et al. provide data that indicate a requirement for zebrafish aquaporins 1 and 8, in cellular water inflow and sprouting angiogenesis. Zebrafish mutants lacking aqp1a.1 and aqp8a.1 have significantly lower tip cell volume and migration velocity, which delays vascular development. Inhibition of actin formation and filopodia dynamics further aggravates this phenotype. The link between water inflow, hydrostatic pressure, and actin dynamics driving endothelial cell sprouting and migration during angiogenesis is highly novel.

      Strengths:

      The zebrafish genetics, microscopy imaging, and measurements performed are of very high quality. The study data and interpretations are very well-presented in this manuscript.

      Weaknesses:

      Some of the mechanobiology findings and interpretations could be strengthened by more advanced measurements and experimental manipulations. Also, a better comparison and integration of the authors' findings, with other previously published findings in mice and zebrafish would strengthen the paper.

      We thank Reviewer #2 for the critique that the paper can be strengthened by more advanced measurements and experimental manipulations. One of the technical challenges that we face is how to visualize and measure water flow directly in the zebrafish. We have therefore taken indirect approaches to assess water abundance in endothelial cells in vivo. One approach was to measure the diffusion of GEM nanoparticles in tip cell cytoplasm in wildtype and Aquaporin mutants, but results were inconclusive. The second was to measure the volume of tip cells, which should reflect water in/outflow. As the second approach produced clear and robust differences between wildtype ECs, ECs lacking Aqp1a.1 and Aqp8a.1 and ECs overexpressing Aqp1a.1 (revised Fig. 5), we decided to present these data in this manuscript.

      We have also taken Reviewer 2 advice to better incorporate previously published data in our discussion (see below and lines 374 to 383 of the revised manuscript).

      Reviewer #2 (Recommendations For The Authors):

      I have a few comments that the authors may address to further improve their manuscript analysis, quality, and impact.

      Major comments:

      (1) Citation and discussion of published literature

      The authors have failed to cite and discuss recently published results on the role of aqp1a.1 and aqp8a.1 in ISV formation and caliber in zebrafish (Chen C et al. Cardiovascular Research 2024). That study showed a similar impairment of ISV formation when aqp1a.1 is absent but demonstrated a stronger phenotype on ISV morphology in the absence of aqp8a.1 than the current manuscript by Kondrychyn I et al. Furthermore, Chen C et al show an overall decrease in ISV diameter in single aquaporin mutants suggesting that the cell volume of all ECs in an ISV is affected equally. Given this published data, are ISV diameters affected in single and double mutants in the current study by Kondrochyn I et al? An overall effect on ISVs would suggest that aquaporin-mediated cell volume changes are not an inherent feature of endothelial tip cells. The authors need to analyse/compare and discuss all differences and similarities of their findings to what has been published recently.

      We apologise for having failed and discussed the recently published paper by Chen et al. This has been corrected and discussed in lines 374 to 383.

      In the paper by Chen et al, the authors describe a role of Aqp1a.1 and Aqp8a.1 in regulating ISV diameter (ISV diameter was analysed at 48 hpf) but they did not examine the earlier stages of sprouting angiogenesis between 20 to 30 hpf, which is the focus of our study. We therefore cannot directly compare the ISV phenotypes with theirs. Nevertheless, we recognise that there are differences in ISV phenotypes from 2 dpf. For example, they did not observe incompletely formed or missing ISVs at 2 and 3 dpf, which we clearly observe in our study. This could be explained by differences in the mutations generated. In Chen et al., the sgRNA used targeted the end of exon 2 that resulted in the generation of a 169 amino acid truncated aqp1a.1 protein. However, in our approach, our sgRNA targeted exon 1 of the gene that resulted in a truncated aqp1a.1 protein that is 76 amino acid long. As for the aqp8a.1 zebrafish mutant that we generated, our sgRNA targeted exon 1 of the gene that resulted in a truncated protein that is 73 amino acids long. In Chen et al., the authors did not generate an aqp8a.1 mutant but instead used a crispant approach, which leads to genetic mosaicism and high experimental variability.

      Following the reviewer’s suggestion, we have now measured the diameters of arterial ISVs (aISVs) and venous ISVs (vISVs) in aqp1a.1<sup>-/-</sup>, aqp8a.1<sup>-/-</sup> and aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup> zebrafish. In our lab, we always make a distinction between aISVs and vISVs are their diameters are significantly different from each other. The results are in Fig S11A. While we corroborate a decrease in diameter in both aISVs and vISVs in single aqp1a.1<sup>-/-</sup> and double aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup>.zebrafish, we observed a slight increase in diameter in both aISVs and vISVs in aqp8a.1<sup>-/-</sup> zebrafish at 2 dpf. We also measured the diameter of aISV and vISV in Tg(fli1ep:aqp1a.1-mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) zebrafish at 2 dpf (Fig S11B) and unlike in Chen et al., we could not detect a difference in the diameter between control and aqp1a.1- or aqp8a.1-overexpressing endothelial cells.

      We also would also like to point out that, because ISVs are incompletely formed or are missing in aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup> zebrafish (Fig. 3G – L), blood flow is most likely altered in the zebrafish trunk of these mutants, and this can have a secondary effect on blood vessel calibre or diameter. In fact, we often observed wider ISVs adjacent to unperfused ISVs (Fig. 3J) as more blood flow enters the lumenized ISV. Therefore, to determine the cell autonomous function of Aquaporin in mediating cell volume changes in vessel diameter regulation, one would need to perform cell transplantation experiments where we would measure the volume of single aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup> endothelial cells in wildtype embryos with normal blood flow. As this is beyond the scope of the present study, we have not done this experiment during the revision process.

      (2) Expression of aqp1a.1 and aqp8a.1

      The quantification shown in Figure 1G shows a relative abundance of expression between tip and stalk cells. However, it seems aqp8a.1 is almost never detected in most tip cells. The authors could show in addition, the % of Tip and stalk cells with detectable expression of the 2 aquaporins. It seems aqp8a1 is really weakly or not expressed in the initial stages. Ofcourse the protein may have a different dynamic from the RNA.

      We would like to clarify that aqp8a.1 mRNA is not detected in tip cells of newly formed ISVs at 20hpf. At 22 hpf, it is expressed in both tip cells (22 out of 23 tip cells analysed) and stalk cells of ISVs at 22hpf. This is clarified in lines 107 - 109. We also include below a graph showing that although aqp8a.1 mRNA is expressed in tip cells, its expression is higher in stalk cells.

      Author response image 1.

      Could the authors show endogenously expressed or tagged protein by antibody staining? The analysis of the Tg(fli1ep:aqp8a.1-mEmerald)rk31 zebrafish line is a good complement, but unfortunately, it does not reveal the localization of the endogenously expressed protein. Do the authors have any data supporting that the endogenously expressed aqp8a.1 protein is present in sprouting tip cells?

      We tested several antibodies against AQP1 (Alpha Diagnostic International, AQP11-A; ThermoFisher Scientific, MA1-20214; Alomone Labs, AQP-001) and AQP8 (Sigma Aldrich, SAB 1403559; Alpha Diagnostic International, AQP81-A; Almone Labs, AQP-008) but unfortunately none worked. As such, we do not have data demonstrating endogenous expression and localisation of Aqp1a.1 and Aqp8a.1 proteins in endothelial cells.

      Could the authors perform F0 CRISPR/Cas9 mediated knockin of a small tag (i.e. HA epitope) in zebrafish and read the endogenous protein localization with anti-HA Ab?

      CRISPR/Cas9 mediated in-frame knock-in of a tag into a genomic locus is a technical challenge that our lab has not established. We therefore cannot do this experiment within the revision period.

      Given the double mutant phenotypic data shown, is aqp8a.1 expression upregulated and perhaps more important in aqp1a.1 mutants?

      In our analysis of aqp1a.1 homozygous zebrafish, there is a slight down_regulation in _aqp8a.1 expression (Fig. S5C). Because the loss of Aqp1a.1 leads to a stronger impairment in ISV formation than the loss of Aqp8a.1 (see Fig. S6F, G, I and J), we believe that Aqp1a.1 has a stronger function than Aqp8a.1 in EC migration during sprouting angiogenesis.

      Regarding the regulation of expression by the Vegfr inhibitor Ki8751, does this inhibitor affect Vegfr/ERK signalling in zebrafish and the sprouting of ISVs significantly?

      ki8751 has been demonstrated to inhibit ERK signalling in tip cells in the zebrafish by Costa et al., 2016 in Nature Cell Biology. In our experiments, treatment with 5 µM ki8751 for 6 hours from 20 hpf also inhibited sprouting of ISVs.

      The data presented suggest that tip cells overexpressing aqp1a.1-mEmerald (Figure 2C) need more than 6 times longer to migrate the same distance as tip cells expressing aqp8a.1mEmerald (Figure 2D). How does this compare with cells expressing only Emerald? A similar time difference can be seen in Movie S1 and Movie S2. Is it just a coincidence? Could aqp8a.1, when expressed at similar levels than aqp1a, be more functional and induce faster cell migration? These experiments were interpreted only for the localization of the proteins, but not for the potential role of the overexpressed proteins on function. Chen C et al. Cardiovascular Research 2024 also has some Aqp overexpression data.

      The still images prepared for Fig. 2 C and D were selected to illustrate the localization of Aqp1a.1-mEmerald and Aqp8a.1-mEmerald at the leading edge of migrating tip cells. We did not notice that the tip cell overexpressing Aqp1a.1-mEmerald (Figure 2C) needed more than 6 times longer to migrate the same distance as the tip cell expressing aqp8a.1-mEmerald (Figure 2D), which the reviewer astutely detected. To ascertain whether there is a difference in migration speed between Aqp1a.1-mEmerald and Aqp8a.1-mEmerald overexpressing endothelial cells, we measured tip cell migration velocity of three ISVs from Tg(fli1ep:aqp1a.1-mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) zebrafish during the period of ISV formation (24 to 29 hpf) using the Manual Tracking plugin in Fiji. As shown in the graph, there is no significant difference in the migration speed of ECs overexpressing Aqp1a.1-mEmerald and Aqp8a.1-mEmerald, suggesting that Aqp8a.1-overexpressing cells migrate at a similar rate as Aqp1a.1-overexpressing cells. As we have not generated a Tg(fli1ep:mEmerald) zebrafish line, we are unable to determine whether endothelial cells migrate faster in Tg(fli1ep:aqp1a.1mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) zebrafish compared to endothelial cell expressing only mEmerald. As for the observation that tip cells overexpressing aqp1a.1mEmerald (Figure 2C) need more than 6 times longer to migrate the same distance as tip cells expressing aqp8a.1-mEmerald, we can only surmise that it is coincidental that the images selected “showed” faster migration of one ISV from Tg(fli1ep:aqp8a.1-mEmerald) zebrafish. We do not know whether the Aqp1a.1 and Aqp8a.1 are overexpressed to the same levels in Tg(fli1ep:aqp1a.1mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) zebrafish.

      We would also like to point out that when we analysed the lengths of ISVs at 28 hpf in aqp1a.1<sup>-/-</sup> and aqp8a.1<sup>-/-</sup> zebrafish, ISVs were shorter in aqp1a.1<sup>-/-</sup> zebrafish compared to aqp8a.1<sup>-/-</sup> zebrafish (Fig. S6 F to J). These results indicate that the loss of Aqp1a.1 function causes slower migration than the loss of aqp8a.1 function, and suggest that Aqp1a.1 induces faster endothelial cell migration that Aqp8a.1.

      Author response image 2.

      The data on Aqps expression after the Notch inhibitor DBZ seems unnecessary, and is at the moment not properly discussed. It is also against what is set in the field. aqp8a.1 levels seem to increase only 24h after DBZ, not at 6h, and still authors conclude that Notch activation inhibits aqp8a.1 expression (Line 138-139). In the field, Notch is considered to be more active in stalk cells, where aqp8a.1 expression seems higher (not lower). Maybe the analysis of tip vs stalk cell markers in the scRNAseq data, and their correlation with Hes1/Hey1/Hey2 and aqp1 vs aqp8 mRNA levels will be more clear than just showing qRT-PCR data after DBZ.

      As our scRNAseq data did not include ECs from earlier during development when ISVs are developing, we have analysed of scRNAseq data of 24 hpf endothelial cells published by Gurung et al, 2022 in Scientific Reports during the revision of this manuscript. However, we are unable to detect separate clusters of tip and stalk cells. As such, we are unable to correlate hes1/hey1/hey2 expression (which would be higher in stalk cells) with that of aqp1a.1/aqp8a.1. Also, we have decided to remove the DBZ-treatment results from our manuscript as we agree with the two reviewers that they are unnecessary.

      The paper would also benefit from some more analysis and interpretation of available scRNAseq data in development/injury/disease/angiogenesis models (zebrafish, mice or humans) for the aquaporin genes characterized here. To potentially raise a broader interest at the start of the paper.

      We thank the reviewer for suggesting examining aquaporin genes in other angiogenesis/disease/regeneration models to expand the scope of aquaporin function. We will do this in future studies.

      (3) Role of aqp1a.1 and aqp8a.1 on cytoplasmic volume changes and related phenotypes

      In Figure 5 the authors show that Aqp1/Aqp8 mutant endothelial tip cells have a lower cytoplasmic volume than tip cells from wildtype fish. If aquaporin-mediated water inflow occurs locally at the leading edge of endothelial tip cells (Figure 2, line 314-318), why doesn't cytoplasmic volume expand specifically only at that location (as shown in immune cells by Boer et al. 2023)? Can the observed reduction in cytoplasmic volume simply be a side-effect of impaired filopodia formation (Figure 4F-I)?

      We believe that water influx not only expands filopodia but also the leading front of tip cells (see bracket region in Fig. 4D), where Aqp1a.1-mEmerald/Aqp8a.1-mEmerald accumulate (Fig. 2), to generate an elongated protrusion and forward expansion of the tip cell. The decrease in cytoplasmic volume observed in the aqp1a.1;aqp8a.1 double mutant zebrafish is a result of decreased formation of these elongated protrusions at the leading front of migration tip cells as shown in Fig. 4E (compare to Fig. 4D), not from just a decrease in filopodia number. In fact, in the method used to quantify cell volume, mEmerald/EGFP localization is limited to the cytoplasm and does not label filopodia well (compare mEmerald/EGFP in green with membrane tagged-mCherry in Fig. 5A - C). The volume measured therefore reflects cytoplasmic volume of the tip cell, not filopodia volume.

      Do the authors have data on cytoplasmic volume changes of endothelial tip cells in latrunculin B treated fish? The images in Figures 6 A,B suggest that there is a difference in cell volume upon lat b treatment only.

      No, unfortunately we have not performed single cell labelling and measurement of tip cells in Latrunculin B-treated embryos. We can speculate that as there is a decrease in actindriven membrane protrusions in this experiment, one would also expect a decrease in cell volume as the reviewer has observed.

      (4) Combined loss of aquaporins and actin-based force generation.

      Lines 331-332 " we show that hydrostatic pressure is the driving force for EC migration in the absence of actin-based force generation"....better leave it more open and stick to the data. The authors show that aquaporin-mediated water inflow partially compensates for the loss of actin-based force generation in cell migration. Not that it is the key driving/rescuing force in the absence of actin-based force.

      We have changed it to “we show that hydrostatic pressure can generate force for EC migration in the absence of actin-based force generation” in line 348.

      (5) Aquaporins and their role in EC proliferation

      In the study by Phnk LK et al. 2013, the authors have shown that proliferation is not affected when actin polymerization or filopodia formation is inhibited. However, in the current manuscript by Kondrychyn I. et al. this has not been analysed carefully. In Movie S4 the authors indicate by arrows tip cells that fail to invade the zebrafish trunk demonstrating a severe defect of sprouting initiation in these mutants. Yet, when only looking at ISVs that reach the dorsal side in Movie S4, it appears that they are comprised of fewer EC nuclei/ISV than the ISVs in Movie S3. At the beginning of DLAV formation, most ISVs in control Movie S3 consist of 3-4 EC nuclei, while in double mutants Movie S4 it appears to be only 2-3 EC nuclei. At the end of the Movie S4, one ISV on the left side even appears to consist of only a single EC when touching the dorsal roof. The authors provide convincing data on how the absence of aquaporin channels affects sprouting initiation and migration speed, resulting in severe delay in ISV formation. However, the authors should also analyse EC proliferation, as it may also be affected in these mutants, and may also contribute to the observed phenotype. We know that effects on cell migration may indirectly change the number of cells and proliferation at the ISVs, but this has not been carefully analysed in this paper.

      We thank the reviewer for highlighting the lack of information on EC number and division in the aquaporin mutants. We have now quantified EC number in ISVs that are fully formed (i.e. connecting the DA or PCV to the DLAV) at 2 and 3 dpf and the results are displayed in Figure S10A and B. At 2 dpf, there is a slight but significant reduction in EC number in both aISVs and vISVs in aqp1a.1<sup>-/-</sup> zebrafish and an even greater reduction in the double aqp1a. aqp1a.1<sup>/-</sup>;aqp8a.1<sup>-/-</sup> zebrafish. No significant change in EC number was observed in aqp8a.1<sup>-/-</sup> zebrafish. EC number was also significantly decreased at 3 dpf for aqp1a.1<sup>-/-</sup>, aqp8a.1<sup>-/-</sup> and aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup> zebrafish. The decreased in EC number per ISV may therefore contribute to the observed phenotype.

      We have also quantified the number of cell divisions during sprouting angiogenesis (from 21 to 30 hpf) to assess whether the lack of Aquaporin function affects EC proliferation. This analysis shows that there is no significant difference in the number of mitotic events between aqp1a.1<sup>+/-</sup>; aqp8a.1<sup>+/-</sup> and aqp1a.1<sup>-/-</sup>;aqp8a.1<sup>-/-</sup> zebrafish (Figure S10 C), suggesting that the reduction in EC number is not caused by a decrease in EC proliferation.

      These new data are reported on lines 198 to 205 of the manuscript.

      Minor comments:

      - Figure 3K data seems not to be necessary and even partially misleading after seeing Figure 3E. Fig. 3E represents the true strength of the phenotype in the different mutants.

      Figure 3K has been removed from Figure 3.

      - Typo Figure 3L (VII should be VI).

      Thank you for spotting this typo. VII has been changed to VI.

      - Line 242: The word "required" is too strong because there is vessel formation without Aqps in endothelial cells.

      This has been changed to “ …Aqp1a.1 and Aqp8a.1 regulate sprouting angiogenesis…” (lines 238 - 239).

      - From Figure S2, the doublets cluster should be removed.

      We have performed a new analysis of 24 hpf, 34hpf and 3 dpf endothelial cells scRNAseq data (the previous analysis did not consist of 24 hpf endothelial cells). The doublets cluster is not included in the UMAP analysis.

      - Better indicate the fluorescence markers/alleles/transgenes used for imaging in Figures 6A-D.

      The transgenic lines used for this experiment are now indicated in the figure (this figure is now Figure 7).

      Reviewer #3 (Public Review):

      Summary:

      Kondrychyn and colleagues describe the contribution of two Aquaporins Aqp1a.1 and Aqp8a.1 towards angiogenic sprouting in the zebrafish embryo. By whole-mount in situ hybridization, RNAscope, and scRNA-seq, they show that both genes are expressed in endothelial cells in partly overlapping spatiotemporal patterns. Pharmacological inhibition experiments indicate a requirement for VEGR2 signaling (but not Notch) in transcriptional activation.

      To assess the role of both genes during vascular development the authors generate genetic mutations. While homozygous single mutants appear normal, aqp1a.1;aqp8a.1 double mutants exhibit defects in EC sprouting and ISV formation.

      At the cellular level, the aquaporin mutants display a reduction of filopodia in number and length. Furthermore, a reduction in cell volume is observed indicating a defect in water uptake.

      The authors conclude, that polarized water uptake mediated by aquaporins is required for the initiation of endothelial sprouting and (tip) cell migration during ISV formation. They further propose that water influx increases hydrostatic pressure within the cells which may facilitate actin polymerization and formation membrane protrusions.

      Strengths:

      The authors provide a detailed analysis of Aqp1a.1 and Aqp8a.1 during blood vessel formation in vivo, using zebrafish intersomitic vessels as a model. State-of-the-art imaging demonstrates an essential role in aquaporins in different aspects of endothelial cell activation and migration during angiogenesis.

      Weaknesses:

      With respect to the connection between Aqp1/8 and actin polymerization/filopodia formation, the evidence appears preliminary and the authors' interpretation is guided by evidence from other experimental systems.

      Reviewer #3 (Recommendations For The Authors):

      Figure 1 H, J:

      The differential response of aqp1/-8 to ki8751 vs DBZ after 6h treatment is quite obvious. Why do the authors show the effect after 24h? The effect is more likely than not indirect.

      We agree with the reviewer and we have now removed 24 hour Ki8751 treatment and all DBZ treatments from Figure 1.

      Figure 2:

      According to the authors' model anterior localization of Aqp1 protein is critical. The authors perform transient injections to mosaically express Aqp fusion proteins using an endothelial (fli1) promoter. For the interpretation, it would be helpful to also show the mCherry-CAAX channel in separate panels. From the images, it is not possible to discern how many cells we are looking at. In particular the movie in panel D may show two cells at the tip of the sprout. A marker labelling cell-cell junctions would help. Furthermore, the authors are using a strong exogenous promoter, thus potentially overexpressing the fusion protein, which may lead to mislocalization. For Aqp1a.1 an antibody has been published to work in zebrafish (e.g. Kwong et al., Plos1, 2013).

      We would like to clarify that we generated transgenic lines - Tg(fli1ep:aqp1a.1-mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) - to visualize the localization of Aqp1a.1 and Aqp8a.1 in endothelial cells, and the images displayed in Fig. 2 are from the transgenic lines (not transient, mosaic expression).

      To aid visualization and interpretation, we have now added mCherry-CAAX only channel to accompany the Aqp1a.1/Aqp8a.1-mEmerald channel in Fig. 2A and B. To discern how many cells there are in the ISVs at this stage, we have crossed Tg(fli1ep:aqp1a.1-mEmerald) and Tg(fli1ep:aqp8a.1-mEmerald) zebrafish to TgKI(tjp1a-tdTomato)<sup>pd1224</sup> (Levic et al., 2021) to visualize ZO1 at cell-cell junction. However, because tjp1-tdTomato is expressed in all cell types including the skin that lies just above the ISV and the signal in ECs in ISVs is very weak at 22 to 25 hpf, it was very difficult to obtain good quality images that can properly delineate cell boundaries to determine the number of cells in the ISVs at this early stage. Instead, we have annotated endothelial cell boundaries based on more intense mCherryCAAX fluorescence at cell-cell borders, and from the mosaic expression of mCherryCAAX that is intrinsic to the  Tg(kdrl:ras-mCherry)<sup>s916</sup> zebrafish line.

      In Fig. 2D, there are two endothelial cells in the ISV during the period shown but there is only 1 cell occupying the tip cell position i.e. there is one tip cell in this ISV. Unlike the mouse retina where it has been demonstrated that two endothelial cells can occupy the tip cell position side-by-side (Pelton et al., 2014), this is usually not observed in zebrafish ISVs. This is demonstrated in Movie S3, where it is clear that one nucleus (belonging to the tip cell) occupies the tip of the growing ISV. The accumulation of intracellular membranes is often observed in tip cells that may serve as a reservoir of membranes for the generation of membrane protrusions at the leading edge of tip cells.

      We agree that by generating transgenic Tg(fli1ep:aqp1a.1-mEmerald) and Tg(fli1ep:aqp8a.1mEmerald) zebrafish, Aqp1a.1 and Aqp8a.1 are overexpressed that may affect their localization. The eel anti-Aqp1a.1 antibody used in (Kwong et la., 2013) was a gift from Dr. Gordon Cramb, Univ. of St Andrews, Scotland and it was first published in 2001. This antibody is not available commercially. Instead, we have tried to several other antibodies against AQP1 (Alpha Diagnostic International , AQP11-A; ThermoFisher Scientific, MA120214; Alomone Labs, AQP-001) and AQP8 (Sigma Aldrich, SAB 1403559; Alpha Diagnostic International, AQP81-A; Almone Labs, AQP-008) but unfortunately none worked. As such, we cannot compare localization of Aqp1a.1-mEmerald and Aqp8a.1-mEmerald with the endogenous proteins.

      Figure 3:

      E: the quantification is difficult to read. Wouldn't it be better to set the y-axis in % of the DV axis? (see also Figure S6).

      We would like to show the absolute length of the ISVs, and to illustrate that the ISV length decreases from anterior to posterior of the zebrafish trunk. We have increased the size of Fig. 3E to enable easier reading of the bars.

      K: This quantification appears arbitrary.

      We have removed this panel from Figure 3.

      G-J: The magenta channel is difficult to see. Is the lifeact-mCherry mosaic? In panel J there appears to be a nucleus between the sprout and the DLAV. It would be helpful to crop the contralateral side of the image.

      No, the Tg(fli1:Lifeact-mCherry) line is not mosaic. The “missing” vessels are not because of mosaicism in transgene but because of truncated ISVs that is a phenotype of loss Aquaporin function. We have changed the magenta channel to grey and hope that by doing so, the reviewer will be able to see the shape of the blood vessels more clearly. We would like to leave the contralateral side in the images, as it shows that the defective vessel is only on one side of body. Furthermore, when we tried to remove it (reducing the number of Z-stacks) neighbour ISV looks incomplete because the embryos were not mounted flat. To clarify what the nucleus between the sprout and the DLAV is, we have indicated that it is that of the contralateral ISV.

      L: I do not quite understand the significance of the different classes of phenotypes. Do the authors propose different morphogenetic events or contexts of how these differences come about?

      Here, we report the different types of ISV phenotypes that we observe in 3 dpf aqp1a.1<sup>-/-</sup>; aqp8a.1<sup>-/-</sup> zebrafish (Fig. 3 and Fig. S7). As demonstrated in Fig. 4, most of the phenotypes can be explained by the delayed emergence of tip cells from the dorsal aorta and slower tip cell migration. However, in some instances, we also observed retraction of tip cells (Movie S4) and failure of tip cells to emerge from the dorsal aorta or endothelial cell death (see attached figure on page 14), which can give rise to the Class II phenotype. In the dominant class I phenotype (in contrast to class II), secondary sprouting from the posterior cardinal vein is unaffected, and the secondary sprout migrates dorsally passing the level of horizontal myoseptum but cannot complete the formation of vISV (it stops beneath the spinal cord). The Class III phenotype appears to result from a failure of the secondary sprout to fuse with the regressed primary ISV. In the Class IV phenotype, the ventral EC does not maintain a connection to the dorsal aorta. We did not examine how Class III and IV phenotypes arise in detail in this current study.

      Author response image 3.

      Figure 4:

      This figure nicely demonstrates the defects in cell behavior in aqp mutants.

      In panel F it would be helpful to show the single channels as well as the merge.

      We have now added single channels for PLCd1PH and Lifeact signal in panels F and G.

      In Figure 1 the authors argue that the reduction of Aqp1/8 by VEGFR2 inhibition may account for part of that phenotype. In turn, the aqp phenotype seems to resemble incomplete VEGFR2 inhibition. The authors should check whether expression Aqp1Emerald can partially rescue ki8751 inhibition.

      To address the reviewer’s comment, we have treated Tg(fli1ep:Aqp1-Emerald) embryos with ki8751 from 20 hpf for 6 hours but we were unable to observe a rescue in sprouting. It could be because VEGFR2 inhibition also affects other downstream signalling pathways that also control cell migration as well as proliferation.

      Based on previous studies (Loitto et al.; Papadopoulus et al.) the authors propose that also in ISVs aquaporin-mediated water influx may promote actin polymerization and thereby filopodia formation. However, while the effect on filopodia number and length is well demonstrated, the underlying cause is less clear. For example, filopodia formation could be affected by reduced cell polarization. This can be tested by using a transgenic golgi marker (Kwon et al., 2016).

      We have examined tip cell polarity of wildtype, aqp1a.1<sup>-/-</sup> and  aqp8a. 1<sup>-/-</sup> embryos at 24-26 hpf by analysing Golgi position relative to the nucleus. We were unable to analyze polarity in  aqp1a.1<sup>rk28/rk28</sup>; aqp8a.1<sup>rk29/rk29</sup> embryos as they exist in an mCherry-containing transgenic zebrafish line (the Golgi marker is also tagged to mCherry). The results show that tip cell polarity is similar, if not more polarised, in aqp1a.1<sup>-/-</sup> and  aqp8a. 1<sup>-/-</sup> embryos when compared to wildtype embryos (Fig. S10D). This new data is discussed in lines 234 to 237.

      Figure 5:

      Panel D should be part of Figure 4.

      Panel 5D is now in panel J of Figure 4 and described in lines 231 and 235.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      People can perform a wide variety of different tasks, and a long-standing question in cognitive neuroscience is how the properties of different tasks are represented in the brain. The authors develop an interesting task that mixes two different sources of difficulty, and find that the brain appears to represent this mixture on a continuum, in the prefrontal areas involved in resolving task difficulty. While these results are interesting and in several ways compelling, they overlap with previous findings and rely on novel statistical analyses that may require further validation.

      Strengths

      1) The authors present an interesting and novel task for combining the contributions of stimulus-stimulus and stimulus-response conflict. While this mixture has been measured in the multi-source interference task (MSIT), this task provides a more graded mixture between these two sources of difficulty

      2) The authors do a good job triangulating regions that encoding conflict similarity, looking for the conjunction across several different measures of conflict encoding

      3) The authors quantify several salient alternative hypothesis and systematically distinguish their core results from these alternatives

      4) The question that the authors tackle is of central theoretical importance to cognitive control, and they make an interesting an interesting contribution to this question

      We would like to thank the reviewer for the positive evaluation of our manuscript and the constructive comments and suggestions. Your feedback has been invaluable in our efforts to enhance the accessibility of our manuscript and strengthen our findings. In response to your suggestion, we reanalyzed our data using the approach proposed by Chen et al.’s (2017, NeuroImage) and applied stricter multiple comparison correction thresholds in our reporting. This reanalysis largely replicated our previous results, thereby reinforcing the robustness of our findings. We also have examined several alternative models and results supported the integration of the spatial Stroop and Simon conflicts within the cognitive space. In addition, we enriched the theoretical framework of our manuscript by connecting the cognitive space with other important theories such as the “Expected Value of Control” theory. We have incorporated your feedback, revisions and additional analyses into the manuscript. As a result, we firmly believe that these changes have significantly improved the quality of our work. We have provided detailed responses to your comments below.

      1) It's not entirely clear what the current task can measure that is not known from the MSIT, such as the additive influence of conflict sources in Fu et al. (2022), Science. More could be done to distinguish the benefits of this task from MSIT.

      We agree that the MSIT task incorporates Simon and Eriksen Flanker conflict tasks and can efficiently detect the additivity of conflict effects across orthogonal tasks. Like the MSIT, our task incorporates Simon with spatial Stroop conflicts and can test the same idea. For example, a previous study from our lab (Li et al., 2014) used the combined spatial Stroop-Simon condition with the arrows displayed on diagonal corners and found evidence for the additive hypothesis. However, the MSIT cannot be used to test whether/how different conflicts are parametrically represented in a low-dimensional space, a question that is important to address the debate of domain-general and domain-specific cognitive control.

      To this end, our current study adopted the spatial Stroop-Simon task for the unique purpose of parametrically modulating conflict similarity. As far as we know, there is no way to define the similarity between the combined Simon_Flanker conflict condition and the Simon/Flanker conditions in the MSIT. In contrast, with the spatial Stroop-Simon paradigm, we can define the similarity with the cosine of the angle difference across the two conditions in question.

      We have added the following texts in the discussion part to emphasize the 51 difference between our paradigm and other studies.

      "The use of an experimental paradigm that permits parametric manipulation of conflict similarity provides a way to systematically investigate the organization of cognitive control, as well as its influence on adaptive behaviors. This approach extends traditional paradigms, such as the multi-source interference task (Fu et al., 2022), color Stroop-Simon task (Liu et al., 2010) and similar paradigms that do not afford a quantifiable metric of conflict source similarity."

      References:

      Li, Q., Nan, W., Wang, K., & Liu, X. (2014). Independent processing of stimulus-stimulus and stimulus-response conflicts. PloS One, 9(2), e89249.

      2) The evidence from this previous work for mixtures between different conflict sources make the framing of 'infinite possible types of conflict' feel like a strawman. The authors cite classic work (e.g., Kornblum et al., 1990) that develops a typology for conflict which is far from infinite, and I think few people would argue that every possible source of difficulty will have to be learned separately. Such an issue is addressed in theories like 'Expected Value of Control', where optimization of control policies can address unique combinations of task demands.

      The notion that there might be infinite conflicts arises when we consider the quantitative feature of cognitive control. If each combination of the Stroop-Simon combination is regarded as a conflict condition, there would be infinite combinations, and it is our major goal to investigate how these infinite conflict conditions are represented effectively in a space with finite dimensions. We agree that it is unnecessary to dissociate each of these conflict conditions into a unique conflict type, since they may not differ substantially. However, we argue that understanding variant conflicts within a purely categorical framework (e.g., Simon and Flanker conflict in MSIT) is insufficient, especially because it leads to dichotomic conclusions that do not capture how combinations of conflicts are organized in the brain, as our study addresses.

      There could be different perspectives on how our cognitive control system flexibly encodes and resolves multiple conflicts. The cognitive space assumption we held provides a principle by which we can represent multiple conflicts in a lower dimensional space efficiently. While the “Expected Value of Control” theory addresses when and how much cognitive control to apply based on control demand, the “cognitive space” view seeks to explain how the conflict, which defines cognitive control demand, is encoded in the brain. Thus, we argue that these two lines of work are different yet complementary. The geometry of cognitive space of conflict can benefit the adjustment of cognitive control for upcoming conflicts. For example, our brain may evaluate the similarity/distance (and thus cost) between the consecutive conflict conditions, and selects the path with best cost-benefit tradeoff to switch from one state to another. This idea is conceptually similar to a recent study by Grahek et al. (2022) demonstrating that more frequently switching states were encoded as closer together than less frequently switching states in a “drift-threshold” space.

      Nevertheless, Grahek et al (2022) investigated how cognitive control changes based on the expected value of control theory within the same conflict, whereas our study aims to examine organization of different conflict.

      We have added the implications of cognitive space view in the discussion to indicate the potential values of our finding to understand the EVC account and the difference between the two theories.

      “Previous researchers have proposed an “expected value of control (EVC)” theory, which posits that the brain can evaluate the cost and benefit associated with executing control for a demanding task, such as the conflict task, and specify the optimal control strength (Shenhav et al., 2013). For instance, Grahek et al. (2022) found that more frequently switching goals when doing a Stroop task were achieved by adjusting smaller control intensity. Our work complements the EVC theory by further investigating the neural representation of different conflict conditions and how these representations can be evaluated to facilitate conflict resolution. We found that different conflict conditions can be efficiently represented in a cognitive space encoded by the right dlPFC, and participants with stronger cognitive space representation have also adjusted their conflict control to a greater extent based on the conflict similarity (Fig 4C). The finding suggests that the cognitive space organization of conflicts guides cognitive control to adjust behavior. Previous studies have shown that participants may adopt different strategies to represent a task, with the model-based strategies benefitting goal-related behaviors more than the model-free strategies (Rmus et al., 2022). Similarly, we propose that cognitive space could serve as a mental model to assist fast learning and efficient organization of cognitive control settings. Specifically, the cognitive space representation may provide a principle for how our brain evaluates the expected cost of switching and the benefit of generalization between states and selects the path with the best cost-benefit tradeoff (Abrahamse et al., 2016; Shenhav et al., 2013). The proximity between two states in cognitive space could reflect both the expected cognitive demand required to transition and the useful mechanisms to adapt from. The closer the two conditions are in cognitive space, the lower the expected switching cost and the higher the generalizability when transitioning between them. With the organization of a cognitive space, a new conflict can be quickly assigned a location in the cognitive space, which will facilitate the development of cognitive control settings for this conflict by interpolating nearby conflicts and/or projecting the location to axes representing different cognitive control processes, thus leading to a stronger CSE when following a more similar conflict condition. On the other hand, without a cognitive space, there would be no measure of similarity between conflicts on different trials, hence limiting the ability of fast learning of cognitive control setting from similar trials.”

      Reference:

      Grahek, I., Leng, X., Fahey, M. P., Yee, D., & Shenhav, A. Empirical and Computational Evidence for Reconfiguration Costs During Within-Task Adjustments in Cognitive Control. CogSci.

      3) Wouldn't a region that represented each conflict source separately still show the same pattern of results? The degree of Stroop vs Simon conflict is perfectly negatively correlated across conditions, so wouldn't a region that just tracks Stoop conflict show these RSA patterns? The authors show that overall congruency is not represented in DLPFC (which is surprising), but they don't break it down by whether this is due to Stroop or Simon congruency (I'm not sure their task allows for this).

      To estimate the unique contributions of the spatial Stroop and Simon conflicts, we performed a model-comparison analysis. We constructed a Stroop-Only model and a Simon-Only model, with each conflict type projected onto the Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, P., 1901), that is, their intersection divided by their union. By replacing the cognitive spacebased conflict similarity regressor with the Stroop-Only and Simon-Only regressors, we calculated their BICs. Results showed that the BIC was larger for Stroop-Only (5377122) and Simon-Only (5377096) than for the Cognitive-Space model (5377094). An additional Stroop+Simon model, including both Stroop-Only and Simon-Only regressors, also showed a poorer model fitting (BIC = 5377118) than the Cognitive-Space model. Considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials), we also conducted the model comparison using the incongruent trials only. Results showed that Stroop-Only (1344128), Simon-Only (1344120), and Stroop+Simon (1344157) models all showed higher BIC values than the CognitiveSpace model (1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. Therefore, we believe the cognitive space has incorporated both dimensions. We added these additional analyses and results to the revised manuscript.

      “To examine if the right 8C specifically encodes the cognitive space rather than the domain-general or domain-specific organizations, we tested several additional models (see Methods). Model comparison showed a lower BIC in the Cognitive-Space model (BIC = 5377094) than the Domain-General (BIC = 537127) or Domain-Specific (BIC = 537127) models. Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D. We also tested if the observed conflict similarity effect was driven solely by spatial Stroop or Simon conflicts, and found larger BICs for the models only including the Stroop similarity (i.e., the Stroop-Only model, BIC = 5377122) or Simon similarity (i.e., the Simon-Only model, BIC = 5377096). An additional Stroop+Simon model, including both StroopOnly and Simon-Only regressors, also showed a worse model fitting (BIC = 5377118). Moreover, we replicated the results with only incongruent trials, considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. The more detailed model comparison results are listed in Table 2.”

      We reason that we did not observe an overall congruency effect in the RSA results is because our definition of congruency here differed from traditional definitions (i.e., contrast between incongruent and congruent conditions). In the congruency regressor of our RSA model, we defined representational similarity as 1 if calculated between two incongruent, or two congruent trials, and 0 if between incongruent and congruent trials. Thus, our definition of the congruency regressor reflects whether multivariate patterns differ between incongruent and congruent trials, rather than whether activity strengths differ. Indeed, we did observe the latter form of congruency effects, with stronger univariate activities in pre-SMA for incongruent versus congruent conditions. We have added this in the Note S6 (“The multivariate representations of conflict type and orientation are different from the congruency effect”):

      “Neither did we observe a multivariate congruency effect (i.e., the pattern difference between incongruent and congruent conditions compared to that within each condition) in the right 8C or any other regions. Note the definition of congruency here differed from traditional definitions (i.e., contrast between activity strength of incongruent and congruent conditions), with which we found stronger univariate activities in pre-SMA for incongruent versus congruent conditions.”

      We could not determine whether the null effect of the congruency regressor was due to Stroop or Simon congruency alone, because congruency levels of the two types always covary. On all trials of the compound conditions (Conf 2-4), whenever the Stroop dimension was incongruent, the Simon dimension was also incongruent, and vice versa for the congruent condition. Thus, the contribution of spatial Stroop or Simon alone to the congruency effect could not be tested using compound conditions. Although we have pure spatial Stroop or Simon conditions, within-Stroop and withinSimon trial pairs constituted only 8% of cells in the representational similarity matrix. This was insufficient to determine whether the null congruency effect was due to solely Stroop or Simon.

      Overall, with the added analysis we found that the data in the right 8C area supports conflict representations that are organized based on both Simon and spatial Stroop conflict. Although the current experimental design does not allow us to identify whether the null effect of the congruency regressor was driven by either conflict or both, we clarified that the congruency regressor did not test the 205 conventional congruency effect and the null finding does not contradict previous 206 research.

      Reference:

      Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat(37), 547-579.

      4) The authors use a novel form of RSA that concatenates patterns across conditions, runs and subjects into a giant RSA matrix, which is then used for linear mixed effects analysis. This appears to be necessary because conflict type and visual orientation are perfectly confounded within the subject (although, if I understand, the conflict type x congruence interaction wouldn't have the same concern about visual confounds, which shouldn't depend on congruence). This is an interesting approach but should be better justified, preferably with simulations validating the sensitivity and specificity of this method and comparing it to more standard methods.

      The confound exists for both the conflict type and the conflict type × congruence interaction in our design, since both incongruent and congruent conditions include stimuli from the full orientation space. For example, for the spatial Stroop type, the congruent condition could be either an up arrow at the top or a down arrow at the bottom. Similarly, the incongruent condition could be either an up arrow at the bottom or a down arrow at the top. Therefore, both the congruent and incongruent conditions are perfectly confounded with the orientation.

      We reanalyzed the data using the well-documented approach by Chen et al. (2017, Neuroimage), as suggested by the reviewer. The new analysis replicated our previously reported results (Fig. 4-5, S4-S7). As Chen et al (2017) has provided abundant simulations to validate this approach, we did not run any further simulations.

      5) A chief concern is that the same pattern contributes to many entries in the DV, which has been addressed in previous work using row-wise and column-wise random effects (Chen et al., 2017, Neuroimage). It would also be informative to know whether the results hold up to removing within-run similarity, which can bias similarity measures (Walther et al., 2016, Neuroimage).

      Thank you for the comment. In our revised manuscript, we followed your suggestion and adopted the approach proposed by Chen et al. (2017). Specifically, we included both the upper and lower triangle of the representational similarity matrix (excluding the diagonal). Moreover, we also removed all the within-subject similarity (thus also excluding the within-run similarity as suggested by Walther et al. (2016)) to minimize the bias of the potentially strong within-subject similarity. In addition, we added both the row-wise and column-wise random effects to capture the dependence of cells within each column and each row, respectively (Chen et al., 2017).

      Results from this approach largely replicated our previous results. The right 8C again showed significant conflict similarity representation, with greater representational strength in incongruent than congruent condition, and positively correlated to behavioral performance. The orientation effect was also identified in the visual (e.g., right V1) and oculomotor (e.g., left FEF) regions.

      We have revised the methodology and the results in the revised manuscript:

      "Representational similarity analysis (RSA).

      For each cortical region, we calculated the Pearson’s correlations between fMRI activity patterns for each run and each subject, yielding a 1400 (20 conditions × 2 runs × 35 participants) × 1400 RSM. The correlations were calculated in a cross297 voxel manner using the fMRI activation maps obtained from GLM3 described in the previous section. We excluded within-subject cells from the RSM (thus also excluding the within-run similarity as suggested by Walther et al., (2016)), and the remaining cells were converted into a vector, which was then z-transformed and submitted to a linear mixed effect model as the dependent variable. The linear mixed effect model also included regressors of conflict similarity and orientation similarity. Importantly, conflict similarity was based on how Simon and spatial Stroop conflict are combined and hence was calculated by first rotating all subject’s stimulus location to the top right and bottom-left quadrants, whereas orientation was calculated using original stimulus locations. As a result, the regressors representing conflict similarity and orientation similarity were de-correlated. Similarity between two conditions was measured as the cosine value of the angular difference. Other regressors included a target similarity regressor (i.e., whether the arrow directions were identical), a response similarity regressor (i.e., whether the correct responses were identical); a spatial Stroop distractor regressor (i.e., vertical distance between two stimulus locations); a Simon distractor regressor (i.e., horizontal distance between two stimulus locations). Additionally, we also included a regressor denoting the similarity of Group (i.e., whether two conditions are within the same subject group, according to the stimulus-response mapping). We also added two regressors including ROI316 mean fMRI activations for each condition of the pair to remove the possible uni-voxel influence on the RSM. A last term was the intercept. To control the artefact due to dependence of the correlation pairs sharing the same subject, we included crossed random effects (i.e., row-wise and column-wise random effects) for the intercept, conflict similarity, orientation and the group factors (G. Chen et al., 2017)."

      Reference:

      Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. Neuroimage, 137, 188-200. doi:10.1016/j.neuroimage.2015.12.012

      6) Another concern is the extent to which across-subject similarity will only capture consistent patterns across people, making this analysis very similar to a traditional univariate analysis (and unlike the traditional use of RSA to capture subject-specific patterns).

      With proper normalization, we assume voxels across different subjects should show some consistent localizations, although individual differences can be high. J. Chen et al. (2017) has demonstrated that consistent multi-voxel activation patterns exist across individuals. Previous studies have also successfully applied cross-subject RSA (see review by Freund et al, 2021) and cross-subject decoding approaches (e.g., Jiang et al., 2016; Tusche et al., 2016), so we believe cross-subject RSA should be feasible to capture distributed activation patterns shared at the group level. We added this argument in the revised manuscript:

      "Previous studies (e.g., J. Chen et al., 2017) have demonstrated that consistent multivoxel activation patterns exist across individuals, and successful applications of cross-subject RSA (see review by Freund, Etzel, et al., 2021) and cross-subject decoding approaches (Jiang et al., 2016; Tusche et al., 2016) have also been reported."

      In the revised manuscript, we also tested whether the representation in right 8C held for within-subject data. We reasoned that the conflict similarity effects identified by cross-subject RSA should be replicable in within-subject data, although the latter is not able to dissociate the conflict similarity effect from the orientation effect. We performed similar RSA for within-subject RSMs, excluding the within-run cells. We replaced the perfectly confounded factors of conflict similarity and orientation with a common factor called similarity_orientation. Other confounding factor pairs were addressed similarly. Results showed a significant effect of similarity_orientation, t(13993) = 3.270, p = .0005, 1-tailed. Given the specific representation of conflict similarity identified by the cross-subject RSA, we believe that the within-subject data of right 8C probably showed similar conflict similarity modulation effects as the cross-subject data, although future research that orthogonalizes conflict type and orientation is needed to fully answer this question. We added this result in the revised section Note S7.

      "Note S7. The cross-subject RSA captures similar effects with the within-subject RSA Considering the variability in voxel-level functional localizations among individuals, one may question whether the cross-subject RSA results were biased by the consistent multi-voxel patterns across subjects, distinct from the more commonly utilized withinsubject RSA. We reasoned that the cross-subject RSA should have captured similar effects as the within-subject RSA if we observe the conflict similarity effect in right 8C with the latter analysis. Therefore, we tested whether the representation in right 8C held for within-subject data. Specifically, we performed similar RSA for withinsubject RSMs, excluding the within-run cells. We replaced the perfectly confounded factors of conflict similarity and orientation with a common factor called similarity_orientation. Other confounding factor pairs (i.e., target versus response, and Stroop distractor versus Simon distractor) were addressed similarly. Results showed a significant effect of similarity_orientation, t(13993) = 3.270, p = .0005, 1tailed. Given the specific representation of conflict similarity identified by the crosssubject RSA, the within-subject data of right 8C may show similar conflict similarity modulation effects as the cross-subject data. Further research is needed to fully dissociate the representation of conflict and the representation of visual features such as orientation."

      Reference:

      Chen, J., Leong, Y. C., Honey, C. J., Yong, C. H., Norman, K. A., & Hasson, U. (2017). Shared memories reveal shared structure in neural activity across individuals. Nature Neuroscience, 20(1), 115-125.

      Freund, M. C., Etzel, J. A., & Braver, T. S. (2021). Neural Coding of Cognitive Control: The Representational Similarity Analysis Approach. Trends in Cognitive Sciences, 25(7), 622-638.

      Jiang, J., Summerfield, C., & Egner, T. (2016). Visual Prediction Error Spreads Across Object Features in Human Visual Cortex. J Neurosci, 36(50), 12746-12763.

      Tusche, A., Bockler, A., Kanske, P., Trautwein, F. M., & Singer, T. (2016). Decoding the Charitable Brain: Empathy, Perspective Taking, and Attention Shifts Differentially Predict Altruistic Giving. Journal of Neuroscience, 36(17), 4719-4732.

      7) Finally, the authors should confirm all their results are robust to less liberal methods of multiplicity correction. For univariate analysis, they should report the effects from the standard p < .001 cluster forming threshold for univariate analysis (or TFCE). For multivariate analyses, FDR can be quite liberal. The authors should consider whether their mixed-effects analyses allow for group-level randomization, and consider (relatively powerful) Max-Stat randomization tests (Nichols & Holmes, 2002, Hum Brain Mapp).

      In our revised manuscript, we have corrected the univariate results using the probabilistic TFCE (pTFCE) approach by Spisak et al. (2019). This approach estimates the conditional probability of cluster extent based on Bayes’ rule. Specifically, we applied pTFCE on our univariate results (i.e., the z-maps of our contrasts). This returned enhanced Z-score maps, which were then thresholded based on simulated cluster size thresholds using 3dClustSim. A cluster-forming threshold of p < .001 was employed. Results showed only the pre-SMA was activated in the incongruent > congruent contrast, and right IPS and right dmPFC were activated in the linear Simon modulation effect. Further tests also showed these regions were not correlated with the behavioral performance, uncorrected ps >.28. These results largely replicated our previous results. We have revised the method and results accordingly.

      Methods:

      "Results were corrected with the probabilistic threshold-free cluster enhancement(pTFCE) and then thresholded by 3dClustSim function in AFNI (Cox & Hyde, 1997) with voxel-wise p < .001 and cluster-wize p < .05, both 1-tailed."

      Results:

      "In the fMRI analysis, we first replicated the classic congruency effect by searching for brain regions showing higher univariate activation in incongruent than congruent conditions (GLM1, see Methods). Consistent with the literature (Botvinick et al., 2004; Fu et al., 2022), this effect was observed in the pre-supplementary motor area (preSMA) (Fig. 3, Table S1). We then tested the encoding of conflict type as a cognitive space by identifying brain regions with activation levels parametrically covarying with the coordinates (i.e., axial angle relative to the horizontal axis) in the hypothesized cognitive space. As shown in Fig. 1B, change in the angle corresponds to change in spatial Stroop and Simon conflicts in opposite directions. Accordingly, we found the right inferior parietal sulcus (IPS) and the right dorsomedial prefrontal cortex (dmPFC) displayed positive correlation between fMRI activation and the Simon conflict (Fig. 3, Fig. S3, Table S1)."

      We appreciate the reviewer’s suggestion to apply the Max-Stat randomization tests (Nichols & Holmes, 2002) for the multivariate analyses. However, the representational similarity matrix was too large (1400×1400) to be tested with a balanced randomization approach (i.e., the Max-Stat), due to (1) running even 1000 times for all ROIs cost very long time; (2) the distribution generated from normal times of randomization (e.g., 5000 iterations) would probably be unbalanced, since the full range of possible samples that could be generated by a complete randomization is not adequately represented. Instead, we adopted a very strict Bonferroni correction p < 0.0001/360 when reporting the regression results from RSA. Notebally, Chen et al (2017) has shown that their approach could control the FDR at an acceptable level.

      Reference:

      Spisák, T., Spisák, Z., Zunhammer, M., Bingel, U., Smith, S., Nichols, T., & Kincses,T. (2019). Probabilistic TFCE: A generalized combination of cluster size and voxel intensity to increase statistical power. NeuroImage, 185, 12-26.

      Chen, G., Taylor, P. A., Shin, Y.-W., Reynolds, R. C., & Cox, R. W. J. N. (2017). Untangling the relatedness among correlations, Part II: Inter-subject correlation group analysis through linear mixed-effects modeling. 147, 825-840.

      Minor concerns:

      8) I appreciate the authors wanting to present the conditions in a theory-agnostic way, but the framing of 5 conflict types was confusing. I think framing the conditions as a mixture of 2 conflict types (Stroop and Simon) makes more sense, especially given the previous work on MSIT.

      We have renamed the Type1-5 as spatial Stroop, StHSmL, StMSmM, StLSmH, and Simon conditions, respectively. H, L, and M indicate high, low andmedium similarity with the corresponding conflict, respectively. This is alsoconsistent with the naming of our previous work (Yang et al., 2021).

      Reference:

      Yang, G., Xu, H., Li, Z., Nan, W., Wu, H., Li, Q., & Liu, X. (2021). The congruency sequence effect is modulated by the similarity of conflicts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 47(10), 1705-1719.

      9) It would be helpful to have more scaffolding for the key conflict & orientation analyses. A schematic in the main text that outlines these contrasts would be very helpful (e.g. similar to S4).

      We have inserted Figure 7 in the revised manuscript. In this figure, we plotted the schematic of the difference between the conflict similarity 467 and orientation regressors according to their cross-group representational similarity 468 matrices.

      10) Figure 4D could be clearer, both in labeling and figure caption. 'Modeled similarity' could be relabelled to something more informative, like 'conflict type (or mixture) similarity'. Alternatively, it would be helpful to show a summary RDM for region r-8C. For example, breaking it down by just conflict type and congruence.

      We have relabeled the x-axis to “Conflict type similarity” and y-axis to “Neural similarity” for Figure 4D in the revised manuscript.

      We have also added a summary RSM figure in Fig. S5 to show the different similarity patterns between incongruent and congruent conditions.

      11) It may be helpful to connect your work to how people have discussed multiple forms of conflict monitoring and control with respect to target and distractor features e.g., Lindsay & Jacoby, 1994, JEP:HPP; Mante, Sussillo et al., 2013, Nature; Soutschek et al., 2015, JoCN; Jackson et al., 2021, Comm Bio; Ritz & Shenhav, 2022, bioRxiv

      We have added an analysis to examine how cognitive control modulates target and distractor representation. To this end, we selected the left V4, a visual region showing joint representation of target, Stroop distractor and Simon distractor, as the region of interest. We tested whether these representation strengths differed between incongruent and congruent conditions, finding the representation of target was stronger and representations of both distractors were weaker in the incongruent condition. This suggests that cognitive control modulates the stimuli in both directions. We added the results in Note S10 and Fig. S8, and also added discussion of it in “Methodological implications”.

      “Note S10. Cognitive control enhances target representation and suppresses distractor representation Using the separability of confounding factors afforded by the cross-subject RSA, we examined how representations of targets and distractors are modulated by cognitive control. The key assumption is that exerting cognitive control may enhance target representation and suppress distractor representation. We hypothesized that stimuli are represented in visual areas, so we chose a visual ROI from the main RSA results showing joint representation of target, spatial Stroop distractor and Simon distractor (p < .005, 1-tail, uncorrected). Only the left V4 met this criterion. We then tested representations with models similar to the main text for incongruent only trials, congruent only trials, and the incongruent – congruent contrast. The contrast model additionally used interaction between the congruency and target, Stroop distractor and Simon distractor terms. Results showed that in the incongruent condition, when we employ more cognitive control, the target representation was enhanced (t(237990) = 2.59, p = .029, Bonferroni corrected) and both spatial Stroop (t(237990) = –4.18, p < .001, Bonferroni corrected) and Simon (t(237990) = –3.14, p = .005, Bonferroni corrected) distractor representations were suppressed (Fig. S8). These are consistent with the idea that the top-down control modulates the stimuli in both directions (Polk et al., 2008; Ritz & Shenhav, 2022).”

      Discussion:

      “Moreover, the cross-subject RSA provides high sensitivity to the variables of interest and the ability to separate confounding factors. For instance, in addition to dissociating conflict type from orientation, we dissociated target from response, and spatial Stroop distractor from Simon distractor. We further showed cognitive control can both enhance the target representation and suppress the distractor representation (Note S10, Fig. S8), which is in line with previous studies (Polk et al., 2008; Ritz & Shenhav, 2022)."

      12) For future work, I would recommend placing stimuli along the whole circumference, to orthogonalize Stroop and Simon conflict within-subject.

      We thank the reviewer for this highly helpful suggestion. Expanding the 547 conflict conditions to a full conflict space and replicating our current results could 548 provide stronger evidence for the cognitive space view.

      In the revised manuscript, we added this as a possible future design:

      “A possible improvement to our current design would be to include left, right, up, and down arrows presented in a grid formation across four spatially separate quadrants, with each arrow mapped to its own response button. However, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity."

      Reviewer #2:

      Summary, general appraisal

      This study examines the construct of "cognitive spaces" as they relate to neural coding schemes present in response conflict tasks. The authors utilize a novel paradigm, in which subjects must map the direction of a vertically oriented arrow to either a left or right response. Different types of conflict (spatial Stroop, Simon) are parametrically manipulated by varying the spatial location of the arrow (a taskirrelevant feature). The vertical eccentricity of the arrow either agrees or conflicts with the arrow's direction (spatial Stroop), while the horizontal eccentricity of the arrow agrees or conflicts with the side of the response (Simon). A neural coding model is postulated in which the stimuli are embedded in a cognitive space, organized by distances that depend only on the similarity of congruency types (i.e., where conditions with similar relative proportions of spatial-Stroop versus Simon congruency are represented with similar activity patterns). The authors conduct a behavioral and fMRI study to provide evidence for such a representational coding scheme. The behavioral findings replicate the authors' prior work in demonstrating that conflict-related cognitive control adjustments (the congruency sequence effect) shows strong modulation as a function of the similarity between conflict types. With the fMRI neural activity data, the authors report univariate analyses that identified activation in left prefrontal and dorsomedial frontal cortex modulated by the amount of Stroop or Simon conflict present, and multivariate representational similarity analyses (RSA) that identified right lateral prefrontal activity encoding conflict similarity and correlated with the behavioral effects of conflict similarity.

      This study tackles an important question regarding how distinct types of conflict, which have been previously shown to elicit independent forms of cognitive control adjustments, might be encoded in the brain within a computationally efficient representational format. The ideas postulated by the authors are interesting ones and the utilized methods are rigorous.

      We would like to express our sincere appreciation for the reviewer’s positive evaluation of our manuscript and the constructive comments and suggestions. Through careful consideration of your feedback, we have endeavored to make our manuscript more accessible to readers and further strengthened our findings. In response to your suggestion, we reanalyzed our data with the approach proposed by Chen et al.’s (2017, NeuroImage). This reanalysis largely replicated our previous results, reinforcing the validity of our findings. Additionally, we conducted tests with several alternative models and found that the cognitive space hypothesis best aligns with our observed data. We have incorporated these revisions and additional analyses into the manuscript based on your valuable feedback. As a result, we believe that these changes and additional analyses have significantly enhanced the quality of our manuscript. We have provided detailed responses to your comments below.

      However, the study has critical limitations that are due to a lack of clarity regarding theoretical hypotheses, serious confounds in the experimental design, and a highly non-standard (and problematic) approach to RSA. Without addressing these issues it is hard to evaluate the contribution of the authors findings to the computational cognitive neuroscience literature.

      1) The primary theoretical question and its implications are unclear. The paper would greatly benefit from more clearly specifying potential alternative hypotheses and discussing their implications. Consider, for example, the case of parallel conflict monitors. Say that these conflict monitors are separately tuned for Stroop and Simon conflict, and are located within adjacent patches of cortex that are both contained within a single cortical parcel (e.g., as defined by the Glasser atlas used by the authors for analyses). If RSA was conducted on the responses of such a parcel to this task, it seems highly likely that an activation similarity matrix would be observed that is quite similar (if not identical) to the hypothesized one displayed in Figure 1. Yet it would seem like the authors are arguing that the "cognitive space" representation is qualitatively and conceptually distinct from the "parallel monitor" coding scheme. Thus, it seems that the task and analytic approach is not sufficient to disambiguate these different types of coding schemes or neural architectures.

      The authors also discuss a fully domain-general conflict monitor, in which different forms of conflict are encoded within a single dimension. Yet this alternative hypothesis is also not explicitly tested nor discussed in detail. It seems that the experiment was designed to orthogonalize the "domain-general" model from the "cognitive space" model, by attempting to keep the overall conflict uniform across the different stimuli (i.e., in the design, the level of Stroop congruency parametrically trades off with the level of Simon congruency). But in the behavioral results (Fig. S1), the interference effects were found to peak when both Stroop and Simon congruency are present (i.e., Conf 3 and 4), suggesting that the "domain-general" model may not be orthogonal to the "cognitive space" model. One of the key advantages of RSA is that it provides the ability to explicitly formulate, test and compare different coding models to determine which best accounts for the pattern of data. Thus, it would seem critical for the authors to set up the design and analyses so that an explicit model comparison analysis could be conducted, contrasting the domain-general, domain-specific, and cognitive space accounts.

      We appreciate the reviewer pointing out the need to formally test alternative models. In the revised manuscript, we have added and compared a few alternative models, finding the Cognitive-Space model (the one with graded conflict similarity levels as we reported) provided the best fit to our data. Specifically, we tested the following five models against the Cognitive-Space model:

      (1) Domain-General model. This model treats each conflict type as equivalent, so each two conflict types only differ in the magnitude of their conflict. Therefore, we defined the domain-general matrix as the difference in their effects indexed by the group-averaged RT in Experiment 2. Then the z-scored model vector was sign-flipped to reflect similarity instead of distance. This model showed non-significant conflict type effects (t(951989) = 0.92, p = .179) and poorer fit (BIC = 5377126) than the Cognitive-Space model (BIC = 5377094).

      (2) Domain-Specific model. This model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all crossconflict type similarities being 0. This model also showed non-significant effects (t(951989) = 0.84, p = .201) and poorer fit (BIC = 5377127) than the Cognitive-Space model.

      (3) Stroop-Only model. This model assumes that the right 8C only encodes the spatial Stroop conflict. We projected each conflict type to the Stroop (vertical) axis and calculated the similarity between any two conflict types as the Jaccard similarity index (Jaccard, 1901), that is, their intersection divided by their union. This model also showed non-significant effects (t(951989) = 0.20, p = .423) and poorer fit (BIC = 5377122) than the Cognitive-Space model.

      (4) Simon-Only model. This model assumes that the right 8C only encodes the Simon conflict. We projected each conflict type to the Simon (horizontal) axis and calculated the similarity like the Stroop-Only model. This model showed significant effects (t(951989) = 4.19, p < .001) but still quantitatively poorer fit (BIC = 5377096) than the Cognitive-Space model.

      (5) Stroop+Simon model. This model assumes the spatial Stroop and Simon conflicts are parallelly encoded in the brain, similar to the "parallel monitor" hypothesis suggested by the reviewer. It includes both Stroop-Only and Simon-Only regressors. This model showed nonsignificant effect for the Stroop regressor (t(951988) = 0.06, p = .478) and significant effect for the Simon regressor (t(951988) = 3.30, p < .001), but poorer fit (BIC = 5377118) than the Cognitive-Space model.

      “Moreover, we replicated these results with only incongruent trials (i.e., when conflict is present), considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104).”

      In summary, these results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. We added the above results to the revised manuscript.

      The above analysis approach was added to the method “Model comparison and representational dimensionality”, and the results were added to the “Multivariate patterns of the right dlPFC encodes the conflict similarity” in the revised manuscript.

      Methods:

      “Model comparison and representational dimensionality To estimate if the right 8C specifically encodes the cognitive space, rather than the domain-general or domain-specific structures, we conducted two more RSAs. We replaced the cognitive space-based conflict similarity matrix in the RSA we reported above (hereafter referred to as the Cognitive-Space model) with one of the alternative model matrices, with all other regressors equal. The domain-general model treats each conflict type as equivalent, so each two conflict types only differ in the magnitude of their conflict. Therefore, we defined the domain-general matrix as the difference in their congruency effects indexed by the group-averaged RT in Experiment 2. Then the zscored model vector was sign-flipped to reflect similarity instead of distance. The domain-specific model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all cross-conflict type similarities being 0.

      Moreover, to examine if the cognitive space is driven solely by the Stroop or Simon conflicts, we tested a spatial Stroop-Only (hereafter referred to as “Stroop-Only”) and a Simon-Only model, with each conflict type projected onto the spatial Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, 1901), that is, their intersection divided by their union. We also included a model assuming the Stroop and Simon dimensions are independently represented in the brain, adding up the StroopOnly and Simon-Only regressors (hereafter referred to as the Stroop+Simon model). We conducted similar RSAs as reported above, replacing the original conflict similarity regressor with the Strrop-Only, Simon-Only, or both regressors (for the Stroop+Simon model), and then calculated their Bayesian information criterions (BICs).”

      Results:

      “To examine if the right 8C specifically encodes the cognitive space rather than the domain-general or domain-specific organizations, we tested several additional models (see Methods). Model comparison showed a lower BIC in the Cognitive-Space model (BIC = 5377094) than the Domain-General (BIC = 537127) or Domain-Specific (BIC = 537127) models. Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D. We also tested if the observed conflict similarity effect was driven solely by spatial Stroop or Simon conflicts, and found larger BICs for the models only including the Stroop similarity (i.e., the Stroop-Only model, BIC = 5377122) or Simon similarity (i.e., the Simon-Only model, BIC = 5377096). An additional Stroop+Simon model, including both StroopOnly and Simon-Only regressors, also showed a worse model fitting (BIC = 5377118). Moreover, we replicated the results with only incongruent trials, considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. The more detailed model comparison results are listed in Table 2.”

      Reference:

      Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat(37), 547-579.

      2a) Relatedly, the reasoning for the use of the term "cognitive space" is unclear. The mere presence of graded coding for two types of conflict seems to be a low bar for referring to neural activity patterns as encoding a "cognitive space". It is discussed that cognitive spaces/maps allow for flexibility through inference and generalization. But no links were made between these cognitive abilities and the observed representational structure.

      In the revised manuscript, we have clarified that we tested a specific prediction of the cognitive space hypothesis: the geometry of the cognitive space predicts that more similar conflict types will have more similar neural representations,leading to the CSE and RSA patterns tested in this study. These results add to the literature by providing empirical evidence on how different conflict types are encoded in the brain. We agree that this study is not a comprehensive test of the cognitive space hypothesis. Thus, in the revised manuscript we explicitly clarified that this study is a test of the geometry of the cognitive space hypothesis.

      Critically, the cognitive space view holds that the representations of different abstract information are organized continuously and the representational geometry in the cognitive space are determined by the similarity among the represented information (Bellmund et al., 2018).

      "The present study aimed to test the geometry of cognitive space in conflict representation. Specifically, we hypothesize that different types of conflict are represented as points in a cognitive space. Importantly, the distance between the points, which reflects the geometry of the cognitive space, scales with the difference in the sources of the conflicts being represented by the points."

      We have also discussed the limitation of the results and stressed the need for more research to fully test the cognitive space hypothesis.

      “Additionally, our study is not a comprehensive test of the cognitive space hypothesis but aimed primarily to provide original evidence for the geometry of cognitive space in representing conflict information in cognitive control. Future research should examine other aspects of the cognitive space such as its dimensionality, its applicability to other conflict tasks such as Eriksen Flanker task, and its relevance to other cognitive abilities, such as cognitive flexibility and learning.

      2b) Additionally, no explicit tests of generality (e.g., via cross-condition generalization) were provided.

      To examine the generality of cognitive space across conditions, we conducted a leave-one-out prediction analysis. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model as reported in the main text (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001. We have added this analysis and result to the “Conflict type 706 similarity modulated behavioral congruency sequence effect (CSE)” section.

      “Moreover, to test the continuity and generalizability of the similarity modulation, we conducted a leave-one-out prediction analysis. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001."

      2c) Finally, although the design elicits strong CSE effects, it seems somewhat awkward to consider CSE behavioral patterns as a reflection of the kind of abilities supported by a cognitive map (if this is indeed the implication that was intended). In fact, CSE effects are well-modeled by simpler "model-free" associative learning processes, that do not require elaborate representations of abstract structures.

      We argue the conflict similarity modulation of CSEs we observed cannot be explained by the “model-free” stimulus-driven associative learning process. This mainly refers to the feature integration account proposed by Hommel et al. (2004), which explains poorer performance in CI and IC trials (compared with CC and II trials) with the partial repetition cost caused by the breaking of stimulus-response binding. Although we cannot remove its influence on the within-type trials (similarity level 5, θ = 0), it should not affect the cross-type trials (similarity level 1-4, θ = 90°, 67.5°, 45° and 22.5°, respectively), because the CC, CI, IC, II trials had equal probabilities of partially repeated and fully switched trials (see the Author response image 1 for an example of trials across Conf 1 and Conf 3 conditions). Thus, feature integration cannot explain the gradual CSE decrease from similarity level 1 to 4, which sufficiently reproduce the full effect, as suggested by the leave-one-out prediction analysis mentioned above. We thus conclude that the similarity modulation of CSE cannot be explained by the stimulus-driven associative learning.

      Author response image 1.

      Notably, however, our findings are aligned with an associative learning account of cognitive control (Abrahamse et al., 2016), which extends association learning from stimulus/response level to cognitive control. In other words, abstract cognitive control state can be learned and generalized like other sensorimotor features. This view explicitly proposes that “transfer occurs to the extent that two tasks overlap”, a hypothesis directly supported by our CSE results (see also Yang et al., 2021). Extending this, our fMRI results provide the neural basis of how cognitive control can generalize through a representation of cognitive space. The cognitive space view complements associative learning account by providing a fundamental principle for the learning and generalization of control states. Given the widespread application of CSE as indicator of cognitive control generalization (Braem et al., 2014), we believe that it can be recognized as a kind of ability supported by the cognitive space. This was further supported by the brain-behavioral correlation: stronger encoding of cognitive space was associated with greater bias of trial-wise behavioral adjustment by the consecutive conflict similarity.

      We have incorporated these ideas into the discussion:

      “Similarly, we propose that cognitive space could serve as a mental model to assist fast learning and efficient organization of cognitive control settings. Specifically, the cognitive space representation may provide a principle for how our brain evaluates the expected cost of switching and the benefit of generalization between states and selects the path with the best cost-benefit tradeoff (Abrahamse et al., 2016; Shenhav et al., 2013). The proximity between two states in cognitive space could reflect both the expected cognitive demand required to transition and the useful mechanisms to adapt from. The closer the two conditions are in cognitive space, the lower the expected switching cost and the higher the generalizability when transitioning between them. With the organization of a cognitive space, a new conflict can be quickly assigned a location in the cognitive space, which will facilitate the development of cognitive control settings for this conflict by interpolating nearby conflicts and/or projecting the location to axes representing different cognitive control processes, thus leading to a stronger CSE when following a more similar conflict condition.”

      References:

      Hommel, B., Proctor, R. W., & Vu, K. P. (2004). A feature-integration account of sequential effects in the Simon task. Psychological Research, 68(1), 1-17. Abrahamse, E., Braem, S., Notebaert, W., & Verguts, T. (2016). Grounding cognitive control in associative learning. Psychological Bulletin, 142(7), 693-728.

      Yang, G., Xu, H., Li, Z., Nan, W., Wu, H., Li, Q., & Liu, X. (2021). The congruency sequence effect is modulated by the similarity of conflicts. Journal of 770 Experimental Psychology: Learning, Memory, and Cognition, 47(10), 1705-1719.

      Braem, S., Abrahamse, E. L., Duthoo, W., & Notebaert, W. (2014). What determines the specificity of conflict adaptation? A review, critical analysis, and proposed synthesis. Frontiers in Psychology, 5, 1134.

      3) More generally, it seems problematic that Stroop and Simon conflict in the paradigm parametrically trade-off against each other. A more powerful design would have de-confounded Stroop and Simon conflict so that each could be separately estimation via (potentially orthogonal) conflict axes. Additionally, incorporating more varied stimulus sets, locations, or responses might have enabled various tests of generality, as implied by a cognitive space account.

      We thank the reviewer for these valuable suggestions. We argue that the current design is adequate to test the prediction that more similar conflict types have more similar neural representations. That said, we agree that further examination using more powerful experimental designs are needed to fully test the cognitive space account of cognitive control. We also agree that employing more varied stimulus sets,locations and responses would further extend our findings. We have included this as a future research direction in the revised manuscript.

      We have revised our discussion about the limitation as:

      “A few limitations of this study need to be noted. To parametrically manipulate the conflict similarity levels, we adopted the spatial Stroop-Simon paradigm that enables parametrical combinations of spatial Stroop and Simon conflicts. However, since this paradigm is a two-alternative forced choice design, the behavioral CSE is not a pure measure of adjusted control but could be partly confounded by bottom-up factors such as feature integration (Hommel et al., 2004). Future studies may replicate our findings with a multiple-choice design (including more varied stimulus sets, locations and responses) with confound-free trial sequences (Braem et al., 2019). Another limitation is that in our design, the spatial Stroop and Simon effects are highly anticorrelated. This constraint may make the five conflict types represented in a unidimensional space (e.g., a circle) embedded in a 2D space. Future studies may test the 2D cognitive space with fully independent conditions. A possible improvement to our current design would be to include left, right, up, and down arrows presented in a grid formation across four spatially separate quadrants, with each arrow mapped to its own response button. However, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity.”

      4) Serious confounds in the design render the results difficult to interpret. As much prior neuroimaging and behavioral work has established, "conflict" per se is perniciously correlated with many conceptually different variables. Consequently, it is very difficult to distinguish these confounding variables within aggregate measures of neural activity like fMRI. For example, conflict is confounded with increased time-on-task with longer RT, as well as conflict-driven increases in coding of other task variables (e.g., task-set related coding; e.g., Ebitz et al. 2020 bioRxiv). Even when using much higher resolution invasive measures than fMRI (i.e., eCoG), researchers have rightly been wary of making strong conclusions about explicit encoding of conflict (Tang et al, 2019; eLife). As such, the researchers would do well to be quite cautious and conservative in their analytic approach and interpretation of results.

      We acknowledge the findings showing that encoding of conflicts may not be easily detected in the brain. However, recent studies have shown that the representational similarity analysis can effectively detect representations of conflict tasks (e.g., the color Stroop) using factorial designs (Freund et al., 2021a; 2021b).

      In our analysis, we are aware of the potential impact of time-on-task (e.g., RT) on univariate activation levels and subsequent RSA patterns. To address this issue, we added univariate fMRI activation levels as nuisance regressors to the RSA. To de confound conflict from other factors such as orientation of stimuli related to the center of the screen, we also applied the cross-subject RSA approach. Furthermore, we were cautious about determining regions that encoded conflict control. We set three strict criteria: (1) Regions must show a conflict similarity modulation effect; (2) regions must show higher representational strength in the incongruent condition compared with the congruent condition; and (3) regions must correlate with behavioral performance. With these criteria, we believe that the results we reported are already conservative. We would be happy to implement any additional criteria the reviewer recommends.

      Reference:

      Freund, M. C., Etzel, J. A., & Braver, T. S. (2021a). Neural Coding of Cognitive Control: The Representational Similarity Analysis Approach. Trends in Cognitive Sciences, 25(7), 622-638.

      Freund, M. C., Bugg, J. M., & Braver, T. S. (2021b). A Representational Similarity 823 Analysis of Cognitive Control during Color-Word Stroop. Journal of 824 Neuroscience, 41(35), 7388-7402.

      5) This issue is most critical in the interpretation of the fMRI results as reflecting encoding of conflict types. A key limitation of the design, that is acknowledged by the authors is that conflict is fully confounded within-subject by spatial orientation. Indeed, the limited set of stimulus-response mappings also cast doubt on the underlying factors that give rise to the CSE modulations observed by the authors in their behavioral results. The CSE modulations are so strong - going from a complete absence of current x previous trial-type interaction in the cos(90) case all the way to a complete elimination of any current trial conflict when the prior trial was incongruent in the cos(0) case - that they cause suspicion that they are actually driven by conflict-related control adjustments rather than sequential dependencies in the stimulus-response mappings that can be associatively learned.

      Unlike the fMRI data, we cannot tease apart the effects of conflict similarity and orientation in a similar manner as the cross-subject RSA for behavioral CSEs. However, we have a few reasons that the orientation and other bottom-up factors should not be the factors driving the similarity modulation effect.

      First, we did not find any correlation between the regions showing orientation effects and behavioral CSEs. This suggests that orientation does not directly contribute to the CSE modulation.

      Second, if the CSE modulation is purely driven by the association learning of the stimulus-response mapping, we should observe a stronger modulation effect after more extensive training. However, our results do not support this prediction. Using data from Experiment 1, we found that the modulation effect remained constant across the three sessions (see Note S3).

      “Note S3. Modulation of conflict similarity on behavioral CSEs does not change across time We tested if the conflict similarity modulation on the CSE is susceptible to training. We collected the data of Experiment 1 across three sessions, thus it is possible to examine if the conflict similarity modulation effect changes across time. To this end, we added conflict similarity, session and their interaction into a mixed-effect linear model, in which the session was set as a categorical variable. With a post-hoc analysis of variance (ANOVA), we calculated the statistical significance of the interaction term. This approach was applied to both the RT and ER. Results showed no interaction effect in either RT, F(2,1479) = 1.025, p = .359, or ER, F(2,1479) = 0.789, p = .455. This result suggests that the modulation effect does not change across time. “

      Third, the observed similarity modulation on the CSE, particularly for similarity levels 1-4, should not be attributed to the stimulus-response associations, such as feature integration, as have been addressed in response to comment 2.c.

      Finally, other bottom-up factors, such as the spatial location proximity did not drive the CSE modulation results, which we have addressed in the original manuscript in Note S2.

      "Note S2. Modulation of conflict similarity on behavioral CSEs cannot be explained by the physical proximity

      In our design, the conflict similarity might be confounded by the physical proximity between stimulus (i.e., the arrow) of two consecutive trials. That is, when arrows of the two trials appear at the same quadrant, a higher conflict similarity also indicates a higher physical proximity (Fig. 1A). Although the opposite is true if arrows of the two trials appear at different quadrants, it is possible the behavioral effects can be biased by the within quadrant trials. To examine if the physical distance has confounded the conflict similarity modulation effect, we conducted an additional analysis.

      We defined the physical angular difference across two trials as the difference of their polar angles relative to the origin. Therefore, the physical angular difference could vary from 0 to 180°. For each CSE conditions (i.e., CC, CI, IC and II), we grouped the trials based on their physical angular distances, and then averaged trials with the same previous by current conflict type transition but different orders (e.g., StHSmL−StLSmH and StLSmH−StHSmL) within each subject. The data were submitted to a mixed-effect model with the conflict similarity, physical proximity (i.e., the opposite of the physical angular difference) as fixed-effect predictors, and subject and CSE condition as random effects. Results showed significant conflict similarity modulation effects in both Experiment 1 (RT: β = 0.09 ± 0.01, t(7812) = 13.74, p < .001, ηp2 = .025; 875 ER: β = 0.09 ± 0.01, t(7812) = 7.66, p < .001, ηp2 = .018) and Experiment 2 (RT: β = 876 0.21 ± 0.02, t(3956) = 9.88, p < .001, ηp2 = .043; ER: β = 0.20 ± 0.03, t(4201) = 6.11, 877 p < .001, ηp2 = .038). Thus, the observed modulation of conflict similarity on behavioral 878 CSEs cannot be explained by physical proximity."

      6) To their credit, the authors recognize this confound, and attempt to address it analytically through the use of a between-subject RSA approach. Yet the solution is itself problematic, because it doesn't actually deconfound conflict from orientation. In particular, the RSA model assumes that whatever components of neural activity encode orientation produce this encoding within the same voxellevel patterns of activity in each subject. If they are not (which is of course likely), then orthogonalization of these variables will be incomplete. Similar issues underlie the interpretation target/response and distractor coding. Given these issues, perhaps zooming out to a larger spatial scale for the between-subject RSA might be warranted. Perhaps whole-brain at the voxel level with a high degree of smoothing, or even whole-brain at the parcel level (averaging per parcel). For this purpose, Schaefer atlas parcels might be more useful than Glasser, as they more strongly reflect functional divisions (e.g., motor strip is split into mouth/hand divisions; visual cortex is split into central/peripheral visual field divisions). Similarly, given the lateralization of stimuli, if a within-parcel RSA is going to be used, it seems quite sensible to pool voxels across hemispheres (so effectively using 180 parcels instead of 360).

      Doing RSA at the whole-brain level is an interesting idea. However, it does not allow the identification of specific brain regions representing the cognitive space. Additionally, increasing the spatial scale would include more voxels that are not involved in representing the information of interest and may increase the noise level of data. Given these concerns, we did not conduct the whole-brain level RSA.

      We agree that smoothing data can decrease cross-subject variance in voxel distribution and may increase the signal-noise ratio. We reanalyzed the results for the right 8C region using RSA on smoothed beta maps (6-mm FWHM Gaussian kernel). This yielded a significant conflict similarity effect, t(951989) = 5.55, p < .0001, replicating the results on unsmoothed data (t(951989) = 5.60, p < .0001). Therefore, we retained the results from unsmoothed data in the main text, and added the results based on smoothed data to the supplementary material (Note S9).

      “Note S9. The cross-subject pattern similarity is robust against individual differences Due to individual differences, the multivoxel patterns extracted from the same brain mask may not reflect exactly the same brain region for each subject. To reduce the influence of individual difference, we conducted the same cross-subject RSA using data smoothed with a 6-mm FWHM Gaussian kernel. Results showed a significant conflict similarity effect, t(951989) = 5.55, p < .0001, replicating the results on unsmoothed data (t(951989) = 5.60, p < .0001). “

      We also used the bilateral 8C area as a single mask and conducted the same RSA. We found a significant conflict type similarity effect, t(951989) = 4.36, p < .0001. However, the left 8C alone showed no such representation, t(951989) = 0.38, p = .351, consistent with the right lateralized representation of cognitive space we reported in Note S8. Therefore, we used ROIs from each hemisphere separately.

      “Note S8. The lateralization of conflict type representation

      We observed the right 8C but not the left 8C represented the conflict type similarity. A further test is to show if there is a lateralization. We tested several regions of the left dlPFC, including the i6-8, 8Av, 8C, p9-46v, 46, 9-46d, a9-46v (Freund, Bugg, et al., 2021). We found that none of these regions show the representation of conflict type, all uncorrected ps > .35. These results indicate that the conflict type is specifically represented in the right dlPFC. “

      We have also discussed the lateralization in the manuscript:

      “In addition, we found no such representation in the left dlPFC (Note S8), indicating a possible lateralization. Previous studies showed that the left dlPFC was related to the expectancy-related attentional set up-regulation, while the right dlPFC was related to the online adjustment of control (Friehs et al., 2020; Vanderhasselt et al., 2009), which is consistent with our findings. Moreover, the right PFC also represents a composition of single rules (Reverberi et al., 2012), which may explain how the spatial Stroop and Simon types can be jointly encoded in a single space.”

      7) The strength of the results is difficult to interpret due to the non-standard analysis method. The use of a mixed-level modeling approach to summarize the empirical similarity matrix is an interesting idea, but nevertheless is highly non-standard within RSA neuroimaging methods. More importantly, the way in which it was implemented makes it potentially vulnerable to a high degree of inaccuracy or bias. In this case, this bias is likely to be overly optimistic (high false positive rate). No numerical or formal defense was provided for this mixed-level model approach. As a result, the use of this method seems quite problematic, as it renders the strength of the observed results difficult to interpret. Instead, the authors are encouraged using a previously published method of conducting inference with between-subject RSA, such as the bootstrapping methods illustrated in Kragel et al. (2018; Nat Neurosci), or in potentially adopting one of the Chen et al. methods mentioned above, that have been extensively explored in terms of statistical properties.

      No numerical or formal defense was provided for this mixed-level model approach. As a result, the use of this method seems quite problematic, as it renders the strength of the observed results difficult to interpret. Instead, the authors are encouraged using a previously published method of conducting inference with between-subject RSA, such as the bootstrapping methods illustrated in Kragel et al. (2018; Nat Neurosci), or in potentially adopting one of the Chen et al. methods mentioned above, that have been extensively explored in terms of statistical properties.

      In our revised manuscript, we have adopted the approach proposed by Chen et al. (2017). Specifically, we included both the upper and lower triangle of the representational similarity matrix (excluding the diagonal). Moreover, we also removed all the within-subject similarity (thus also excluding the within-run similarity) to minimize the bias of the potentially strong within-subject similarity (note we also analyzed the within-subject data and found significant effects for the similarity modulation, though this effect cannot be attributed to the conflict similarity or orientation alone. We added this part in Note S7, see below). In addition, we added both the row-wise and column-wise random effects to capture the dependence of cells within each column/row (Chen et al., 2017). We have revised the method part as:

      “We excluded within-subject cells from the RSM (thus also excluding the withinrun similarity as suggested by Walther et al., (2016)), and the remaining cells were converted into a vector, which was then z-transformed and submitted to a linear mixed effect model as the dependent variable. The linear mixed effect model also included regressors of conflict similarity and orientation similarity. Importantly, conflict similarity was based on how Simon and spatial Stroop conflicts are combined and hence was calculated by first rotating all subject’s stimulus location to the topright and bottom-left quadrants, whereas orientation was calculated using original stimulus locations. As a result, the regressors representing conflict similarity and orientation similarity were de-correlated. Similarity between two conditions was measured as the cosine value of the angular difference. Other regressors included a target similarity regressor (i.e., whether the arrow directions were identical), a response similarity regressor (i.e., whether the correct responses were identical); a spatial Stroop distractor regressor (i.e., vertical distance between two stimulus locations); a Simon distractor regressor (i.e., horizontal distance between two stimulus locations). Additionally, we also included a regressor denoting the similarity of Group (i.e., whether two conditions are within the same subject group, according to the stimulus-response mapping). We also added two regressors including ROImean fMRI activations for each condition of the pair to remove the possible uni-voxel influence on the RSM. A last term was the intercept. To control the artefact due to dependence of the correlation pairs sharing the same subject, we included crossed random effects (i.e., row-wise and column-wise random effects) for the intercept, conflict similarity, orientation and the group factors (G. Chen et al., 2017).”

      Results from this approach highly replicated our original results. Specifically, we found the right 8C again showed a strong conflict similarity effect, a higher representational strength in the incongruent condition compared to the congruent condition, and a significant correlation with the behavioral CSE. The orientation effect was also identified in the visual (e.g., right V1) and oculomotor (e.g., left FEF) regions.

      We revised the results accordingly:

      For the conflict type effect:

      “The first criterion revealed several cortical regions encoding the conflict similarity, including the Brodmann 8C area (a subregion of dlPFC(Glasser et al., 2016)) and a47r in the right hemisphere, and the superior frontal language (SFL) area, 6r, 7Am, 24dd, and ventromedial visual area 1 (VMV1) areas in the left hemisphere (Bonferroni corrected ps < 0.0001, one-tailed, Fig. 4A). We next tested whether these regions were related to cognitive control by comparing the strength of conflict similarity effect between incongruent and congruent conditions (criterion 2). Results revealed that the left SFL, left VMV1, and right 8C met this criterion, Bonferroni corrected ps < .05, one-tailed, suggesting that the representation of conflict type was strengthened when conflict was present (e.g., Fig. 4D). The intersubject brain-behavioral correlation analysis (criterion 3) showed that the strength of conflict similarity effect on RSM scaled with the modulation of conflict similarity on the CSE (slope in Fig. S2C) in right 8C (r = .52, Bonferroni corrected p = .002, onetailed, Fig. 4C, Table 1) but not in the left SFL and VMV1 (all Bonferroni corrected ps > .05, one-tailed). “

      For the orientation effect:

      “We observed increasing fMRI representational similarity between trials with more similar orientations of stimulus location in the occipital cortex, such as right V1, right V2, right V4, and right lateral occipital 2 (LO2) areas (Bonferroni corrected ps < 0.0001). We also found the same effect in the oculomotor related region, i.e., the left 997 frontal eye field (FEF), and other regions including the right 5m, left 31pv and right parietal area F (PF) (Fig. 5A). Then we tested if any of these brain regions were related to the conflict representation by comparing their encoding strength between incongruent and congruent conditions. Results showed that the right V1, right V2, left FEF, and right PF encoded stronger orientation effect in the incongruent than the congruent condition, Bonferroni corrected ps < .05, one-tailed (Table1, Fig. 5B). We then tested if any of these regions was related to the behavioral performance, and results showed that none of them positively correlated with the behavioral conflict similarity modulation effect, all uncorrected ps > .45, one-tailed. Thus all regions are consistent with the criterion 3.”

      “Note S7. The cross-subject RSA captures similar effects with the within-subject RSA Considering the variability in voxel-level functional localizations among individuals, one may question whether the cross-subject RSA results were biased by the consistent multi-voxel patterns across subjects, distinct from the more commonly utilized withinsubject RSA. We reasoned that the cross-subject RSA should have captured similar effects as the within-subject RSA if we observe the conflict similarity effect in right 8C with the latter analysis. Therefore, we tested whether the representation in right 8C held for within-subject data. Specifically, we performed similar RSA for withinsubject RSMs, excluding the within-run cells. We replaced the perfectly confounded factors of conflict similarity and orientation with a common factor called similarity_orientation. Other confounding factor pairs (i.e., target versus response, and Stroop distractor versus Simon distractor) were addressed similarly. Results showed a significant effect of similarity_orientation, t(13993) = 3.270, p = .0005, 1tailed. Given the specific representation of conflict similarity identified by the crosssubject RSA, the within-subject data of right 8C may show similar conflict similarity modulation effects as the cross-subject data. Further research is needed to fully dissociate the representation of conflict and the representation of visual features such as orientation.”

      8) Another potential source of bias is in treating the subject-level random effect coefficients (as predicted by the mixed-level model) as independent samples from a random variable (in the t-tests). The more standard method for inference would be to use test statistics derived from the mixed-model fixed effects, as those have degrees of freedom calculations that are calibrated based on statistical theory.

      In our revised manuscript, we reported the statistical p values calculated from the mixed-effect models. Note that because we used the Chen et al. (2017) method, which includes data from the symmetric matrix, we corrected the degrees of freedom and estimated the true p values based on the t statistics of model results. For the I versus C comparison results, we calculated the p values by combining I and C RSMs into a larger model and then adding the condition type, as well as the interaction between the regressors of interest (conflict similarity and orientation) and the condition type. We made the statistical inference based on the interaction effect.

      We have revised the corresponding methods as:

      “The statistical significance of these beta estimates was based on the outputs of the mixed-effect model estimated with the “fitlme” function in Matlab 2022a. Since symmetric cells from the RSM matrix were included in the mixed-effect model, we adjusted the t and p values with the true degree of freedom, which is half of the cells included minus the number of fixed regressors. Multiple comparison correction was applied with the Bonferroni approach across all cortical regions at the p < 0.0001 level. To test if the representation strengths are different between congruent and incongruent conditions, we also conducted the RSA using only congruent (RDM_C) and incongruent (RDM_I) trials separately. The contrast analysis was achieved by an additional model with both RDM_C and RDM_I included, adding the congruency and the interaction between conflict type (and orientation) and congruency as both fixed and random factors. The difference between incongruent and congruent representations was indicated by a significant interaction effect.”

      Reviewer #3:

      Yang and colleagues investigated whether information on two task-irrelevant features that induce response conflict is represented in a common cognitive space. To test this, the authors used a task that combines the spatial Stroop conflict and the Simon effect. This task reliably produces a beautiful graded congruency sequence effect (CSE), where the cost of congruency is reduced after incongruent trials. The authors measured fMRI to identify brain regions that represent the graded similarity of conflict types, the congruency of responses, and the visual features that induce conflicts.

      Using several theory-driven exclusion criteria, the authors identified the right dlPFC (right 8C), which shows 1) stronger encoding of graded similarity of conflicts in incongruent trials and 2) a positive correlation between the strength of conflict similarity type and the CSE on behavior. The dlPFC has been shown to be important for cognitive control tasks. As the dlPFC did not show a univariate parametric modulation based on the higher or lower component of one type of conflict (e.g., having more spatial Stroop conflict or less Simon conflict), it implies that dissimilarity of conflicts is represented by a linear increase or decrease of neural responses. Therefore, the similarity of conflict is represented in multivariate neural responses that combine two sources of conflict.

      The strength of the current approach lies in the clear effect of parametric modulation of conflict similarity across different conflict types. The authors employed a clever cross-subject RSA that counterbalanced and isolated the targeted effect of conflict similarity, decorrelating orientation similarity of stimulus positions that would otherwise be correlated with conflict similarity. A pattern of neural response seems to exist that maps different types of conflict, where each type is defined by the parametric gradation of the yoked spatial Stroop conflict and the Simon conflict on a similarity scale. The similarity of patterns increases in incongruent trials and is correlated with CSE modulation of behavior.

      We would like to thank the reviewer for the positive evaluation of our manuscript and for providing constructive comments. By addressing these comments, we believe that we have made our manuscript more accessible for the readers while also strengthening our findings. In particular, we have tested a few alternative models and confirmed that the cognitive space hypothesis best fits the data. We have also demonstrated the geometric properties of the cognitive space by examining the continuity and dimensionality of the space, further supporting our main arguments. We have incorporated revisions and additional analyses to the manuscript based on your feedback. Overall, we believe that these changes and additional analyses have significantly improved the manuscript. Please find our detailed responses below.

      However, several potential caveats need to be considered.

      1) One caveat to consider is that the main claim of recruitment of an organized "cognitive space" for conflict representation is solely supported by the exclusion criteria mentioned earlier. To further support the involvement of organized space in conflict representation, other pieces of evidence need to be considered. One approach could be to test the accuracy of out-of-sample predictions to examine the continuity of the space, as commonly done in studies on representational spaces of sensory information. Another possible approach could involve rigorously testing the geometric properties of space, rather than fitting RSM to all conflict types. For instance, in Fig 6, both the organized and domain-specific cognitive maps would similarly represent the similarity of conflict types expressed in Fig1c (as evident from the preserved order of conflict types). The RSM suggests a low-dimensional embedding of conflict similarity, but the underlying dimension remains unclear.

      Following the reviewer’s first suggestion, we conducted a leave-one-out prediction approach to examine the continuity of the cognitive space. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model as reported in the main text (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level at subject level. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001. We have added this analysis and result to the “Conflict type similarity modulated behavioral congruency sequence effect (CSE)” 1079 section:

      “Moreover, to test the continuity and generalizability of the similarity modulation, we conducted a leave-one-out prediction analysis. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001.”

      To estimate if the domain-specific model could explain the results we observed in right 8C, we conducted a model-comparison analysis. The domain-specific model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all cross-conflict type similarities being 0. This model showed non-significant effects (t(951989) = 0.84, p = .201) and poorer fit (BIC = 5377127) than the cognitive space model (t(951989) = 5.60, p = 1.1×10−8, BIC = 5377094). We also compared other alternative models and found the cognitive space model best fitted the data. We have included these results in the revised manuscript:

      “To examine if the right 8C specifically encodes the cognitive space rather than the domain-general or domain-specific organizations, we tested several additional models (see Methods). Model comparison showed a lower BIC in the Cognitive-Space model (BIC = 5377094) than the Domain-General (BIC = 537127) or Domain-Specific (BIC = 537127) models. Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D. We also tested if the observed conflict similarity effect was driven solely by spatial Stroop or Simon conflicts, and found larger BICs for the models only including the Stroop similarity (i.e., the Stroop-Only model, BIC = 5377122) or Simon similarity (i.e., the Simon-Only model, BIC = 5377096). An additional Stroop+Simon model, including both StroopOnly and Simon-Only regressors, also showed a worse model fitting (BIC = 5377118). Moreover, we replicated the results with only incongruent trials, considering that the pattern of conflict representations is more manifested when the conflict is present (i.e., on incongruent trials) than not (i.e., on congruent trials). We found a poorer fitting in Domain-general (BIC = 1344129), Domain-Specific (BIC = 1344129), Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. The more detailed model comparison results are listed in Table 2.”

      We also estimated the dimensionality of the right 8C with the averaged RSM and found the dimensionality of the cognitive space was ~ 1.19, very close to a 1D space. This result is consistent with our experimental design, as the only manipulated variable is the angular distance between conflict types. We have added these results and the methods to the revised manuscript.

      Results:

      “Further analysis showed the dimensionality of the representation in the right 8C was 1.19, suggesting the cognitive space was close to 1D.”

      Methods:

      “To better capture the dimensionality of the representational space, we estimated its dimensionality using the participation ratio (Ito & Murray, 2023). Since we excluded the within-subject cells from the whole RSM, the whole RSM is an incomplete matrix and could not be used. To resolve this issue, we averaged the cells corresponding to each pair of conflict types to obtain an averaged 5×5 RSM matrix, similar to the matrix shown in Fig. 1C. We then estimated the participation ratio using the formula:

      where λi is the eigenvalue of the RSM and m is the number of eigenvalues.

      2) Another important factor to consider is how learning within the confined task space, which always negatively correlates the two types of conflicts within each subject, may have influenced the current results. Is statistical dependence of conflict information necessary to use the organized cognitive space to represent conflicts from multiple sources? Answering this question would require a paradigm that can adjust multiple sources of conflicts parametrically and independently. Investigating such dependencies is crucial in order to better understand the adaptive utility of the observed cognitive space of conflict similarity.

      As the central goal of our design was to test the geometry of neural representations of conflict, we manipulated the conflict similarity. The anticorrelated Simon and spatial Stroop conflict aimed to make the overall magnitude of conflict similar among different conflict types. We agree that with the current design the likely cognitive space is not a full 2D space with Simon and spatial Stroop being two dimensions. Instead, the likely cognitive space is a subspace (e.g., a circle) embedded in the 2D space, due to the constraint of anticorrelated Simon and spatial Stroop conflict across conflict types. Nevertheless, the subspace can also be used to test the geometry that similar conflict types share similar neural representations.

      To test the full 2D cognitive space, a possible revision of our current design is to have multiple hybrid conditions (like Type 2-4) that cover the whole space. For instance, imagine arrow locations in the first quadrant space. We could have a 3×3 design with 9 conflict conditions, where their horizontal/vertical coordinates could be one of the combinations of 0, 0.5 and 1. This way, the spatial Stroop and Simon conditions would be independent of each other. Notably, however, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity.<br /> We have added the above limitations and future designs to the revised 1156 manuscript.

      “Another limitation is that in our design, the spatial Stroop and Simon effects are highly anticorrelated. This constraint may make the five conflict types represented in a unidimensional space (e.g., a circle) embedded in a 2D space. Future studies may test the 2D cognitive space with fully independent conditions. A possible improvement to our current design would be to include left, right, up, and down arrows presented in a grid formation across four spatially separate quadrants, with each arrow mapped to its own response button. However, one potential confounding factor would be that these conditions have different levels of difficulty (i.e., different magnitude of conflict), which may affect the CSE results and their representational similarity.”

      Major comments:

      3) The RSM result (and the absence of univariate effect) seem to be a good first step to claim the use of cognitive space of conflict. Yet, the presence of an organized (unidimensional; Fig. 6) and continuous cognitive space should be further tested and backed up.

      We thank the reviewer for recognizing the methods and results of our current work. Indeed, the utilization of a parametric design and RSA to examine organization of neural representations is a widely embraced methodology in the field of cognitive neuroscience (e.g., Freund et al., 2021; Ritz et al., 2022). Our current study aimed primarily to provide original evidence for whether similar conflicts are represented similarly in the brain, which reflects the geometry of conflict representations (i.e., the structure of differences between conflict representations). We have used multiple criteria to back up the findings by showing the representation is sensitive to the presence of conflict and has behavioral relevance.

      We agree that the cognitive space account of cognitive control requires further validation. Therefore, in the revised manuscript, we have added several additional tests to strengthen the evidence supporting the organized cognitive space representation. Firstly, we tested five alternative models (Domain-General, Domain Specific, Stroop-Only, Simon-Only and Stroop+Simon models), and found that the Cognitive-Space model best fitted our data. Secondly, we explicitly calculated the dimensionality of the representation and observed a low dimensionality (1.19D). We have added these results to the “Multivariate patterns of the right dlPFC encodes the conflict similarity” section in the revised manuscript (see also the response to Comment 1).

      Furthermore, we utilized data from Experiment 1 to demonstrate the continuity of the cognitive space by showing its ability to predict out-of-sample data. We have included this result to the “Conflict type similarity modulated behavioral congruency sequence effect (CSE)” section in the revised manuscript:

      “Moreover, to test the continuity and generalizability of the similarity modulation, we conducted a leave-one-out prediction analysis. We used the behavioral data from Experiment 1 for this test, due to its larger amount of data than Experiment 2. Specifically, we removed data from one of the five similarity levels (as illustrated by the θs in Fig. 1C) and used the remaining data to perform the same mixed-effect model (i.e., the two-stage analysis). This yielded one pair of beta coefficients including the similarity regressor and the intercept for each subject, with which we predicted the CSE for the removed similarity level for each subject. We repeated this process for each similarity level once. The predicted results were highly correlated with the original data, with r = .87 for the RT and r = .84 for the ER, ps < .001.”

      References:

      Freund, M. C., Bugg, J. M., & Braver, T. S. (2021). A Representational Similarity Analysis of Cognitive Control during Color-Word Stroop. Journal of Neuroscience, 41(35), 7388-7402.

      Ritz, H., & Shenhav, A. (2022). Humans reconfigure target and distractor processing to address distinct task demands. bioRxiv. doi:10.1101/2021.09.08.459546

      4) Is the conflict similarity effect not driven by either coding of the weak to strong gradient of the spatial Stroop conflict or the Simon conflict? For example, would simply identifying brain regions that selectively tuned to the Simon conflict continuously enough to create a graded similarity in Fig. C.

      We recognize that our current design and analyzing approach cannot fully exclude the possibility that the current results are driven solely by either Stroop or Simon conflicts, since their gradients are correlated to the conflict similarity gradient we defined. To estimate their unique contributions, we performed a model-comparison analysis. We constructed a Stroop-Only model and a Simon-Only model, with each conflict type projected onto the Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, P., 1901), that is, their intersection divided by their union. By replacing the cognitive space-based conflict similarity regressor with the Stroop-Only and Simon-Only regressors, we calculated their BICs. Results showed that the BIC was larger for Stroop-Only (5377122) and Simon-Only (5377096) than for the cognitive space model (5377094). An additional Stroop+Simon model, including both Stroop-Only and Simon-Only regressors, also 1220 showed a poorer model fitting (BIC = 5377118) than the cognitive space model.

      Moreover, we replicated the results with only incongruent trials. We found a poorer fitting in Stroop-Only (BIC = 1344128), Simon-Only (BIC = 1344120), and Stroop+Simon (BIC = 1344157) models than the Cognitive-Space model (BIC = 1344104). These results indicate that the right 8C encodes an integrated cognitive space for resolving Stroop and Simon conflicts. Therefore, we believe the cognitive space has incorporated both dimensions. We added these additional analyses and results to the revised manuscript (see also the response to the above Comment 1).

      5) Is encoding of conflict similarity in the unidimensional organized space driven by specific requirements of the task or is this a general control strategy? Specifically, is the recruitment of organized space something specific to the task that people are trained to work with stimuli that negatively correlate the spatial Stroop conflict and the Simon conflict?

      We argue that this encoding is a general control strategy. In our task design, we asked the participants to respond to the target arrow and ignore the location that appeared randomly for them. So, they were not trained to deal with the stimuli in any certain way. We also found the conflict similarity modulation on CSE did not change with more training (We added this result in Note S3), indicating that the cognitive space did not depend on strategies that could be learned through training.

      “Note S3. Modulation of conflict similarity on behavioral CSEs does not change across time We tested if the conflict similarity modulation on the CSE is susceptible to training. We collected the data of Experiment 1 across three sessions, thus it is possible to examine if the conflict similarity modulation effect changes across time. To this end, we added conflict similarity, session and their interaction into a mixed-effect linear model, in which the session was set as a categorical variable. With a post-hoc analysis of variance (ANOVA), we calculated the statistical significance of the interaction term.

      This approach was applied to both the RT and ER. Results showed no interaction effect in either RT, F(2,1479) = 1.025, p = .359, or ER, F(2,1479) = 0.789, p = .455. This result suggests that the modulation effect does not change across time."

      Instead, the cognitive space should be determined by the intrinsic similarity structure of the task design. A previous study (Freitas et al., 2015) has found that the CSE across different versions of spatial Stroop and flanker tasks was stronger than that across either of the two conflicts and Simon. In their designs, the stimulus similarity was controlled at the same level, so the difference in CSE was only attributable to the similar dimensional overlap between Stroop and flanker tasks, in contrast to the Simon task. Furthermore, recent studies showed that the cognitive space generally exists to represent structured latent states (e.g., Vaidya et al., 2022), mental strategy cost (Grahek et al., 2022), and social hierarchies (Park et al., 2020). Therefore, we argue that cognitive space is likely a universal strategy that can be applied to different scenarios.

      We added this argument in the discussion:

      “Although the spatial orientation information in our design could be helpful to the construction of cognitive space, the cognitive space itself was independent of the stimulus-level representation of the task. We found the conflict similarity modulation on CSE did not change with more training (see Note S3), indicating that the cognitive space did not depend on strategies that could be learned through training. Instead, the cognitive space should be determined by the intrinsic similarity structure of the task design. For example, a previous study (Freitas et al, 2015) has found that the CSE across different versions of spatial Stroop and flanker tasks was stronger than that across either of the two conflicts and Simon. In their designs, the stimulus similarity was controlled at the same level, so the difference in CSE was only attributable to the similar dimensional overlap between Stroop and flanker tasks, in contrast to the Simon task. Furthermore, recent studies showed that the cognitive space generally exists to represent structured latent states (e.g., Vaidya et al., 2022), mental strategy cost (Grahek et al., 2022), and social hierarchies (Park et al., 2020). Therefore, cognitive space is likely a universal strategy that can be applied to different scenarios."

      Reference:

      Freitas, A. L., & Clark, S. L. (2015). Generality and specificity in cognitive control: conflict adaptation within and across selective-attention tasks but not across selective-attention and Simon tasks. Psychological Research, 79(1), 143-162.

      Vaidya, A. R., Jones, H. M., Castillo, J., & Badre, D. (2021). Neural representation of 1280 abstract task structure during generalization. Elife, 10, 1-26.

      Grahek, I., Leng, X., Fahey, M. P., Yee, D., & Shenhav, A. Empirical and 1282 Computational Evidence for Reconfiguration Costs During Within-Task 1283 Adjustments in Cognitive Control. CogSci.

      Park, S. A., Miller, D. S., Nili, H., Ranganath, C., & Boorman, E. D. (2020). Map 1285 Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. 1286 Neuron, 107(6), 1226-1238 e1228. doi:10.1016/j.neuron.2020.06.030

      6) The observed pattern seems to suggest that there is conflict similarity space that is defined by the combination of the conflict similarity (i.e., the strength of conflicts) and the sources of conflict (i.e., the Simon vs the spatial Stroop). What are the rational reasons to separate conflicts of different sources (beyond detecting incongruence)? And how are they used for better conflict resolutions?

      The necessity of separating conflicts of different sources lies in that the spatial Stroop and the Simon effects are resolved with different mechanisms. The behavioral congruency effects of a combined conflict from two different sources were shown to be the summation of the two conflict sources (Liu et al., 2010), suggesting that the conflicts are resolved independently. Moreover, previous studies have shown that different sources of conflict are resolved with different brain regions (Egner, 2008; Li et al., 2017), and at different processing stages (Wang et al., 2013). Therefore, when multiple sources of conflict occur simultaneously or sequentially, it should be more efficient to resolve the conflict by identifying the sources.

      We have added this argument to the revised manuscript:

      “The rationale behind defining conflict similarity based on combinations of different conflict sources, such as spatial-Stroop and Simon, stems from the evidence that these sources undergo independent processing (Egner, 2008; Li et al., 2014; Liu et al., 2010; Wang et al., 2014). Identifying these distinct sources is critical in efficiently resolving potentially infinite conflicts."

      Reference:

      Egner, T. (2008). Multiple conflict-driven control mechanisms in the human brain. Trends in Cognitive Sciences, 12(10), 374-380.

      Li, Q., Yang, G., Li, Z., Qi, Y., Cole, M. W., & Liu, X. (2017). Conflict detection and 1307 resolution rely on a combination of common and distinct cognitive control networks. Neuroscience and Biobehavioral Reviews, 83, 123-131.

      Wang, K., Li, Q., Zheng, Y., Wang, H., & Liu, X. (2014). Temporal and spectral 1310 profiles of stimulus-stimulus and stimulus-response conflict processing. NeuroImage, 89, 280-288.

      Liu, X., Park, Y., Gu, X., & Fan, J. (2010). Dimensional overlap accounts for independence and integration of stimulus-response compatibility effects. Attention, Perception, & Psychophysics, 72(6), 1710-1720.

      7) The congruency effect is larger in conflict type 2, 3, 4 consistently compared to conflict 1 and 5. Are these expected under the hypothesis of unified cognitive space of conflict similarity? Is the pattern of similarity modeled in RSA?

      Yes, this is expected. The spatial Stroop and Simon effects have been shown to be additive and independent (Li et al., 2014). Therefore, the congruency effects of conflict type 2, 3 and 4 would be the weighted sum of the spatial Stroop and Simon effects. The weights can be defined by the sine and cosine of the polar angle.

      For instance, in Type 2, wy = sin(67.5°) and wx = cos(67.5°). The sum of the two 1321 weight values (i.e., 1.31) is larger than 1, leading to a larger congruency effect than 1322 the pure spatial Stroop (Conf 1) and Simon (Conf 5) conditions.

      Note that this hypothesis underlies the Stroop+Simon model, which assumes the Stroop and Simon dimensions are independently represented in the brain and drive the behavior in an additive fashion. Moreover, the observed difference of behavioral congruency effects may have reflected the variance in the Domain-General model, which treats all conflict types as equivalent, with the only difference between each two conflict types in the magnitude of their conflict. Therefore, we did not model the behavioral congruency effects as a covariance regressor in the major RSA. Instead, we conducted a model comparison analysis by comparing these models and the Cognitive-Space model. Results showed worse model fitting of both the Domain-general and Stroop+Simon models. Specially, the regressor of congruency effect difference in the Domain-General model was not significant (p = .575), which also suggests that the higher congruency effect in conflict type 2, 3 and 4 should not influence the Cognitive-Space model results. We have added these methods and results to the revised manuscript (see also our response to Comment 1):

      Methods:

      “Model comparison and representational dimensionality

      To estimate if the right 8C specifically encodes the cognitive space, rather than the domain-general or domain-specific structures, we conducted two more RSAs. We replaced the cognitive space-based conflict similarity matrix in the RSA we reported above (hereafter referred to as the Cognitive-Space model) with one of the alternative model matrices, with all other regressors equal. The domain-general model treats each conflict type as equivalent, so each two conflict types only differ in the magnitude of their conflict. Therefore, we defined the domain-general matrix as the difference in their congruency effects indexed by the group-averaged RT in Experiment 2. Then the z scored model vector was sign-flipped to reflect similarity instead of distance. The domain-specific model treats each conflict type differently, so we used a diagonal matrix, with within-conflict type similarities being 1 and all cross-conflict type similarities being 0.

      Moreover, to examine if the cognitive space is driven solely by the Stroop or Simon conflicts, we tested a spatial Stroop-Only (hereafter referred to as “Stroop-Only”) and a Simon-Only model, with each conflict type projected onto the spatial Stroop (vertical) axis or Simon (horizontal) axis, respectively. The similarity between any two conflict types was defined using the Jaccard similarity index (Jaccard, 1901), that is, their intersection divided by their union. We also included a model assuming the Stroop and Simon dimensions are independently represented in the brain, adding up the Stroop Only and Simon-Only regressors. We conducted similar RSAs as reported above, replacing the original conflict similarity regressor with the Strrop-Only, Simon-Only, or both regressors, and then calculated their Bayesian information criterions (BICs)."

      Reference:

      Li, Q., Nan, W., Wang, K., & Liu, X. (2014). Independent processing of stimulus stimulus and stimulus-response conflicts. PloS One, 9(2), e89249.

      8) Please clarify the observed patterns of CSE effects in relation to the hypothesis of common cognitive space of conflict. In particular, right 8C shows that the patterns become dissimilar in incongruent trials compared to congruent trials. How does this direction of the effect fit to the common unidimensional cognitive space account? And how does such a representation contribute to the CES effects?

      The behavioral CSE patterns provide initial evidence for the cognitive space hypothesis. Previous studies have debated whether cognitive control relies on domain-general or domain-specific representations, with much evidence gathered from behavioral CSE patterns. A significant CSE across two conflict conditions typically suggests domain-general representations of cognitive control, while an absence of CSE suggests domain-specific representations. The cognitive space view proposes that conflict representations are neither purely domain-general nor purely domain-specific, but rather exist on a continuum. This view predicts that the CSE across two conflict conditions should depend on the representational distance between them within this cognitive space. Our finding that CSE values systematically vary with conflict similarity level support this hypothesis. We have added this point in the discussion of the revised manuscript:

      “Previous research on this topic often adopts a binary manipulation of conflict(Braem et al., 2014) (i.e., each domain only has one conflict type) and gathered evidence for the domain-general/specific view with presence/absence of CSE, respectively. Here, we parametrically manipulated the similarity of conflict types and found the CSE systematically vary with conflict similarity level, demonstrating that cognitive control is neither purely domain-general nor purely domain-specific, but can be reconciled as a cognitive space(Bellmund et al., 2018) (Fig. 6, middle).

      Fig. 4D was plotted to show the steeper slope of the conflict similarity effect for incongruent versus congruent conditions. Note the y-aixs displays z-scored Pearson correlation values, so the grand mean of each condition was 0. The values for the first two similarity levels (level 1 and 2) were lower for incongruent than congruent conditions, seemingly indicating lower average similarity. However, this was not the case. The five similarity levels contained different numbers of data points (see Fig. 1C), so levels 4 and 5 should be weighted more heavily than levels 1 and 2. When comparing the grand mean of raw Pearson correlation values, the incongruent condition (0.0053) showed a tendency toward higher similarity than the congruent condition (0.0040), t(475998) = 1.41, p = .079. We have also plotted another version of Fig. 4D in Fig. S5, in which the raw Pearson correlation values were used.

      The greater representation of conflict type in incongruent condition compared to congruent condition (as evidenced by a steeper slope) suggests that the conflict representation was driven by the incongruent condition. This is probably due to the stronger involvement of cognitive control in incongruent condition (than congruent condition), which in turn leads to more distinct patterns across different conflict types. This is consistent with the fact that the congruent condition is typically a baseline, where any conflict related effects should be weaker.

      The representation of cognitive space may contribute to the CSE as a mental model. This model allows our brain to evaluate the cost and benefit associated with transitioning between different conflict conditions. When two consecutive trials are characterized by more similar conflict types, their representations in the cognitive space will be closer, resulting in a less costly transition. As a consequence, stronger CSEs are observed. We revised the corresponding discussion part as:

      “Similarly, we propose that cognitive space could serve as a mental model to assist fast learning and efficient organization of cognitive control settings. Specifically, the cognitive space representation may provide a principle for how our brain evaluates the expected cost of switching and the benefit of generalization between states and selects the path with the best cost-benefit tradeoff (Abrahamse et al., 2016; Shenhav et al., 2013). The proximity between two states in cognitive space could reflect both the expected cognitive demand required to transition and the useful mechanisms to adapt from. The closer the two conditions are in cognitive space, the lower the expected switching cost and the higher the generalizability when transitioning between them. With the organization of a cognitive space, a new conflict can be quickly assigned a location in the cognitive space, which will facilitate the development of cognitive control settings for this conflict by interpolating nearby conflicts and/or projecting the location to axes representing different cognitive control processes, thus leading to a stronger CSE when following a more similar conflict condition.”

      Minor comments:

      9) Some of the labels of figure axes are unclear (e.g., Fig4C) about what they represent.

      In Fig. 4C, the x-axis label is “neural representational strength”, which refers to the beta coefficient of the conflict type effect computed from the main RSA, denoting the strength of the conflict type representation in neural patterns. The y-axis label is “behavioral representational strength”, which refers to the beta coefficient obtained from the behavioral linear model using conflict similarity to predict the CSE in Experiment 2; it reflects how strong the conflict similarity modulates the behavioral 1440 CSE. We apologize for any confusion from the brief axis labels. We have added expanded descriptions to the figure caption of Fig. 4C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      One concern is regarding the experimental task design. Currently, only subjective reports of interoceptive intensity are taken into account, the addition of objective behavioural measures would have given additional value to the study and its impact. 

      To address this comment, we calculated interoceptive accuracy during the cardiorespiratory perturbation (isoproterenol) task according to our previous methods (e.g., Khalsa et al 2009 Int J Psychophys, Khalsa et al, 2015 IJED, Khalsa et al 2020 Psychophys, Hassanpour et al, 2018 NPP, Teed et al 2022 JAMA Psych). Thus, we quantified interoceptive accuracy as the cross-correlation between heart rate and real-time cardiorespiratory perception; specifically, the zero-lag cross-correlation between the heart rate and dial rating time series, and the maximum cross-correlation between these time series while allowing for different temporal delays (or lags). As expected, we found a dose-related increase in interoceptive accuracy from the 0.5mcg moderate perturbation dose (for which neuroimaging maps were not included in the current study) to the 2.0mcg high perturbation dose: zero-lag cross-correlations of 0.25 and 0.61, maximum cross-correlations of 0.41 and 0.73, for 0.5mcg and 2.0mcg doses, respectively, when averaged across all participants in the current study. Taking a closer examination at just the 2.0mcg dose, there were no group differences in zero-lag cross-correlation (t89\=-0.68, p=0.50) or maximum cross-correlation (t87\=-1.0, p=0.32) (depicted below, panel A). Furthermore, there were no associations between either of these interoceptive accuracy measures and the magnitude of activation within bilateral dysgranular convergent regions (F1\= 0.27 and 0.01, p=0.61 and 0.91, for the main effect of percent signal change on max and zero-lag cross-correlations, respectively; depicted below, panel B). When considering the significant correlation between the right insula signal intensity and subjective dial ratings, this lack of association with interoceptive accuracy suggests that the right dysgranular convergent insula was preferentially tracking the magnitude estimation rather than accuracy facet of interoceptive awareness during cardiorespiratory perturbation. Notably, during the saline placebo infusion, there were no systematic changes in heart rate and thus no systematic change in dial rating, precluding the calculation of the cross-correlation as a measure of interoceptive accuracy.

      In reviewing these findings, we did not feel that the results add meaningful information to our interpretation of convergence, and accordingly we have chosen not to include it in the manuscript.

      Author response image 1.

      (A) Interoceptive accuracy during 2.0mcg isoproterenol perturbation, as measured by the maximum (left panel) and zero-lag (right panel) cross-correlation between the time series of heart rate and perceptual dial rating. There were no differences between groups. (B) There were no associations between interoceptive accuracy ratings and signal intensity within the convergence dysgranular insula during the Peak period of 2.0mcg perturbation. 

      This brings me to my second concern. The authors mostly refer to their own previous work, without highlighting other methods used in the field. Some tasks measure interoceptive accuracy or other behavioural outcomes, instead of merely subjective intensity. Expanding the scientific context would aid the understanding and integration of this study with the rest of the field. 

      Given our focus on the neural basis of bottom-up perturbations of interoception, we found it relevant to reference previous studies from our lab, as we built directly upon these previous findings to inform the hypotheses and design of the current experiment, but we can appreciate to provide a broader view of the literature. To expand the contextual frame, we have cited two fMRI meta-analyses of cardiac and gastrointestinal interoception (line 101). There are few studies that have used comparable perturbation approaches during neuroimaging in clinical populations, although we have referenced an exemplar study from the respiratory domain by Harrison et al (2021) in the discussion (line 612). In considering this comment more carefully, we felt that expanding the context further to other task-based methods or behavioral outcomes would shift the focus beyond our emphasis on the insular cortex and top-down/bottom-up convergence, though we have previously discussed and integrated such approaches (e.g., Khalsa & Lapidus, 2016 Front Psych, Khalsa et al, 2018 Biol Psychiatry CNNI, Khalsa et al 2022, Curr Psych Rep).

      Lastly, the suggestions for future research lack substance compared to the richness of the discussion. I recommend a slight revision of the introduction/discussion. There is text in the discussion (explanatory or illuminating) which is better suited to the introduction. 

      When discussing our study limitations (beginning line 732), we offer numerous areas for future research including different preprocessing pipelines, more sophisticated analysis techniques (such as multivariate pattern analysis) that would allow for individual-level inferences regarding convergent patterns of activation within the insula. However, we have revised the last sentence of our limitations paragraph (line 757), and have added more specificity regarding future approaches examining insular and whole-brain interoceptive signal flow.

      Reviewer 2:

      (1) The interpretation of the resting-state data is not quite as clear-cut as the task-based data - as presented currently, changes could potentially represent fluctuations over time rather than following interoception specifically. In contrast, much stronger conclusions can be drawn from the authors' task-based data. …I was also unsure about the interpretation of the resting state analysis (Figure 5), as there was no control condition without interoceptive tasks, meaning any change could represent a change over time that differed between groups and not necessarily a change from pre- to post-interoception. Relatedly I wondered if the authors had calculated the test-retest reliability of the resting state data (e.g. intraclass correlation coefficients for the whole-brain functional connective of convergent dysgranular insula subregions and left middle frontal gyrus before vs. after the tasks), as it would be generally useful for the field to know its stability. 

      We have acknowledged the lack of a control condition in the isoproterenol task (note that the VIA task contained an exteroceptive trial that was included in the brain image contrast analysis). We have also provided further justification for our approach in both the Methods (see the first paragraph “fMRI resting state analysis” subsection) and Results (see the last paragraph of the “Convergence analysis” subsection). We cannot estimate test-retest reliability from the current dataset, given that we do not have resting state scans separated by a similar time frame without the performance of the interoceptive tasks in between (this is now clarified in line 346).

      (2) The transdiagnostic sample could be better characterised in terms of diagnostic information, and was almost entirely female; it is also unclear what the effect of psychotropic medications may have been on the results given the effects of (e.g.) serotonergic medication on the BOLD signal. …Table 1 would be substantially improved by a fuller clinical characterisation of the specific sample included in the analysis - the diagnostic acronyms included in the table caption are not used in the table itself at present and would be an excellent addition, describing, for example, the demographics and symptom scores of patients meeting criteria for MDD, GAD, and AN (and perhaps those meeting criteria for more than 1). Similarly, additional information about the specific medications patients (or controls?) were taking in this study would be welcome (given the potential influences of common medications (e.g. antidepressants) on neurovascular coupling). 

      We have expanded Table 1 to include more specific diagnostic information for the transdiagnostic ADE group (GAD, MDD, and/or AN, as well as other psychiatric diagnoses). We have also included medication use.  

      Finally, Figures 7c and 7d would be greatly improved by showing individual data points if possible, and there may be a typo in the caption 'The cardiac group reported higher cardiac intensity ratings in the ADE group'.

      We have adjusted Figure 7c and 7d to include individual data points, as we agree that this provides greater transparency to the data itself. We have also fixed the typo in the figure caption.

      (3) As the authors point out, there may have been task-specific preprocessing/analysis differences that influenced results, for example, due to physiological correction in one but not both tasks. Although I note this is mentioned in the limitations, it was not clear to me why physiological noise was removed from the ISO task and whether it would be possible to do the same in the VIA task, which could be important for the most robust comparison of the two. 

      In this study, we intentionally chose different task-specific preprocessing pipelines so we could ensure that our results were not simply due to new ways of handling the data. This would allow us to evaluate evidence of replicating the previous group-level findings of insular activation that informed the current approach and hypotheses. We agree that a harmonized approach is also merited, and in a subsequent project using this dataset, we have matched preprocessing pipelines for a connectivity-based analysis, to best facilitate comparison across tasks. We look forward to sharing those results with the scientific community in due time.

      Reviewer 3:

      Maybe I missed it (and my apologies in case I did), but there were a few instances where it was not entirely clear whether differential effects (say between groups or conditions) were compared directly, as would be required. One example is l. 459 ff: The authors report the interesting lateralisation effect for the two interception tasks and say it was absent in the exteroceptive VIA task. As a reader, it would be great to know whether that finding (effect in one condition but not in the other) is meaningful, i.e. whether the direct comparison becomes statistically significant. … The same applies to later comparisons, for example, the correlations reported in l. 465 ff (do these differ from one another?) as well as the FC patterns reported in l. 476 ff - again, there is a specific increase in the ADE group (but not in the HC), but is this between-group difference statistically meaningful? 

      Thank you for these questions. We have added greater detail in the Results section in order to increase clarity regarding which statistical comparisons support which conclusions. Generally, we limited our comparisons to the effect of group, as comparing ADE vs. HC individuals was of primary interest, and in some cases also the effect of hemisphere and epoch. However, we did not perform exhaustive comparisons for all measures, in the interest of keeping the focus of our multi-level multi-task analysis on the hypothesis-driven questions specifically related to convergence of top-down and bottom-up processing.

      Regarding the comment asking if we could compare the lateralization effect directly across task conditions (i.e., is there a greater difference between hemispheres in the ISO task compared to VIA?): unfortunately, directly comparing signal intensity across tasks is not possible because the isoproterenol infusion induces physiological changes that can cause some dose-related signal reduction (we have attempted to address this in the past, e.g., Hassanpour et al, 2018 HumBrMapp). Consequently, our conclusions about spatial localization of top-down and bottom-up convergence are limited to group-level comparisons based on binary activation.

      (2) A second 'major' relates to the intensity ratings (l. 530 ff). I found it very interesting that the ADE group reported higher cardiac, but lower exteroceptive intensity ratings during the VIA task. I understand the authors' approach to collapse within the ADE group, but it would be great to know which subgroup of patients drives this differential effect. It could be the case that the cardiac effect is predominantly present in the anxiety group, while the lower exteroceptive ratings are driven by the depression patients. Even if that were not the case, it would be highly instructive to understand the rating pattern within the anxiety group in greater detail. Do these patients 'just' selectively upregulate interoception, or is there even a perceived downregulation of exteroceptive signalling? 

      We have depicted these data below for reviewers’ reference, showing individual responses for each group (HC and ADE; panel A), as well as the ADE individuals separated by primary diagnosis (GAD = generalized anxiety disorder, n=24; AN = anorexia nervosa, n=16; MDD = major depressive disorder, n=6; panel B). When tested via linear regression, we found no differences in ratings across ADE subgroups (rating ~ subgroup * condition, F3\=1.71, p=0.16 for main effect of subgroup). However, several factors should be considered in interpreting this result: first, all subgroups are small, particularly the MDD sample. Second, while these diagnostic labels refer to the most prominent symptom expression of each patient, every clinical participant in the study had a co-morbid disorder. Therefore, it is not possible to isolate disorder-specific pathology from our multi-diagnostic sample, and for this reason we refrained from including the subgroup-specific data in the manuscript.

      Author response image 2.

      (A) Post-trial ratings during the Visceral Interoceptive attention task, for reference. This is also shown in Figure 7D. (B) The same post-trial ratings in (A), but with the ADE group separated by primary diagnoses. Importantly, although assigned to one diagnostic category on the basis of most prominent symptom expression, most patients had one or more comorbidities across disorders. GAD = Generalized Anxiety Disorder. MDD = major depressive disorder. AN = anorexia nervosa. HC = healthy comparison.

      l. 86: 'Conscious experience' of what, precisely? During the first round of reading, I was wondering about the extent to which consciousness as a general concept will play a role, which could be misleading. 

      We have changed it to “conscious experience of the inner body” in the text. The current study is limited in scope to the neurobiology of conscious perceptions of the inner body, not consciousness as a general phenomenon. We hope this distinction is now clear.

      l.115: Particularly given the focus on predictive processing, I was wondering whether the (slightly outdated) spotlight metaphor is really needed here. 

      While not perfect, we believe it is still valid to metaphorically reference goal-directed attention towards the body as an “attentional spotlight”. Given the concern, we have minimized the focus on this metaphor, and the sentence now reads as follows:

      “Extending beyond these model-based influences are goal-directed activities (also described previously as the ‘attentional spotlight’ effect ((Brefczynski and DeYoe 1999)), whereby focusing voluntary attention towards certain environmental signals not only alters their conscious experience but selectively enhances neural activity in the responsive area of cortex.”

      l. 129 ff: The sentence has three instances of 'and' in it, most likely a typo. 

      We have fixed this in the text.

      l. 245: What do these ratings correspond to, i.e. what was the precise question/instruction? 

      The instructions for subjective ratings in each task are mentioned in the Methods (line 223 for ISO task, line 249 for the VIA task), and we have added more detail regarding the scale used to collect subjective intensity ratings.

      l. 322: Could you provide the equation of the LMEM in the main text? It would be interesting to know e.g. whether participants/patients were included as a random effect. 

      We have provided this equation in the Methods (line 326).

      l. 418 ff: I was confused about the statistical approach here. Why use separate t-tests instead of e.g. another LMEM which would adequately model task and condition factors? 

      We did not use t-tests, but instead used linear regression to look at differences in agranular PSC across groups, hemispheres, and epochs, as well as potential associations between PSC and trait measures. We have adjusted the wording in this Methods paragraph (line 418) to help clarity.

      l. 425: As a general comment, it would be great to provide the underlying scripts openly through GitHub, OSF, ... 

      We agree with this comment, and our main analysis scripts have been posted on our OSF as an addition to the original preregistration of this work (https://osf.io/6nxa3/).

      l. 443: For consistency, please report the degrees of freedom for the X² test.

      l. 454: ... and the F statistic would require two degrees of freedom (only the second is reported).

      l. 523: The t value is reported without degrees of freedom here (but has them in other instances).

      l. 540: Typo ('were showed').

      We have reported degrees of freedom for all statistics.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      (1) In general, the representation of target and distractor processing is a bit of a reach. Target processing is represented by SSVEP amplitude, which is most likely going to be related to the contrast of the dots, as opposed to representing coherent motion energy, which is the actual target. These may well be linked (e.g., greater attention to the coherent motion task might increase SSVEP amplitude), but I would call it a limitation of the interpretation. Decoding accuracy of emotional content makes sense as a measure of distractor processing, and the supplementary analysis comparing target SSVEP amplitude to distractor decoding accuracy is duly noted.

      We agree with the reviewer. The SSVEP amplitude of the target at the whole trial level indeed reflected the combined effect of the stimulus parameters (e.g., contrast of the moving dots) as well as attention. However, the time course of the target SSVEP amplitude within a trial, derived from the moving window analysis, reflected the temporal fluctuations of target processing, since the stimulus parameters remained the same during the trial. We now make this clearer in the revised manuscript.

      (2) Comparing SSVEP amplitude to emotional category decoding accuracy feels a bit like comparing apples with oranges. They have different units and scales and probably reflect different neural processes. Is the result the authors find not a little surprising in this context? This relationship does predict performance and is thus intriguing, but I think this methodological aspect needs to be discussed further. For example, is the phase relationship with behaviour a result of a complex interaction between different levels of processing (fundamental contrast vs higher order emotional processing)?

      Traditionally, the SSVEP amplitude at the distractor frequency is used to quantify distractor processing. Given that the target SSVEP amplitude is stronger than that of the distractor, it is possible that the distractor SSVEP amplitude is contaminated by the target SSVEP amplitude due to spectral power leakage; see Figure S4 for a demonstration of this. Because of this issue we therefore introduced the use of decoding accuracy as an index of distractor processing. The lack of correlation between the distractor SSVEP amplitude and the distractor decoding accuracy, although it is kind of like comparing apples with oranges as pointed out by the reviewer, serves the purpose of showing that these two measures are not co-varying, and the use of decoding accuracy is free from the influence of the distractor SSVEP amplitude which is influenced by the target SSVEP amplitude. Also, to address the apples-vs-oranges issue, the correlation was computed on normalized time series, in which a z-score time series replaced the original time series so that the correlated variables are dimensionless. Regarding the question of assessing the relation between behavior and different levels of processing, we do not have means to address it, given that we are not able to empirically separate the effects of stimulus parameters versus attention.

      Reviewer 2:

      (1) Incomplete Evidence for Rhythmicity at 1 Hz: The central claim of 1 Hz rhythmic sampling is insufficiently validated. The windowing procedure (0.5s windows with 0.25s step) inherently restricts frequency resolution, potentially biasing toward low-frequency components like 1 Hz. Testing different window durations or providing controls would significantly strengthen this claim.

      We appreciate the reviewer’s insightful suggestion. In response, we tested different windowing parameters, e.g., 0.1s sliding window with a 0.05s step size. Figure S5 demonstrates that the strength of both target and distractor processing fluctuates around ~1 Hz, both at the individual and group levels. Additionally, Figures S6(A) and S6(B) show that the relative phase between target and distractor processing time series exhibits a uniform distribution across subjects. In terms of the relation between relative phase and behavior, Figure S6(C) illustrates two representative cases: a high-performing subject with 84.34% task accuracy exhibited a relative phase of 0.9483π (closer to π), while a low-performing subject with 30.95% accuracy showed a phase of 0.29π close to 0). At the group level, a significant positive correlation between relative phase and task performance was found (r = 0.6343, p = 0.0004), as shown in Figure S6(D). All these results, aligning closely with our original findings (0.5s window length and 0.25s step size), suggest that the conclusions are not dependent on windowing parameters. We discuss these results in the revised manuscript.

      To further validate our findings, we also employed the Hilbert transform to extract amplitude envelopes of the target and distractor signals on a time-point-by-time-point basis, providing a window-free estimate of signal strength (Figures R3 and R4). The results remain consistent with both the original findings and the new sliding window analyses (Figure S6). Specifically, Figure S7 reveals ~1 Hz fluctuations in target and distractor processing at both individual and group levels. Figures S8(A) and S8(B) confirm a uniform distribution of the relative phase across subjects. In Figure S8(C), the relative phase was 0.9567π for a high-performing subject (84.34% accuracy) and 0.2247π for a low-performing subject (28.57% accuracy). At the group level, a significant positive correlation was again observed between relative phase and task performance (r = 0.4020, p = 0.0376), as shown in Figure S8(D).

      (2) No-Distractor Control Condition: The study lacks a baseline or control condition without distractors. This makes it difficult to determine whether the distractor-related decoding signals or the 1 Hz effect reflect genuine distractor processing or more general task dynamics.

      The lack of a no-distractor control condition is certainly a limitation and will be acknowledged as such in the revised manuscript. However, given that our decoding results are between two different classes of distractors, we are confident that they reflect distractor processing.

      (3) Decoding Near Chance Levels: The pairwise decoding accuracies for distractor categories hover close to chance (~55%), raising concerns about robustness. While statistically above chance, the small effect sizes need careful interpretation, particularly when linked to behavior.

      This is an important point. To test robustness, we have implemented a random permutation procedure in which trial labels were randomly shuffled to construct a nullhypothesis distribution for decoding accuracy. We then compared the decoding accuracy from the actual data to this distribution. Figure S9 shows the results based on 1,000 permutations. For each of the three pairwise classifications—pleasant vs. neutral, unpleasant vs. neutral, and pleasant vs. unpleasant—as well as the three-way classification, the actual decoding accuracies fall far outside the null-hypothesis distribution (p < 0.001), and the effect size in all four cases is extremely large. These findings indicate that the observed decoding accuracies are statistically significant and robust in terms of both statistical inference and effect size.

      (4) No Clear Correlation Between SSVEP and Behavior: Neither target nor distractor signal strength (SSVEP amplitude) correlates with behavioral accuracy. The study instead relies heavily on relative phase, which - while interesting - may benefit from additional converging evidence.

      We felt that what the reviewer pointed out is actually the main point of our study, namely, it is not the target or distractor strength over the whole trial that matters for behavior, it is their temporal relationship within the trial that matters for behavior. This reveals a novel neuroscience principle that has not been reported in the past. We have stressed this point further in the revised manuscript.

      (5) Phase-analysis: phase analysis is performed between different types of signals hindering their interpretability (time-resolved SSVEP amplitude and time-resolved decoding accuracy).

      The time-resolved SSVEP amplitude is used to index the temporal dynamics of target processing whereas the time-resolved decoding accuracy is used to index the temporal dynamics of distractor processing. As such, they can be compared, using relative phase for example, to examine how temporal relations between the two types of processes impact behavior. This said, we do recognize the reviewer’s concern that these two processes are indexed by two different types of signals. We thus normalized each time course using zscoring, making them dimensionless, and then computed the temporal relations between them.

      Appraisal of Aims and Conclusions:

      The authors largely achieved their stated goal of assessing rhythmic sampling of distractors. However, the conclusions drawn - particularly regarding the presence of 1 Hz rhythmicity - rest on analytical choices that should be scrutinized further. While the observed phaseperformance relationship is interesting and potentially impactful, the lack of stronger and convergent evidence on the frequency component itself reduces confidence in the broader conclusions.

      Impact and Utility to the Field:

      If validated, the findings will advance our understanding of attentional dynamics and competition in complex visual environments. Demonstrating that ignored distractors can be rhythmically sampled at similar frequencies to targets has implications for models of attention and cognitive control. However, the methodological limitations currently constrain the paper's impact.

      Thanks for these comments and positive assessment of our work’s potential implications and impact. As indicated above, in the revision process, we have carried out a number of additional analyses, some suggested by the reviewers, and the results of the additional analyses, now included in the Supplementary Materials, served to further validate the main findings and strengthen our conclusions.

      Additional Context and Considerations:

      (1) The use of EEG-fMRI is mentioned but not leveraged. If BOLD data were collected, even exploratory fMRI analyses (e.g., distractor modulation in visual cortex) could provide valuable converging evidence.

      Indeed, leveraging fMRI data in EEG studies would be very beneficial, as has been demonstrated in our previous work. However, given that this study concerns the temporal relationship between target and distractor processing, it is felt that fMRI data, which is known to possess low temporal resolution, has limited potential to contribute. We will be exploring this rich dataset in other ways in the future, where we will be integrating the two modalities for more insights that are not possible with either modality used alone.

      Author response image 1.

      Appyling moving window analysis (0.02s window duration and 0.01 step size) to a different EEG-fMRI dataset. (A) The amplitude time series of the 4.29 Hz component and the Fourier spectrum. (B) The group level Fourier spectrum. At both individual and group level, no 1 Hz modulation is observed, suggesting that the 1 Hz modulation observed in our data is not introduced by the artifact removal procedure.

      (2) In turn, removal of fMRI artifacts might introduce biases or alter the data. For instance, the authors might consider investigating potential fMRI artifact harmonics around 1 Hz to address concerns regarding induced spectral components.

      We have done extensive work in the area of simultaneous EEG-fMRI and have not encountered artifacts with a 1Hz rhythmicity. Our scanner artifact removal procedure is very standardized. As such, it stands to reason that if the 1Hz rhythmicity observed here results from the artifact removal process, it should also be present in other datasets where the same preprocessing steps were implemented. We tested this using another EEG-fMRI dataset (Rajan et al., 2019) . Author response image 1 shows that the EEG power time series of the new dataset doesn't have 1 Hz rhythmicity, whether at the individual level or at the group level, suggesting that the 1 Hz rhythmicity reported in the manuscript is not coming from the removal of the scanner artifacts, but instead reflects true rhythmic sampling of stimulus information. Also, the fact that the temporal relations between target processing and distractor processing at 1Hz impact behavior is another indication that the 1Hz rhythmicity is a neuroscientific effect, not an artifact.

      References

      Rajan, A., Siegel, S. N., Liu, Y., Bengson, J., Mangun, G. R., & Ding, M. (2019). Theta Oscillations Index Frontal Decision-Making and Mediate Reciprocal Frontal–Parietal Interactions in Willed Attention. Cerebral Cortex, 29(7), 2832–2843. https://doi.org/10.1093/cercor/bhy149